Technical
Reinforcement Learning from Human Feedback (RLHF)
A training technique in which AI models are fine-tuned using feedback from human evaluators to align model outputs with human preferences. Used to train modern LLMs including ChatGPT, Claude, and Gemini to be helpful and reduce harmful outputs. The quality of RLHF depends heavily on the diversity and representativeness of human feedback.
Referenced in frameworks
NIST AI RMF