Collection Methods
Rating outputs: People rate AI generated content (good/bad, helpful/harmful, preferred/less preferred).
Pairwise comparisons: Given two outputs, humans pick which is better.
Direct edits or suggestions: Annotators or users improve the AI’s output (e.g., rewriting text or correcting errors).
Specialized feedback: Domain experts (e.g., lawyers, doctors, teachers) review content for accuracy in specialized fields.
Feedback is turned into training signals for techniques like Reinforcement Learning from Human Feedback (RLHF) or Direct Preference Optimization (DPO).

Last updated