Similar authors to tesslerAICanHelp2024. Fine-tune 70b LLM to generate statements that maximize expected approval of people with diverse preferences.
- Participants write opinions on thousands of moral and political questions
- Rate LLM’s generated candidate consensus statements for agreement and quality
- Reward model trained to predict individual preferences, enabling it to quantify and rank consensus statements in terms of their appeal to the overall group
- Produce consensus statements that are preferred by humans over prompted LLMs
- Best model’s consensus preferred over best human-generated opinions