Similar authors to tesslerAICanHelp2024. Fine-tune 70b LLM to generate statements that maximize expected approval of people with diverse preferences.

  1. Participants write opinions on thousands of moral and political questions
  2. Rate LLM’s generated candidate consensus statements for agreement and quality
  3. Reward model trained to predict individual preferences, enabling it to quantify and rank consensus statements in terms of their appeal to the overall group
  4. Produce consensus statements that are preferred by humans over prompted LLMs
  5. Best model’s consensus preferred over best human-generated opinions