Named after Jürgen Habermas Two-model system.

  • Generative model:
    • Built on 70b param model (Chinchilla)
    • Prompted with question, opinions, and initial group statement winner and critiques
    • Use SFT on statements from previous rounds that were rated as high quality
  • Personalized reward model (PRM)
    • 1.4b variant with added linear layer for single scalar reward prediction
    • Predicts how much each participant would endorse each statement based on their individual opinions

Operation:

  1. The system takes participants’ written opinions on a question
  2. It generates multiple candidate consensus statements
  3. PRM estimates how each participant would rank these statements
  4. Rankings are aggregated via the Shulze ranked-choice voting method, yielding a single “winning” statement
  5. In a second “critique” phase, participants critique the initial statement; the model integrates these critiques to generate revised statements, again selecting the best via social choice aggregation