Named after Jürgen Habermas Two-model system.
- Generative model:
- Built on 70b param model (Chinchilla)
- Prompted with question, opinions, and initial group statement winner and critiques
- Use SFT on statements from previous rounds that were rated as high quality
- Personalized reward model (PRM)
- 1.4b variant with added linear layer for single scalar reward prediction
- Predicts how much each participant would endorse each statement based on their individual opinions
Operation:
- The system takes participants’ written opinions on a question
- It generates multiple candidate consensus statements
- PRM estimates how each participant would rank these statements
- Rankings are aggregated via the Shulze ranked-choice voting method, yielding a single “winning” statement
- In a second “critique” phase, participants critique the initial statement; the model integrates these critiques to generate revised statements, again selecting the best via social choice aggregation