pstore

Home

/

Bibliography

/

Can AI Truly Represent Your Voice in Deliberations? A Comprehensive Study of Large Scale Opinion Aggregation with LLMs

Can AI Truly Represent Your Voice in Deliberations? A Comprehensive Study of Large-Scale Opinion Aggregation with LLMs

Nov 24, 20252 min read

tags
  • lit
  • agent
  • analyst
  • interaction/AIs-AIs
link
http://arxiv.org/abs/2510.05154
zotero
zotero://select/library/items/HW3PJUUA
itemType
preprint
authors
  • Shenzhe Zhu
  • Shu Yang
  • Michiel A. Bakker
  • Alex Pentland
  • Jiaxin Pei
pubDate
2025-10-08
retDate
2025-11-20
relatedProjects
Machine-assisted deliberation
tlkr
Evaluation of LLM deliberation, post-deliberation synthesis.

Abstract

Large-scale public deliberations generate thousands of free-form contributions that must be synthesized into representative and neutral summaries for policy use. While LLMs have been shown as a promising tool to generate summaries for large-scale deliberations, they also risk underrepresenting minority perspectives and exhibiting bias with respect to the input order, raising fairness concerns in high-stakes contexts. Studying and fixing these issues requires a comprehensive evaluation at a large scale, yet current practice often relies on LLMs as judges, which show weak alignment with human judgments. To address this, we present DeliberationBank, a large-scale human-grounded dataset with (1) opinion data spanning ten deliberation questions created by 3,000 participants and (2) summary judgment data annotated by 4,500 participants across four dimensions (representativeness, informativeness, neutrality, policy approval). Using these datasets, we train DeliberationJudge, a fine-tuned DeBERTa model that can rate deliberation summaries from individual perspectives. DeliberationJudge is more efficient and more aligned with human judgements compared to a wide range of LLM judges. With DeliberationJudge, we evaluate 18 LLMs and reveal persistent weaknesses in deliberation summarization, especially underrepresentation of minority positions. Our framework provides a scalable and reliable way to evaluate deliberation summarization, helping ensure AI systems are more representative and equitable for policymaking.


Graph View

Created with Quartz v4.5.2 © 2025

  • GitHub
  • Email