Collaboration between Polis and Anthropic team. LLMs can help scalable deliberation with the following, with risks:

  • Topic modeling
    • Comments were assigned topics in batches
    • Risk: assigned topics miss contextual nuance.
      • Mitigation: human review, and ability to override topics
  • Summarization and reporting
    • Summaries created using a small set of comments, and their number of agree and disagree votes
    • Risk: LLMs can hallucinate
      • Mitigation: show automated summaries back to participants for review and feedback
  • Facilitating conversations by synthesizing group identity and consensus
    • Dialectic process of facilitator reflecting back their understanding, and recursively asking for confirmation, until a fixed point is reached, and each side feels understood.
    • Risks: How much machine influence is acceptable in a process which aims to surface human opinions?
      • Difficult problem. High risk.
  • Vote prediction
    • Theoretically, are LLMs good at representing someone’s position? Practically, it can help with filling in missing data in Polis votes.
      • Apparently, large Polis conversations can have over 90% of the data missing.
    • Apparently, base LLMs predicted people’s opinions (based on their previous votes) with high accuracy (90%) and well-calibrated confidence
    • Risks: misrepresentation, especially for those who deviate from stereotypical ideological configurations
      • “it would be catastrophic for deliberation at scale if the remarkable capabilities of LLMs lead to a replacement of whole groups of individuals by simulacrums designed by a very different population.”

At the time of writing, Claude models had a relatively low context limit, which induced a ceiling for many of the tasks they tested. They tested a larger context model that was released, finding it improved on previous results.