Abstract
Achieving consensus among diverse opinions through multi-round discussions can be a complex process. The advent of large language models (LLMs) offers promising avenues for resolving this challenge, given their prowess in understanding and analyzing human sentiments. However, existing approaches typically focus on single-round discussion, limiting their effectiveness in real-world discussion scenarios. In response to this, we proposes a two-layer facilitation agent modeled a multi-round discussion as a Markov decision process (MDP) to foster efficient agreement. The model comprises a high-level reinforcement learning-based agent, deciding the optimal facilitation action such as facilitation time and facilitation prompt. In the low-level, a large language model that generates the facilitation message based on the facilitation action. Our agent dynamically chooses facilitation moments, generates novel content, and directs the discussion towards consensus. Our methodology was validated across several different topic-based discussions, demonstrating excellent performance in achieving agreement swiftly across all.
MDP:
- State: a list of agreement scores for all participants, win condition is when aggregate is > threshold
- Action:
- Do nothing
- Generate facilitation message
- Prompt for additional opinions
- Reward:
- If the agent’s action leads the discussion to the “win” condition, it gets a large positive reward
- If the discussion continues but agreement isn’t reached, it gets a smaller, standard unit reward.
- They train this with 500 discussions generated with LLM. Finds good results.