Used the InVivoGPT dataset. GDPR.
Used calls to the ChatGPT bio tool to find times when memory was stored/updated.
Found that:
- 96% of memories are created unilaterally by the conversational system
- 28% of memories contain GDPR-defined personal data, and 52% contain psychological insights about participants
- Two authors manually annotate random subset of 100 memories according to definitions in GDPR
- Psychological framework based on Theory of Mind, categories include emotions, desires, intentions, percepts, knowledge, beliefs, and mentalistic understanding. Classification was done by GPT-4o.
- Human eval yielded 96% agreement between authors and 93% average agreement.
- 84% of memories are directly grounded in user context, indicating faithful representation of conversations
- Syntactic matching: calculated overlap between grounding of memory to messages when evaluating against full user history
- Calculated cosine similarity using embedding model, remain high (greater than 0.51) across all variations
- LLM as a judge: grade grounding on 5-point Likert scale