Used the InVivoGPT dataset. GDPR.

Used calls to the ChatGPT bio tool to find times when memory was stored/updated.

Found that:

  1. 96% of memories are created unilaterally by the conversational system
  2. 28% of memories contain GDPR-defined personal data, and 52% contain psychological insights about participants
    1. Two authors manually annotate random subset of 100 memories according to definitions in GDPR
    2. Psychological framework based on Theory of Mind, categories include emotions, desires, intentions, percepts, knowledge, beliefs, and mentalistic understanding. Classification was done by GPT-4o.
      1. Human eval yielded 96% agreement between authors and 93% average agreement.
  3. 84% of memories are directly grounded in user context, indicating faithful representation of conversations
    1. Syntactic matching: calculated overlap between grounding of memory to messages when evaluating against full user history
    2. Calculated cosine similarity using embedding model, remain high (greater than 0.51) across all variations
    3. LLM as a judge: grade grounding on 5-point Likert scale