Because the LLM sees a lot of examples that show what the user both values and prefers, given mismatched pairs, if they choose s1, it would mean they cater more towards the preferences in the given examples, while if they choose s2, it would mean they cater more towards values.

I would say this test isn’t fair to the LLM, because it isn’t prompted to “tend to the values”. For each use-case, I believe it isn’t the responsibility of the model but the responsibility of the user would have their own needs of whether they want the LLM to stick to their second-order preferences, or first-order preferences.