Llama 2 models follow a very specific prompt formatting at training time, which needs to be respected at inference time as well. So when assigning a system prompt to a message, it should be done with the right tokens. Hugging Face posted this information on r/LocalLLaMA at this link.

They include at the end that the interesting part about open access models is you aren’t forced to use a prescribed system prompt, and that this is a point of experimentation available for researchers and users.

Getting the formatting right is paramount to getting the model working on a local system, and education on such topics really matters to the community.