PSA that training whitespace at the end of the prompt always triggered numbered list generation.
One of the commenters knows a solution: “This is a solved problem, but the solution isn’t easy to implement so most implementations just don’t. https://github.com/guidance-ai/guidance/blob/main/notebooks/token_healing.ipynb” This is a dead link now, but token healing is an inference-time trick that basically trims the end of the prompt while staying consistent with UTF-8 encoding.
The general sentiment here is that the LLaMA tokenizer is bad in many ways, and while this post is old (2023-10-22), these were the kinds of problems users had to fix on their own. Furthermore, because problems on the tokenizer is fixed into the training process, it isn’t something you can “update” or “fix”, it requires training an entirely new model with a new tokenizer.