- A very naïve way to reducing refusal by setting a logit bias to the individual tokens of specific trigger phrases like “but as an AI language model”
- While the comments in this post point out how there could be negative downstream consequences to the LLM (especially changing the logits of very oft-used tokes like “as” and “an”), this still documents an attempt at a time when robust methods were not easily available to everyone.
- Arguably, there is a bit of a hacker ethos here.