This paper is written not by the authors but someone wanting to share the authors’ work, Microsoft’s ALMA model that does initial fine-tuning on monolingual data, and subsequent fine-tuning on parallel language data. They also interestingly find that doing 5M instead of 10K examples actually dilutes its existing knowledge in Russian. This breaks the assumption that the posts have to be something by the author.
People actually get into a conversation about the validity of BLEU as a benchmark, and whether their claim that their model performs better than GPT-4 is something verifiable qualitatively.
The author of the paper!
Hello, I’m the author of the paper and I’d like to express my gratitude for the post as well as for converting the model checkpoints to the GGUF version! I’ve noticed some complaints regarding the unexpected translation performance of ALMA. Here are some tips that may help you achieve better translation results as intended: * ALMA-13B-Pretrain is not specifically designed for translation; it’s a general Language Model. Like other LLMs, it may produce crazy hallucination outputs after translation. For effective translation, it should be used in conjunction with ALMA-13B-Pretrain-LoRA. An example of how to use these together can be found in our GitHub Repo. The GGUF checkpoint conversion is indeed a valuable resource for those with limited computational capabilities. However, if you’re finding that the performance with the GGUF version is bad, we recommend using our original checkpoints. The 7B ALMA model requires 28GB of memory, and the 13B version requires 52GB. We also offer a model-parallel evaluation method on our GitHub, allowing multiple GPUs to support a single model for evaluation. This will yield translation performance identical to what we reported in the paper. If you have any additional questions, please feel free to open an issue in the GitHub repo. Thank you for your interest in our work!
Interestingly, while they are thankful for the GGUF conversion from the community, they also point out that their original checkpoints might work better.