This post is the source of most of the sfrs, being 55 out of 87 of them. This is because it is a post dedicated to asking the community for suggestions for future release—someone who works at Google wants to know what the community desires in their upcoming Gemma model.
It is interesting to see the nuance of what people ask for in this community, perhaps as a lens to their ideal model:
- A lot of what they are asking for is multilinguality in this post, as they do not have as many options for multilingual open models (at the time of this post) than they do of closed models. They see this not only as the ability to speak more languages, but also learning cultural knowledge.
- Better writing quality with less literary clichés (what they perceive to be “GPT-slop”)
- Support for multi-character and multi-persona, without imposing the “user-assistant” model of user-LLM interaction
- More specifically, for LLMs to better recognize when it is not being addressed or when it does not have something of value to contribute, and simply does not generate anything.
- Similarly, another person suggests to train for the ability to have multi-user chats, since currently, most models get confused
- Size in parameters suitable for 5-bit quantization which they see as the almost lossless quantization that specifically fits on 24gb or less consumer GPUs
- People also see that Gemma 2, the model preceding the model in question in this post, to be very adept at storytelling, and suggests Google that they can carve some kind of reputation for being good at making models for storytelling.
- Large context, up to 1M, that will enable people to use models without needing fine-tuning
- They would like Google’s attempt at specialized Gemma models for specific domains, like scientific research, creative writing, or code generation.
- Ability to draw vector graphics
- Generate audio like the Gemini models (when model producers build both open and closed, people can want downstream innovation)
- Less censorship. More specifically, the classic argument that filtering the inconvenient or questionable things away is not good at the pretraining level, and people started suspecting that the distilled models of Gemma are “censored” at the pretraining level.
- Better training for reliable tool calling, especially as at the time of the post, small models sucked at it.
- Some have very poignant feedback knowing what Google’s up to—knowing they hired Noam Brown back, they implore them to take some of their expertise in inference-friendly optimizations for the local computer, and do 8-bit native training.
They also ask them to contribute more to the OSS community:
- “[don’t] forget to support the ppl who port the models to llama.cpp, exl2 etc”
- Gemma.cpp was developed internally by Google at this time, which is supposed to be, again, an internal effort for fast inference of Gemma models. Others talk about how there already are established and widely used tools by the community, and that them putting their efforts to interact with third-party OSS will be beneficial to all.
- “Here’s what currently happens when a model gets released: 1. Someone posts that it has been released. 2. Literally the first thing people ask is, “where are the gguf’s?” (or other file types, but mostly gguf). 3. We realize that the new model either a) doesn’t support llama.cpp/gguf (and others) or b) that there’s a bug in the tokenization etc. 4. People lose interest. 5. A few weeks later the already overworked llama.cpp maintainers add support or fix that bug, but many people either don’t know about that, or have already moved on to the next released model… my idea would be to just assign someone for maybe 1 or 2 days to write (and properly test) support before releasing the model.”
- More tooling to help unfamiliar or even semi-familiar people use models outside of just simple inference would be huge. “drop dead simple fine tuning” and “press this button to get something besides just a chat it spun up”
- “I would very much like a, “I am an adult and accept total and full legal responsibility for the output of my LLM” button that completely disables censorship of every sort.”