Basically trying to clarify that RAG isn’t limited to just one method of vectorization (like vector databases), but it is an ensemble of technologies that are encompassed within what is a loosely defined concept.

One notable thread was started with “i feel like some of us need to get together and make a huge write up of techniques, there’s no one size fits all, it’s HIGHLY dependent on various factors…” ended up with https://github.com/Skyrider3/GENAI-LLM-Repository, a repository intended to have people share the different ways of implementing RAG.

Others follow-up with their trials and errors:

  • Setting up an evaluation pipeline early, educating yourself to SoTA approaches… and a list of repos they use. “This is mostly trial and error… tech is still too new to have a solid “best practice” guide.
    • this seems like an important point about RAG and why this thread exists—it is something new that people are excited about, but something that is also difficult to find information about.
  • People write their own takes and experiences:
    • “…once you finetune that knowledge into the model your information loses that priority and distinction.”
  • “When people ask for “RAG techniques” it’s kind of meaningless because it’s too vague as there are two separate things: retrieval and augmentation.

OP and commenters also begin to engage in brainstorming about the given problem. One example is wondering whether RAG can go beyond augmenting generation, but being nested into the LLM’s decoding process.

“Thanks this really helped break down a mental block and open my mind to understand rag better. Turns out my sql assisted ai thingy I’m working on is a RAG setup!”

And there are some critiques:

Your solution to pull data from a RAG dictionary and use that to feed a fine tuning function does seem like the correct approach, but I understand that this solution is a heavier lift that involves significantly more compute and technical effort than just setting up an open source llm with a basic RAG architecture. I hope I’m wrong about this and someone could point out a fine tuning framework that is relatively simple to get off the ground, because I would like to try this approach if I can.

Some ask the OP for advice. Their use-case is user asking questions that need multiple steps before answer can be generated. They ask whether a multi-agnetic flow or fine-tuning would work best, and the OP answers that a mix of the two is best:

“But then I got another idea. What if a added another column to the dataset that would contain the inner monologue of both white and black player, and then have an LLM go through each row of the dataset where it was inputted with the game history and also 3 moves into the future, and then ask it to write a tactical assessment of the situation leading up to the move, all while not revealing it knows the future. Took me a week with 2 x RTX 3090s to get through the entire dataset, but when I tried finetuning a model on it, I saw improvements even after the first epoch though not much. I then tried running it for I think around 3 epochs, and it went from adhering to the prompt template maybe 10% of the time to adhering to it around 80%-90% of the time.”