Harnessing AI: Tailoring Language Models

Dray McFarlane
April 30, 2024

Harnessing AI: Tailoring Language Models

Let’s dive into a topic that is near and dear to our hearts here at Betty Bot: tailoring large language models to make them more focused, smarter, and reliable in your area of expertise.

There are several options for this, each of which has its own complexities and costs. So today, we’re exploring each pathway at a high level, and then we’ll dig into the strategies we’ve adopted for Betty Bot.

Custom Building: A Ground-Up Approach

One approach for tailoring large language models is to build the model from scratch. This involves gathering a vast amount of general and specialized content to train the model to become an expert in a specific area. The major hurdles here are cost, time, and the need for continual updates and retraining, which can be slow and resource intensive. A bespoke model might be the gold standard in terms of focusing on and performing a specific task, but it’s not realistic (yet) for most organizations both in the association industry and beyond.

Fine-Tuning: Modifying Existing Models

An alternative to custom building is fine-tuning, which adjusts an existing model by training it on specific tasks. This method is more cost-effective and faster than building a new model from the ground up. By providing numerous examples of the desired task, the model learns to perform that function. The drawback of this approach, though, is that it focuses on adapting to tasks rather than learning new content. For high quality fine-tuning that results in a tool users can leverage to interact with content, you need a data set of example user interactions – and a lot of them. ‘

Prompt Engineering: Influencing Outputs

Whether they realize it or not, everyone who has used generative AI has used our third approach – prompt engineering. While it sounds fancy, prompt engineering simply means managing the input to the model to influence the output. For tailoring language models, this involves providing detailed context to help the model understand and generate the desired responses. While effective, this method is limited by the amount of information that can be input, the potential costs, and the necessity of having the right content to begin with. For more information on prompt engineering, check out Betty’s Guide to Prompt Engineering.

Enhancing Prompt Engineering with Retrieval Augmented Generation (RAG)

To make content interactive without requiring users to provide the right documentation, we can augment prompt engineering with a retrieval system. This technique, known as Retrieval Augmented Generation, or RAG, enhances prompt engineering by using a query to identify relevant content, which is then fed into the model alongside the initial question, providing a richer context for the model to generate accurate responses. RAG integrates seamlessly into various applications, from conversational help documentation to more complex interactions that simulate memory.

Challenges with RAG

The typical retrieval process for RAG systems is based on ‘semantic’ search, in which the meaning of the user’s input is compared to the meaning of your content to find the most relevant items. This is very powerful, crazy fast, and handles most cases smoothly, making sure the prompts are fed the information most likely to provide an answer for the user’s request.

Like any of the first three approaches, RAG does struggle in certain scenarios.

First, semantic search doesn’t focus on keywords or search terms. That means the user doesn’t need to use specific language. This is normally great, but sometimes, when users know the resource they are looking for and use very specific language to find that resource, they might not get the expected results. This is because semantic search is focused on matching, rather than just words.
Second, in a typical RAG system, relevancy is the only filter. This means the user is not reducing the results to any particular category of content or publication date range. With a RAG system, you won’t necessarily get the most recent content if older pieces are more relevant, and a semantic search alone won’t provide the results you are looking for with a question like “What events are coming up soon?“.

How Betty Works

Betty Bot utilizes a RAG system with added layers to the retrieval system to interpret the user’s question and the context of the conversation in ways that dramatically improve Betty’s retrieval success rates. To address some of the challenges with RAG outlined above, we are currently working on integrating a new approach that we are calling ‘hybrid’ search. This will allow us to layer two approaches – traditional methods focused on date or categorical search and semantic systems focused on meaning – to get the best of both worlds.

Another critical component of training Betty Bot involves structuring content in ways that play nicely with semantic systems. For example, a long table full of statistics doesn’t have much information that would identify it as relevant to a user’s inquiry or request. For this type of content, we utilize translation techniques that preserve the information while making it findable in a RAG system.

Continuous Improvement and Future Directions

There are still some limitations in the underlying technologies (the language models) and services (the hosting providers). As these limitations are addressed and resolved, however, we will see significant improvements in the quality and reliability of language models, even beyond what we see now. As performance improves with these models and prices drop, Betty will have more knowledge available for every question and request, and she will be able to ‘think’ more about her answers before responding. We’ve already mapped this out and tested and have seen great results, but this would currently mean much longer delays and higher prices for the technology. As prices drop, Betty will have more knowledge available for every question and request.

All of this is underway, with the recent release Llama-3 from Meta providing a big opportunity. Llama-3 is not as smart as the current top end models, but it is open sourced, meaning it (and other fine-tuned versions that are coming) can be hosted by providers like groq, making it insanely fast relative to other available services. At Betty Bot, we’re always watching closely for industry updates and experimenting to see when we can make shifts to bring Betty to the next level. Our goal is to make sure she’s always providing the most valuable, reliable, and intelligent assistance to your members and community.

For more information about how Betty works, ask Betty herself!

Dray McFarlane

All Posts