Enhancing Gen AI model output - Adding external knowledge

using prompt engineering, RAG, RAG with Knowledge Graph and adding external knowledge to an LLM
Enhancing Gen AI Models - Adding External Knowledge for better output
Written by
Clio AI
Published on
March 14, 2024

Key Takeaways

  • You can add knowledge to a model using prompt engineering to give additional examples, RAG, or training models.
  • RAG works when datasets is bigger than context length but manageable. Eg: ~100M or so tokens.
  • Above that, better off training an embedding model to generate more relevant embeds. Eg: ColBert.
  • With > 10B tokens, better to train a custom model and merge it with an oss model.


Large language models have been a breakthrough in the field of Generative AI. With a large parameter size, we finally have systems that can generate human readable and understandable text across wide range of verticals. With the rise of ChatGPT and it's usage for wide ranging tasks, one way to enhance the models' output is to get answers on external knowledge, that is, knowledge outside of the training corpus of a large language model. For companies and enterprises alike, this enables an LLM to understand what a company does and generate responses grounded in an org's knowledge that is directly actionable and useful for the employees.

Over the last few months, as ChatGPT has been a revelation, various techniques have come up to generate relevant and accurate responses from an LLM based on a specific dataset. That data set could be a set of documents proprietary to an organization, a topic outside of training data, or just a set of docs curated by any individual. Many popular techniques in the market to ground an LLM's response to a particular dataset. Here, we go through some of the most popular ones:

Method #1: Including Context with the prompt

With a context window for LLMs now exceeding 200K tokens, for most adhoc tasks it's easier to put the entire context directly with the prompt. Everything a model needs is directly included in the prompt with the instruction not to use any external knowledge. With GPT 4 and Mixtral, the generations are even more accurate to get the answer directly from the context. Two popular ways this is used is "zero shot learning" and "few shot learning"

Zero Shot Learning

As simple as passing the context with the prompt. Here, you give a context, an instruction and expect the model to accomplish the task based on the instruction. very useful in simple and common usecases where you are not expecting a specialized output.

Image source: Ludwig AI

Few Shot Learning

In this you would pass a few examples to the LLM to make the model understand how it needs to behave. In the example below, you have the same instruction, but here you provide examples of the expected output, before giving the final instruction. This helps ground the model into understanding what kind of output you are looking for and helps generate a specific output that is more directly useful and actonable for your tasks.

Image source: Ludwig AI

In this method, if you add a JSON to your examples like {"language":"fr", "translation":...}, it will generate the answer in the same format. If you were to ask to switch to german, it will generate the answer in german with a JSON like {"language":"de", "translation":... }

Advanced Tactics to improve output

For reasoning based tasks, you can prompt an LLM with stuff like "Think step by step". Google researchers even tried a strange but effective "Take a deep breath". Just for fun, this came up last month and all these get ChatGPT to perform better:

Image source: WhatsApp

There are more things like Chain of thought reasoning and Tree of thought reasoning though they are not central to data based use cases. Will cover each of them as a separate blog post.

Method #2: Retrieval Augmented Generation

Retrieval Augmented Generation or RAG has been an effective method to enhance the quality of the output by 1/ Retrieving relevant context to your query 2/ Adding to the prompt (Method #1) at runtime. There are various engineering tactics which are used to further enhance the output, and a lot of work is done in this regard by various open source libraries like LlamaIndex, Haystack, langchain etc.


Mostly driven by a semantic search to figure out the nearest neighbour for your query in a list of documents (or chunked documents), it gives you the relevant documents (indexed already) according to your query using either semantic search or a combination of keyword + semantic search. Think of it as a search on your data, finding the most relevant results and sending them as part of your prompt.

A typical RAG pipeline schematic. Image source: Towards Data Science

While there are various tactics to enhance this, RAG is specifically limited by human factor. If you have a large amount of data, you need more specific and longer queries, but most users would write short generic queries which fail to provide them with a satisfactory answer. Some amount of personalization can help, but this is a data science (and human) issue, not an engineering problem, so be careful applying many hacks here.

Method #3: RAG with Knowledge graph

One specific way of enhancing RAG is to create a knowledge graph on your data. I talk about this because it is relatively untouched, and gives very good results in the experiments we have conducted so far. The key idea is every entity in your data is related to another. Instead of traversing with a semantic search and nearest neighbor, how about you traverse across a knowledge graph which would have stronger relationships than semantic matching.

The setup here is non trivial. Source for this: https://doi.org/10.3390/math11153269

Method #4: Custom Models

For large datasets (1B tokens or so. 750-1000 tokens is 1 page). Models need to understand your business and need to be trained on them. We believe every big company would eventually shift to a custom LLM model, adapted to their own use cases and work only for them. The reason is simple, one unique data point can dramatically alter the model behavior - a property common to all ML models. So, to hone competitive advantage, it's important to make full use of your data, and ensure no one else uses it. Building your own models ensures both the objectives. We have talked about it here


If you are a large enterprise figuring out their ai strategy, you should check out this deep dive we did last month for Enterprise Deployments.


Adding external knowledge to generative AI models can significantly enhance their performance and provide more accurate results. By using methods such as prompt with context, RAG, RAG with knowledge graph, and custom models with continual pretraining, we can improve the accuracy and relevance of generative AI models. It is important to understand the role of external knowledge in generative AI models and how it can be used to improve their performance.

Weekly newsletter
No spam. Just the latest releases and tips, interesting articles, and exclusive interviews in your inbox every week.
Read about our privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Spend time thinking not searching. Get a demo today.

By signing up for a demo, you agree to our Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.