Why Enterprises should build their own Custom Model

RAG, Finetuning, Prompt Engineering fail to give the right results for large enterprises. Best solution for them is to invest in building their own great Large Language Model trained on their org's knowledge.
Why Enterprises should train on LLAMA-2 or make their custom Gen AI Model
Written by
Clio AI
Published on
December 13, 2023
Takeaway: Custom models solve all the AI specific problems long term - data remains private, Model is internal policy compliant, No dependency on external updates, Responses are better and more relevant, and can be customized according to use cases. Try out ChatGPT/Azure APIs to see if Gen AI is a good fit for your use case, and if the results are encouraging, go for a full fledged Custom Model Implementation.

We have covered this partially in previous posts. You can read them here and here.

This post is based on a series of conversations with large companies and enterprises where people wanted to implement Gen AI models like GPT4 on their data, but were not getting the right outputs consistently. They tried many things out there - like finetuning, Retrieval Augmented Generation (RAG) where you pass context within the prompt, and were mildly successful but eventually struggled to get to a level of consistency and predictability to deploy all this into production.


With the speed and growth of Generative AI as a phenomenon over the last one year or so, every enterprise needs an AI strategy in place. With such a fast changing landscape and the speed of tech progress, that is easier said than done. It becomes even more tricky when you have folks reaching out from everywhere pitching an AI solution for just about everything - better writing, better sales, better coding, better revops, better product management, better CX, and so on. I am one of them, so I empathize.

The capabilities are clear and extremely useful as evident by the viral usage of ChatGPT at nearly 100M Weekly Active Users. For the purpose of this post, it would help to think of LLM as a text compression engine. There is some initial knowledge, compressed and stored in very small space. When asked a question, or to complete a sequence, the model tends to look at the initial knowledge and predict the next word based on what it's supposed to output. If something is part of the training data, it can give accurate answers, and if something is totally tangential it can hallucinate and still produce a sample text that is grammatically correct and sounds about right.

When the domain knowledge is absent from training data, how do you make an LLM respond accurately? That's what AI startups try to do.

Improving Text Generation Quality

Many startups have been doing it in multiple ways.

Simple Prompt Engineering:

Just add all of context to the prompt directly so that it generates an output in line with the context. Of course, some of this is common sense and some of it is based on training data. Eg: For a writing task, you will get a considerably better output if you start the prompt with "You are a top 1% copywriter in the world. Write a simple conversational title for this post" vs a normal "write a title for this post".

Sometimes, it helps just to put the entire context in the prompt and just ask a question based on the context itself. With 100K context windows and file uploads, you can almost pass anything in a simple prompt.

Zero/Few Shot Learning

This is similar to prompt engineering, but you pass examples of expected output in the prompt itself to teach an LLM the result you are expecting. See Open AI Cookbook on how they do it here. It's a variant of prompt engineering but listing it separately just to separate from hackish ways prompt engineering happens.

Retrieval Augmented Generation (RAG)

What happens when your data is large and you dont want to pass everything in a single prompt? RAG involves chunking your data into bits, creating embeddings. When you query, it looks up relevant information, and then pass that along with your question to generate a well informed response. It works for smaller data sets, but will always be a large prompt been passed to an LLM behind the scenes. Call it dynamically generated prompt engineering.


This is used if you want a model to output in a certain pattern like capturing the format or the tone of a specific blog post. You collect 100-10000 examples of how the prompt and expected answer would be and run a finetuning job on a model. Next when you query the finetuned model with a similar prompt, it will generate output similar to your examples at the time of finetuning. Don't worry about the quantity, but make sure every example you use in fine tuning is of high quality.

Important: Unlike the other three techniques, finetuning does not involve adding new knowledge to the model instead saving on context.

You can either one or all of the above together to improve output. This video from Open AI would help clear a lot of questions and help you understand this better.

But...I tried all this and still get crappy and inconsistent output. Why?

Yeah, I know. Hence, this post exists.

At this point, many will tell you different solutions - some will suggest techniques like Hyde, some will ask to embed metadata, or add more Retrieved chunks in prompt, more examples for few shot learning and so on. Most would not work either.

To understand why, let's dive into how models generate output and develop an intuition about its working.

What a Model Needs

To get the desired output, a model needs three things: similar knowledge (in training data), examples of output (Few Shot Learning/Finetuning), and enough context in the prompt (RAG/Prompt Engineering). We're essentially balancing the lack of knowledge with these techniques, but there's a limit. If the model's training data lacks the domain knowledge, the output quality suffers.

Data Science Problem

The listed techniques use engineering skills to address what's fundamentally a data and data science problem. Attempting to augment the model with external knowledge to fill gaps in the training data often leads to struggles, especially when introducing a completely new domain to a model.

If the outputs are not upto the mark, that means your domain knowledge is not in model's training data

If what you are prompting about is in the training data, the model works incredibly well. See how ChatGPT responds to a generic question or any query. If not, the quality decline is apparent.

Teaching a model new knowledge

At this point, the ultimate solution is just simply adding more knowledge to a model. It's not as easy to do, and that is where Clio AI comes in.

If you have a large corpus of private knowledge, about 1M words (1000 pages) or greater, the above tactics won't cut it. The issue is not with engineering, but with knowledge, and should be solved with adding knowledge.

A Custom Model, trained specifically on your company's private knowledge, is the key.

Just like how GPT 4 compresses the world's knowledge and answers about it when asked, a custom model would do so for your company only.

This is also what Open AI is looking to do with large Enterprises. We covered it here in a previous post.

Training Custom Models

As of today, to train a custom model, you basically need three things - a good Foundational Model, Lots of Data, and Expertise. (and compute though that is given when it comes to Data Science).

Open Source Supplies models like Llama-2 which are exceptionally good, you supply the data you need a LLM trained on, and Clio AI team provides the expertise.

Our team works closely with you to help make a great custom model for your company's use cases. Below is a schematic of the training process.

Source: State of GPT by Andrej Karpathy

We modify every step of the model training process:

  • We take a foundational model (like Llama 2, Mistral 7B or other) and do an additional domain specific pretraining to alter model weights.
  • Supervised Finetuning with 100K high quality examples. This is different from usual finetuning as now the model is trained on new knowledge.
  • Reward Modelling process and frameworks for better outputs
  • Custom post training Reinforcement learning process tailored for your company's domain.
  • And everything else to make a great Custom model for you.

This is pushing stuff as far as it can currently go.

Tailored for Your Company:

This isn't a generic model; it's specifically trained to understand and cater to your company's needs. This approach is particularly beneficial for companies with large datasets, as other techniques may not yield satisfactory results.

Please note that this would not be a generic model, but mostly trained and built in a way it understands what the company does to supply output accordingly.

When to opt for a custom model?

Companies should consider a custom AI model for:

Adding Domain Knowledge

Needing to incorporate domain knowledge the model isn't trained on.

Company-Specific Policies:

Needing the model to adhere to company-specific policies regarding compliance, data, and communication.

Privacy Concerns:

Ensuring privacy for business data and trade secrets.


Requiring cross-functional insights spanning multiple apps and databases.

Who should consider a custom model?

If the typical ChatGPT wrappers are not working for you, you should consider a custom model.

  • Large Data Volume: like more than 1M tokens - then you should go for it. Text could include all the reports, memos, proposals, discussions, emails, and docs shared between your team which contains insights about the company and domain.
  • Proprietary Data Sensitivity: If the data involved is sensitive, it makes sense to go for a Custom model than relying on any 3rd party wrapper or even directly Open AI.
  • Compliance Requirements: For compliance heavy industries like Pharmaceuticals, Financial Services etc. where there is a need to preserve most communications or other policies are in place, a custom model is important as you can edit system prompt to give specific instructions.
  • Heavy Customizations and Complex Workflows: When you need a model to work a very specific way (eg: connect to db and pull data for every question) then a custom model is important to edit through system prompt instructions.

I think most companies have to go for open source implementation because of system prompt requirements, which give them more control. You can use RAG, Finetuning etc on a self-deployed Llama-2 as well. Custom Model is useful for a higher degree of customization.

Advantages of a Custom Model

At Clio AI, we estimate that a custom model trained on your org's knowledge could save every one of your employee about 2-3 hours a day. With 5x faster decisions and 10x less time spent on looking for context. That could simply translate to a direct and proportional impact on your top and bottom line.

1. Competitive Edge + Exclusivity

Your custom model is more aware of your org's years of accumulated knowledge, context, and language, thus more adept at accurately answering questions about your business. This is one instance where your competition cannot replicate the results using the same tools. Your data and your knowledge gives you the push to the next level.

2. Enhanced Decision Making

A custom model is an asset when it comes to decisions. It can provide much needed context, help teams by distilling complex information into quick actionable insights, and unblock individuals with getting them information they need instantly.

3. Higher degree of control

This is non trivial and not immediately obvious. Open AI and other providers keep updating their models which may result in a performance change. You would have heard about "ChatGPT getting dumber", it's just newer iterations on the model. While using the APIs, you are likely to encounter changes in the form of output quality degrading suddenly. With a custom model, you decide on the update, and can deploy when you are satisfied with your own benchmarks.

Other minor things that are advantageous compared to using a generic model. Ironing out these minor irritations help a lot with user experience and drive company wide adoption.

  • Lower Latency: You cut down on the API call times. Typically an Open AI call takes about 15-20 seconds to return a response.
  • Private data remains private: Your privileged data is not shared with anyone else, and remains entirely in your own network, on your own servers.

Addressing the pain points

Usual use cases for ChatGPT, like code assist, sales support, and marketing copy generation, remain relevant. However, the key difference lies in the model's in-depth understanding of your business—its knowledge, priorities, and goals. Going beyond standard applications, there are advanced use cases that a typical RAG-based ChatGPT can't handle.

Workplace Search

Search for any document, discussion, and report from the past without having to ping/interrupt the person who may have the right context. This is particularly useful for your best employees who get interrupted and pulled in multiple discussions in addition to their own work.

CxO assist

Insights from Live Data at your Business Leaders' fingertips even for complex questions. Imagine a custom dashboard that responds to dynamic queries like "What's our total CAC if we include the time spent in prep and meetings?"

Insights from Live Data

With a model that understands your schema and business, every employee can get the right insights in time without having to go through the data request process and excessive analysis.

Costs and Time

Open AI's pricing starts at about $2M-$3M minimum with an acknowledgement that they won't be able to accommodate most companies. Our pricing is one-tenth of that. It takes about 2-3 months realistically to have a state of the art model for your company going from Pre-Training to custom Post RLHF process. This is a highly customized process that varies according to the business and industry.

Why go with Clio AI?


You have the data, models come from open source, you still need the expertise. Thats where we come in. There are a handful of people who have trained and deployed a ML model like this from scratch in production. Clio AI's team counts one of them as Cofounder. Abhinav did it for Tokopedia in the past.

Costs Offset

As stated, we will reduce your training cost by 90% while giving a near almost similar performance.

Hosting and Control

You can deploy the model in your cloud, in your private network, thereby controlling the costs yourself, unlike Open AI which would deploy it on Azure servers and charge you for the service.

For more details and questions, you can check this out.


Choosing Clio AI over Open AI brings not only cost savings but also specialized expertise. The model's understanding of your business enhances standard use cases and introduces advanced applications like workplace search and dynamic data insights. The reduced pricing and deployment flexibility further make Clio AI a strategic and cost-effective choice for companies seeking tailored AI solutions.

You can reach out directly with any questions or schedule a conversation to understand more via this page.
Weekly newsletter
No spam. Just the latest releases and tips, interesting articles, and exclusive interviews in your inbox every week.
Read about our privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Spend time thinking not searching. Get a demo today.

By signing up for a demo, you agree to our Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.