Today, every enterprise needs to have a Generative AI strategy in place, for the next decade of their growth depends on that. Unfortunately, most enterprises are struggling to finalize a strategy for a tech that has only become popular in the last six months. So popular that everyone has been talking about it non stop. It's hard to decipher what is signal and what is noise. Everyone has an opinion, but too few people have actual experimental results. Some questions we have gotten from our customers are:
- Shall I go for Open Source Model or use Open AI?
- Build vs Buy: Shall I build it inhouse or shall I buy it from any of the Saas startups?
- What is the use case I can deploy it in to start with?
- What are the costs estimate I should look at?
We will answer those in a series of blog posts. Let me start with the first question. Because answers to all the subsequent questions would depend on this one.
In a rapidly evolving technological landscape, enterprises are constantly seeking innovative ways to stay competitive. One such avenue that has gained immense attention is the deployment of open-source Generative AI models. These models, which generate content like text, images, and more, are not only pushing the boundaries of creativity but also transforming the way businesses operate.
Shall I go with ChatGPT Enterprise/Claude 2or use any of the open source models in the market for our AI workflows?
I have fielded this question asked multiple times over the last few weeks. Admittedly with increased frequency after the launch of Mistral AI.
Frankly, this wasn't even a question about six months back. While open source models were good, GPT3, ChatGPT, and GPT4 were exceptionally better at all the benchmarks and quite good at generic tasks asked of them.
Over the last few months, Open Source has made rapid progress. Models like Llama2, Mistral etc. allow for easy retraining/finetuning and have comparable performance compared to Open AI on some of the benchmarks. Of course, in many cases, Open AI still offers the best output.
Let me lay out the market map, available options, and help you decide which option to pick.
Offerings: ChatGPT Enterprise, GPT4 Chat APIs, Completion APIs.
API available: Yes
Open AI is by far the most popular provider for LLMs. They have a full stack of Gen AI services - Instruction based models, smaller models for finetuning, Embedding models (ada), Image Models, chat models and so on. ChatGPT is probably the fast growing consumer product given how quickly it scaled to 100M users.
With GPT-3 and ChatGPT (GPT 3.5) they offered general purpose model with a large parameter size (~175B to ~200B). A model capable of zero shot learning and able to produce text of comparable quality to humans. With API as form factor, they could be integrated into any other product and hence all the incumbents launched their addons with Gen AI. With GPT 4, they used a Mixture of Experts (MoE) which has been effective and produces better quality outputs compared to GPT 3.5. They recently introduced finetuning for GPT 3.5 and GPT 4, though both are super costly with finetuning. They recommend Zero shot learning + RAG instead of finetuning.
Models - 7B, 13B, 34B, 70B param models.
Offerings: Completion Models and Chat Models .
Meta launched Llama in Feb 2023, and followed it up with Llama 2 in July. An open source model with publicly disclosed weights, Llama 2 has four available parameter models - 7B, 13B, 34B, and 70B. For space considerations, consider every parameter to take 2x GPU space. That is 7B model would need 14 GB space to run. Llama 2 is trained on about 2T tokens,
Llama 2 is comparable in output to GPT 3.5 on most benchmarks. With enough finetuning, for many of the business specific use cases, it can outperform GPT 3.5. Llama 2 is available both in chat as well as instruction following (completion)model. You can also use LoRA, a finetuning technique that is very cost effective. For most of the business use cases, you would not see a noticeable difference between GPT and Llama.
As as aside Meta also open sourced Imagebind - their model for enabling holistic embeddings across six modalities - text, image, video, audio, depth, and thermal (heatmaps). This may not be directly related to LLMs, but would be very useful when it comes to running multiple Gen AI models for image/text/video. More on that in a future blog post.
Models: Claude, Claude 2
Offerings: Claude 2, Anthropic Enterprise, Claude 2 Chat API.
API Available Yes
Anthropic is funded by Google, Amazon, and FTX, among others. They have launched two models till date - Claude and Claude 2. Claude 2 has a context window of 100K tokens. and allows for full file upload. Anthropic is far more geared towards Responsible AI compared to other providers.
In my testing, the models perform well, but here the issues are two fold. One, the lack of finetuning options. Two, the prompts have to be engineered and tested out what works best. It's not as easy because the model is not widely adopted as Open AI so there will be some issues.
Models: Mistral 7B model
Offerings: Completion model
Recently launched with a 7B model, Mistral is another addition to the Open Source models. The model claims to outperform Llama 2 and comparable to GPT 3.5 on various benchmarks. It's also easy to deploy given the parameter size compared to other models. Can be deployed on any cloud server or on premise.
Mistral was launched with Sliding Window attention, pretty much negating the requirement of a context window and giving you infinite generation. They also provide a instruction based fine tuned model for a chat based interface.
Models; Google Palm 2
Offerings: Palm2 API on Google Cloud, Embeddings API
API available: Yes.
As it's well known now, Google researchers published the paper titled "Attention is all you need" introducing transformers in 2017. Open AI, Anthropic, Llama all are based on the same paper. Google launched PaLM -2 as their most advanced model. It's going to be greatly integrated in Google Search and other Google Products, but it is not widely adopted outside of Google based on empirical evidence.
Azure provides a privated hosted version of Open AI and ChatGPT. They are slightly better in the sense that they can be finetuned more easily and you can choose a specific region to minimize latency. With Microsoft the data is secure. Pricing is same as that of Open AI. They have better response times in general (for the same region) than Open AI too.
Let me give you the answer straight away and then spend the rest of the post explaining it.
For a proof of concept, Open AI works best. It's a general model, and with an API can be integrated everywhere. You iron out the chinks, validate your hypotheses and convince others of using this. At this point, the considerations are at a functional level. "Does it work?", "Does it augment my work and improve my output?" and so on.
You can also try so many of upcoming startups depending on the usecase. Someone would help you qualify sales leads, some would help you with marketing copies etc.
As found out by very many startups using OpenAI to run production workloads. When using in production, you are not just bounded by functional considerations, but also company policies and compliance requirements among other things.
In simple terms, a system prompt is a starting text or instruction provided to a large language model. It helps orient the model's output in a particular direction or context.
Eg: For Open AI's integration with Dall-E 3, the system prompt is (Shoutout to Simon Wilson)
With such a detailed prompt, Open AI prevent a lot of PR nightmares and ensure that users get a good experience.
For your company, the challenges are different. For example, (not an exhaustive list):
Net net, a good system prompt would ensure that you do not land in hot water while augmenting your customers and employees alike. This is the state today, soon we will have specific guidelines for AI systems to not produce harmful text.
Open source models allow you to put long system prompts based on company specific policies that Open AI or Anthropic can not enable you to do.
A typical ChatGPT API call takes about 10-15 secs to return output. A typical GPT 4 API call takes about 20-25 secs. This is outside of the time it takes to scan through all the embeddings, retrieval of relevant context, searching through DBs and what not. Azure GPT is somewhat faster as you can choose a region closest to your location.
With Open Source models, this can be considerably fast. Llama2 on Colab was answering in 4 secs for me without any finetuning.
Open source scales very well with constant finetuning and RLHF.
This takes some effort but you can run a combination of Stable Diffusion, Llama 2 with the LLMs as an understanding engine generating prompts for Stable Diffusion just like GPT4 + Dall-E. You can also deploy your own ML models on top of open source LLMs and they interact seamlessly.
Your data never leaves your premises. Not even shared with Open AI. Thats maximum security.
For limited use cases, cost would be on higher side. At scale, costs would be lower.
On-premises deployment of open-source Gen AI models grants enterprises a unique edge. It empowers organizations with complete control over their AI infrastructure, ensuring that data remains secure, and privacy is maintained. In addition to these crucial advantages, on-premises solutions also offer reliability by eliminating external dependencies, making businesses less susceptible to service disruptions or downtime. Moreover, by managing their infrastructure, companies can fine-tune their AI models to perform optimally and align with their specific business objectives.
In conclusion, deploying open-source Gen AI models for enterprises holds the promise of transforming the way businesses operate, innovate, and engage with their audience. It opens up a realm of creative possibilities, is cost-effective, scalable, and offers data security. However, for those who prioritize complete control, data privacy, and optimal performance, on-premises deployment stands as the ultimate choice, making it a powerful asset for businesses in the ever-evolving digital landscape.