Been a year since ChatGPT's launch, and as I write this OpenAI just announced it's averaging 100M Weekly Active Users. Suffice to say that ChatGPT has had a phenomenal year. In the enterprises, boards, C-suites, and leadership teams have made it their top priority to figure out how to integrate Generative AI, or Large Language Models(LLMs) as they are called, in their operations and products.
We find ourselves in the AI industrialization era, experiencing a phase of swift growth comparable to the impact of the industrial revolution or the Internet on civilizations. For enterprises, Gen AI holds the power to markedly enhance employee productivity, personalize customer interactions in a profound way, and automate operations even in the absence of a predefined pattern.
In simpler terms, Generative AI acts as an accelerated pathway, allowing you to discover order and significance in interactions that are predominantly unstructured and challenging to formulate rules or codes for—akin to how humans communicate and understand
Early Adopters of Generative AI
There are companies with significant strides early in the market:
Using it to enhance customer experience with an always on customer support. They have used LLM based classification to combat fraud on community platforms like Discord that was not possible before. They also use LLMs internally for enterprise search.
2. Morgan Stanley:
Employs AI models trained on their documents to assist wealth managers who can now pitch better products to their customers while having more context and visibility on Morgan Stanley's knowledge and priorities. MS heads describes it as providing advisors with all the analysts on call instantly.
Explores the realms of Generative AI-powered digital art infused with branding elements. They used Generative AI models to create a new beverage artwork that was an instant hit with the fans and engaged them in new and creative ways.
Other than these, companies like Estee Lauder, Zoom, GE, Rolls Royce, Google, Nordstrom, Lufthansa etc have all announced they have been adopting or have launched products based on Generative AI. You can look at AI powered customer service bots, designers, or workplace assistants/copilots from Microsoft and Google, or Gen AI apps from Salesforce and Slack to how AI is a significant part of every enterprise's future growth strategy.
Though most companies we spoke to have explored integrating Gen AI, only a handful (10%) could take it from experimentation phase to production, due to inherent challenges with deploying a product powered by LLMs.
Challenges in Generative AI Adoption
As enterprises navigate the terrain of Generative AI adoption, challenges emerge.
- Heavy customization and fine-tuning are imperative for achieving enterprise-level performance and reliability, and it is often non intuitive how to get there.
- Concerns about model hallucinations and brand safety,
- Paramount issues of security and privacy of proprietary data and intellectual property
- Concerns about LLMs adhering to the specific company policies.
- Concerns about a lack of control as enterprises do not own the underlying model.
Addressing these challenges will be critical to enterprises' ability to scale their experiments and deliver tangible ROI.
Generative AI Use Cases for Enterprises
Gen AI models can be understood as an entity which can understand the intent behind a user's words. That has profound implications whether it's about recalling and searching for a specific mail thread from two years ago, summarizing long documents, or better just querying them to extract relevant information, looking for and filling the gaps, automating somewhat cerebral tasks and so on. Moreover, LLMs can interact with proprietary data and access enterprise knowledge bases to surface insights at the time your employees and customers need it - something that saves both time and frustration.
For financial services firms generative AI can be used for automating various tasks and assisting researchers and analysts alike. Things like FInancial Document search, summarization, synthesis jumps out as the most prominent use case. There are other cases emerging as popular ones with conversational chatbots for financial queries (we implemented one recently), automating basic accounting functions like revenue reconciliation, and automating data summaries, freeing up the analysts' bandwidth.
Retail and e-commerce
For retail/ecommerce, generative AI is emerging as a medium to personalize marketing content at scale. A fraud detection in reviews is another popular usecase that can save companies potentially millions in revenue. Companies are also looking at personalized chatbots, long product descriptions, and AI generated product photoshoots to boost sales and after sales support.
A common low hanging fruit for EMRs but that is very data dependent. A more powerful use case is in medical imaging and diagnostics when an AI can analyze medical images (CT Scan, MRIs, XRays) and provide an initial feedback which can be more accurate in detecting diseases early, reduce human errors, and improve patient outcomes. Then, better analysis of patient history - both from past records or questions - is a great usecase, and personalized treat plans is very promising uses case too.
Deploying Generative AI in Enterprises
To properly deploy the Gen AI applications and make full use of LLMs to boost your topline and bottom line numbers, it's imperative to first understand how any tech using these models would stack up and layered:
At the base there are foundational models like your PaLM-2, Llama-2, Mistral, Da-vinci and so on which are trained on huge corpus of data. They have all the knowledge, but are poor at expressing it.
Commercially available models use these base models to finetune on general use cases, giving them high quality examples of what good answers look like. ChatGPT in it's current format, GPT-4, Claude-2, Llama-2 Chat are all examples of models that have been finetuned from the base model. These models mostly have a readily available API, and what we can use as commercially available models.
The data layer serves as the connective tissue within our Generative AI Application layer and commercial model api, seamlessly integrating user inputs, contextual information, to generate model responses. Provides the data customization and fine-tuning required to enable the base model to use proprietary enterprise data properly
The UI layer which can connect with a human input and pass it to the data layer to get an output. An app layer could be a WhatsApp message, slack or a certain website. Similar to UI for any other app out there.
Now, some other things in the domain which are used when the basic tactics arent giving the right result.
Many enterprises have unique needs that require extensive finetuning and prompt engineering of a foundational model to be able to fulfill a use case. If you work in an industry where most of the knowledge is publicly available, then the reasonable assumption is that model will have the knowledge in the training data (now that we have reached about 1000 GB worth of training datasets). Then you can finetune a model with 1000-10000 high quality examples of questions and answers directly to augment and show what a good answer looks like. These specialized models perform better than generic commercial models.
- Med-PaLM 2 works better on medical records compared to GPT4
- WizardLM finetuned Llama on SQL queries and it generates better, more accurate SQL queries from raw text better than GPT4.
The open source community is very active in building and finetuning open source models for different use cases. Hugging Face keeps a leaderboard of best performing finetuned models. Notable names include Vicuna-13B, GOAT, and so on.
Every organization has business critical tasks that require access to proprietary data. Quite a few stuff can be done with RAG if needing to add knowledge, but if you have a set of tasks that requires specific output, I suggest finetuning and then passing context in prompt.
Both RAG and Finetune are hard to scale in performance with engineering tactics. If your core requirement is about adding more knowledge to a foundational model, you should opt for building your own custom LLM model.
Commercially available models are trained on publicly available data. They are great at generalization and answering about things in their training data, but do not work well when you introduce a new concept. If the context is small, finetuning/RAG can solve it, but if it needs large amount of domain knowledge, then you should opt for training your own model.
Example: For a law firm, a custom model with added knowledge of past contracts added to a model can generate a new contract incredibly fast compared to a generic LLM or even a finetuned model. Bloomberg GPT is a great example of a private model built from scratch and available only to Bloomberg Terminal customers.
Challenges to consider while building your AI strategy:
Security and safety
3rd party apps usually send your data through the provider's cloud. But, if you make a special app just for your business, it can stay in your private digital space or even on your own servers. This is crucial when you really need to keep your data super safe.
With a tailor-made app, your data stays put, right where you want it. Plus, you get to control who accesses your Large Language Model (LLM) app using the same roles and permissions you're already using. It's like having your own digital bouncer for your data party!
Without robust evaluation and monitoring systems, generative models are susceptible to hallucinations, potentially yielding false, harmful, or unsafe outcomes. The stakes are high for companies, especially when these models are deployed in customer-facing scenarios or handle sensitive information, posing significant risks to their brand.
To mitigate these risks, customizing an enterprise app becomes crucial. This customization empowers your teams to define how they assess the performance of applications and establish appropriate monitoring processes.
Beyond just tracking traffic and latency, it's essential to align monitoring processes with operational priorities. For instance, consider a financial firm that develops an app, fueled by Generative AI, for detecting trading on material information not publicly available. In this scenario, the security and compliance team should be promptly alerted about any instances of insider trading and misclassification. The optimal way to achieve this is by directly integrating real-time monitoring and logging of prompts and model responses into the custom app. This proactive approach enables the security and compliance team to take immediate action based on alerts, preventing potential further damage.
Build vs. Buy Decision for Generative AI
Enterprises face critical decisions when opting for a Generative AI solutions. The choice between building and buying depends on the nuances of each component.
We generally recommend building your own applications. If you are looking to buy, opt for ones which allow you to use your own model (for reasons that would be clear below) rather than just a ChatGPT wrapper.
This is a combination of build and buy. Get the core systems from a 3rd party like how you would install a database. Buying these functionalities free up the resources to focus on more of the core tasks that work well with AI. The key here is control and abstraction, and you need to avoid a platform dependency. With so many new startups propping up everyday and given the market environment, it's important to have options and be fast enough to move data from one platform to another without much fuss.
I personally feel that this should be outsourced to a competent development agency. It's a balance between sensitivity, privacy, and quality. With a contracted 3rd party firm, you do not need to have humans in place who would curate, collect, and annotate data. Or would build the pipelines and maintain the engine regularly. With a service contract, you can ensure data security and privacy while keeping the data in your cloud and in your premises.
We have been the leading voice in the industry about enterprises needing to build and deploy their own models - for more control, sensitivity, and privacy. Any model publicly available would have guardrails based on political climate, leanings of the developers, and so on. Those cannot be changed.
Eg: Anthropic's Claude 2 refuses to shorten a transcript of a youtube podcast conversation citing copyrights. Dall-E 3 has an extensive list.
Many of these checks are good for an enterprise. But they are not enough and not customizable.
Use Case: You want a model to follow your own communication policy, your data policy, and any other compliance and regulatory requirements. LLMs are pretty stupid in this regard and have to be trained hard about it. Eg: A GPT4 put under pressure would provide a trader with potentially private material info, that would count as insider trading. Fine with Open AI, but disaster for financial firms.
Another reason to build Custom Models is the ability to add new domain knowledge to the model in the pretraining stage that can later be finetuned better. That's what aforementioned BloombergGPT did. This way the responses are more grounded in knowledge, unique and actionable.
Getting Started with Generative AI Deployment
Enterprise Deployments are never easy. This deployment is similar to how SAP is deployed at enterprises. Been part of two such implementations, i know how grueling that can be. The returns when done well are 10x higher than SAP.
1. Figuring out the requirements and use cases.
A few straightforward use-cases like marketing copies generation or blog generation can be addressed by tools available in the market. The crux is identifying the use-cases that drive revenue and can justify return on investment. Eg: Enterprise search use cases which can relieve your best performing employees from getting interrupted again and again for smaller doubts. Or analytics in natural language democratizing access to data. A cool use case I think is enabling CxO to get answers for queries that span multiple systems (Eg: Total cost to land a customer including salaries, meetings, and demo preps?)
Some generic problems we have seen are about operational efficiencies pre and post sales, Customer experience, new product introduction, generating insights from troves of data, and empower their workforce to do more with less time. Most companies are suffering from high attrition rates, low NPS scores, dissatisfaction among customers without having ways to fix them quickly and effectively. Add to it the complex and unoptimized processes, number of meetings, silos, communication gaps, all of which impact the topline and bottom line.
Questions to ask:
- What are the main factors driving our costs, and can automation through data retrieval, summarization, or generation help in reducing any of these costs?
- In which areas of our business do we handle significant volumes of unstructured data and requires a lot of manual work?
- What is our current knowledge base stack and how do our employees pass on their know-how to their peers?
- How well is our customer-facing support performing?
- How much time does our sales team spend on converting a customer - from people involved, to meetings, to salaries, to bandwidth spent on demos, calls and so on.
- Are there roles in our organization where the training and onboarding of new hires pose a bottleneck?
- Why are we not growing faster? What are the constrains where we can accelerate the outputs with use of technologies?
- How much time does my Data Science team spend on building data pipelines and dashboards for different internal teams?
Return on AI Spend (ROAI):
After compiling a list of potential use cases for Generative AI, prioritize them for development by aggregating the total value they bring to your business, their feasibility in deployment (accounting for technical and operational complexity, necessary change management, and cost), and the potential risks they pose, including risks to your brand, customers, or security. Begin by concentrating on a select few high-impact, highly feasible, and relatively low-risk use cases as pilot projects. Once your organization gains insights from these initial pilots, you can then move on to other use cases in a phased approach.
2. Gathering requirements
This is same as any software evaluation phase. Common questions include:
- What does "success" look like at the pilot stage?
- What is the overall performance benchmark we would tag as good performance?
- What is the data requirements and how would they be fulfilled?
- Who will be the point guy for these?
- What are the potential risks and how do we test against those risks?
- How much customization of a commercially available model is needed?
- What are the acceptable performance benchmarks in production?
- How do we compare multiple models for performance?
3. Baselining the capabilities pre-AI implementation
Consider the nature of your use case and the timelines you aim to achieve. It might be beneficial to invest in external expertise to expedite the readiness of your teams, particularly given the current competitive labor market for skilled professionals in this domain.
For use cases heavily reliant on custom models or finetuning, external workforces and specialized tools can significantly hasten progress, especially for the initial use cases. Many enterprises typically lack readily available data suitable for fine-tuning. Consequently, setting up teams to gather, clean, and annotate such data can pose a substantial challenge. External support can ease this burden considerably.
4. Experimenting with a publicly available tool and model
After you are clear with requirements, identify an initial team and work with them for a couple of weeks to understand their workflows and how to best augment those. Use a public tool/publicly available model, document the frustrations. This would be super useful in the adoption stage.
5. Identifying the gaps and strategies to fix them.
There would be model gaps you would identify from 4. You need to understand how those can be filled. That is, is it a workflow change, is it a data issue, or just engineering issue. Devise a strategy to address these gaps, and understand the potential impact of filling these gaps. In practice, this whole exercise is filling a large knowledge hole left by ChatGPT.
6. Moving from experiment to production
Beyond technical aspects, it's crucial to ponder these additional questions when moving to production:
1. Organizational Process Changes:What modifications to existing organizational processes will be necessary to support these deployed use cases? Consider factors like alterations to approval chains.
2. Team and Business Unit Interaction: In what ways will Generative AI impact the interactions between teams or business units? Are there potential changes needed in the organizational structure to accommodate these shifts?
3. Regulatory and Compliance Readiness: How can we proactively address and stay ahead of regulatory and compliance implications associated with the deployment of Generative AI?
4. Data Collection and Engine Implementation: What strategies will we employ to collect data, and how can we establish a robust data engine for continual model improvement once the system is operational?
Enterprises stand at the precipice of a transformative era, where Generative AI holds the key to unlocking unprecedented possibilities. Strategic planning and investments in the right tools and talent are paramount for harnessing the full potential of Generative AI. Clio AI's Enterprise AI Platform emerges as a beacon, empowering businesses to accelerate Generative AI application development and thrive in this new paradigm.