Gen AI has been a revelation over the past year. We collect the resources that we think would be useful for understanding the maze.

Transformers Paper

The research paper from Google Deepmind researchers which started it all. Titled "Attention is all you need".

Transformers - Primer

Stephen Wolfram explains how tranformer architecture works. Really important if you want to go deep into Gen AI.

State of GPT

Andrej Karpathy with a great talk at Microsoft Connect about how GPT3, ChatGPT and GPT4 came to be.

Enterprise Deep-dives

Clio AI's dives deep into things like use-cases, deployment strategies, etc. to bring you the best perspectives in implementing your AI strategy.
Chain of Thought Prompting Demystified

CoT prompting is an effective way to get larger models to solve complex tasks beyond the scope of simple instructions. This deep dive helps you develop an intuition, discusses different techniques, and helps figure out the applications for CoT.

Generative AI for Enterprises - Use Cases, Experimentation, Iterations, and Deployments

Generative AI is transforming employees' habits and workflows across the board, and changing the way customers engage with enterprises. We do a deep dive on enterprises' AI strategy, help you grasp the essence of Generative AI and leverage these models actively. We end with how to think about deployment, decoding build-vs-buy decisions, and highlight use cases.

Research Insights

Our insights into the latest research and publications providing context into their features, breakthroughs, and business applications.
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Megalodon by Meta AI is a new LLM architecture that tackles the problems in transformers and can support unlimited context length using a new attention technique.

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Infini-attention uses compressive memory cache and efficient retrieval to enable practically unbounded context for a transformer within a bounded memory footprint.

ReFT: Representation Finetuning for Language Models

ReFT changes representations at different layers of an LLM by using a technique called intervention instead of changing weights/parameters using PeFT. Gives a better performance on common benchmarks and tasks.

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

MoD is a new Mixture of Depths implementation by Google Deepmind which dynamically allocates compute to input tokens.

RAFT: Adapting Language Model to Domain Specific RAG

With RAFT, an LLM is finetuned to ignore the distracting documents and focus on relevant information for any given task. Along with CoT, this enables model to assign correct weightage to relevant information, improving the generation and downstream tasks output significantly.

Gecko - Versatile Text Embeddings Distilled from Large Language Models

Gecko - from Google Deepmind - is a new embedding model architecture that utilizes two step LLM distillation process to create a high quality training dataset, and leads to a better model performance.

Jamba - A hybrid Transformer-Mamba Language Model

Jamba by AI21 Labs combines transformer layers with Mamba (SSM) layers and implements a MoE layers in middle to get a compute efficient model with high throughput.

Evolutionary Optimization of Model Merging Recipes

Evolutionary model merge uses evolutionary algorithms that automatically discover optimal ways to combine diverse open source models. This way the resultant model harnesses the capabilities of parent models without requiring extensive additional training data or compute. This makes foundational model development more accessible and efficient.

Dense X Retrieval: Proposition based Retrieval for RAG

Proposition based retrieval performs significantly better than existing techniques like paragraph based retrieval and sentence retrieval in case of RAG apps. This paper by Tencent investigates and quantifies how much better on Wikipedia articles.

PERL: Parameter Efficient RLHF

PERL or Parameter Efficient Reinforcement Learning could be a groundbreaking technique to reduce memory and time consumption when it comes to aligning a model before releasing it to the world. This paper shows how using LoRA, you get close to the same benchmarks as standard RLHF techniques, and hence you get the same quality of output. Business implications are about costs, and can be done efficiently on premises on any open source model.

CaLM - Composition of LLMs by augmentation

CaLM provides composition for LLMs similar to how libraries would in a programming language. It's a powerful method to enable combining skills of multiple LLMs depending on the use case.

