Evolutionary Optimization of Model Merging Recipes

Evolutionary model merge uses evolutionary algorithms that automatically discover optimal ways to combine diverse open source models. This way the resultant model harnesses the capabilities of parent models without requiring extensive additional training data or compute. This makes foundational model development more accessible and efficient.

Paper Link

Weekly newsletter

No spam. Just the latest researches, and exclusive interviews in your inbox every week.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Key Takeaways:

This paper introduces a new paradigm for automated model composition that harnesses the collective intelligence of existing open models to create powerful new models without requiring extensive additional training data or compute. This makes foundation model development more accessible and efficient.
The approach enables cross-domain merging, allowing models from disparate domains like language and math or language and vision to be combined in novel ways that exceed what human experts may come up with. This expands the capabilities of the resulting models.
The evolved models achieve state-of-the-art performance on various benchmarks, even without being explicitly optimized for those tasks. For example, their Japanese Math LLM and Japanese VLM outperform larger models on Japanese language and vision benchmarks. This demonstrates the effectiveness of the evolutionary merging approach.
Surprisingly, the evolved 7B parameter Japanese LLM surpasses some 70B parameter Japanese LLMs on benchmarks, showing high efficiency and generalization of the method. The evolved Japanese VLM also exhibits strong culturally-aware understanding.

Introduction

Model merging is a recent development in LLM community. We can combine multiple LLMs into a single architecture strategically, and it retains the capabilities of all the original models - without requiring any additional training or compute. That is the capabilities are superposable and this becomes an extremely cost effective approach for developing new models. The Open LLM leaderboard showcases most of the merged models on top in terms of capabilities and shows how groundbreaking this finding is. CaLM, as we discussed in our previous analysis, also does something similar without merging.

Model merging in enterprise context

However, model merging today is an art, relying more on one person's intuition and instincts about selecting models and merging recipes to create and refine a new model that is more capable. Typically, model maker needs to have domain knowledge about the models, right intuition and instinct about open source models and what their training data may contain. Given the vast diversity of models, benchmarks, and evals in OSS community, the instinctive approach can only go so far.

Evolutionary approach to model merging

This research paper presents a novel approach called Evolutionary Model Merge that uses evolutionary algorithms to automatically discover optimal ways to combine diverse open-source models to create new foundation models with desired capabilities.

This work makes several key contributions to the field of foundation model development:

Automated Model Composition: This approach harnesses the collective intelligence of existing open models, enabling the creation of powerful models without the need for extensive training data or compute.
Cross-Domain Merging: potentially exceeds the capabilities achievable through conventional human design strategies.
State-of-the-Art Performance: Models achieve state-of-the-art performance on various benchmarks, even without explicit optimization for those tasks.
High Efficiency and Surprising Generalizability: Merged 7B parameter LLM surpasses the performance of some previous 70B parameter Japanese LLMs on benchmark datasets, highlighting the high efficiency and surprising generalization capability.
Culturally-Aware VLM: The generated Japanese VLM achieves top results when tested on a domestically-sourced dataset of Japanese image-description pairs, demonstrating its ability to handle Japanese culture-specific content.

Approach and Methodology

In terms of what the paper did:

They propose optimizing model merging in both the parameter space (weights) and data flow space (inference path).
For parameter space, they use evolutionary search to optimize layer-wise merging parameters.
For data flow space, they evolve the sequence of layers that tokens pass through, allowing cross-model layer transitions.
They demonstrate their approach by evolving a Japanese Math LLM by merging a Japanese LLM with English Math LLMs, and a Japanese VLM by merging a Japanese LLM with an English VLM.
Extensive experiments show their evolved models achieve SOTA results on Japanese language and vision benchmarks.

‍

Key Findings

The key findings are that evolutionary model merging is a powerful, automated way to combine the knowledge in diverse open-source models to create new foundation models with expanded capabilities in a compute-efficient manner. The evolved models show surprising generalization and cultural understanding despite not being explicitly trained for the downstream tasks.

Business Implications

In terms of business implications, this work democratizes foundation model development by leveraging the open-source model ecosystem. It provides a path for quickly prototyping capable models by combining existing building blocks rather than training from scratch. Organizations can use this evolutionary approach to develop proof-of-concept models to assess feasibility before investing heavily in custom models. The efficiency and generalization of the evolved models make them compelling for real-world applications.

Overall, this paper presents an exciting new direction for automated foundation model creation that makes it more accessible while pushing the boundaries of model capabilities through cross-domain composition. The evolutionary model merging paradigm has significant implications for accelerating foundation model development and deployment.

‍

Share this post

Why Clio AI?

Unlock the most obvious-yet-hidden-in-plain-sight growth hack - enable your employees to work on important things, and reduce their cognitive load and time to resolve blockers.

Fast, efficient, and in-context information to make every employee a super performer.

Spend time thinking not searching. Get a demo today.