xLAM AI - A Family of Large Action Models to Empower AI Agents
**Advancements in Training Autonomous Agents with XLam: A Comprehensive Approach**
---
### **Introduction**
In this article, we explore our innovative approach to training XLam, a series of open-source agent models designed to enhance performance in autonomous agents. Our methodology focuses on key aspects of the data pipeline, emphasizing data unification, augmentation, and synthesis to improve model robustness and versatility. By addressing challenges such as data diversity and overfitting, we demonstrate how our techniques significantly boost model performance across various environments and tasks. Additionally, we highlight the role of scalable, high-quality data synthesis in achieving state-of-the-art results, including top rankings on the Berkeley Function Calling Leaderboard.
Our work contributes to the open-source community by sharing insights into effective data processing and synthesis, bridging the gap between open-source and proprietary models. We evaluate XLam models against public benchmarks, showcasing their strong performance and versatility in tasks ranging from function calling to complex interactions.
---
### **Data Unification: Standardizing Agent Data Sets**
One of the cornerstone innovations in our approach is the unification of diverse agent data sets. Existing data sets often come from various environments and formats, introducing noise and complicating data augmentation and verification. To address this, we propose a universal data format that standardizes agent data, reducing noise and enhancing model performance.
Our unified format includes several key components:
- **Task Instruction**: Provides clear guidance for the model.
- **Available Tools**: Defines the actions the agent can take.
- **Format Instruction**: Specifies how the agent should present its responses.
- **Few-Shot Examples**: Offers examples to guide the model.
- **Query and Steps**: Organizes the agent's output, environment feedback, and user follow-up into a structured format.
This format is adaptable to various environments and tasks, enabling our data pipeline to handle large datasets effectively. The modular design supports detailed data augmentation and quality verification, which are critical for improving data quality.
For data augmentation, we categorize our strategies into **prompt format augmentation** and **instruction following augmentation**. These techniques enhance data diversity by shuffling the order of available tools, rephrasing instructions, and ensuring accuracy. We also employ rigorous quality checks to identify and eliminate errors such as undefined function calls, incorrect argument types, and argument hallucination.
---
### **Data Synthesis: Overcoming Limitations in Public Data Sets**
Publicly available data sets often suffer from limitations such as being static, created by weak models, and focusing on single function calls. To address these issues, we introduce **API Gen**, a systematic data synthesis framework that generates high-quality, verifiable data sets using a multi-stage verification process:
1. **Format Verification**: Ensures data conforms to our unified format.
2. **Execution Verification**: Validates the accuracy of function calls.
3. **Semantic Verification**: Checks for logical consistency and correctness.
Using over 3,673 APIs from 21 categories in ToolBench, we generate 60,000 high-quality samples. These samples are further enhanced by integrating diverse instruction tuning data sets and applying rule-based techniques to eliminate low-quality data. Our approach significantly improves the robustness and applicability of the data set, making it ideal for supervised fine-tuning (SFT) and direct preference optimization (DPO).
---
### **XLam Model Series: Versatile and Accessible Agent Models**
The XLam model series is designed to provide balanced performance across a wide range of tasks, from complex interactions to function calling applications. Our main model series, XLam, is based on the Mixtral-Instruct models and is trained on uniformly sampled data to achieve versatility. We also create two specialized models, **XL7 BFC** and **XL1 BFC**, for function calling tasks, based on DeepSeeker-Coder models.
These models are designed to be accessible, with smaller models capable of running on a single GPU. We evaluate our models across four rigorous benchmarks:
- **WebShop**: Simulates an e-commerce experience with 250 test cases.
- **ToolQuery**: Assesses information retrieval skills across weather, movie, and academic settings.
- **ToolBench**: Evaluates multi-turn reasoning and interactive capabilities with 1,000 test cases.
- **Berkeley Function Calling Leaderboard**: Provides a comprehensive framework for assessing function calling abilities in various programming languages.
Our models demonstrate superior performance across these benchmarks, with the XL7 BR model achieving the highest success rate in the WebShop environment and XLam models consistently outperforming state-of-the-art models in ToolQuery and ToolBench.
---
### **ToolBench: Evaluating Model Performance**
In ToolBench, our XLam models show strong performance, outperforming both LLaMA V2 and GPT-3.5 Turbo in all test scenarios. The models excel in multi-turn reasoning and complex tool usage, effectively managing both familiar and unfamiliar tasks. An ablation study on the 7B models highlights the effectiveness of our data augmentation and cleaning processes, with augmented data improving performance by 2.3% on ToolBench, 5.8% on WebShop, and 18.3% on ToolQuery. Adding data cleaning further boosts performance on ToolQuery by 23.4%.
---
### **Conclusion**
In this article, we introduced the XLam model series, a series of open-source agent models designed for various applications. By emphasizing data unification, augmentation, and synthesis, we address critical challenges in agent model training and create competitive alternatives to proprietary models. Our systematic data synthesis framework, API Gen, and rigorous quality verification processes ensure high-quality training data, enabling our models to achieve state-of-the-art performance across multiple benchmarks.
The XLam series demonstrates strong capabilities in multi-turn reasoning, function calling, and adaptability to diverse tasks, making it a powerful tool for real-world applications. By open-sourcing these models, we contribute to the development of open-source agent models and share valuable insights into effective data processing and synthesis techniques.
Comments
Post a Comment