In this project, we build a machine learning workflow using AG2. The workflow involves data analysis, preprocessing, and model training to build a machine learning model.
Machine learning workflows typically involve several key steps:
- Data Analysis and Exploration: Understanding dataset size, columns, and distributions.
- Data Preprocessing: Cleaning data, handling missing values, and encoding categorical variables.
- Model Training: Training a model, comparing different models, and tuning hyperparameters.
🏗️ System Architecture
State Machine Workflow
The system follows a state machine design with intelligent transitions between ML workflow stages:
Ready for Training| D["🎯 Train State"] C2 -->|Need More Analysis| B C2 -->|Error| C1 D --> D1["🧠 Model Trainer"] D1 --> D2["⚙️ Code Executor"] D2 -->|< 2 Trials| D1 D2 -->|≥ 2 Trials| E["📊 Summarize State"] D2 -->|Error| D1 E --> E1["📝 Summarizer"] E1 --> F["🏁 End State"] %% State grouping subgraph STATES ["State Machine Workflow"] direction LR G["Custom Speaker
Selection Method"] H["StateFlow Pattern
Transition Logic"] end %% Styling classDef initEnd fill:#ffeaa7,stroke:#fdcb6e,stroke-width:3px classDef state fill:#e17055,stroke:#d63031,stroke-width:3px classDef agent fill:#74b9ff,stroke:#0984e3,stroke-width:2px classDef executor fill:#00b894,stroke:#00a085,stroke-width:2px classDef pattern fill:#fd79a8,stroke:#e84393,stroke-width:2px class A,F initEnd class B,C,D,E state class B1,C1,D1,E1 agent class B2,C2,D2 executor class G,H pattern
📋 State Machine Details
Explore State
Analyze the dataset structure, distributions, and characteristics to understand the data landscape.
Preprocess State
Clean and prepare data including handling missing values, encoding categoricals, and feature scaling.
Train State
Train and compare multiple ML models with different algorithms and hyperparameters.
Summarize State
Generate comprehensive workflow summary and integrate all successful code snippets.
🔄 Workflow Process
🤖 AG2 Features
🎭 Custom Speaker Transitions
State-driven agent selection using custom speaker_selection_method for workflow control
🌊 StateFlow Design
Build state-driven workflows with intelligent transitions based on execution results
⚡ Code Execution
Jupyter-based code execution environment for interactive ML development
🧠 LLM Decision Making
AI-powered decisions on workflow readiness and state transitions
🏷️ Tags
📋 Prerequisites
- Python 3.12 or higher
- OpenAI API key
⚙️ Installation
- Clone and navigate to the folder:
- Install dependencies:
- Set up environment variables:
🚀 Usage
Automated Workflow: The system will automatically analyze the dataset (house_prices_train.csv), preprocess the data, train and compare multiple models, generate performance visualizations, and output a comprehensive summary.
The workflow will:
- Analyze the dataset (
house_prices_train.csv
) - Preprocess the data automatically
- Train and compare multiple models
- Generate performance visualizations
- Output a comprehensive summary
📞 Contact
For more information or any questions, please refer to the documentation or reach out to us!
📄 License
This project is licensed under the Apache License 2.0. See the LICENSE for details.