An MCP server for data analysis workflows including reading, preprocessing, feature engineering, model selection, visualization, and hyperparameter tuning.
An intelligent automated machine learning platform that provides comprehensive data analysis, preprocessing, model selection, and hyperparameter tuning capabilities through Model Context Protocol (MCP) tools.
AutoML/
├── data/ # Sample datasets
│ ├── Ai.csv
│ ├── Calories.csv
│ ├── Cost.csv
│ ├── Digital.csv
│ ├── Electricity.csv
│ ├── ford.csv
│ ├── Habits.csv
│ ├── heart.csv
│ ├── Lifestyle.csv
│ ├── Mobiles.csv
│ ├── Personality.csv
│ ├── Salaries.csv
│ ├── Shopper.csv
│ ├── Sleep.csv
│ ├── cat.csv
│ ├── test.csv
│ └── train.csv
├── tools/
│ └── all_tools.py # MCP tool definitions
├── utils/
│ ├── before_model.py # Feature preparation
│ ├── details.py # Data information
│ ├── external_test.py # External data test with XGBoost
│ ├── feature_importance.py # Feature importance analysis
│ ├── hyperparameter.py # Hyperparameter tuning
│ ├── model_selection.py # Model selection and evaluation
│ ├── prediction.py # Prediction utilities
│ ├── preprocessing.py # Data preprocessing
│ ├── read_csv_file.py # CSV reading utilities
│ └── visualize_data.py # Visualization functions
├── main.py # Application entry point
├── server.py # MCP server configuration
├── requirements.txt # Python dependencies
└── README.md # This file
Clone the repository
git clone https://github.com/emircansoftware/AutoML.git
cd AutoML
Install dependencies
# Using pip
pip install -r requirements.txt
pip install uv
In utils/read_csv_file.py
, update the path
variable to match your own project directory on your computer:
# Example:
path = r"C:\\YOUR\\PROJECT\\PATH\\AutoML\\data"
In Claude Desktop, add the following block to your claude_desktop_config.json
file and adjust the paths to match your own system:
{
"mcpServers": {
"AutoML": {
"command": "uv",
"args": [
"--directory",
"C:\\YOUR\\PROJECT\\PATH\\AutoML",
"run",
"main.py"
]
}
}
}
You can now start your project from Claude Desktop.
mcp[cli]>=1.9.4
- Model Context Protocol for tool integrationpandas>=2.3.0
, pyarrow>=20.0.0
, numpy>=2.3.1
scikit-learn>=1.3.0
, xgboost>=2.0.0
, lightgbm>=4.3.0
catboost
(for CatBoost models)from server import mcp
# Run the server
mcp.run()
The platform provides the following MCP tools:
information_about_data(file_name)
: Give detailed information about the datareading_csv(file_name)
: Read the csv filevisualize_correlation_num(file_name)
: Visualize the correlation matrix for numerical columnsvisualize_correlation_cat(file_name)
: Visualize the correlation matrix for categorical columnsvisualize_correlation_final(file_name, target_column)
: Visualize the correlation matrix after preprocessingvisualize_outliers(file_name)
: Visualize outliers in the datavisualize_outliers_final(file_name, target_column)
: Visualize outliers after preprocessingpreprocessing_data(file_name, target_column)
: Preprocess the data (remove outliers, fill nulls, etc.)prepare_data(file_name, target_column, problem_type)
: Prepare the data for models (encoding, scaling, etc.)models(problem_type, file_name, target_column)
: Select and evaluate models based on problem typevisualize_accuracy_matrix(file_name, target_column, problem_type)
: Visualize the confusion matrix for predictionsbest_model_hyperparameter(model_name, file_name, target_column, problem_type, n_trials, scoring, random_state)
: Tune the hyperparameters of the best modeltest_external_data(main_file_name, target_column, problem_type, test_file_name)
: Test external data with the best model and return predictionspredict_value(model_name, file_name, target_column, problem_type, n_trials, scoring, random_state, input)
: Predict the value of the target column for new inputfeature_importance_analysis(file_name, target_column, problem_type)
: Analyze the feature importance of the data using XGBoost# 1. Analyze your data
info = information_about_data("data/heart.csv")
# 2. Preprocess the data
preprocessed = preprocessing_data("data/heart.csv", "target")
# 3. Prepare features for classification
features = prepare_data("data/heart.csv", "target", "classification")
# 4. Train and evaluate models
results = models("classification", "data/heart.csv", "target")
# 5. Visualize results
confusion_matrix = visualize_accuracy_matrix("data/heart.csv", "target", "classification")
# 6. Optimize best model
best_model = best_model_hyperparameter("RandomForestClassifier", "data/heart.csv", "target", "classification", 100, "accuracy", 42)
The project includes various sample datasets for testing:
server.py
utils/model_selection.py
utils/preprocessing.py
utils/visualize_data.py
We welcome contributions! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
git checkout -b feature/AmazingFeature
)git commit -m 'Add some AmazingFeature'
)git push origin feature/AmazingFeature
)This project is licensed under the MIT License - see the LICENSE file for details.
If you encounter any issues or have questions: