Autogluon, a magical python library.

Sep 4, 2023#技术 #python脚本72

AI Translation

This post is translated from Chinese into English through AI.View Original

AI-generated summary

Today, we are sharing a powerful Python library called autogluon. It is an AutoML toolkit for deep learning that automates end-to-end machine learning tasks, allowing us to achieve powerful predictive performance with just a few lines of code. Autogluon can be installed using pip and can load datasets using TabularDataset. The library requires initialization of evaluation metrics, dependent variables, and a directory to store results. In the provided example, f1 is used as the evaluation metric, "class" is the dependent variable, and the models are stored in the "output_models" folder. The library provides a leaderboard to track the performance of different models and also allows for feature importance analysis. All built models are stored in the "output_models" folder.

Today I'm going to share a powerful Python library called autogluon.

https://github.com/autogluon/autogluon
AutoGluon is an AutoML toolkit for deep learning that can automatically perform end-to-end machine learning tasks, allowing us to achieve powerful predictive performance with just a few lines of code.
AutoGluon "automates machine learning tasks" and allows you to easily achieve powerful predictive performance in your applications.
First experience
Installing AutoGluon
We can directly install it using pip.

pip install autogluon

Loading datasets
We can load datasets using TabularDataset.

from autogluon.tabular import TabularDataset, TabularPredictor
train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')

Model building
To use the model, we need to initialize "evaluation metrics, dependent variables, and the directory to store the results."

In the following example, we use f1 as the evaluation metric. The dependent variable is "class", and the models are stored in the "output_models" folder.

evaluation_metric = "f1"
data_label = "class"
save_path = "output_models"

predictor = TabularPredictor(label='class').fit(train_data, time_limit=120)  # Fit models for 120s
leaderboard = predictor.leaderboard(test_data)

# Create predictor
predictor = TabularPredictor(label=data_label, path=save_path, eval_metric=evaluation_metric) 
predictor = predictor.fit(train_data)
predictor.leaderboard(silent=True)

The leaderboard shown in the image below allows you to see "the attempts made with all models and the scores you obtained with these models."

Now let's take a look at feature importance.

X = train_data 
Predictor.feature_importance(X)

All built models are stored in the output folder "output_models".