Wednesday, 27 May 2026 | Mise à jour quotidienne L'intelligence artificielle au service des constructeurs

How to Build Your First Machine Learning Model in Python (2026)

The best way to understand machine learning is to build a model yourself. It’s far less intimidating than it sounds — with Python and the right library, your first working model is about 20 lines of code. This tutorial walks through every step, explaining not just what to type but why.

Principaux enseignements

  • You’ll use Python and scikit-learn — the standard beginner-friendly ML library.
  • The workflow: load data → split it → train a model → evaluate → predict.
  • The golden rule: always test on data the model never saw during training.
  • No advanced math needed — scikit-learn handles the hard parts.

What you’ll build

You’ll build a classifier — a model that sorts things into categories. We’ll use the classic beginner dataset, the Iris dataset: measurements of iris flowers (petal and sepal length and width), where the task is to predict the flower’s species. It’s small, clean, and built into scikit-learn, so it’s perfect for a first model.

The same five steps you learn here apply to almost every machine learning project, no matter how large.

Step 1: Set up your tools

You need Python and two libraries. scikit-learn is the workhorse — it provides datasets, algorithms, and evaluation tools in a consistent, beginner-friendly interface.

Install them from your terminal:

pip install scikit-learn pandas

You can write the code in a plain .py file, but a Jupyter notebook (or a free cloud notebook like Google Colab) is ideal for learning — you run code in small pieces and see each result immediately.

Step 2: Load the data

Every ML project starts with data. Here we load the built-in Iris dataset:

from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data      # the measurements (the inputs / features)
y = iris.target    # the species (the labels / answers)

print("Shape of X:", X.shape)   # (150, 4) — 150 flowers, 4 measurements each
print("Classes:", iris.target_names)

Two variables matter here, and the naming is a universal convention:

  • X holds the features — the inputs the model learns from (the four measurements).
  • y holds the labels — the correct answers (the species).

Because we have the answers, this is supervised learning.

Step 3: Split the data

This is the most important step for getting an honest result. You must split your data into two parts:

  • A training set the model learns from.
  • A test set the model never sees during training — used only to evaluate it.

If you tested on the same data you trained on, you’d just be measuring memorization, not real learning. (This is how you catch overfitting.)

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

test_size=0.2 keeps 20% of the data for testing and trains on the other 80%. random_state=42 just makes the random split reproducible, so you get the same result every run.

Step 4: Choose and train a model

Now the machine learning itself. We’ll use a Random Forest — an accurate, reliable, beginner-friendly algorithm (see our algorithms guide).

In scikit-learn, training a model is two lines:

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

That .fit() call is the training. The model studies the training features and their labels and learns the patterns that connect measurements to species. scikit-learn handles all the math behind that single line.

Step 5: Evaluate the model

Now check how well it learned — using the test set it has never seen:

from sklearn.metrics import accuracy_score

predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)

print(f"Accuracy: {accuracy:.2%}")

.predict() asks the model to classify the test flowers; accuracy_score compares its guesses to the true answers. On the Iris dataset you’ll typically see accuracy around 95–100% — your model correctly identifies almost every flower it never saw before.

Step 6: Make a prediction on new data

The real payoff: using the model on brand-new input. Give it a set of measurements and it predicts the species:

new_flower = [[5.1, 3.5, 1.4, 0.2]]   # sepal & petal measurements
prediction = model.predict(new_flower)

print("Predicted species:", iris.target_names[prediction[0]])

That’s a complete machine learning model: trained, tested, and making predictions on data it has never encountered.

The complete workflow

Those five steps are not just an exercise — they’re the skeleton of essentially every supervised ML project:

StepWhat it does
1. Load dataGet features (X) and labels (y)
2. Split dataSeparate training and test sets
3. Trainmodel.fit() learns the pattern
4. EvaluateMeasure accuracy on unseen test data
5. Predictmodel.predict() on new inputs

Bigger projects add data cleaning, feature preparation, and model tuning — but this core loop stays the same.

Where to go next

To keep building:

  • Try other algorithms — swap RandomForestClassifier for LogisticRegression or SVC and compare. scikit-learn’s consistent interface makes this trivial.
  • Try other datasets — practice on free datasets that interest you.
  • Learn data preparation — real data is messy; cleaning and preparing it is most of the job.
  • Explore evaluation — accuracy is just one metric; learn precision, recall, and cross-validation.

FAQ

How do I build a machine learning model in Python?

Use the scikit-learn library. The workflow is: load your data into features (X) and labels (y), split it into training and test sets, create a model and call .fit() to train it, evaluate it on the test set, and use .predict() for new data. A first model is about 20 lines of code.

What library should beginners use for machine learning?

scikit-learn. It offers a wide range of algorithms, built-in datasets, and evaluation tools through one simple, consistent interface, and it handles the underlying math for you. It’s the standard starting point before moving to deep learning frameworks.

Do I need to be good at math to build an ML model?

No. To build models with scikit-learn you need only basic Python and an understanding of the workflow. The library handles the math. Deeper math becomes useful later if you want to tune models expertly or do research.

Why do I need to split data into training and test sets?

So you can measure real performance. If you test a model on the same data it trained on, you only measure memorization. A separate test set the model never saw shows whether it genuinely learned the pattern and can generalize to new data.

What does model.fit() do?

.fit() is the training step. It feeds the training features and labels to the algorithm, which adjusts its internal parameters to learn the patterns connecting inputs to correct answers. After .fit(), the model is trained and ready to make predictions.

Bottom line

Building your first machine learning model is genuinely a short, achievable project: install scikit-learn, then load, split, train, evaluate, and predict. Those five steps are the foundation of nearly every supervised ML project you’ll ever build.

Don’t just read this — open a notebook and run the code. Change the algorithm, try a different dataset, break things and fix them. The concepts in machine learning click far faster once you’ve trained a model with your own hands. When you’re ready for more, grab a free dataset and build something of your own.

Défiler vers le haut