A machine learning model can score 99% accuracy in testing and then fail badly in the real world. The usual culprit has a name: overfitting. It is the single most common mistake in applied machine learning, and understanding it is essential to building models that actually work. This guide explains overfitting clearly and gives you the proven ways to prevent it.
Punti chiave
- Overfitting is when a model memorizes its training data instead of learning the general pattern.
- The sign: excellent performance on training data, poor performance on new data.
- The opposite problem is underfitting — a model too simple to learn the pattern at all.
- Prevent it with: more data, a simpler model, regularization, cross-validation, and early stopping.
- Always test on data the model never saw — that’s the only honest measure of quality.
What is overfitting?
Overfitting happens when a model learns its training data too well — including the noise, quirks, and random accidents that don’t represent the real pattern. Instead of learning the general rule, it memorizes the specific examples.
The goal of machine learning is generalization: performing well on new, unseen data. An overfit model fails at exactly that. It has essentially memorized the answers to the practice exam, so it aces the practice exam — and then collapses on the real one, because the questions are different.
A simple analogy
Picture two students preparing for a math test.
The first understands the concepts — the methods, the reasoning. Give them any problem, even one they’ve never seen, and they can solve it.
The second memorizes the exact practice problems and their answers, word for word. On the practice test, they score perfectly. On the real test, with new numbers, they’re lost — they never learned the method, only the specific answers.
The second student is an overfit model: flawless on training data, helpless on anything new.
How to spot overfitting
Overfitting has one classic, unmistakable signature: a large gap between training performance and test performance.
This is why you always split your data. You train the model on one portion (the training set) and evaluate it on a separate portion it never saw (the test set). Then:
- Small gap, both scores good → the model generalizes well. Healthy.
- Training score high, test score much lower → overfitting. The model memorized.
- Both scores poor → underfitting. The model is too simple (more on this below).
If your model is brilliant on training data and mediocre on test data, you have overfitting — full stop.
The opposite problem: underfitting
Overfitting has a mirror image. Underfitting is when a model is too simple to capture the real pattern, so it performs poorly on both training and test data. It hasn’t memorized — it hasn’t learned at all.
The two define a balance every ML practitioner manages:
| Problem | Training score | Test score | Cause |
|---|---|---|---|
| Underfitting | Poor | Poor | Model too simple |
| Good fit | Buono | Buono | Right complexity |
| Overfitting | Eccellente | Poor | Model too complex / too little data |
The aim is the middle row: a model complex enough to learn the pattern, but not so complex it memorizes the noise.
Why overfitting happens
The common causes:
- Too little training data — with few examples, the model can memorize them all instead of generalizing.
- A model that’s too complex — a very flexible model has enough capacity to fit every quirk in the data.
- Training for too long — past a point, extra training just fits noise more tightly.
- Noisy or low-quality data — the more random junk in the data, the more there is to wrongly “learn.”
- Too many features — irrelevant inputs give the model spurious patterns to latch onto.
How to prevent overfitting
There’s no single fix — practitioners combine several techniques.
1. Get more training data
The most effective cure. With more examples, memorizing becomes impossible and the model is forced to learn the genuine pattern. When you can’t collect more, data augmentation — creating realistic variations of what you have (rotating or cropping images, for instance) — helps.
2. Simplify the model
If the model is too complex, reduce its capacity: fewer parameters, a shallower structure, fewer features. Always try a simpler model first — it’s less prone to overfitting and easier to understand.
3. Use regularization
Regularization adds a penalty for complexity during training, discouraging the model from relying too heavily on any one feature or fitting extreme values. It’s a standard, built-in option in most ML algorithms and one of the most effective tools available.
4. Use cross-validation
Cross-validation tests the model on several different splits of the data rather than one. It gives a more honest, stable estimate of real-world performance and quickly reveals a model that only looks good on a lucky split.
5. Stop training early
Monitor performance on a validation set during training. When validation performance stops improving and starts to slip, stop — continuing past that point only fits noise. This is early stopping.
6. Use dropout (for neural networks)
For neural networks, dropout randomly switches off some neurons during each training step. This stops the network from over-relying on any single path and forces it to learn more robust, general patterns.
7. Always hold out a real test set
Non-negotiable: keep a portion of data the model never sees during training or tuning, and judge the model only on that. It’s the only honest measure of how the model will perform in the real world.
Data leakage: the hidden cause of fake good results
Most of this guide treats overfitting as a modeling problem — a model too complex for too little data. But there is a quieter cause that produces the same symptom and fools far more practitioners: data leakage. Leakage is when information that would not be available at prediction time sneaks into training. The model looks brilliant in testing, then collapses in production. If your validation scores seem too good to be true, suspect leakage before you suspect luck.
There are two families to watch for:
- Train-test contamination. Test data bleeds into the training process. The classic mistake is preprocessing before splitting: if you scale, normalize, or impute missing values using statistics from the whole dataset, your training set has already “seen” the mean and range of the test set. Always split first, then fit any transformer on the training data alone and apply it to the test set.
- Target leakage. A feature secretly encodes the answer. A model predicting whether a patient has an illness will look near-perfect if one of its inputs is “medication prescribed for that illness” — information that only exists after the diagnosis. The feature is not available when you actually need a prediction, so the score is fiction.
Time-ordered data adds a third trap. Randomly shuffling a time series before splitting lets the model train on the future to predict the past, which violates causality and inflates accuracy. For anything with a timestamp, split chronologically: train on earlier periods, test on later ones.
Leakage is dangerous precisely because none of the fixes elsewhere in this article catch it. More data, regularization, and early stopping all assume your evaluation is honest. If the test set is contaminated, every signal you rely on to detect overfitting is itself corrupted — so the model passes every check and still fails for real users.
Three habits prevent most of it. First, wrap preprocessing and the model in a single pipeline (scikit-learn’s Pipeline does this) so transforms are only ever fit on training folds. Second, audit suspiciously strong features by asking: would I genuinely know this value at the moment of prediction? If not, drop it. Third, when results look spectacular, treat that as a red flag to investigate rather than a victory to celebrate. Genuine generalization rarely looks effortless.
Domande frequenti
What is overfitting in machine learning?
Overfitting is when a model learns its training data too well — memorizing the noise and quirks instead of the general pattern. It performs excellently on training data but poorly on new, unseen data, because it never learned to generalize.
How do I know if my model is overfitting?
Compare its performance on training data versus test data (data it never saw). If it scores much higher on training than on testing, it’s overfitting. A healthy model performs similarly well on both.
What is the difference between overfitting and underfitting?
Overfitting is a model too complex that memorizes the training data and fails on new data. Underfitting is a model too simple to learn the pattern at all, so it performs poorly on both training and new data. The goal is the balanced middle.
How do you prevent overfitting?
Use more training data, choose a simpler model, apply regularization, use cross-validation, and stop training early when validation performance stops improving. For neural networks, dropout also helps. Most practitioners combine several of these techniques.
Does more data always fix overfitting?
More high-quality data is the most reliable cure, because it makes memorization impossible and forces genuine learning. But it isn’t always available — which is why simplifying the model, regularization, and early stopping matter as practical alternatives.
What is data leakage, and how is it different from overfitting?
Overfitting is a model memorizing noise in legitimately available training data. Data leakage is information that should not be available — such as test-set statistics or a feature that encodes the answer — contaminating training. They produce the same symptom (great test scores, poor real-world results), but leakage is more insidious: it makes your evaluation itself untrustworthy, so the usual overfitting checks fail to catch it. The fix is data hygiene — split before preprocessing and audit any feature that looks too predictive.
Why does my model overfit when I fine-tune an LLM on a small dataset?
Small fine-tuning sets are a textbook overfitting risk: with few examples, the model memorizes them instead of learning the pattern. The tell-tale sign is training loss falling while validation loss climbs. The standard remedies are running fewer epochs (often just a handful) with early stopping, and using a parameter-efficient method like LoRA, which constrains updates to a small subset of weights and acts as built-in regularization that resists memorization.
Is a small gap between training and test accuracy acceptable?
Yes. A small gap is normal and healthy — no model performs identically on data it has seen versus data it has not. Overfitting is signaled by a large or widening gap, where training accuracy keeps climbing while test accuracy stalls or falls. Chasing a zero gap usually means underfitting instead. Judge a model by its test-set performance, and treat the gap as a direction-of-travel warning light rather than a number to eliminate.
Conclusione
Overfitting is the gap between looking good and being good. A model that memorizes its training data will dazzle you in testing and disappoint you in production — it learned the answers, not the method.
The defense is straightforward: always evaluate on data the model never saw, watch for the train-versus-test gap, and prevent overfitting with more data, simpler models, regularization, cross-validation, and early stopping. Master this balance and you’ll build models that work not just on your desk, but in the real world. For the bigger picture, see our guide to machine learning.
