FPL Lens

Data Clubhouse

Prediction-led fixture analysis for serious FPL planning

Live prediction feedFixture ticker, match center, and model comparison in one system

ML process

How the machine learning system actually works.

This page explains the full machine learning pipeline in plain English: what data we collect, how we prepare it, what models we train, how we test them, and how those predictions finally reach the website.

What machine learning means here

We use past football matches to teach models how combinations of signals usually lead to goals, clean sheets, and results.

What success looks like

A good system predicts future matches honestly, handles uncertainty well, and does not fake confidence by leaking future information.

Simple version

The shortest possible explanation

1

Feed in old matches

We show the system many past matches with all the pre-match clues we knew at the time.

2

Learn patterns

The model learns how different patterns tend to lead to goals, clean sheets, and match outcomes.

3

Apply to new fixtures

For an upcoming match, we build the same type of input row and let the trained model estimate what is likely to happen.

Pipeline walkthrough

Step by step, from raw data to live prediction

1. Collect the raw football information

We first gather the facts. This is the stage where we collect match results, expected goals, shots, team form, referee context, weather, and fixture information from the sources that power the project.

Historical match records are stored so the models can learn from several seasons, not just a few recent weeks.
We keep source information separate from calculated information, so we know what came from the outside world and what was generated by our system.
This includes core football data, expected goals data, match context, and manual corrections when needed.

2. Clean and align the data

Raw data is messy. Team names can differ between providers, dates can drift, and some matches can be missing fields. Before machine learning can happen, all of that has to be lined up properly.

The importer normalizes team identities across sources so the same club is treated as one club everywhere.
The pipeline defends against schema drift and missing columns so the system does not silently train on broken tables.
Problem rows are isolated instead of poisoning the full pipeline, which means one bad record should not crash the whole process.

3. Turn football knowledge into features

The models do not understand football directly. They only understand numbers. So we convert football ideas into measurable signals called features.

Examples include recent form, shot volume, shot quality, xG trend, defensive trend, points per game, possession levels, Elo strength, rest days, and attack-vs-defense interaction signals.
There is a shared feature contract so training and live prediction use the same columns in the same order.
This matters because many ML projects fail not because the model is bad, but because training-time inputs and live-time inputs drift apart.

4. Train separate models for separate football questions

We do not ask one giant model to predict everything. Instead, we train different models for different jobs, because predicting goals is not the same problem as predicting a clean sheet or a match result.

Goals model: regression models estimate likely home and away xG or goal output.
Clean-sheet model: binary classification estimates the chance that a team concedes zero.
Results model: multi-class classification estimates home win, draw, or away win probabilities.

5. Test the models like real future predictions

A machine learning model only matters if it can predict matches it has not seen yet. So we test it in time order, not with random shuffling that would accidentally leak future information.

The validator uses leakage-safe holdout windows where training ends before testing begins.
Rolling-origin benchmarking checks performance across multiple chronological windows instead of one lucky test split.
We track metrics like MAE, accuracy, AUC, calibration quality, and comparison versus baselines.

6. Serve live predictions for upcoming fixtures

When the website asks for an upcoming match prediction, the backend recreates the same feature set for that fixture, loads the trained model artifacts, and returns ML outputs for display.

A live feature extractor builds the pre-match row for the specific fixture.
The prediction API then runs the goals, clean-sheet, and results models on that row.
If a specific artifact is unavailable, the system reports that truthfully instead of pretending the model is ready.

The model family

Why we split the work across multiple models

Goals Predictor

Estimate how much attacking output each team is likely to create.

Think of this as a smarter version of asking: how dangerous should each side be in this match?

Uses structured pre-match features such as recent attacking trend, defensive weakness, shot quality, and team strength.
Learns continuous outputs rather than yes/no answers, because goals and xG live on a spectrum.
Feeds later calculations such as clean-sheet fallback logic and match-shape interpretation.

Clean Sheet Predictor

Estimate the chance that a team finishes the match without conceding.

This is the defender and goalkeeper model.

Treats clean sheets as a classification problem: yes or no.
Uses calibrated probabilities so the percentage shown is intended to mean something, not just rank teams roughly.
Falls back sensibly when model artifacts are missing, instead of hiding the state.

Results Predictor

Estimate the probabilities of home win, draw, or away win.

This is the outcome model for who gets the result.

Uses explicit class mapping for H / D / A so output labels are unambiguous.
Has been hardened against draw collapse, class imbalance, and overconfident probabilities.
Is judged not only by accuracy, but by how realistic and calibrated its probability outputs are.

What model are we actually using?

Gradient-boosted trees, mainly through XGBoost

The main model family: XGBoost

The project mainly uses XGBoost. XGBoost is a gradient-boosted decision tree system. In simple terms, it builds many small rule-based trees, and each new tree tries to fix mistakes made by the earlier ones.

Why trees fit football data

Football prediction has lots of messy, mixed signals: form, Elo, rest, venue, shot quality, defensive trend, and schedule congestion. Tree models are strong at handling those non-linear relationships without needing the data to behave in a perfectly clean straight-line way.

Why not one giant neural network?

For a project like this, tree models are often a better first choice because they work well on structured tabular data, are easier to debug, and give clearer feature importance and SHAP explanations.

Tree depth and size

What those model settings actually mean

Goals predictor

Two XGBoost regressors: one for home output and one for away output.

Typical depth: 5 to 6
Typical learning rate: 0.035 to 0.05
Typical trees: 200 to 320

A depth of 5 or 6 means each tree can make several layers of decisions, but not so many that it memorizes the past. Hundreds of trees means the final prediction is built from many small corrections rather than one giant leap.

Clean-sheet predictor

Two XGBoost binary classifiers: one for home clean-sheet chance and one for away clean-sheet chance.

Typical depth: 5
Typical learning rate: 0.05
Typical trees: around 150

This model answers a yes-or-no style question, so it uses classification rather than regression, then calibrates the probabilities so the percentages are more trustworthy.

Results predictor

A more complex XGBoost setup for home win, draw, and away win probabilities.

Core multi-class depth: around 5
Typical learning rate: 0.05
Typical trees: 200 to 260

This is the hardest problem because draws are tricky. The project uses a stronger setup here, including draw-aware tuning, class-balance handling, and extra probability blending logic so the model does not just collapse into always preferring wins.

Engineering rules

The safeguards that keep the system honest

No future leakage

The model should never get credit for information that would not have existed before kickoff. This is one of the biggest ways ML projects fool themselves.

One feature contract

The same canonical feature set is used across training, validation, and live prediction. That reduces train-serve drift.

Probabilities must be trustworthy

It is not enough to get the winner right sometimes. The percentages also need to be believable and usable.

Why this is hard

What makes football ML difficult

Football is low-scoring, noisy, and heavily influenced by context, so a small number of events can change everything.
Draws are especially hard because they are less common and sit between two more obvious classes.
Data quality matters as much as model choice. A clever model on inconsistent data often performs worse than a simpler model on clean data.
Live predictions are only as good as the feature pipeline. If the current-state inputs are wrong, the model can still be mathematically correct and practically useless.

How to think about the whole project

The full breadth of what this system is doing

Data engineering

Collecting, cleaning, aligning, and storing football information from multiple sources so the rest of the stack has trustworthy inputs.

Feature engineering

Translating football concepts like form, shot quality, rest, tactical context, and team strength into machine-readable numbers.

Model training

Teaching specialist models to answer different football questions using leakage-safe historical training windows.

Validation and benchmarking

Checking whether the models genuinely predict future matches well, not just past matches they accidentally memorized.

Live inference

Generating the same features for upcoming fixtures and turning trained model files into live probabilities for the website.

Product delivery

Exposing all of that through usable screens so the predictions are understandable, inspectable, and easy to challenge.

Glossary

Plain-English definitions for the main ML terms

This section is here so a reader does not need to already understand machine learning before reading the rest of the page.

Feature

A single input the model uses to make a decision.

For example: recent xG, team rest days, or Elo difference.

Feature contract

The agreed list of inputs, their names, their order, and how they are calculated.

This keeps training data and live prediction data speaking the same language.

Train-serve drift

A mismatch between what the model saw during training and what it receives when used live.

A feature might mean “last 5 matches” in training but “last 3 matches” in production.

Regression

A model that predicts a number on a scale rather than a yes/no answer.

Used for outputs like expected goals.

Classification

A model that predicts a category or class.

Used for questions like home win, draw, or away win.

Probability

The model’s estimate of how likely an event is.

A 65% home win probability means the model thinks that outcome happens roughly 65 times in 100 similar situations.

MAE

Mean Absolute Error. The average size of the model’s mistakes.

Lower is better because it means the model is closer to the truth on average.

Accuracy

How often the model gets the category right.

If it correctly predicts 52 results out of 100, accuracy is 52%.

AUC

A score for how well the model separates likely events from unlikely events.

Often used for things like clean-sheet probability. Higher is better.

Calibration

A check on whether the model’s percentages are honest.

If the model says 70% many times, about 70% of those events should really happen.

Baseline

A simpler method we compare the ML model against.

This tells us whether the extra ML complexity is actually worth it.

Leakage

When a model accidentally learns from information that would not have been available before the match.

This makes performance look better than it really is and is one of the biggest ML mistakes.