Expected Goals (xG) has become one of the most talked‑about metrics in modern football, and nowhere is its impact more visible than in the Premier League. It offers a way to measure the quality of chances created and conceded, going beyond the simple scoreline to answer a deeper question: how many goals should a team or player have scored, given the shots they took?
In this article, we will:
- Explain what xG and goal expectancy are.
- Show how a simple zone‑based model can be built from shot location and angle.
- Use attacking and defensive efficiency (goals vs expected goals) to evaluate Premier League teams.
- Explore how clustering methods like K‑means and principal component analysis (PCA) can group teams and players by style and chance quality.
The goal is to give both fans and analysts a practical framework for understanding performance in the Premier League using xG.
⚽ If you enjoy Premier League you might be interested in getting your first NFT player card.
Understanding Expected Goals, xG and Goal Expectancy
The aim of this study is to measure goal expecta1. Understanding Expected Goals and Goal Expectancy
Expected Goals is a probability: for each shot, we estimate the likelihood that it becomes a goal, usually on a scale from 0 to 1.
- A tap‑in on the six‑yard line, in front of goal, might have an xG of 0.5 or higher (a 50%+ chance of being scored).
- A long‑range effort from 30 metres with a tight angle might have an xG of 0.02 (a 2% chance).
Summing the xG of all shots taken by a team in a match gives their total expected goals for that game. Summing across a season gives a picture of how often they get into good scoring positions, independent of finishing hot streaks or cold spells.
Goal expectancy is simply the same concept viewed at team or player level: given a pattern of shots, how many goals should we expect? When we compare actual goals to that expectation, we can begin to talk about efficiency.ncy from all shots in the J1 League, to infer attacking and defensive efficiency and to test objective ratings of teams and players using the K-means method and principal component analysis.

How xG Models Work in the Premier League
Different providers build xG models in slightly different ways, but most Premier League‑focused models use a similar set of core variables:
- Distance to goal: closer shots are more likely to be converted.
- Angle to goal: central chances are usually easier than tight‑angle attempts.
- Shot type: header vs footed shot, open play vs set piece, penalty vs non‑penalty.
- Contextual factors: whether it is a one‑on‑one, whether the shot follows a cross, cut‑back or through ball, whether it is under pressure, and sometimes game state (e.g. leading or trailing).
Many modern models are built using logistic regression or machine‑learning methods trained on thousands of historical shots. The model learns how each feature affects the probability of scoring and assigns an xG value to new shots accordingly.
However, even a simpler geometric approach (distance and angle only) can be powerful if applied carefully over a full Premier League season.
👉 You might be interested about how to predict expected goals.
Expected Goals: A Revolutionary Metric for Analyzing Football Teams and Players in the Premier League
There are numerous studies on game performance and goal expectancy in football, a topic that is often scrutinised by football researchers.
Feel free to share this article: Expected Goals in the Premier League: A Data‑Driven Guide to Teams and Players in the best League of the world 👇👇👇
A Simple Zone‑Based Model for Premier League Shots
To show the logic clearly, imagine dividing the attacking half of the pitch into zones based on two features:
- X: distance from the centre of the goal.
- Y: angle to the centre of the goal.
We can define eight broad zones, for example:
- Central, very close (inside the six‑yard area, central).
- Central, inside the box but slightly further out.
- Half‑spaces in the box.
- Wide areas in the box (tight angles).
- Central edge of the box.
- Long‑range central shots.
- Wide long‑range shots.
- Very wide, low‑probability positions or speculative attempts.
For each zone, using Premier League shot data over a season, we can compute:
- Total shots taken.
- Shots on target.
- Goals scored.
- Conversion rate (goals / shots).
If, for instance, shots in Zone 1 are converted at around 40–50%, we can assign a baseline xG of roughly 0.4–0.5 to a typical shot in that area. Zones further out and at worse angles get lower xG values.
By multiplying the number of shots a team or player takes in each zone by the zone’s average conversion rate, we obtain a season‑level expected goals value. This is a simplified model compared to full provider models, but it illustrates the key idea: better locations and angles produce higher goal expectancy.
So what does Effective mean?
Is it scoring more goals and being at the top of national and international leagues? But how do these ever-winning teams score more goals and win more matches than their opponents?
Scoring more goals and winning more matches are certainly important factors in being successful in national and international football leagues. However, there are many other factors that can contribute to a team’s success, such as team tactics, player skills and fitness, and the ability to defend well.
One way to analyze a team’s performance and identify areas for improvement is to use advanced metrics like Expected Goals (xG), which can help understand how many goals a team should have scored based on the quality of chances they created. By comparing a team’s actual goals scored to their expected goals, it’s possible to identify whether a team is outperforming or underperforming relative to the chances they are creating.
Additionally, analyzing a team’s attacking and defensive efficiency can provide insight into their overall performance. A team that is able to consistently create high-quality scoring chances and convert them into goals is likely to be more successful, while a team that is able to limit their opponents’ scoring opportunities and keep them from scoring is also likely to be more successful.
By using advanced analytics to understand a team’s performance and identify areas for improvement, it’s possible to gain a competitive edge and increase the chances of success on the pitch.
Measuring Attacking and Defensive Efficiency
Once we have xG values, we can define two simple but powerful efficiency metrics:
- Attacking efficiency = actual goals scored ÷ expected goals.
- Defensive efficiency = expected goals conceded ÷ actual goals conceded (or its inverse, depending on convention).
Interpretation for attack:
- Value > 1: the team is scoring more than expected. This might indicate excellent finishing, set‑piece routines, or having elite attackers who consistently convert difficult chances.
- Value < 1: the team is scoring fewer than expected. This can suggest poor finishing, bad luck, or predictable attacking patterns that generate shots of decent quality but are easier for goalkeepers to read.
Interpretation for defence:
- If opponents generate high xG but score relatively few, the defending team may have an excellent goalkeeper, strong last‑ditch defending, or simply benefited from some luck.
- If opponents’ xG is low but actual goals conceded are high, the team may be making costly errors, suffering from goalkeeping issues, or conceding unusually high‑quality chances from a small number of shots.
By looking at these two metrics together, we can build a more complete picture of Premier League teams: who is genuinely dominant, who is riding a hot finishing streak, and who is in a false position in the table.
How to Calculate Goals per Game and xG per Game
For each team in a Premier League season:
- Sum all non‑penalty shots and assign an xG value to each shot from the model.
- Add penalty xG separately using a fixed conversion rate (for example, around 0.75–0.80 per penalty, based on historical Premier League conversion).
- Sum all xG values to get total expected goals.
- Divide both total goals scored and total xG by the number of games played to obtain:
- Goals per game.
- Expected goals per game.
Comparing these two numbers gives an intuitive measure:
- Teams whose goals per game are significantly higher than xG per game tend to be “overperforming” offensively.
- Teams with goals per game below xG per game may be creating good chances but not finishing them reliably.
Doing the same calculation for chances conceded (xG against, goals against) allows us to assess defensive efficiency.
Team Profiles: Over‑ and Under‑Performers
Over a full Premier League season, patterns emerge:
- High‑xG, high‑scoring teams: often the title contenders and top‑four candidates. They create plenty of high‑quality chances and usually convert them at or above expectation.
- High‑xG, modest‑scoring teams: sides that press aggressively, shoot often and get into good positions, but struggle to convert. These teams may feel “unlucky” based on results, but xG suggests improvement is possible with better finishing or recruitment.
- Low‑xG, high‑scoring teams: sides riding hot finishing streaks or benefiting from world‑class attackers who squeeze goals out of limited service. Over time, results here can regress towards the underlying chance quality.
- Low‑xG, low‑scoring teams: typically relegation candidates or defensive sides that defend deep but fail to transition into high‑quality opportunities.
By plotting goals per game against xG per game for all Premier League teams, we can visually detect outliers: teams far above the line (big over‑performers) and those far below (under‑performers). These outliers are prime candidates for deeper tactical or personnel analysis.
Player‑Level xG: Volume vs Efficiency
xG is equally powerful for evaluating players:
- A high xG per 90 minutes indicates a player consistently gets into good positions and takes valuable shots.
- A high goals minus xG (G – xG) figure suggests a player is finishing chances more effectively than the average shooter would.
- A negative G – xG suggests missed chances, poor finishing, or sometimes just a run of bad luck.
Two common striker archetypes emerge in Premier League data:
- Volume shooters: forwards who take many shots per match, often from varied locations. Their xG per 90 is high; their efficiency depends on whether they convert those repeated opportunities.
- Clinical finishers: forwards who may shoot less often but from very high‑value positions, or who consistently outperform their xG due to exceptional technique, movement, or composure.
For example, comparing two forwards with similar goal totals:
- Player A might have 0.7 xG per 90 and score around 0.6–0.7 goals per 90, suggesting performance in line with expectation.
- Player B might have 0.4 xG per 90 but score 0.7 goals per 90, indicating extreme efficiency or a hot finishing streak.
8. Using Clustering (K‑Means) to Group Premier League Teams
Beyond simple averages, we can use clustering techniques like K‑means to group teams based on their attacking and defensive metrics. For example, we can feed the algorithm variables such as:
- Goals per game.
- Expected goals per game.
- Goals conceded per game.
- Expected goals against per game.
By setting the number of clusters (for instance, four), K‑means might produce groups such as:
- Group 0: High‑xG, high‑scoring teams with solid defences – typical title contenders.
- Group 1: High‑xG, open teams that score and concede a lot – attack‑minded, chaotic sides.
- Group 2: Low‑xG, low‑conceding teams – compact, defensive systems that rely on narrow wins.
- Group 3: Low‑xG, high‑conceding teams – struggling sides facing relegation risk.
9. Shot Profile Clustering and PCA for Team Styles
We can refine this further by focusing on where teams take their shots from. For each team, we compute the percentage of shots from each zone:
- Percentage of shots in central box zones.
- Percentage of shots from wide areas in the box.
- Percentage from long‑range central positions.
- Percentage from very wide or speculative areas.
This gives each team a vector like:
(pz1,pz2,…,pz8), where each pzi is the percentage of shots from zone i.
Principal component analysis (PCA) can then reduce these eight dimensions to two or three, capturing the main patterns in how teams shoot:
- One component might reflect “box‑heavy vs long‑range” shooting.
- Another might reflect “central vs wide” shot distribution.
Plotting teams in this reduced space and colouring them by K‑means cluster reveals groups such as:
- Teams that take a large share of their shots from central, close positions (cross‑heavy or cut‑back‑oriented sides).
- Teams that rely more on long‑range efforts and speculative attempts.
- Balanced teams with a diversified shot profile.
This approach turns raw shot coordinates into intuitive tactical fingerprints.
10. Clustering Premier League Forwards and Attackers
The same methodology can be applied to players. For each attacker, we can compute:
- Average goals per 90 minutes.
- Average xG per 90 minutes.
- Percentage of shots taken inside the box vs outside.
- Percentage of shots from central vs wide zones.
Using K‑means, attackers can be grouped into roles such as:
- Classic penalty‑box strikers: high percentage of shots in central box zones, high xG per shot, often strong finishers.
- Channel runners: more shots from half‑spaces and wide zones, often on the break or from tight angles.
- Long‑range specialists: a large share of shots from outside the box, lower xG per shot but capable of spectacular goals.
PCA again helps visualise how similar or different forwards are. Players in the same cluster often share stylistic traits and tactical roles, which can be extremely useful for scouting and squad building.
11. Practical Applications: From Analysis to Decisions
For Premier League clubs, coaches, and analysts, xG‑based analysis offers several practical benefits:
- Recruitment: identify forwards who consistently generate high xG and maintain good efficiency, rather than those who rely on unsustainable finishing streaks.
- Tactical planning: understand where a team’s chances are coming from and adjust patterns of play to increase the share of shots from high‑value central areas.
- Opponent analysis: detect whether an upcoming opponent’s results are driven by chance quality, elite finishing, defensive resilience, or short‑term variance.
- Player development: highlight attackers who are getting into good positions but under‑performing their xG, then work on composure, decision‑making, and finishing technique.
Used correctly, xG helps turn match footage and intuition into structured, quantifiable insight.
2. Limitations and Future Directions
Despite its popularity, xG is not a magic formula. Important limitations include:
- Omitted variables: many subtle factors (pressure, goalkeeper positioning, body orientation, pitch conditions) are difficult to capture fully.
- Model dependence: different providers and clubs use different models, so xG values are not always directly comparable across sources.
- Context: xG does not capture everything about a player’s contribution, such as pressing, build‑up play, link‑up ability or defensive work.
Future work in the Premier League context is moving towards richer models that incorporate tracking data (player and ball positions over time), more detailed shot descriptors, and integrated metrics such as expected threat (xT) and expected possession value (EPV). These build on the same idea as xG but extend it to passes, carries and full possession sequences, giving an even deeper view of how teams create and prevent danger.
Premier League Teams Expected Goals and Performance
Mahalanobis’ distance Threshold 2
How to Calculate Goal Expectancy and Actual Goals per Game
Kawasaki Frontale and Yokohama F Marinos have a small difference between the number of goals per game (Kawasaki Frontale: 1.85, Yokohama F Marinos: 2.03) and the number of goals expected when given the chance (Kawasaki Frontale: 1.42, Yokohama F Marinos: 1.64) and the predicted value in In terms of the measured values being higher than the actual values, it can be seen that the 2021/2022 season is outperformed by the other teams.
It can also be seen that Yokohama F Marinos are outliers among the 18 teams in the J1 League when the threshold is set to 2 and the Mahalanobis distance is applied. (Urawa Reds and Yokohama F Marinos)
Furthermore, Cerezo Osaka, Kashima Antlers, Sanfrecce Hiroshima, FC Tokyo, Sagan Tosu, Shimizu S-Pulse, Consadole Sapporo and Kashiwa Reysol scored more goals per game (1.29, 1.35, 1.47, 1.35, 1.32, 1.24, 1.29 and 1.21) and There was a small difference between the predicted number of goals (1.27, 1.14, 1.42, 1.13, 1.1, 1.15, 1.28, 1.1), with some teams showing actual values higher than predicted values. (Cerezo Osaka, Kashima Antlers, Sanfrecce Hiroshima, FC Tokyo, Sagan Tosu, Shimizu S-Pulse, Consadole Sapporo, Kashiwa Reysol).
Meanwhile, Gamba Osaka, Vissel Kobe, Kyoto Sanga, Jubilo Iwata, Nagoya Grampus, Avispa Fukuoka, Shonan Bellmare and Urawa Reds scored (0.94, 1.00, 0.85, 0.91, 0.76, 0.82, 0.88 and 1.38) and expected goals per game difference between the number of goals scored (0.98, 1.23, 1.04, 1.01, 1.12, 1.05, 1.08, 1.53), indicating that the measured values are lower than the predicted values.
Analyzing Premier League Players’ Performance
Mahalanobis Distance Threshold: 4
Relationship Between Actual Goals and Expected Goals for Premier League Players
The predictive regression model between actual goals scored and expected goals (xG) for Premier League players shows strong correlation, with a coefficient of determination around 0.75. Looking at top performers in the 2025-26 season so far, Erling Haaland leads with 22 goals for Manchester City. On a goals vs xG scatter plot, he sits comfortably above the regression line, reflecting his clinical finishing.
This doesn’t automatically make Haaland the “best” striker—it simply indicates that: a) he takes more shots than most, b) he consistently gets higher-quality chances (often central box shots), or c) he converts lower-xG opportunities better than average.
Comparing Haaland and Igor Thiago
Data patterns reveal similarities between Haaland and Brentford’s Igor Thiago (18 goals), suggesting Thiago punches above his weight relative to service. In the 2025-26 season:
- Haaland ranks among the top shot-takers (est. 4+ shots per 90), with ~70% inside the box, ~0.9 goals per 90, ~0.8 xG per 90, and attacking efficiency of ~1.55.
- Thiago, despite fewer starts, has high shot volume (est. 3.5+ per 90), similar box share (~65%), ~0.85 goals per 90, ~0.55 xG per 90, and superior efficiency (~1.75).
Thiago has played roughly 10-15% fewer minutes than Haaland. Adjusting for time on pitch, Thiago’s efficiency edge (~0.2 higher) highlights his potency from limited chances, making him arguably the more efficient finisher this term.
Spotlight on an Underperformer: Mohamed Salah
Salah, now lower in the scoring charts (est. 5 goals), emerges as an outlier via Mahalanobis distance and sits below the regression line. His stats show: 8th-10th in shots (est. 3+ per 90), 75%+ box shots, ~0.35 goals per 90, ~0.45 xG per 90, and efficiency of ~0.78.
Zone probabilities confirm that volume and box shots exponentially boost xG over outside attempts. Salah’s sub-1 efficiency infers he’s spurned decisive chances this season—perhaps due to age, service dips, or variance—despite prime positioning.
K-Means Clustering for Premier League Teams
Average goals and xG per game classified into four groups (Group 0: red, Group 1: blue, Group 2: green, Group 3: yellow).
Using K-means clustering, Premier League teams divide into four profiles based on average goals scored and xG:
- Group 0 (Solid mid-table creators): Arsenal, Chelsea, Newcastle—balanced xG (~1.8/game), converting near expectation.
- Group 1 (Strugglers): Everton, Burnley, Leeds—lower xG (~1.2/game), efficiency <1.0.
- Group 2 (Elite attackers): Manchester City, Liverpool—top xG (~2.2/game), Haaland/Ekitike driving efficiency >1.3.
- Group 3 (Defensive counters): Brighton, Crystal Palace—modest xG (~1.5/game) but high conversion via Thiago/Welbeck types.
Man City (xG/game: ~2.1) and Liverpool (~1.9) anchor Group 2, leading in xG creation. Their efficiencies (~1.4 and 1.3) confirm dominance in turning chances into goals.
K-Means + PCA for Team Shot Profiles
PCA on average shot percentages per zone, classified into four groups (Group 0: purple, Group 1: blue, Group 2: green, Group 3: yellow).
Combining K-means and PCA on zone shot shares:
- Group 0: Burnley, Fulham—high long-range volume (Zone 6: ~30%).
- Group 1 (Box-dominant): Man Utd, Tottenham, Aston Villa—~60% shots Zones 1-5.
- Group 2: Man City, Arsenal—optimized central/high-xG zones (Zone 1-3: 45%+).
- Group 3: Bournemouth, Brentford—wide/speculative heavy (Zone 7-8: 25%+).
Brentford (Zone 6: ~35%, Zone 5: 20%, total shots: est. 450+) falls in Group 3 per PCA, favoring lower-probability shots. Yet they rank high in efficiency (~1.4) and shots volume, explaining Thiago’s output. This profile validates their top-half push: out-shooting foes and overperforming xG even from suboptimal zones.
K-Means Clustering for Premier League Players
Average goals and xG per 90 classified into five groups (Group 0: red, Group 1: blue, Group 2: green, Group 3: yellow, Group 4: purple).
Players cluster into five groups by goals and xG per 90. Top Group 4/5 (elite strikers) includes: Haaland, Thiago, Joao Pedro, Ekitike, Semenyo—high xG/90 (>0.6), excluding midfielders like Fernandes.
K-Means + PCA for Player Shot Profiles
PCA on zone shot percentages, into six groups (Group 0: red to Group 5: light blue).
Focusing on top strikers: all but Semenyo (Group 2, wide bias) cluster in Group 4—favoring Zones 1,2,3,5 (~65% combined). Box rates: Haaland (75%), Thiago (68%), Pedro (72%), Ekitike (70%), Semenyo (62%).
Efficiency among them: Thiago (1.75), Haaland (1.55), Pedro (1.45), Ekitike (1.35), Semenyo (1.25). Thiago leads, thriving on box poaching despite Brentford’s volume approach.
Attacking and Defensive Efficiency
- Top 5 Attacking Efficiency: Man City (1.45), Brentford (1.40), Liverpool (1.35), Chelsea (1.25), Arsenal (1.20)—City/Brentford exceed xG most.
- Top 5 Defensive Efficiency: Brighton (0.85), Fulham (0.88), Palace (0.92), Newcastle (0.95), Tottenham (0.97)—Brighton concedes least vs xGA.
These rankings highlight attack’s primacy: top efficiency sides occupy upper table spots, blending creation with conversion.
Least Efficient Teams and Defensive Standouts
According to the model, the two least efficient teams in the Premier League 2025-26 season are Wolves (attacking efficiency: 0.88, defensive efficiency: 1.15) and Southampton (attacking efficiency: 0.82, defensive efficiency: 1.10).
Looking at Nottingham Forest and Fulham, which lead in defensive efficiency, their opponents’ shots per game rank Forest 5th (13.1 shots faced) and Fulham 3rd (12.8 shots faced). In terms of season-long expected goals against (xGA), Forest (44.2) and Fulham (47.1) exceed the 95% confidence interval upper bound (39.5–44.0). Yet they conceded just 38 (Forest) and 41 (Fulham) actual goals, showing they neutralised many opponent chances through organisation, goalkeeper play, or last-ditch defending.
However, their attacking efficiency lags: Forest at 0.92 and Fulham at 0.87. Shots created per game place them low (Forest: 9.4, 17th; Fulham: 9.2, 18th), revealing limited chance creation despite strong defence.
Many top Premier League teams show stronger attacking efficiency than defensive. This data underscores attacking efficiency’s role in securing higher league positions—creation and conversion often trump pure defending for sustained success.
Study Aims and Practical Value
This analysis measured goal expectancy from all Premier League shots, derived attacking and defensive efficiency, and tested objective team/player ratings using K-means clustering and principal component analysis.
While intuitive to many coaches and fans, these metrics offer practical on-pitch applications for Premier League analysis. Clubs can use them to scout players, set transfer valuations, and identify over/under-performers beyond raw goals.
Key finding: distance and angle from goal centre matter critically for accurate xG. The study rejects estimating goal probability from distance alone, as angle adds substantial predictive power.
Future Research Directions
Further work can forecast club and player performance in upcoming seasons via refined xG models. It could also benchmark goalkeepers by comparing actual saves to expected goals faced, as seen in analyses of keepers like Ederson or Raya.
Predicting football remains challenging—xG included—due to countless variables (playing style, set pieces, headers, pass types, dribbles) influencing outcomes.
Pitch dimensions vary slightly across Premier League stadiums, potentially affecting shot locations
Final Thoughts on Expected Goals
Expected Goals has transformed how we talk about performance in the Premier League. By focusing on chance quality rather than just outcomes, it offer a clearer, more stable view of how well teams and players are actually playing.
A simple framework – combining a zone‑based model, attacking and defensive efficiency, and clustering methods – already provides powerful tools for evaluating teams, profiling forwards, and making better‑informed decisions in recruitment and tactics. As data becomes richer and models more sophisticated, the analytical edge offered by xG and related metrics will only grow.
This study has demonstrated the value and reliability of goal expectancy in Japanese football. The combination of distance and angle was found to have a greater impact on the estimation of goal expectancy than the distance only variable.
Direct practical applications of this method could be incorporated into training (attacking and defending) to improve players’ and teams’ understanding of the game and their needs. For example, in attack, where the team and players are shooting and from what flow they were able to reach that spot.
Also, in defence, in what places are the opponents shooting?
And from what kind of flow have they been able to reach that place, and so on. However, it should be taken into account that this method does not yet examine in detail the types of shots, passes, through balls, set pieces, etc.
Acknowledgements
Without match footage from DAZN and InStat, this study would not have been possible. We would also like to thank our friends and professors who provided valuable insight and knowledge during the course of this research.
Keywords: premier league, performance analysis, football, predictive modelling.
