|
'Rating'
English football league teams
Yardstik can be applied to the English football leagues
with interesting results.
Because it doesn't impose any artificial contraints on the
fixture list, but works with whatever data is provided, Yardstik
is not restricted to a single, completed season for one league.
It can simultaneously process as many seasons as required
completed or not whilst still giving greater
preference to the more recent results. This makes it possible
to generate a consolidated rating for all four English leagues
plus the Conference league. The games that relegated and promoted
clubs play in their previous seasons provide the bridging
evidence between leagues.
Summary
Charts and results are provided in detail in 'Leagues:
Analysis'.
Yardstik
ratings are based on goal differences. A rating difference
between, two clubs implies a similar
goal difference between those clubs.
The chart on the left shows the league difference between
the top and bottom teams of each league and the extent of
the overlap between successive leagues.
Average goal differences at the top and bottom are, as might
be expected, quite large: four goals in the Barclays Premiership
and three and a half goals in the Conference league. In the
intermediate leagues the difference is much less: two goals
in the Coca-Cola Championship and only one goal in Coca-Cola
leagues one and two.
The degree of apparent overlap in the intermediate leagues
is suprisingly high. On the basis of this analysis, clubs
could be promoted from the Coca-Cola League Two to the Cocal-Cola
Championship, or from the top of the Conference League to
Coca-Cola League One. Certainly there appears to be a case
for much larger promotion/relegation bands particulalrly
within the intermediate leagues.
The
reason that ratings are compressed in the intermediate leagues
is that footballing talent clusters around a mean just like
any other skill or aptitude. Indeed when averaged across a
team of eleven players, the clustering phenomenon might be
expected to be more pronounced.
The histogram of team 'ratings' shown in the chart to the
right exhibits a bell curve very similar to the example shown
in 'Yardstik in Operation'.
Comparisons of the league tables versus Yardstik ratings
can be found in 'Leagues: Analysis'
Interpretation
The Yardstik algorithm rewards consistency in determining
its team ratings. It concerns itself with which teams are
beaten, as much as how many teams are beaten. One reason why
these team ratings are less discriminating than, and sometimes
even inconsistent with, conventional league tables is the
that for many teams the the comparative status of the opposition
seems to have little impact on the outcome of a game. The
Yardstik ratings suggest that there is a comparatively modest
gradient in talent from the bottom of Coca-Cola League Two
to the top of Coca-Cola League One. In these divisions, differences
in position apparently owe rather more to luck than they do
in lower or higher divisions. If it seems heretical to suggest
that team rankings are as much a matter of chance as they
are of skill, consider for a moment, whether the league tables
would be 'flat' even if the teams were actually clones
of one another but still subject to the unpredictabilities
of wind, weather, health, environment, disposition, refereee
and, of course, opposition.
Another reason that Yardstik results differ from league rankings,
is that it 'ages' results progressively. A match played later
in the season is considered more relevant than a game played
at the beginning. This means that Yardstik has absolutely
no problem bridging seasons. The off-season represents nothing
more than three months' additional 'ageing'. If the objective
is to assess comparative football talent, it is perverse to
consider the evidence of nine months to be of equal value
whilst the evidence of anything older is of zero value
although with suitable changes to data and input parameters,
those conditions could easily be applied.
'Home Advantage'
If the yardstik algorithm is applied directly to the results
of individual football games, the resultant ratings are more
compressed than those shown here.
Home advantage is an obvious source of possible distortion.
In standard mode, Yardstik assumes that, except for chance,
the outcome of a match between A and B reflects a difference
in football playing ability and nothing
else. But if the result is biased in A's favour by
the advantage confered by playing at home, then it is no longer
reasonable to expect an A v B match to have the same outcome
as a B v A match.
To obtain the results given here two approaches were employed.
In the first case, home and away games were matched and aggregate
scores alone were used to generate ratings. This is the 'single-pass
variant'. The data fed to the program looked like this:
Crystal_P Arsenal 1:1,1:5 26/12/2004
The first result is as it would normally be declared: Crystal
Palace 1 : Arsenal 1, whilst the score and the team order
of the second have been swapped from the original: Arsenal
5 : Crystal Palace 1. The new date is an 'average' of the
original dates.
The data processed by Yardstik is the 'average' result: Crystal_P
v Arsenal 1:3. (This is why ratings are delivered in units
of 0.5).
The disadvantage of this approach is that it invalidates
all unmatched games. This means that until comparatively late
in this season, few recent comparisons are available to Yardstik.
To address this problem a second approach was devised. It
is a two-phase algorithm (the double-pass variant) which uses
the ratings that emerge from the 'single-pass variant' to
infer team-specific home advantages. These can then be used
to remove the home-advantage-effect from all results (whether
matched or not).
The data fed to the program now incorporates all matches.
Thus:.
.
.
Birmingham_C Charlton_Ath 1:2,1:1 25/01/2004 (matched)
Birmingham_C Chelsea 0:1 21/08/2004 (unmatched)
Birmingham_C Crystal_P 0:1,0:2 29/12/2004 (matched)
Birmingham_C Everton 0:1 13/11/2004 (unmatched)
.
.
The raw output looks like this:
Player, Player-group & Rating listed in descending order of Rating
Discounted Player Home
Group Player Matches Rating Adv.
.
.
1 Telford_U 25.5 99.8 0.0
1 Torquay_U 59.7 99.8 0.0
1 Brighton_&_HA 58.4 99.7 0.4
1 Plymouth_Arg 58.4 99.7 0.9
1 Rotherham_U 58.3 99.7 0.7
1 Wimbledon 28.2 99.7 0.5
.
.
League Table Comparisons
Yardstik results can be compared with the League tables.
This is the Premier League table as of 30/3/2005. Yardstik
ratings are based on games up to and including 28 March 2005.

The output from three Yardstik runs is summarised
here. The first two columns use an ageing factor of 0.1. That
places significant weight on the result of the 2003/4 and
2002/3 seasons. The first column (starting with Chelsea at
104.0) has been calculated using the 'single-pass variant'
of the program, whilst the second column has been calculated
with the 'double-pass variant' i.e. it exploits estimates
of home advantage calculated in the first pass). The third
column is the estimated home advantage (where statistical
analysis suggests it to be significant at a level of 10%).
The fourth and fifth columns are based on the single-pass
variant using a faster 'ageing factor' of 0.5, that should
make them more directly comparable with the 2004/5 league
results.
Home Advantage: Interpretation
'Home advantage' is calculated as an average
of the residual errors that arise when the predicted result
(based on rating differences) differs from the actual goal
difference. It is attributed to 'home advantage' because the
average is tabulated on the basis of the home team. Viewed
entirely dispassionately, it would have been equally valid
to calculate an "away advantage" or to analyse the
predictive residuals in terms of closeness to end of season,
incidence of rain, or even direction of wind.
To a football agnostic, these estimates of 'home
advantage' do pass at least one test of plausibility. They
are almost all positive, the marked exceptions being those
of the top three teams. Manchester United's home advantage
fails to clear a 10% significance test, but would be negative
if it did. Why top teams should have a negative home advantage
is a mystery, but a possible reasons include: support bases
that are national, rather than regional, a predisposition
to field weaker teams at home, or confidence that a smaller
goal-advantage should suffice for matches played at home.
yardstik.ini
The single- and double-pass variants of the Yardstik algorithm
necessitate some changes and additions to the yardstik.ini
file:
| Parameter |
Description |
Default |
| Modified parameter |
| Ifirstto |
If the match (or game) has a scoring objective, then
Ifirstto ('First to') defines that objective.
Thus 'Best of 5' implies a value of 3 for Ifirstto
and 'First to 6', to 9, to 15, or to 21 ... imply values
of 6, 9, 15 or 21 for Ifirstto. Requirements
to win by two clear points are treated specially.
If the match has no fixed objective, and 'isUnlimited'
is set to 1, then Ifirstto is interpreted as
a 'comfortable' advantage which the opposition is unlikely
to reverse. In the runs that have been described above
a value of 2 has been used for this parameter. Some
people might argue that most clubs wouldn't rest until
an advantage of 3 had been achieved.
|
3
|
| Added parameters |
| isHomeAdv |
If home advantage is considered to
be a factor in match outcome then the format of the data
must be changed (as previously described) and this parameter
should be set to 1 or 2. If isHomeAdv = 1 then
the single pass variant algorithm is used. If isHomeAdv
= 2 then the double pass variant is used. In either case
home advantage estimates are calculated, and tested against
the value of FsigHomeAdv. |
0
|
| isUnlimited |
This parameter must be set to 1 if
match scores are unlimited. (e.g. if the match is of fixed
duration.) |
0
|
| FsigHomeAdv |
This parameter is used to test whether
or not a home advantage estimate is sigificantly different
from zero. If that probability is less than FsigHomeAdv
then the home advantage estimate is printed and used in
the second pass of the two-pass algorithm. |
0.05
|
Notes:
1) But not necessesarily the same goal
difference. Not only is chance involved, but so is the principle
maximum efficiency or minumum effort. There is little point
in winning by (say) five goals if two will do. Not only is
less effort involved, but the risk of injury is reduced. The
concept is discussed in greater detail here.
2) except when modified by the maximum
efficiency or minumum effort principle.
|