The Algorithm
I first began tinkering with algorithms for rating college football
teams in 1994. My first algorithm remained in note form until 1999 when I
finally implemented it in an Excel spreadsheet. Excel's solver capability
was employed to calculate the solution, often requiring several hours. (I
have recently found that original algorithm to be virtually identical to
Colley's system, part of the BCS and published in the Atlanta
Journal-Constitution). Later in 1999, I implemented the algorithm in
Visual Basic.
Because I was frustrated with the algorithm's inability to accommodate
home field advantage and was uncomfortable with its statistical
foundations, I began to formulate another algorithm, with my goals being a
firm statistical foundation and no subjective inputs. In my opinion, a few
of the algorithms whose results appear on the web have come close to
achieving both of these objective for "score-based" (points only) ratings,
but none had come very close to doing so for a "win/loss-based" system.
With the BCS prohibiting "margin of victory" from their computer ratings, I
began focusing on a fully self-contained self-consistent win-loss based
rating system.
The product of that effort has been documented in our recent paper. It
contains a detailed description of the mathematics, an abundant analysis
demonstrating its self-consistency, and a brief comparison to the current
BCS computer ratings. We had previously dubbed this method our
"BCS-Compliant Rating" but have since dropped the name due to the fact that
some have mistakenly inferred from this that our ratings represent a
precise representation of how the BCS computers might rank divisions beyond
the FBS. We will henceforth simply refer to this system as our Win-Loss
Based Ratings.
At the same time, we continue to publish our old "Hybrid Rating."
Currently, the Hybrid Ratings consists of a simple weighted average of two
fully self-contained self-consistent rating systems. The first is our
Win-Loss Based Rating. The second is our Score-Based Rating (which we do
not publish but we do use for the score and win predictions).
As of this writing, time has not permitted us the opportunity to
document our Score-Based Ratings in detail. Most importantly, like the
Win-Loss Based Rating, it is self-contained and self-consistent. We have
gone to tremendous lengths to minimize any subjective elements. Perhaps in
the future we will have the opportunity to publish the details, as some of
you have expressed interest.
In all of our work, the closest thing we have to a subjective input is
the relative weighting given the two components of our Hybrid Rating. This
single knob has been adjusted to give a decent match to Ken Massey's
Rating Comparison. Some arbitrary combination was unavoidable to
reflect a consensus with similarly arbitrary numbers of both predictive and
retrodictive components. What have we proven by this? I don't know. Is
there any value in the "collective wisdom of the masses?" If you have any
ideas, please let
me know.
Bear in mind that the two ratings that make up the Hybrid Ratings are,
in and of themselves, completely self-contained -- they have no knobs. This
does not mean that they have knobs that I choose not to adjust; it means
the models simply have no place to logically introduce any adjustable
parameters. I do not give xx% to strength of schedule or yy% to home-field
advantage. The models naturally dictate the influence of these factors.
We would sum up all of this with our "Four Commandments of a Perfect
Rating Algorithm" (and there are probably more I can't think of right
now):
| The Four Commandments of a Perfect Rating Algorithm |
A Perfect Rating Algorithm is self-contained.
It should have no "knobs" or "tuning parameters." Knobs mean an
algorithm is incomplete. |
A Perfect Rating Algorithm has a solid statistical
foundation.
It follows accepted practice. |
A Perfect Rating Algorithm is able to inherently
estimate its own accuracy.
A good statistical foundation is normally conducive to
this. |
A Perfect Rating Algorithm is capable of producing
either measurable quantities or quantities from which measurables can
be derived.
For example, the probability that one team will win over
another. |
Have I ever seen a perfect rating algorithm for college football? Not
yet. But we're getting closer.
The Objective
Our objective has been to develop a model-based approach to the problem
known to mathematicians as "ranking by pairwise comparison." What makes
football rather unique among sports, and college football in particular, is
the relatively small number of games. Given also movement in recent years
to de-emphasize "margin of victory" and limit rankings to using only wins
and losses (a favorable trend, in our opinion), the problem has not
necessarily become easier. Using only wins/loss information, each game is
reduced to a single bit of information. Thus, for the combined set of about
716 FBS, FCS, Division II, III, and NAIA teams scheduled to play about 3720
games in 2006, we have 3720 bits of data at the conclusion of the season.
That is the equivalent of 465 bytes of information -- about the same amount
of information as one verse of The Star-Spangled Banner. From this tiny bit
of data, we hope to accurately rank over 700 teams. Is it any wonder that
this is such a controversial problem? To learn more about the problem and
our approach to solving it, check out our paper.
The Paper
After several years of development, we have finally produced a paper
describing our Win-Loss Based algorithm. The paper, A Bayesian
Mean-Value Approach with a Self-Consistently Determined Prior Distribution
for the Ranking of College Football Teams, is available in PDF
format for download from arXiv.org. For those who may not know, arXiv.org is an
archive for electronic preprints of scientific papers in the fields of
physics, mathematics, computer science and biology originally hosted at the
Los Alamos National Laboratory, now hosted and operated by Cornell
University and mirrored worldwide.
|