Atomic Football


Atomic Football is proud to be a part of AL.com Sports!


For 2008, we did not officially receive any Prediction Tracker awards since we trailed the "betting line" or computer average in most categories. However, among individuals, we finished FIRST in "mean square error" (accuracy), FIRST in "absolute error," and FIRST in "percent correct." Not a bad year at all.

For 2009, again no awards. Among individuals, we were FIRST in "mean absolute error" for the second half of the season and even beat the "opening line." We also finished second or third among individuals in four other categories. Again, still a good year.

For 2010, again no awards. Among individuals, we were FIRST in "mean absolute error" for the season. We also finished third among individuals in one other category.


Here are some newspaper articles that have been written about AtomicFootball. See what others are saying about our rating system.

  • 22 October 2006 PDF
  • 8 July 2007 PDF
  • 9 July 2007 PDF


Please check out our site and let us know what you think. Comments and suggestions are welcome.

contact us

The Algorithm

I first began tinkering with algorithms for rating college football teams in 1994. My first algorithm remained in note form until 1999 when I finally implemented it in an Excel spreadsheet. Excel's solver capability was employed to calculate the solution, often requiring several hours. (I have recently found that original algorithm to be virtually identical to Colley's system, part of the BCS and published in the Atlanta Journal-Constitution). Later in 1999, I implemented the algorithm in Visual Basic.

Because I was frustrated with the algorithm's inability to accommodate home field advantage and was uncomfortable with its statistical foundations, I began to formulate another algorithm, with my goals being a firm statistical foundation and no subjective inputs. In my opinion, a few of the algorithms whose results appear on the web have come close to achieving both of these objective for "score-based" (points only) ratings, but none had come very close to doing so for a "win/loss-based" system. With the BCS prohibiting "margin of victory" from their computer ratings, I began focusing on a fully self-contained self-consistent win-loss based rating system.

The product of that effort has been documented in our recent paper. It contains a detailed description of the mathematics, an abundant analysis demonstrating its self-consistency, and a brief comparison to the current BCS computer ratings. We had previously dubbed this method our "BCS-Compliant Rating" but have since dropped the name due to the fact that some have mistakenly inferred from this that our ratings represent a precise representation of how the BCS computers might rank divisions beyond the FBS. We will henceforth simply refer to this system as our Win-Loss Based Ratings.

At the same time, we continue to publish our old "Hybrid Rating." Currently, the Hybrid Ratings consists of a simple weighted average of two fully self-contained self-consistent rating systems. The first is our Win-Loss Based Rating. The second is our Score-Based Rating (which we do not publish but we do use for the score and win predictions).

As of this writing, time has not permitted us the opportunity to document our Score-Based Ratings in detail. Most importantly, like the Win-Loss Based Rating, it is self-contained and self-consistent. We have gone to tremendous lengths to minimize any subjective elements. Perhaps in the future we will have the opportunity to publish the details, as some of you have expressed interest.

In all of our work, the closest thing we have to a subjective input is the relative weighting given the two components of our Hybrid Rating. This single knob has been adjusted to give a decent match to Ken Massey's Rating Comparison. Some arbitrary combination was unavoidable to reflect a consensus with similarly arbitrary numbers of both predictive and retrodictive components. What have we proven by this? I don't know. Is there any value in the "collective wisdom of the masses?" If you have any ideas, please let me know.

Bear in mind that the two ratings that make up the Hybrid Ratings are, in and of themselves, completely self-contained -- they have no knobs. This does not mean that they have knobs that I choose not to adjust; it means the models simply have no place to logically introduce any adjustable parameters. I do not give xx% to strength of schedule or yy% to home-field advantage. The models naturally dictate the influence of these factors.

We would sum up all of this with our "Four Commandments of a Perfect Rating Algorithm" (and there are probably more I can't think of right now):

The Four Commandments of a Perfect Rating Algorithm
A Perfect Rating Algorithm is self-contained.
It should have no "knobs" or "tuning parameters." Knobs mean an algorithm is incomplete.
A Perfect Rating Algorithm has a solid statistical foundation.
It follows accepted practice.
A Perfect Rating Algorithm is able to inherently estimate its own accuracy.
A good statistical foundation is normally conducive to this.
A Perfect Rating Algorithm is capable of producing either measurable quantities or quantities from which measurables can be derived.
For example, the probability that one team will win over another.

Have I ever seen a perfect rating algorithm for college football? Not yet. But we're getting closer.

The Objective

Our objective has been to develop a model-based approach to the problem known to mathematicians as "ranking by pairwise comparison." What makes football rather unique among sports, and college football in particular, is the relatively small number of games. Given also movement in recent years to de-emphasize "margin of victory" and limit rankings to using only wins and losses (a favorable trend, in our opinion), the problem has not necessarily become easier. Using only wins/loss information, each game is reduced to a single bit of information. Thus, for the combined set of about 716 FBS, FCS, Division II, III, and NAIA teams scheduled to play about 3720 games in 2006, we have 3720 bits of data at the conclusion of the season. That is the equivalent of 465 bytes of information -- about the same amount of information as one verse of The Star-Spangled Banner. From this tiny bit of data, we hope to accurately rank over 700 teams. Is it any wonder that this is such a controversial problem? To learn more about the problem and our approach to solving it, check out our paper.

The Paper

After several years of development, we have finally produced a paper describing our Win-Loss Based algorithm. The paper, A Bayesian Mean-Value Approach with a Self-Consistently Determined Prior Distribution for the Ranking of College Football Teams, is available in PDF format for download from arXiv.org. For those who may not know, arXiv.org is an archive for electronic preprints of scientific papers in the fields of physics, mathematics, computer science and biology originally hosted at the Los Alamos National Laboratory, now hosted and operated by Cornell University and mirrored worldwide.

PDF Paper

We want to thank the folks at Konquest.org for their comments and review of our paper.


In section five of our paper (Analysis), we speculated that analysis of sets of three teams that each played the other two might be useful as a basis for a second metric by which the "mean schedule variance" (MSV) and "mean team variance" can be uniquely determined independently of the model. At that time, we did not pursue this because of concerns that the matchups in these "triplets" might not be representative of the population. Since that time, we have become increasingly curious about the potential outcome of such an endeavor -- despite the reservations. Recently (spring, 2007), we have succumbed to our curiousity, and here is the product of that work.

First of all, we needed an appropriate metric. Guided by our original suggestion that we examine the relative frequency with which the three teams all go 1-1 (what we shall call a "split"), we derived the following:

From this equation, we were able to determine an estimate for the frequency of splits from the MSV and MTV values derived by the ranking algorithm.

Next, we developed the code necessary to calculate the actual frequency of splits for a given season. In running the code for the 2001 to 2005 seasons, we typically found about 6000 triplets within each season. Below is a summary of the results.

Season Predicted Actual Difference
2005 9.84% 9.44% 0.40%
2004 9.51% 9.54% -0.03%
2003 8.78% 8.94% -0.16%
2002 9.26% 8.93% 0.33%
2001 9.24% 9.09% 0.15%

We also estimated that a sample size of ~6000 and a frequency of ~10% correspond to a precision of 0.39% (one sigma), consistent in magnitude with the RSS error of 0.25% in our relatively small sample.

At this point, we have only just begun to explore the effects of home field advantage on this metric. The effect is complicated by the fact that home field advantage is not necessarily random, but is instead correlated with the relative strength of the teams (better teams are more likely to play at home). Depending upon the magnitude of the correlation, home field advantage appears to have the potential of either increasing or decreasing the frequency of splits.

Nevertheless, neglecting the potential effects of home field advantage for now -- which we estimate to be less than 0.5% on this metric -- the results are very encouraging given that the differences between predicted and actual values are small, consistent with the precision estimates, and apparently free of any significant biases.

Given that these results represent the last missing piece in the puzzle for validating our algorithm, we feel that we can say with confidence that our algorithm and the model upon which it is based represents the first and only published, self-contained (no knobs), win-loss based college football ranking system.

If you have any information to the contrary, please let us know. We would love to hear from you.

Jim Ashburn
August 23, 2007


Valid XHTML 1.0 Transitional   Valid CSS!