If you can't tell, we love statistics. I'm not just talking about looking at box scores and calculating free throw percentage. I'm talking about the boring kind most people avoid while in college. It's somewhat sad, and not too great with the ladies, but I'd say most nights team Ziguana is up debating the distribution of Kobe's shot attempts. Or better yet, being woken up at 4:00 a.m. to, "How in the world do you divide two Gaussian variables?!" Anyways, for those of you who ask similar, albeit more reasonable questions, let me try and shed some light on what goes on behind the scenes. I have broken it down into the main features we offer.
- Player Consistancy Rating
- Lineup Optimization
- Projecting Head-to-Head Matchups
- Free Agent Suggestions
You can find a good deal about our "zRanks" here and here already, so I'll try and be brief on this topic. We set out to create a ranking
system that could be customized to your league's categories. This involves both a ranking system for each category, and a way to compare one category to another. In statistics, this is known as
normalization or standardizing a variable. Essentially, you remove the unit of measurement. This standard score is often referred to as a z-score, thus the birth of zTotals, zRanks, and Ziguana.
This is no miracle formula, from what I understand most legitimate ranking services attempt to do the same thing. However, as I mentioned before, we won't sleep until we've found the most robust solution to any algorithm we deliver. To ensure the most effective rankings we remove the outliers (i.e. players that "DNQ"). Also, we need to handle ratio categories (i.e. FG%) differently from the rest. This formula is where we build our competitive advantage, so I can't explain the secret sauce, but we ensure that a player who only shoots two field goals per game isn't given as much credit as Dwight Howard. [top]
Here's where we kick it up a notch. Projections attempt to predict the future given historical data. There's a bunch of experts that will try and deliver their predictions, but I don't believe in psychics.
I believe in numbers. There are three steps to a good projection algorithm.
First, you need to understand the nature of the world you're trying to model. Any baseball fan will tell you that a player who hits .400 for half a season won't be able to keep it up after the All-Star break. Why? Well, baseball is a game of large numbers. There's so many games that regression to the mean, or the true average, is inevitable. While this phenomenon is true in every sport, basketball has another factor that can create opposing momentum. The amount of playing time (or minutes per game) is critical to determining a player's final line. If a former bench player becomes a starter, he's bound to produce more, simple as that.
Once you understand the driving forces in your model, you need to find appropriate historical time periods. We've decided that limiting our scope to the past two seasons leads to the smallest prediction error. We also front weight our model, so the more current games hold the most weight. Next, we throw in a few trend factors that handle regression to the mean, an increase in playing time, etc. to fit the sport we're dealing with. This builds our foundation for "before season" projections.
The final piece comes into play once the season actually kicks off. As great as "before season" projections are, we don't want to leave out any new data that will help us make more informed decisions. Here I call on one of my favorite algorithms, known as Empirical Bayes estimation. It is an elegant way of taking in information from a group, and using it to estimate the performance of each member. The more variance a member has in their performance, the more likely they are to regress to the population mean. Of course, when the season starts these variances are quite high, so Empirical Bayes automatically knows that things will "level off." If you want to know more about this method, I suggest you read a paper by Lawrence Brown or Wenhua Jiang on predicting baseball batting averages.
Finally, to bring it all together we combine "before season" projections with our Empirical Bayes estimations such that once the season is over, the "in season" projections will match that season's performance. In this way we can do simple calculations that offer "rest of season" predictions as well.
Note, some of these methods are still in production and not currently live on the site. Stay tuned! [top]
Rankings are great, but most people can tell you LeBron James is better than Derek Fisher. But, which is more consistent? Our "Consist" metric tracks a player game by game and gauges how similar
each of these performances are. Ziguana measures the statistical variance between each game, for each statistic you care about (i.e. PTS and REB for basketball or Wins and HR for baseball).
Now, if you're into statistics, you'll know that the more volume (i.e. the more points a player scores), the more likely their variance is higher. A metric that correlates with the volume outputted by each player wouldn't be too helpful (Lebron would always be worse than Fisher), so our metric adjusts for this. In the end, Consist ranges from 1 to 10, the higher the rating the more consistent the player. [top]
Fantasy tools are awesome, but actual league analysis is what Ziguana strives for. This begins by understanding who belongs in your starting lineup, because
a triple double on your bench means absolutely nothing to your standings. Picking an optimal lineup is a simple task for most managers, but a complex one for computers. Assume you have a 30 man
roster with 15 starting spots. Believe it or not, but if everyone can play any position, that's over 100 million possible combinations!
But, have no fear, we've got an algorithm to speed up this optimization. Also, for baseball we track probable starters and the probability a reliever is going to pitch each day to give the most desirable lineup possible. [top]
My favorite leagues are head-to-head because you get to rub a victory in your friend's face each week. So, at Ziguana we've spent a ton of time exploring the probabilities behind the matchup.
It works like this, take every starter on your team and every starter on the opposing squad and find the expected totals. To do this, take the per game averages of each player and multiply that by the
number of games they are in your starting lineup. Add up these expected results, by category, and you'll get a total for your team.
But wait! Expected totals are cool, but so what if I'm projected to hit 7 HR's and my opponent is projected to hit 9? Does this mean I should focus on this category? To answer this question you need to know the odds of winning each category. We look at every possible outcome, not just the expected one, and assign probabilities to each of these results based on the variance of your players' performance. This allows us to give you that pretty graph on the matchup page that details the probability you win, lose, or tie each category.
Lastly, we combine these category probabilities to project the most likely outcome for your week, as well as the odds to win any number of categories. Because all you want to do is win, you don't care how. [top]
Currently, we suggest free agents based on your head-to-head matchup. The probability that you win a category, discussed above, goes into calculating "weighted ranks" for free agents. It's a simple algorithm that looks at how important each category is. If you have a 100% chance of winning stolen bases, we shouldn't suggest you add Rajai Davis (a stolen base specialist). Instead, we choose the players that will most help the categories your team will be in a dogfight to win! [top]