Monday, December 8, 2014

Relationship between advanced stats and winning

While a growth in advanced metrics provides a more robust picture of a player's value, the true measure of a metric should be it's ability to correlate with his team winning. I took six advanced metrics from basketball-reference.com for the 2014-15 season* and focused on only players that had played a minimum of 100 minutes this season through the December 5 games (this resulted in a population of 342 players). 

I compared each player's team's winning percentage (win %) with i) usage %, ii) player efficiency rating (PER), iii) true shooting % (TS%), iv) box plus-minus, v) value over replacement player (VORP) and vi) win shares per 48 minutes (WS/ 48). The definition for each are pulled mostly from basketball-reference.com's glossary. Below each definition the specific matrix is plotted on the vertical/ y axis and win % is plotted on the horizontal/ x axis. 

Each blue dot represents one of the 342 players that qualified. I included a yellow linear regression line to exhibit the direction of the relationship. If there is a very strong relationship between the metric and win % then the yellow line will begin at the bottom left of the chart and rise steeply to the top right of the chart, which would indicate that as the metric increases so too does win percentage. I include an R-square to indicate how well the model fits the data (or how well the metric explains win %). The lower the R-square means that the metric does a poorer job explaining win %.

Usage %Usage Percentage (available since the 1977-78 season in the NBA); the formula is 100 * ((FGA + 0.44 * FTA + TOV) * (Tm MP / 5)) / (MP * (Tm FGA + 0.44 * Tm FTA + Tm TOV)). Usage percentage is an estimate of the percentage of team plays used by a player while he was on the floor.



PER: A per minute rating that adds accomplishments and subtracts failures (see formula here).



TS%True Shooting Percentage; the formula is PTS / (2 * TSA). True shooting percentage is a measure of shooting efficiency that takes into account field goals, 3-point field goals, and free throws.



Box plus-minus: A box score estimate of the points per 100 possessions a player contributed above a league average player, translated to an average team.



VORP: A box score estimate of the points per 100 team possessions a player contributed above a replacement-level player, translated to an average team.



WS/ 48: An estimate of the number of wins contributed by a player per 48 minutes (league average is approximately .1000).


The above shows that while all six of these advanced metrics has a positive correlation with team win%, usage, PER and TS% are weakly related (they have relatively flat lines). Box plus-minus, VORP and WS/ 48 have a stronger positive relationship but the r squares indicate that they only explain ~15-18% of the relationship between the specific metric and team win %. However the includes a player's team's wins are included as a component in the formula for ws/ 48, thus the stronger positive correlation and larger R-square with ws/ 48 are expected. As a result, box plus-minus and VORP, which includes box plus-minus in its formula, should be considered the best metrics in terms of correlating with team win % that don't also include a team's wins in the formula (based on this small sample).

*A major limitation of this approach is that it focuses only on one incomplete season (2014/15). I will try to run a similar analysis on these metrics over several completed seasons, which should give a more accurate picture of the correlation with winning. 

Sunday, November 30, 2014

Scoring efficiency (PTS/ FGA)

There are many measures of shooting efficiency however one that tends to get over-looked is the most simple--points (PTS) per field goal attempt (FGA). The measure is intuitive and gives some context on players that are effective three points scorers or score from the free throw line. (A limitation of this measure of efficiency is that it disproportionately benefits players that are frequent and efficient free throw shooters because a foul on a non-scoring shot can result in two or three points and zero FGA.)

I pulled data from basketball-reference for all games as of November 26, 2014 and calculated PTS/ FGA. Below is a chart that captures the 367 players with at least 10 FGA this season. Clearly, there is significant variance between the five best players, who average almost two points per FGA, and the five worst players, led by Kyle Anderson, who averages a quarter of a point per FGA. But outside of Brandan Wright, all of the names listed below have relatively few FGA.


So next I filtered the population down to the 24 players with 200 FGAs or more. This second chart shows that among these high frequency shooters, James Harden and Anthony Davis are the most efficient scorers and that Kobe, Serge Ibaka and Kemba Walker are among the least efficient. (The fact that James Harden is ranked the most efficient reflects the deficiency in this measure--he leads the league in FTA and is an exceptionally efficient free throw shooter at ~90%. However Kobe is second in FTAs but his ~79% FT shooting can't compensate for his pedestrian non-FT shooting.)  

To give some context to this chart the league average for PTS/ FGA is 1.22 (median is 1.20) so every player from John Wall down is below the league average in terms of efficiency. While below league-average PTS/ FGA efficiency may be a concern for big men, like Aldridge, Griffin and Jefferson, point guards, like Wall and Walker, can point to the fact that they also benefit overall team scoring efficiency through assists.

Friday, November 7, 2014

Kobe: Career 50 Game Moving Averages

After the most recent 14 for 37 shoot-first-ask-questions-last loss, I started thinking that 2014 Kobe might simply be a bad player that gets to shoot a lot. In other words, offensively, he produces what any marginal NBA player would given the green light to shoot the ball 30+ times a game. But I wanted to understand if the data supported this perception. There are countless approaches to measuring this, and the greatest challenge of measuring my perception of Kobe's contemporary futility is that he has only played 11 games over the last two seasons so I'm dealing with an admittedly small sample size to account for 'current' performance.

One way to address this is to create a moving average of his past performance. I pulled all of Kobe's regular season game logs from basketball-reference.com and included only games in which he appeared (as of his last game on Tuesday night, Kobe has appeared in 1,250 regular season games). The benefit of a moving average is that it smooths performance (in this case over a 50 game period) so anomalously exceptional or futile performances are regressed to the mean of the 49 previous games.

To create the moving average I don't report Kobe's first 49 games because "game 1" is essentially the mean of his first 50 games. ("Game 2" is the mean of games 2 through 51, "game 3" is the mean of games 3 through 52, etc.) I created a moving average for FG%,  assists, points, total rebounds, and steals (all plotted below). I also included his career regular season average in each of those categories to provide some context. What does it show? First, that it took Kobe a few years to adjust to the NBA game as he has a sharp increase in every category for his moving average games 0 to about 400. This is expected as he was drafted out of high school and he received significantly less playing time his first three seasons.

If Kobe was truly becoming an ineffective player then we would expect each chart to show somewhat of a bell curve where the plot rises from early-career inefficacy to a mid-career peek and then drops to reflect a current state of inefficacy. But that doesn't happen. For the most part Kobe is right around his career average in these conventional statistics with the exception of assists where he is becoming much more prolific relative to the rest of his career.





Friday, September 26, 2014

Offensive and Defensive Rating & Win %

Looking at team data, I was interested in understanding how effective team offensive and defensive rating serve as a measure of success. Offensive rating measures a team's points scored per 100 possessions. Conversely, defensive rating measures a team's points allowed per 100 possessions. To measure success I used regular season win %. I pulled historical team data from the 1950-51 through the 2013-14 regular seasons from basketball-reference.com. Because basketball-reference.com doesn't provide raw data for analysis, I used excellent code I found here*, which allowed me to loop the URLs for all teams for all seasons.

Before I simply plotted individual team offensive and defensive rating, I looked at the league average for each measure over the last 63 seasons to see if there was a trend. The chart below, which plots NBA regular season mean offensive rating on the left side and mean defensive rating on the right side illustrate two things. First, the ratings mirror each other. This makes sense because for one team to have a more offensively productive season there has to be another team(s) that gives up those points. When aggregated across the league for a given year the offensive and defensive ratings become quite close (usually the difference is tenths or hundredths of a point).

Second, and more importantly, from the 1950s until the early 1980s NBA offensive production soared from about 85 points per 100 possessions to about 108 points per 100 possessions. (The NBA shot clock was introduced in the 1954-55 season so its not like a technical rule change was responsible for this increase.) This high offensive output plateaued in the 1980s and 1990s, but dropped to about 100 points per offensive possession in the late 1990s and early 2000s. By the mid-2000s it picked up slightly but even last season is about 5 points per 100 possessions off the league mean peak established in the 1980s and 1990s. (So defenses appear to be adjusting to prolific offenses.)

The chart below matters because I cannot simply plot a team's offensive or defensive rating for a given season against win % because there were teams in the 1950s and 1960s that had a  high win % but had a much lower offensive rating than teams that played in the 1990s (that had a low win % but had a higher offensive rating) on account of the offensive zeitgeist of the 1990s. (In other words, I had to take into account the change in league-wide scoring over time.)


To account for the increase in scoring over time, I used relative offensive and defensive rating, which simply subtract a team's offensive/ defensive rating from the league mean for that specific season. As a result, it would be clear that a team in 1958 with a relative offensive rating of +10 (or about 100 points per 100 possessions) was better offensively than a team in 1988 with a relative offensive rating of -8 (which is also about 100 points per possession). The two plots below chart relative offensive rating (left side) and relative defensive rating (right side) against win % for all NBA teams that played from the 1950-51 season through the 2013-14 season. I included a blue regression line to help illustrate the relationship. Clearly, relative offensive rating is very positively correlated with win % and relative defensive rating is very negatively correlated with win %. Both make intuitive sense.


Looking at the plots, I was interested in those anomalies over the last 63 years. Below is the same plot on relative defensive rating but without the color and regression line. I identified those teams that were exceptionally deficient or successful. In terms of relative defensive rating, some of the worst teams in the history of the NBA were the 1998-99 Denver Nuggets, the 1981-82 Denver Nuggets, and the 2011-12 Charlotte Bobcats (the Bobcats set the NBA record for the worst win % this season). Some of the best ever were the 1995-96 Chicago Bulls, the 2007-08 Boston Celtics, and the mid-1960s Boston Celtics.


The best teams historically in terms of relative offensive rating were the 2003-04 Dallas Mavericks, the 2004-05 Phoenix Suns and the 1995-96 Chicago Bulls. Some of the most offensively deficient ever were the 1998-99 Chicago Bulls, the 1987-88 LA Clippers and the 2002-03 Denver Nuggets. It should come as no surprise that in the 1995-96 season when the Chicago Bulls set the NBA regular season record for wins (72) they were one of the best offensive and defensive teams relative to all other teams in the that season in the history of the NBA.


*One of the many amazing things about the internet are sites, like stackoverflow or github, that allow very smart people to share or explain complex coding just because they want to help people or to solve challenging problem (or, presumably--for some--for validation/ recognition).

Wednesday, September 3, 2014

Melo & LeBron: Career 30 game moving averages

Carmelo Anthony is a unique player in the NBA in that he is granted superstar status and money despite only reaching the conference finals once in his 12 year career. His critics have questioned his effort on defense and the adverse effect of his individual offensive play on his team. All this begs the question: Is Carmelo Anthony really a superstar? Considering that a big part of being a superstar in the NBA is fan appeal and scoring--two criteria Carmelo certainly exceeds in--the answer is 'yes'. However I'm interested if he will at the minimum maintain his offensive output when he begins his first full season as a thirty year old. To get a better picture of his performance I collected data on each of his regular season games over the course of his entire career from basketball-reference.com.  

I focused on three conventional statistics--points, field goal attempts (FGA) and plus/ minus. I wanted to see how he progressed in each statistics from his first regular season game to his most recent. I only included games where he played at least one minute which resulted in a population of 708 games. Simply plotting these three conventional statistics in chronological order from game 1 to game 708 would have created a visual mess as the variance between individual 'good' and 'bad' games would have had lines darting everywhere.

As a result, I took a technique common in stock charts to smooth the performance. I captured a moving 30 game average of  each statistic. This technique allows me to chart his performance over a sufficient amount of play, thus giving a better understanding of sustained performance as opposed to capturing anomalous high/ low output performances. To create the moving average, I did not include Carmelo's first 29 games of his career because those games are necessary (with the 30th game) to create the starting point for the 30 game moving average. As a result, the charts below actually capture 679 games (708 career games - Carmelo's first 29 games).

To give some idea as to how the chart progresses, the point for game 2 on the plot represents the 30 game average from his career game 2 through game 31. Point 3 represents the 30 game average from his career game 3 through game 32. All the way up to the final point, which represents the 30 game average for his career game 679 through game 708. (Obviously, all points are connected with a line.)

The top chart shows Carmelo's 30 game moving average for FGA (black) and points (grey). I included a dashed line signaling his career regular season average in points (25.3) and FGA (19.7). As expected Carmelo started off slow in his career, which makes sense because he was a rookie adjusting to the NBA. He appears to have peaked around his 200th game and outside of his rookie season his output reached its nadir around the 30 game moving average of 550, which coincides with his first season with the Knicks. Throughout his career his FGA and points seem to mirror each other, which also makes sense--you gotta take shots to make shots.

But if you look closely at the chart something else emerges--Carmelo needs a lot of shots to score all his points. Since he began his career with the Knicks (around what would be 550 on the horizontal axis) he appears to score only about five more points that FGA. Considering that he only averages about 3 assists a game this could challenge his efficacy as on offensive player.


But points and FGA can belie actual performance, especially if a player's teammates are of limited offensive means. The 2013-14 Knicks were certainly one such team as they had to rely disproportionately on Carmelo for offense. Opposing teams knew this and, thus, made it more difficult for him to succeed offensively. 

Plus/ minus is a very rough measure of overall performance but it does give some context as to the effect the player had on the outcome of the game while he was on the court. Carmelo's career plus/ minus average is 2.4 and the chart below shows that his 30 game moving average peaked at 7.9 and bottomed out at -3.5. Interestingly if you compare the chart below with the chart above, especially early in his career, Carmelo's plus/ minus appears indirectly related to his point/ FGA. In other words, the more he shot and scored the worst plus/ minus he had. 

Over the last 150 or so games of his career (almost his entire time with the Knicks), Carmelo seems to have fallen into a zone that hovers right around his career plus/ minus average of 2.4. In light of his prolific scoring, it is fair to question if a player that contributes just less than 2.5 points to the outcome of a game is worthy of the 'superstar' designation. 


Looking at just one player is limiting as it grants no context in comparison to the performance of others. As a result, I created the same charts for The superstar--LeBron. There is no comparing the two players as by every objective measure LeBron is a superior player. However, if a player, like Carmelo, is deigned a superstar then his performance should be measured against the highest criterion--LeBron.

For LeBron, I pulled the same data from basketball-reference.com and conducted the same analysis as I did for Carmelo. (Because LeBron has player one more season, he's played at least one minute in 842 regular season games.) The chart below shows LeBron's career regular season 30 game moving average for points (grey) and FGA (black). There is clearly a wider margin between the two lines than there is for Carmelo, which indicates that LeBron needs fewer shots to score points. In fact, by the time he reaches Miami (around 500 on the horizontal axis) he appears to score 10 more points than FGA. 

The final chart shows LeBron's career 30 game moving average for plus/ minus. LeBron's career regular season average plus/ minus of 5.4 is clearly negatively skewed by his early days in Cleveland. In fact, during his last few seasons in Miami he appears to have a plus/ minus consistently between 5 and 10. (Not to mention an exceptional plus/ minus peak of 14.9 toward the end of his first stint with the Cavaliers.) 

While LeBron was certainly surrounded with better talent in Miami than Carmelo was in New York during the 2013/14 season, one cannot ignore that LeBron's 30 game moving average plus/ minus over the last few seasons hovers around Carmelo's career peak. Again, the point isn't to compare Carmelo and LeBron as individual players. The point is to give some context of the performance of one player perceived as a superstar (Carmelo) with the NBA's greatest performer (LeBron) based on empirical data. Further, all of this ignores marketing factors and fan popularity, which can play an equally critical role is designating a player as a superstar.

Thursday, August 14, 2014

Curreally? Better offense than LeBron

Stephen Curry recently said that he was a better offensive player than LeBron. Here's why he's not:





Based on virtually every measure of offensive performance (i.e., assist %, offensive rating, offensive rebounding %, and offensive win shares), LeBron is better, and by a noticeable amount. (Data pulled from basketball-reference.com.) The one area where controlling for age that Curry might have an argument is effective FG %, but if you look at current performance LeBron far exceeds him.


Friday, July 25, 2014

Best Available Free Agents

I pulled current free agents from here and win shares per 48 from basketball-reference.com. Duncan, Bosh and James Jones are the best available. Udoh, Watson and Billups are among the worst available. Interestingly, Jan Vesely is above Rudy Gay, Bargiani, Evan Turner and Ridnour.


Monday, July 21, 2014

Kevin Love, AFGM & 3s

There's an increasing sense of inevitability of success associated with Kevin Love and whatever team he joins. I pulled monthly averages for his career from NBA Stats, and plotted the percent of three point field goal attempts (%FGA 3PT) and the percent of his made field goals off of an assist (FGM %AST). While he puts up phenomenal statistics in Minnesota, his game has progressed to that of a superior rebounder on defense and a set three point shooter on offense. The state of his offensive game suggests he would prove much more valuable in Cleveland where he could spread the floor for LeBron while also giving them some much needed long distance shooting than he would in Golden State where he would add yet another three point shooter without adding to their anemic post offense. Further, his increasing reliance on assists for made field goals indicates he needs to be set-up to score, thus making Cleveland and LeBron the preferable fit for Love.


Monday, June 30, 2014

Why LeBron leaves

I pulled monthly split stats from Basketball-Reference.com on LeBron, Dwade and Bosh going back to the three's first season with the Heat. Below I charted their monthly offensive rating (an estimate of the points produced by a player per 100 possessions) and defensive rating (an estimate of the points allowed by a player per 100 possessions). Because these three are all starters and play the bulk of the Heat's minutes together, you would expect their ratings to mirror each other (the calculations takes into account several team statistics).

Offensively, over the last two years LeBron has really begun to distance himself from the other two. Further, DWade has exhibited a noticeable decline that began March 2013. Defensively, the three stars follow a much closer path, but again DWade appears to struggle recently more so than the LeBron and Bosh (in the bottom chart, as it is preferable to allow fewer points, the lower the line the better). Should LeBron stay in Miami these charts suggest he will be expected to carry an increasing load for the other Big Two as their performance wanes.


Saturday, June 21, 2014

Spurs, passing & distance

There was much emphasis--and a surprising number of highlights and GIFs--on the Spurs' exceptional passing during this season's championship run. For the first playoffs in history, the NBA used SportVU technology to report on spatial statistics that offer insight on statistics like passes per game, distance traveled, and other possession-based information not easily captured by conventional measures, like points, rebounds, etc.

The chart below breaks out passes per game for the 2013-14 NBA playoffs. The Spurs were third from the best, and clearly passes per game doesn't seem to determine success as the Bobcats are second and more successful playoff teams, like the Heat, Pacers, and Thunder are toward the bottom.


The following chart compares 2013-14 playoff win percentage with passes per game. The orange trend line actually goes down, which signals a negative correlation between win percentage and passes per game. In other words, more passes per game does not seem to equate with winning. (The R-squared number is a signal of model fit. The closer it is to 1 the better the data fit. In this chart the low R-squared signals that trend/ regression line does a poor job of approximating the data. If all the points hovered on or around the trend/ regression line then the R-squared number would be higher.)


When I looked at other spatial data that NBA Stats provided, there was one that the Spurs proved exceptional--distance traveled per game. The chart below shows the distance (in miles) traveled by the Spurs' players on the court per 48 minutes. The Spurs traveled almost a full mile more per 48 minutes than the next most traveled teams, the Trail Blazers, the Bulls and the Bobcats, and almost two miles per game more than the Pacers.


The chart below illustrates the relationship between distance traveled per 48 minutes and win percentage. The trend line indicates a slightly negative correlation and the R-squared shows even poorer model fit than the prior. The Spurs are an anomaly in the upper right hand corner of the chart. This chart simply shows that while distance traveled may have proved beneficial for the Spurs, it had no correlation on success in these playoffs for the other 15 teams.


While I did not find any spatial team measure that explained team success, these basic models are limited by the fact that there is a sample size of only one playoff. As the NBA continues to collect this data in future seasons, there will be more information so it will become easier to distinguish if meaningful relationships exist between measures of success, like winning, and spatial measures. Further, simple two variable models, like the two above, ignore a range of factors, like opponent, offensive and defensive schemes, etc that if accounted for would help clarify the actual relationship between passing or distance and success.