Our data collection started with the CSV file below, it contains all historical data from the beginning of the NBA until the 2015 regular season. We decided to drop all data from more than 20 years ago. We are focusing on the data from when current 30 NBA franchises were established and have been a consistent part of the NBA from that time forward. We also dropped some of the more inconsqeuential columns (season or playoffs games) and deleted the is_copy column, which was just duplicated games. Essentially we cleaned out the data we would not be incorporating into our model.

import pandas as pd
elo = pd.read_csv('elo_1946-2015.csv')
elo.head()

Our model was missing about a year and a half of data points that we needed, so we filled in the games that were missing between 2015 and the present day with this data set. It was obtained by scraping NBA reference. This data set needed to be cleaned so that the names for the franchises across our data sets were the same. For example, some sources list the Portland Trailblazers as the Portland Trail Blazers or just the Portland Blazers.

recent_data = pd.read_csv('recent_data.csv')
recent_data.head()

With the combination of the two previous data sets, we built the historical data set. This contains all the games in the years that we are considering up until the present date. This is the data that we use to calculate our ELO ratings.

historical_data = pd.read_csv('historical_data.csv')
historical_data.head()

Another large portion of our data set that was missing was the altitudes and latitude and longitude locations of each team. We use this information as one of the features in our model and it had to be inserted by hand.

teamInfo = pd.read_csv('teamInfo.csv', delimiter='\t')
teamInfo.head()

The projected win loss data set is where we store our projected wins and losses for each team and the current ELO ranking of that team. This will be updated daily and displayed on our website.

ProjectedWL = pd.read_csv('ProjectedWL.csv')
ProjectedWL.head()

The tomorrow data set shows the games that will be played tomorrow, and the win/loss probabilty for each teams in the matchup. This will be updated daily and displayed on our website.

tomorrow = pd.read_csv('tomorrow.csv')
tomorrow.head()

This data set holds all the information for the upcoming games left in the 2017 season.

upcoming_games = pd.read_csv('upcoming_games.csv')
upcoming_games.head()

	gameorder	game_id	lg_id	_iscopy	year_id	date_game	seasongame	team_id	fran_id	...	win_equiv	opp_id	opp_fran	opp_pts	opp_elo_i	opp_elo_n	game_location	game_result	forecast	notes
0	1	194611010TRH	NBA	0	1947	11/1/1946	1	TRH	Huskies	...	40.294830	NYK	Knicks	68	1300.0000	1306.7233	H	L	0.640065	NaN
1	1	194611010TRH	NBA	1	1947	11/1/1946	1	NYK	Knicks	...	41.705170	TRH	Huskies	66	1300.0000	1293.2767	A	W	0.359935	NaN
2	2	194611020CHS	NBA	0	1947	11/2/1946	1	CHS	Stags	...	42.012257	NYK	Knicks	47	1306.7233	1297.0712	H	W	0.631101	NaN
3	2	194611020CHS	NBA	1	1947	11/2/1946	2	NYK	Knicks	...	40.692783	CHS	Stags	63	1300.0000	1309.6521	A	L	0.368899	NaN
4	3	194611020DTF	NBA	0	1947	11/2/1946	1	DTF	Falcons	...	38.864048	WSC	Capitols	50	1300.0000	1320.3811	H	L	0.640065	NaN

	Date	Start (ET)	Visitor/Neutral	PTS	Home/Neutral	PTS.1
0	Tue, Oct 27, 2015	8:00 pm	Detroit Pistons	106	Atlanta Hawks	94
1	Tue, Oct 27, 2015	8:00 pm	Cleveland Cavaliers	95	Chicago Bulls	97
2	Tue, Oct 27, 2015	10:30 pm	New Orleans Pelicans	95	Golden State Warriors	111
3	Wed, Oct 28, 2015	7:30 pm	Philadelphia 76ers	95	Boston Celtics	112
4	Wed, Oct 28, 2015	7:30 pm	Chicago Bulls	115	Brooklyn Nets	100

	fran_id	pts	opp_fran	opp_pts	game_location	date
0	Pistons	106	Hawks	94	A	2015-10-27
1	Cavaliers	95	Bulls	97	A	2015-10-27
2	Pelicans	95	Warriors	111	A	2015-10-27
3	Sixers	95	Celtics	112	A	2015-10-28
4	Bulls	115	Nets	100	A	2015-10-28

	fran_id	altitude	lat	lon
0	Bucks	617	43.038902	-87.906471
1	Bulls	594	41.881832	-87.623177
2	Cavaliers	653	41.505493	-81.681290
3	Celtics	141	42.361145	-71.057083
4	Clippers	233	34.052235	-118.243683

	Projected L	Projected W	elo	fran_id
0	15.0	67.0	1770.075218	Warriors
1	19.0	63.0	1719.168956	Spurs
2	26.0	56.0	1645.724021	Cavaliers
3	26.0	56.0	1618.636284	Rockets
4	31.0	51.0	1598.583197	Wizards

	fran_id	opp_fran	prob	opp_prob
0	Warriors	Hawks	0.647281	0.352719
1	Pacers	Hornets	0.523157	0.476843
2	Heat	Cavaliers	0.148315	0.851685
3	Kings	Nuggets	0.331493	0.668507
4	Bulls	Pistons	0.563439	0.436561

	fran_id	pts	opp_fran	opp_pts	game_location	date
0	Raptors	NaN	Hawks	NaN	A	2017-03-10
1	Rockets	NaN	Bulls	NaN	A	2017-03-10
2	Magic	NaN	Hornets	NaN	A	2017-03-10
3	Nets	NaN	Mavericks	NaN	A	2017-03-10
4	Celtics	NaN	Nuggets	NaN	A	2017-03-10