Use SQLite database and get data from current year #67

nkgilley · 2023-01-26T21:44:55Z

No description provided.

kyleskom · 2023-01-26T22:01:31Z

A lot going on here, can you better describe the changes.

nkgilley · 2023-01-27T01:16:23Z

This PR creates a new dataset which includes training data from the current season. I added a script to get the odds data from previous games this season (src/Process-Data/Get_Odds_Data.py). I also ran Get_Data and Create_Games to get a new DataSet file. I then reran the XG_Boost_Model scripts and saved the best runs in Models/XGBoost_Models which I now reference in src/Predict/XGBoost_Runner.py.

nkgilley · 2023-01-28T04:42:43Z

I'm not able to upload the sqlite database as it is too large for github (101 MB). A workaround would be to enable LFS (https://git-lfs.com/). It needs to be enabled on the parent repo for me to be able to add the file to my fork.

kyleskom · 2023-01-28T19:58:20Z

Give it a try now

nkgilley · 2023-01-28T20:45:07Z

@kyleskom Thanks...It looks like I was wrong though. Github doesn't let you upload lfs objects to forks. I keep getting this error:

git push --force
batch response: @nkgilley can not upload new objects to public fork nkgilley/NBA-Machine-Learning-Sports-Betting
error: failed to push some refs to 'github.com:nkgilley/NBA-Machine-Learning-Sports-Betting.git'

I was able to upload it to a non-forked repo:
https://github.com/nkgilley/NBA-ML/blob/2022-23/Data/db.sqlite

It's definitely faster processing using the sqlite db, but this github limitation is frustrating.

kyleskom · 2023-01-28T22:09:28Z

What exactly did you store in the DB, 101mb seems very high

nkgilley · 2023-01-28T22:50:11Z

Basically everything that's currently in excel files. I could split it into multiple database files to lower the size.

kyleskom · 2023-01-28T23:31:21Z

This first off this has been one of the biggest things iv wanted to do with this project so thank you. I was thinking all the team data in 1 then the fulls games with everything in another. What do you think? Also you have been super helpful with all this and doing amazing work. If you interested we should hop on a call, id be interested in taking this project a step further with your help.

kyleskom

If we are moving to a DB, do we need the excel files?

kyleskom · 2023-01-29T02:58:20Z

Flask/templates/index.html

+                  </tbody>
+                </table>
+              </td>
+              {% endif %}


Can you explain what you are doing here? I think maybe it would be best to split this PR. I see the major change of moving the to a database, I think we should keep it clean by separating those.

EDIT: I meant changes to index.html, not just highlighted.

It's just whitespace changes and a change of a classname that doesn't affect anything.

kyleskom · 2023-01-29T02:58:51Z

src/Process-Data/Create_Games.py

+
+# season_array = ["2007-08", "2008-09", "2009-10", "2010-11", "2011-12", "2012-13", "2013-14", "2014-15", "2015-16",
+#                 "2016-17", "2017-18", "2018-19", "2019-20", "2020-21", "2021-22", "2022-23"]
+season_array = ["2015-16", "2016-17", "2017-18", "2018-19", "2019-20", "2020-21", "2021-22", "2022-23"]


Can you explain why you started with 2015?

I did this because I remembered in your notes this comment:

we achieved the highest levels of validation accuracy when the training dataset started from the 2012 − 2013 season

I'll change it to go from 2012-13 instead of 2015-16.

kyleskom · 2023-01-29T03:00:39Z

src/Process-Data/Get_Data.py

-                name = directory2 + '/' + '{}-{}-{}'.format(str(int(x[1])), str(int(x[2])), season1) + '.xlsx'
-                general_df.to_excel(name)
-            except:
+            if month1 == 10 and day1 < 19:


Some seasons didn't start at 10. I think maybe the old way was better to more easily run. I still think this could be better.

I was getting errors with the original code so I made some changes. Didn't realize that some seasons didn't start in october. I'll take a closer look at this.

This code still works fine for seasons that don't start with 10 as far as I can tell. Let me know if I'm missing something

kyleskom · 2023-01-29T03:01:49Z

src/Process-Data/Get_Odds_Data.py

+import random
+import time
+import pandas as pd
+import sqlite3
+
+from datetime import datetime
+from tqdm import tqdm
+from sbrscrape import Scoreboard
+
+year = [2022, 2023]
+season = ["2022-23"]
+
+month = [10, 11, 12, 1, 2, 3, 4, 5, 6]
+days = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
+
+begin_year_pointer = year[0]
+end_year_pointer = year[0]
+count = 0
+year_count = 0
+
+sportsbook='fanduel'
+df_data = []
+


Any pros / cons to moving to this for odds? I always grabbed them from here
https://www.sportsbookreviewsonline.com/scoresoddsarchives/nba/nbaoddsarchives.htm

Any pros / cons to moving to this for odds? I always grabbed them from here https://www.sportsbookreviewsonline.com/scoresoddsarchives/nba/nbaoddsarchives.htm

I don't know if it is relevant but the website you linked no longer provides odds after January 16th, 2023. If you need odds for the 2022-23 season, using the sbrscrape library is the only option.

They will update it soon. I think they do it in batches.

Upon further review you are right. I see their note.

kyleskom · 2023-01-29T03:02:25Z

src/Process-Data/Move_Clean_Odds_Data_To_DB.py

+import os
+import sqlite3
+import pandas as pd
+import sys
+from tqdm import tqdm
+sys.path.insert(1, os.path.join(sys.path[0], '..'))
+from Utils.Dictionaries import team_codes
+
+directory = os.fsdecode('../../Odds-Data/Odds-Data-Clean')
+con = sqlite3.connect("../../Data/odds.sqlite")
+
+for file in tqdm(os.listdir(directory)):
+    filename = os.fsdecode(file)
+
+    try:
+        df = pd.read_excel(f"../../Odds-Data/Odds-Data-Clean/{file}") # create DataFrame by reading Excel
+        df.to_sql(f"odds_{file[:-5]}", con, if_exists="replace")
+
+    except Exception as e:
+        print(e)
+
+con.close()


Can we skip this and just go right from the get_data script to DB?

Yea, I don't see any need for this script in the future. I just used to move the data that was already scraped so I didn't have to re-gather all of it.

nkgilley · 2023-01-29T14:58:23Z

If we are moving to a DB, do we need the excel files?

Nope, I'll remove those.

nkgilley · 2023-01-29T15:37:12Z

I removed all the excel files. I'm training the model again using seasons 2012-13 to current. I'll add new json files when it's done.

kyleskom · 2023-01-29T19:38:56Z

Gonna need a little time to review and test all this.

kyleskom · 2023-01-29T19:40:08Z

Also not sure if you saw my previous message. Id be interested in having a zoom call and discuss this project further / some ideas I think you would have great help with. Let me know if that interests you.

nkgilley · 2023-01-30T20:33:53Z

Hey, sorry I've got limited availability to work on this. I don't think I'll have time for a zoom call, but please post your thoughts here and if I find time I may be able to help out further.

kyleskom · 2023-02-10T18:05:42Z

Sorry for delay been super busy and still interested in this. Will review as soon as I can.

So you have been putting in really great work with this and I have always wanted to take this to the next level. Maybe a web app / even a paid product. Along with that the major thing this is missing is player data. Those are the things I was looking to discuss with you.

kyleskom · 2023-02-11T00:30:33Z

Fixed dropped columns with #85

nkgilley · 2023-02-11T14:29:00Z

Just pushed some new updates that now take into account the days rest for each team. I'm not sure that I did this part right, please double check. I've got a good grasp on the python but the tensorflow stuff is all new to me.

Those ideas are definitely interesting, ping me here to discuss more: email/gchat: nkgilley@gmail.com

kyleskom · 2023-02-11T18:01:22Z

Thanks for adding this has always been a major want for me, I see one major issue, we had those rest days columns in when training but we don't have them in when using live data for daily predictions.
Example:
Before this we would train with data abc and predict using data abc. Now we are training with abcx but predicting still with abc.

Hopefully that made sense. Also it might be better to separate that out with the moving of data to a database. It's easier for me to review and test and lowers the possible errors when merging. But again really appreciate the work.

nkgilley · 2023-02-11T20:16:53Z

I thought I was providing that data during the daily predictions. See line 59-60 of main.py:

        stats['Games-Rested-Away'] = away_days_off.days
        stats['Games-Rested-Home'] = home_days_off.days

You may not have seen these changes as github only shows the first 3000 files changed. I could create a new PR that doesn't delete the excel files and would be easier to review. We could then delete the excel files later if we decide to proceed with the sqlite files.

kyleskom · 2023-02-11T20:23:21Z

@nkgilley Ya lets do this. first lets split up the sqlite stuff with the rest days. Let's also keep the excel files for now and remove those after to make this pr manageable.

nkgilley · 2023-02-11T20:42:14Z

Should be good now. I removed the days rest columns for now

nickmalbsn · 2023-02-13T23:28:47Z

im new to coding, how am i able to add this to my fork?

nkgilley · 2023-02-18T13:18:09Z

im new to coding, how am i able to add this to my fork?

git remote add nkgilley git@github.com:nkgilley/NBA-Machine-Learning-Sports-Betting.git
git checkout nkgilley/2022-23

kyleskom · 2023-02-19T17:33:25Z

Really sorry I still haven't gotten to this. Just so busy with work and personally and the small issues the pop up in this repo. I am still trying to get to this.

kyleskom · 2023-02-19T18:02:26Z

Do you have a recommended tool to view sqlite databases locally? The one I have is terrible slow and just overall horrible.

kyleskom · 2023-02-19T18:08:56Z

src/Process-Data/Get_Odds_Data.py

+                    'Unnamed: 0': 0,
+                    'Date': f"{season1}-{month1:02}{day1:02}",
+                    'Home': game['home_team'],
+                    'Home': game['home_team'],


Is duplicate 'Home' key here expected?

kyleskom · 2023-02-19T18:32:33Z

Going through part by part right now. On the get odds data, it will fail if no odds data for that day, just need to add a catch for that.

kyleskom · 2023-02-19T18:38:37Z

Tested everything else up to creating the full dataset and everything looks awesome. Just those few small fixes. I think after this let's do a clean up of old data.

You also had the days rest work to id love to add.

Id also still like to reach out about a few other things.

kyleskom · 2023-02-19T18:40:58Z

One more thing. If you can add the db dataset to be used with training the NN models as well.

nkgilley · 2023-02-19T21:21:32Z

Do you have a recommended tool to view sqlite databases locally? The one I have is terrible slow and just overall horrible.

I've been using DB Browser for SQLite

nkgilley · 2023-02-20T00:03:46Z

Fixed a few issues, updated the models. I think it should be good but it should be testing with real games.

I'll create a new PR for the addition of the days rested column. I need to do some more testing on that.

nkgilley marked this pull request as draft January 27, 2023 19:02

nkgilley changed the title ~~get games from current year~~ Use SQLite database and get data from current year Jan 27, 2023

nkgilley force-pushed the 2022-23 branch from 9ee53e3 to e936669 Compare January 28, 2023 23:31

nkgilley marked this pull request as ready for review January 28, 2023 23:33

kyleskom reviewed Jan 29, 2023

View reviewed changes

nkgilley force-pushed the 2022-23 branch from 59685d0 to 3478bcd Compare January 31, 2023 00:43

nkgilley force-pushed the 2022-23 branch 2 times, most recently from f4b3cb9 to 62c46ed Compare February 11, 2023 20:41

kyleskom reviewed Feb 19, 2023

View reviewed changes

nkgilley added 15 commits February 19, 2023 19:00

fix odds when a sportsbook doesn't have stats

9e8f86f

fix teams in wrong row

39ae783

get current year's games

67b8433

update readme

5ed6c68

use sqlite db for data storage

79b0607

update models

cafce63

split db into mutliple files

5c5459c

remove excel files, update dataset

8b3c235

remove temporary Move_ data scripts

60bc0b1

update models

00ffcc4

revert excel file removal

0b9db1d

fix 2015, remove days rest

04e5f17

remove dupe Home key

52fa7b3

improve error handling

6593008

update odds

e34f208

nkgilley force-pushed the 2022-23 branch from 2bbd0ab to e34f208 Compare February 20, 2023 00:01

kyleskom merged commit 0339624 into kyleskom:master Feb 21, 2023

Use SQLite database and get data from current year #67

Use SQLite database and get data from current year #67

Conversation

nkgilley commented Jan 26, 2023

kyleskom commented Jan 26, 2023

nkgilley commented Jan 27, 2023

nkgilley commented Jan 28, 2023 • edited Loading

kyleskom commented Jan 28, 2023

nkgilley commented Jan 28, 2023

kyleskom commented Jan 28, 2023

nkgilley commented Jan 28, 2023

kyleskom commented Jan 28, 2023

kyleskom left a comment

Choose a reason for hiding this comment

kyleskom Jan 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nkgilley commented Jan 29, 2023

nkgilley commented Jan 29, 2023

kyleskom commented Jan 29, 2023

kyleskom commented Jan 29, 2023

nkgilley commented Jan 30, 2023

kyleskom commented Feb 10, 2023

kyleskom commented Feb 11, 2023

nkgilley commented Feb 11, 2023

kyleskom commented Feb 11, 2023

nkgilley commented Feb 11, 2023

kyleskom commented Feb 11, 2023

nkgilley commented Feb 11, 2023

nickmalbsn commented Feb 13, 2023

nkgilley commented Feb 18, 2023

kyleskom commented Feb 19, 2023

kyleskom commented Feb 19, 2023

Choose a reason for hiding this comment

kyleskom commented Feb 19, 2023

kyleskom commented Feb 19, 2023

kyleskom commented Feb 19, 2023

nkgilley commented Feb 19, 2023

nkgilley commented Feb 20, 2023

nkgilley commented Jan 28, 2023 •

edited

Loading

kyleskom Jan 29, 2023 •

edited

Loading