Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split the data into training and testing #31

Open
vidhi-mody opened this issue Jul 4, 2020 · 3 comments
Open

Split the data into training and testing #31

vidhi-mody opened this issue Jul 4, 2020 · 3 comments
Assignees
Labels
SCI 2020 Student Code-In

Comments

@vidhi-mody
Copy link
Collaborator

80, 20 would be a good ratio

@ankurbhatia24
Copy link

Suggestion: While performing the train test split use a seed so that when you rerun the code, you get the same splitting. Also, see if the function you are using to split has the option of stratifying the data. If you use sklearn, then it gives you that option. Stratification is necessary while splitting the data in multiclass classification because there may be a possibility that while splitting the majority of some class goes into test/train and hence the opposite (train/test) do not have the appropriate samples of that particular class. Stratification makes sure that the data distributions in both train and test remain the same. You can go through this blog for a detailed understanding: https://towardsdatascience.com/3-things-you-need-to-know-before-you-train-test-split-869dfabb7e50

@vidhi-mody vidhi-mody added SCI 2020 Student Code-In Up-For-Grab labels Jul 4, 2020
@deepeshgarg09
Copy link
Contributor

I would like to work on this issue !

@vidhi-mody
Copy link
Collaborator Author

@deepeshgarg09 sure!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SCI 2020 Student Code-In
Projects
None yet
Development

No branches or pull requests

3 participants