Skip to content

Commit

Permalink
Merge pull request #2 from anacmontoya/notebook-links
Browse files Browse the repository at this point in the history
notebook edits - references and links
  • Loading branch information
ThomasMGeo authored Jul 10, 2024
2 parents 8092987 + 1b54ad0 commit 33d2d05
Showing 1 changed file with 23 additions and 14 deletions.
37 changes: 23 additions & 14 deletions notebooks/ptype_ml.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -46,14 +46,9 @@
"| [Pyplot tutorial](https://matplotlib.org/stable/tutorials/pyplot.html) | Helpful | Necessary |\n",
"| [Numpy: the absolute basics for beginners](https://numpy.org/doc/stable/user/absolute_beginners.html) | Great to have | arrays are the language of machine learning |\n",
"\n",
"- **Time to learn**: \n",
"- **Time to learn**: 45 minutes\n",
"\n",
"Under an hour. While it can be easy to get started with the scikit learn syntax, it can take a while to fully understand and learn all of the in's and out's of ML systems. This is designed to just be a very quick introduction. \n",
"\n",
"- **System requirements**:\n",
" - Populate with any system, version, or non-Python software requirements if necessary\n",
" - Otherwise use the concepts table above and the Imports section below to describe required packages as necessary\n",
" - If no extra requirements, remove the **System requirements** point altogether"
"While it can be easy to get started with the scikit learn syntax, it can take a while to fully understand and learn all of the in's and out's of ML systems. This is designed to just be a very quick introduction. "
]
},
{
Expand Down Expand Up @@ -404,7 +399,9 @@
"source": [
"Notice any trends so far? What input features might be the most important? \n",
"\n",
"Next we can plot the Correlation Matrix. As the name suggests, this will show us the correlation between variables. The closer the absolute value is to 1, the stronger the relationship between these variables is. Notice how all of our diagonal values equal to 1? this is because they represent the correlation between a variable and itself. Can you see which other variables have strong correlations?"
"Next we can plot the Correlation Matrix. As the name suggests, this will show us the correlation between variables. The closer the absolute value is to 1, the stronger the relationship between these variables is. Notice how all of our diagonal values equal to 1? this is because they represent the correlation between a variable and itself. Can you see which other variables have strong correlations?\n",
"\n",
"For further reading, visit [Correlation Matrix, Demystified](https://towardsdatascience.com/correlation-matrix-demystified-3ae3405c86c1)"
]
},
{
Expand Down Expand Up @@ -593,7 +590,7 @@
"id": "42bbaf4e-bf83-4eef-8e0a-b1a164a532ad",
"metadata": {},
"source": [
"This is a simple problem, we can choose logistic regression or support vector machine as our classification model."
"We will use a linear regression model:"
]
},
{
Expand Down Expand Up @@ -1051,7 +1048,7 @@
"id": "dbe5b7ab-6f93-4721-8d0d-9ef075276a0d",
"metadata": {},
"source": [
"Next step, let's use the testing data"
"Next step, let's use the testing data and plot the new predicted values vs true values."
]
},
{
Expand Down Expand Up @@ -1142,7 +1139,13 @@
"id": "5131ad3f",
"metadata": {},
"source": [
"R-squared (R²) and Root Mean Squared Error (RMSE) are both metrics used to evaluate the performance of regression models, but they convey different types of information. R², also known as the coefficient of determination, measures the proportion of the variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, where a higher value indicates a better fit of the model to the data, with 1 representing a perfect fit. RMSE, on the other hand, quantifies the average magnitude of the prediction errors, providing an absolute measure of fit in the same units as the dependent variable. It calculates the square root of the average squared differences between predicted and observed values, with a lower RMSE indicating a model that predicts more accurately. While R² gives a sense of how well the model explains the variability of the data, RMSE provides a direct measure of the model’s prediction accuracy. This [blog post](https://www.unidata.ucar.edu/blogs/news/entry/r-sup-2-sup-downsides) covers some of the downsides to looking at R2 alone."
"R-squared (R²) and Root Mean Squared Error (RMSE) are both metrics used to evaluate the performance of regression models, but they convey different types of information. \n",
"\n",
"R², also known as the coefficient of determination, measures the proportion of the variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, where a higher value indicates a better fit of the model to the data, with 1 representing a perfect fit. \n",
"\n",
"RMSE, on the other hand, quantifies the average magnitude of the prediction errors, providing an absolute measure of fit in the same units as the dependent variable. It calculates the square root of the average squared differences between predicted and observed values, with a lower RMSE indicating a model that predicts more accurately. While R² gives a sense of how well the model explains the variability of the data, RMSE provides a direct measure of the model’s prediction accuracy. \n",
"\n",
"This [blog post](https://www.unidata.ucar.edu/blogs/news/entry/r-sup-2-sup-downsides) covers some of the downsides to looking at R2 alone."
]
},
{
Expand Down Expand Up @@ -1201,7 +1204,7 @@
"id": "5a4c963b-c722-4b5c-9a72-8c76d71c8636",
"metadata": {},
"source": [
"Let's look at another dataset. This dataset just has snow and freezing rain as the p-types, so overall it will be colder. Let's see if we "
"Let's look at another dataset. This dataset just has snow and freezing rain as the p-types, so overall it will be colder. Let's see if we get similar results."
]
},
{
Expand Down Expand Up @@ -2100,10 +2103,16 @@
"metadata": {},
"source": [
"## Resources and references\n",
"1. [Scikit-learn](https://scikit-learn.org/stable/)\n",
"1. [Correlation Matrix, Demystified](https://towardsdatascience.com/correlation-matrix-demystified-3ae3405c86c1)\n",
"1. [What is the Difference Between Test and Validation Datasets?](https://machinelearningmastery.com/difference-test-validation-datasets/)\n",
"1. [Machine Learning Foundations in the Earth Systems Sciences](https://elearning.unidata.ucar.edu/dataeLearning/Cybertraining/foundations/#/)\n",
"1. [Scikit-learn's StandardScaler Documentation](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler)\n",
"1. [What and why behind fit_transform() and transform() in scikit-learn!](https://towardsdatascience.com/what-and-why-behind-fit-transform-vs-transform-in-scikit-learn-78f915cf96fe)"
"1. [What and why behind fit_transform() and transform() in scikit-learn!](https://towardsdatascience.com/what-and-why-behind-fit-transform-vs-transform-in-scikit-learn-78f915cf96fe)\n",
"1. [is\r\n",
"R2: Downsides and Potential Pitfalls for ESS ML Predic](https://www.unidata.ucar.edu/blogs/news/entry/r-sup-2-sup-downsides)\n",
"1. [Scikit-learn's Decision Trees](https://scikit-learn.org/stable/modules/tree.html)\n",
"1. [StatQuest video: Decision and Classification Trees, Clearly Explained!!!](https://www.youtube.com/watch?v=_L39rN6gz7Y)tion"
]
}
],
Expand All @@ -2123,7 +2132,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
"version": "3.11.6"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 33d2d05

Please sign in to comment.