Python Notebooks
├── Power Method
├── Inverse Power Method
├── Least Squares Regression
├── Multiclass Classification on MNIST
├── Pagerank Algorithm
└── Evaluating Robustnest of Neural Networks
R Notebooks
├── Confidence Band for Linear Regression
├── Linear Regression Assumptions
├── Polynomial Regression and Piecewise Constant Fit
├── Anova, F-Test, Hypothesis Test and Polynomial Regression
├── Studentized Bootstrap and Student Confidence Intervals
├── Logistic Regression, Forward Selection and Bootstrapping
├── Cross-validation and Sequential Model Selection
└── Analysis of Chemical Plant Data
In this notebook, you'll find an implementation of the Power Method, a numerical algorithm used to find the dominant eigenvalue and its corresponding eigenvector of a square matrix. The Power Method is particularly useful for large matrices and plays a significant role in various applications, such as page ranking and principal component analysis.
The Inverse Power Method, presented in this notebook, is an extension of the Power Method used to find the smallest (in magnitude) eigenvalue and corresponding eigenvector of a matrix. It is often employed in solving systems of linear equations and eigenvalue problems.
This notebook delves into the concept of Least Squares Regression, a popular linear regression technique used to model the relationship between variables by minimizing the sum of the squares of the differences between observed and predicted values. It is implemented from scratch for both linear and non-linear estimates.
Here, we explore Multiclass Classification using the MNIST dataset, a classic dataset containing images of handwritten digits. The notebook demonstrates Multiclass Logistic Regression from scratch. Working with both Mean Square Error (L2) loss and Cross Entropy (CE) loss with gradient descent (GD) as well as stochastic/mini-batch gradient descent (SGD). Finally, training a 2-hidden-layer Neural Network model on the image dataset.
The Pagerank Algorithm is famously known for powering Google's search engine. This notebook provides insights into how the algorithm works to determine the most important articles.
Neural Networks are powerful tools in machine learning, but they are also susceptible to adversarial attacks and may not always generalize well. In this notebook, I have implemented FGSM (Fast Gradient Sign Method) and PGD (Projected Gradient Descent) adversarial attacks on an MNIST neural network model as well as visualize those attacks. I have also performed verification of the model using Interval-Bound -Propagation (IBP).
This notebook focuses on creating confidence bands for linear regression models, which help visualize the uncertainty around the regression line and predictions. Understanding confidence bands is crucial in drawing accurate conclusions from regression analyses.
Linear Regression comes with several assumptions that need to be validated before trusting the model's results. This notebook covers how to check these assumptions and interpret the results.
In this notebook, we explore Polynomial Regression, a method that extends linear regression to capture nonlinear relationships between variables. Additionally, we discuss Piecewise Constant Fit, an alternative approach for modeling data with abrupt changes.
The Analysis of Variance (ANOVA) is a statistical method used to compare means between two or more groups. In this notebook, we'll cover the ANOVA, F-Test, and Hypothesis Testing, along with incorporating polynomial regression in the ANOVA framework.
Bootstrap methods are powerful tools for estimating uncertainties and constructing confidence intervals. This notebook explores Studentized Bootstrap, a variant of the bootstrap that provides more accurate confidence intervals for certain statistics.
Logistic Regression is widely used for binary classification problems. In this notebook, we'll learn how to build and interpret logistic regression models. Additionally, we'll cover forward selection, a feature selection technique, and explore how bootstrapping can improve model evaluation.
Cross-validation is essential for estimating the generalization performance of a model. This notebook explains various cross-validation techniques and demonstrates how to perform sequential model selection to identify the best model among alternatives.
In this notebook, we'll analyze data from a chemical plant. The dataset may include various variables related to the plant's operation, and we'll apply the statistical techniques learned in the previous notebooks to draw meaningful insights from the data.
I hope you find these notebooks insightful and beneficial for your learning and data analysis journey. Feel free to explore, experiment, and adapt the code to suit your specific needs. Happy coding!