Regression

This is an implementation of Simple Linear Regression reading an input file (in CSV format) containing samples.

Here we qoute the example of Miu's age and height from 4 to 19 years old: (4,100.1), (5,107.2), (6,114.1), (7,121.7), (8,126.8), (9,130.9), (10,137.5), (11,143.2), (12,149.4), (13,151.6), (14,154.0), (15,154.6), (16,155.0), (17,155.1), (18,155.3), (19,155.7).

First of all all computations are carried out and then a graph is automatically created with a dispersion plot, another with also the regression line, and finally if needed also the 1/X.

Predictions are carried out from the regression equation in the form y=ax+b.

Where the variable y is indipendent and the variable x is dipendent. In this equation the coefficent a is the regression and as usual gives the slope of the line.

The compute() method calculates all variables to describe statistically samples and perform regression analysis.

Age ¹/_age Height

sum 184.0 1.7144 2212.2

avg 11.5 0.1072 138.3

min 4.0 0.0526 100.1

max 19.0 0.25 155.7

med 11.5 0.0871 146.3

ssd 0.0489 5464.4575

usv 10.0 169.9

rss

-15.9563

coefficent

a ∑_xy ∑_xx -326.6

b _avgY-_avg¹/_age*a 173.3

equation

y = -326.4x + 173.3

ŷ

avg 138.3

ssd 5211.7

Σyŷ 5211.7

correlation

R = 0.9766

R² = 0.9537

confidence

Se 252.8

σ 4.2

anova 288.6

In this specific case there is one and only Miu, so with such correlation coefficent we can be confident to make good predictions about her height in the future.

What age? 21
157.7 = -326.6*21.0 +173.3
148.3 <-- 157.7 --> 167.1

What age? 32
163.1 = -326.6*32.0 +173.3
153.4 <-- 163.1 --> 172.7

The chart gets updated for every prediction!

What if the samples re not the whole population? We carry out variance analysis and we calculate the confidence intervals.

Try out Risa's tea house data file. Once all calculations are carried out and the null hypothesis is checked, you are asked to enter your request for prediction.

--teaHouse.csv--
        day     temp    icetea
0:      22.0    29.0    77.0    
1:      23.0    28.0    62.0    
2:      24.0    34.0    93.0
3:      25.0    31.0    84.0
4:      26.0    25.0    59.0    
5:      27.0    29.0    64.0    
6:      28.0    32.0    80.0
7:      29.0    31.0    75.0
8:      30.0    24.0    58.0    
9:      31.0    33.0    91.0
10:     1.0     25.0    51.0    
11:     2.0     31.0    73.0
12:     3.0     26.0    65.0
13:     4.0     30.0    84.0    

        temp    icetea
sum     408.0   1016.0
avg     29.1      72.6
min     24.0      51.0
max     34.0      93.0
med     29.5      74.0
ssd     129.7   2203.4
usv     10.0     169.5

rss             484.9
a                 3.7
b               -36.4

y = 3.7x -36.4

         Hat
avg              72.6
ssd            1812.3
Syŷ            1812.3

      Regression
R              0.9069
R2             0.8225

      Confidence
Se              391.1
sigma             5.7
anova            55.6
F .05          4.7472

What temp? 27
64.6 = 3.7*27.0 -36.4
51.9 <-- 64.6 --> 77.2

What temp? 43
124.4 = 3.7*43.0 -36.4
104.8 <-- 124.4 --> 144.0

What temp? 50
150.5 = 3.7*50.0 -36.4
124.6 <-- 150.5 --> 176.5

What temp? 19.3
35.8 = 3.7*19.3 -36.4
19.3 <-- 35.8 --> 52.2

What temp? stop

Usage

Run like any Java program, specify the data file .csv, X column (0 default), Y column (1 default), true for 1/X (when X is not linear).

java SimpleLinearRegression ageMiu.csv 0 1 true

java SimpleLinearRegression teaHouse.csv 1 2

java SimpleLinearRegression airPreassure.csv

Package

The code that performs simple linear regression is organized into a package containg two classes: DataCSV and Draw.

DataCSV

load() read CSV
show() prints loaded data
compute() execute linear regression
Getters (public)
- x independent variable list
- invX 1/X (when needed)
- y dependent variable list
- sumX, sumInvX, sumY list sum
- avgX, avgInvX, avgY list average
- ssdX, ssdY Sum of Squared Deviations
- rss Residual Sum of Squares
- a, b coefficents of y=ax+b
- R, R² regression coefficent
Methods (private)
- getIndexValue()
- getIndexValueInverted()
- round() round with precision
- sumList() column sum
- avgList() column average
- ssdList() Sum of Squared Deviations
- rssList() Residual Sum of Squares

Draw

scatterPlot()

Credits

Inspiration for this Java coding challenge comes from The Manga Guide to Regression Analysis published in English by NO STARCH PRESS and Ohmsha Ltd.

Like a lot of people, Miu has had trouble learning regression analysis. But with new motivation—in the form of a handsome but shy customer—and the help of her brilliant café coworker Risa, she’s determined to master it.

Expressive charts built with XChart a light-weight and convenient library for plotting data.

Its focus is on simplicity and ease-of-use, requiring only two lines of code to save or display a basic default chart.

Wonderful coding experience on Microsoft Visual Studio Code IDE and Java Extension Pack.

According to a survey done by Stack Overflow in 2018 VSC was ranked the most popular developer environement tool.

F-Distribution tables courtecy of SOCR.

Last but not least the unmistakable AdoptOpenJDK 8.

AdoptOpenJDK provides rock-solid OpenJDK binaries for the Java ecosystem and also provides infrastructure as code, and a Build farm for builders of OpenJDK, on any platform.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.settings		.settings
.vscode		.vscode
bin		bin
img		img
lib		lib
src		src
.classpath		.classpath
.project		.project
LICENSE		LICENSE
README.md		README.md
ageMiu.csv		ageMiu.csv
airPressure.csv		airPressure.csv
teaHouse.csv		teaHouse.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Regression

Usage

Package

DataCSV

Draw

Credits

About

Releases

Packages

Languages

	Age	¹/_age	Height
sum	184.0	1.7144	2212.2
avg	11.5	0.1072	138.3
min	4.0	0.0526	100.1
max	19.0	0.25	155.7
med	11.5	0.0871	146.3
ssd		0.0489	5464.4575
usv		10.0	169.9

License

YeshiNamkhai/Regression

Folders and files

Latest commit

History

Repository files navigation

Regression

Usage

Package

DataCSV

Draw

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages