LANL Earthquake Prediction

Can you predict upcoming laboratory earthquakes?

Kaggle Competition: Link

Public Results

Team name: RubenAMtz

Commit	Score (mae)	Date
First	1.808	13/02/19
Second	1.724	28/02/19
Third	1.765	01/03/19
Fourth	1.787	01/03/19
Fifth	1.665	02/03/19

Data structure

columns = 2
column names: acoustic_data, time_to_failure
acoustic_data: the seismic signal [int16]
time_to_failure: the time (in seconds) until the next laboratory earthquake [float64]

File descriptions

Train data:

format: .csv
600M instances in training set

Test data:

format: .csv
test segments = 2624 (files)
150k instances in every test segment

Strategy

Step 1:

Split inputs from outputs

Step 2:

Split the training set in segments (with as many instances as the test sets)
150K instances in test sets, so... 600M / 150K ~ 4K segments out of the training set
Create as many training segments needed and store in files

Step 3:

Feature engineering (create features from inputs) to TRAINING data and TEST data

Apply x methods to generate features for every TRAINING segment

Number of features created: 71 131

As our model is trained based on the newly created features, we need to create the same features for our test set, which is the data that will be used to make predictions:

Apply same methods to generate same features for every TEST segment
Concatenate generated features in a single TEST set.
Scale the data (train and test)
Save files in folders train_features and test_features

Step 4:

Configure the model: ensemble { GradientBoostingRegression }
As we don't have a VAL set, we need to split the TRAIN set, to create a Validation set. Split so that we have 66% train vs 33% validation:

Set	Inputs	Output
TRAIN	Yes	Yes
VAL	Yes	Yes
TEST	Yes	No

Define evaluation metrics, 'mean average error' as defined by competition rules.
Evaluate model with 5-fold cross-validation
Train and predict
Submit file

Updates 06/03/19

- CNN will be trained using STFT plots from each training segment.

Updates 02/03/19

Features added:
- Peak counting based on mean and std from local files (and statistics)
- MFCC for last and first samples per local file (and statistics)
- Removed some features
Optimized GradientBoostingRegressor by GridSearchCV:
- Smaller learning rate
- Increased the number of estimators

Updates 01/03/19

More features were added
GridSearch improved our model in ~0.1%
A deep dense model was implemented
Dense model seems to fit better the training data but overfits.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
01_commit-XGB		01_commit-XGB
02_commit-XGB opt (improved)		02_commit-XGB opt (improved)
03_commit-dense (didn't improve)		03_commit-dense (didn't improve)
04_commit-XGB opt(didn't improve)		04_commit-XGB opt(didn't improve)
05_commit-XGB opt (improved)		05_commit-XGB opt (improved)
.gitignore		.gitignore
README.md		README.md
deep_learning.py		deep_learning.py
fft_images.py		fft_images.py
manual_features_plus_XGB.py		manual_features_plus_XGB.py
manual_features_plus_dense.py		manual_features_plus_dense.py
rnn_timeseries.py		rnn_timeseries.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LANL Earthquake Prediction

Kaggle Competition: Link

Public Results

Data structure

File descriptions

Train data:

Test data:

Strategy

Updates 06/03/19

Updates 02/03/19

Updates 01/03/19

About

Releases

Packages

Languages

RubenAMtz/earthquake-prediction

Folders and files

Latest commit

History

Repository files navigation

LANL Earthquake Prediction

Kaggle Competition: Link

Public Results

Data structure

File descriptions

Train data:

Test data:

Strategy

Updates 06/03/19

Updates 02/03/19

Updates 01/03/19

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages