Baseline solution for NeurIPS 2022 Ariel Data Challenge

Inside this repo you will find the baseline solution for the Ariel Data Challenge. To run the script you will need access to the training and test data, both of which can be found here. There are two ways to run the baseline:

via command line:

python baseline_MCDropout.py --training PATH/TO/TRAININGDATA/ --test PATH/TO/TESTDATA

via jupyter notebook, baseline - MCDropout-Public.ipynb

Description

We trained a neural network to perform a supervised multi-target regression task. The architecture of the network is modified from the CNN network as described in Yip et al..

Preprocessing Steps

We used the first 5000 data instances to train the model
We augmented the data with the observation noise
Used stellar and planetary radii as additional features
Standardised both inputs and output

At test time we performed Monte Carlo Dropout to provide a mutlivariate distribution for each test example. Samples from the mutlivariate distribution is submitted to the regular track. Quartiles estimates are extracted from the same distribution to submit to the light track.

Metrics

We have inlcuded the metric we used to compute score for light track and regular track. Please note that the regular could be quite slow. We have used the POT python package to compute the Wessestein-2 distance.

Things to improve

There are different direction to take from here on, let us summarise the shortcoming of this model:

The data preprocessing is quite simplitic and could have invested more efforts.
we have only used 5000 data points, instead of the full dataset
we didnt train the model with results from the retrieval (QuartilesTable.csv for Light Track and Tracedata.hdf5 for Regular Track), which are the GT for this competition.
The conditional distribution from MCDropout is very restricted and Gaussian-like
So far we havent considered the atmospheric targets as a joint distribution
We have only used stellar radius and planet radius from the auxillary information
We have not done any hyperparameter tuning
the train test split here is not clean, as in, we split the data after we have augmented the data, which results in information leakage to the validation data. There is no leakage to the test data though.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
MCDropout.py		MCDropout.py
baseline - MCDropout-Public.ipynb		baseline - MCDropout-Public.ipynb
baseline_MCDropout.py		baseline_MCDropout.py
helper.py		helper.py
metric_light_track.py		metric_light_track.py
metric_regular_track.py		metric_regular_track.py
preprocessing.py		preprocessing.py
readme.md		readme.md
requirements.txt		requirements.txt
submit_format.py		submit_format.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Baseline solution for NeurIPS 2022 Ariel Data Challenge

Description

Preprocessing Steps

Metrics

Things to improve

About

Releases

Packages

Languages

ucl-exoplanets/NeurIPS2022_Baseline

Folders and files

Latest commit

History

Repository files navigation

Baseline solution for NeurIPS 2022 Ariel Data Challenge

Description

Preprocessing Steps

Metrics

Things to improve

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages