-
Install environment with
conda install env.yml
Alternative, if it doesn't work due to incompatible
cudatoolkit
, you can edit theenv.yml
by specifying a compatiblecudatoolkit
(more info here).Utility function to export .yml :
conda env export | grep -v "^prefix: " > env.yml
- Activate your environment
conda activate price_match_env
- Install the kernel
python -m ipykernel install --user --display_name "price_match_env"
- Spin off your jupyter notebook as usual
jupyter notebook
Normally, only GPU-dependent modules are problematic.
- Activate the environment
conda activate price_match
- Bring up python shell
python
- Import and check
import tensorflow as tf ; tf.test.is_gpu_available() # Should return True
- If you receive error similar to this
Could not load dynamic library 'libcudart.so.11.0'
, then you need to set your environment variable to point to the the correct folder that contains the library (likely in/home/<your_username>/anaconda3/envs/price_match/lib/
). If you only want this variable to be set when your conda env is active, follow this guide here
- Step 1 and 2 of test
tensorflow
installation - Import and check
import torch ; torch.cuda.get_device_name() # Should return your NVIDA GPU name
-
Step 1 and 2 of test
tensorflow
installation -
Since there is no utility function that helps to check if gpu is available for
xgboost
, need to write some sample code to test:import numpy as np import xgboost as xgb n = 10_000 m = 100 X = np.random.randn(n, m) y = np.random.randn(n) exp_models = [] for i in range(3): # As long as this runs with no problem, gpu support should be ok clf = xgb.XGBRegressor( tree_method='gpu_hist', eta=0.1, max_depth=6, verbosity=0) trained_model = clf.fit(X, y, verbose=False)
- Sign up for a Neptune account here.
- Get your Neptune API Token (on your neptune.ai console, select your profile icon on the top right corner -> Get Your API Token)
- Create a new project (e.g.
My Shopee Price Match Project
) - In your local environment's root, create a
.env
file with the following lines:NEPTUNE_TOKEN="<your_api_token>" PROJECT_NAME="<your_neptune_username>/<your_neptune_project_name>"
model
├── efficient_net_b3
│ └── pretrained
│ └── efficientnet_b3.pth
└── indobert_lite_p2
├── pretrained
│ ├── config.json
│ ├── pytorch_model.bin
│ ├── README.md
│ ├── special_tokens_map.json
│ ├── tf_model.h5
│ ├── tokenizer_config.json
│ └── vocab.txt
└── tokenizer
├── special_tokens_map.json
├── tokenizer_config.json
└── vocab.txt
data
└── raw
├── train_images
│ ├── 0a0d257d1127f7d4298a7753875b372a.jpg
│ ├── 0a1ad1756ba6219eb2359fd3ed2a7082.jpg
│ └── 0a1c01e1b84cc6c6655dbf886fd72ead.jpg
└── train_split_v3.csv
- Simply run
bash train_model.sh
- The following models will be run:
- Indobert Lite P2 (NLP) is trained using all data (i.e. no validation)
- Efficientnet B3 (IMG) is trained using 4 folds validation (Grouped K Fold - each fold consists of unique label groups)
- Both training and inference notebooks are provided in
kaggle_notebooks
- To train models on Kaggle environment, you need to provide your Github and Neptune tokens using Kaggle Secrets (In a new Kaggle notebook, navigate to Add-ons -> Secrets; Make sure internet access is enabled)
- For submission, since internet access is disabled, you need to install the required packages (
Faiss
andTIMM
) using the wheels uploaded (use the following or upload your own): - Also, upload both the pretrained weights and trained weights of the NLP and IMG models to Kaggle as datasets, and attach them to your submission notebook (check that the paths in the notebook are pointing to the right folder - depending on how you named the uploaded files)