Skip to content

stephen-huan/conditional-knn

Repository files navigation

conditional-knn

Source code for the paper Sparse Cholesky factorization by greedy conditional selection.

Installing

Install dependencies from environment.yml with conda or mamba:

conda env create --prefix ./venv --file environment.yml

or from a non-explicit spec file (platform may need to match):

conda create --prefix ./venv --file linux-64-spec-list.txt

or from an explicit spec file (platform must match):

conda create --prefix ./venv --file linux-64-explicit-spec-list.txt

See managing environments for more information.

Activate conda environment:

conda activate ./venv

Build Cython extensions:

python setup.py build_ext --inplace

Intel oneMKL with conda

We rely on the Intel oneMKL library to provide fast numerical routines.

Make sure that numpy and scipy also use the MKL for BLAS and LAPACK by checking the output of

python -c "import numpy; numpy.__config__.show()"

which should show something like

blas_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['.../venv/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['.../venv/include']
...
lapack_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['.../venv/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['.../venv/include']
...

and similarly for

python -c "import scipy; scipy.__config__.show()"

conda install numpy from the defaults or anaconda channel (not conda-forge) should work, but it sometimes doesn't play well with installing mkl-devel. It's easiest just to use the intel channel.

Downloading datasets

We use datasets from the SuiteSparse Matrix Collection, the UCI Machine Learning Repository, LIBSVM, and the book Gaussian Processes for Machine Learning. Download the datasets with the provided fish script:

chmod +x get_datasets
./get_datasets

OCO-2 data

Downloading the dataset

Navigate to the OCO-2 solar induced fluorescence (SIF) dataset. Note that the (current) latest version of the dataset is 11r, but this might change in the future. If the above link doesn't work, be sure to directly search for the OCO2_L2_Lite_SIF dataset.

Click on the "Online Archive" blue button on right and then on the 2017 folder. Each file is a different day.

Note that in order to download files, an Earthdata account must be created.

Post-processing

First install R and NetCDF using your preferred package manger.

sudo pacman -S r netcdf

In order to install R packages locally, follow the instructions here to create the default R_LIBS_USER.

mkdir -p ~/R/x86_64-pc-linux-gnu-library/4.2/

Be sure to replace x86_64-pc-linux-gnu and 4.2 with your specific platform and R version, respectively. Running the command R --version should show you something like the below.

R version 4.2.3 (2023-03-15) -- "Shortstop Beagle"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

Next, start R and enter the following commands into the REPL to install the packages.

> install.packages("renv", repos = "https://cloud.r-project.org")
> renv::restore()

The data can now be compiled with

R --file=compile_fluorescence_data.R

The compile_fluorescence_data.R script is due to Joe Guinness.

Running

Files can be run as modules:

python -m experiments.cholesky
python -m figures.factor
python -m tests.cknn_tests