Skip to content

Commit

Permalink
Aug 29
Browse files Browse the repository at this point in the history
  • Loading branch information
chenyangkang committed Aug 29, 2023
1 parent b506838 commit 1cec863
Show file tree
Hide file tree
Showing 12 changed files with 103 additions and 15 deletions.
1 change: 1 addition & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,6 @@ jobs:
key: ${{ github.ref }}
path: .cache
- run: pip install mkdocs-material
- run: pip install faraday-retry
- run: pip install pillow cairosvg
- run: mkdocs gh-deploy --force
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@
*.pkl
*.gif
*/__pycache__/*
__pycache__/*
Binary file modified BirdSTEM/.DS_Store
Binary file not shown.
Binary file added BirdSTEM/dataset/.DS_Store
Binary file not shown.
Binary file modified BirdSTEM/model/__pycache__/AdaSTEM.cpython-39.pyc
Binary file not shown.
Binary file modified BirdSTEM/model/__pycache__/Hurdle.cpython-39.pyc
Binary file not shown.
Binary file modified BirdSTEM/model/__pycache__/dummy_model.cpython-39.pyc
Binary file not shown.
Binary file modified BirdSTEM/utils/__pycache__/plot_gif.cpython-39.pyc
Binary file not shown.
Binary file modified BirdSTEM/utils/__pycache__/quadtree.cpython-39.pyc
Binary file not shown.
65 changes: 63 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,72 @@
# BirdSTEM
Daily Abundance & phenology estimation using eBird citizen science data

** Under Construction **
## Brief introduction
Bird STEM is a AdaSTEM model for daily abundance estimation using eBird citizen science data. It leverage the "adjacency" information of surrounding bird observation in space and time, to predict the occurence and abundance of target spatial-temporal point. In the demo, we use a two-step hurdle model with XGBoostClassifier for occurence modeling and XGBoostRegressor for abundance modeling.

User can define the size of stixel (spatial temporal pixel) in terms of space and time. Larger stixel guarantee generalizability but lose precision in fine resolution; Smaller stixel may have better predictability and reduced extrapolability for points outside the stixel.

In the demo, we first split the training data using temporal sliding windows with size of 50 DOY and step of 20 DOY (`temporal_start = 0`, `temporal_end=366`, `temporal_step=20`, `temporal_bin_interval = 50`). For each temporal slice, a spatial gridding is applied, where we force the stixel to be split into smaller 1/4 pieces if the edge is larger than 50 units (measured in longitude and latitude, `grid_len_lon_upper_threshold=50`, `grid_len_lat_upper_threshold=50`), and stop splitting to prevent the edge length to shrink to less than 10 units (`grid_len_lon_lower_threshold=10`, `grid_len_lat_lower_threshold=10`) or containing less than 50 checklists (`points_lower_threshold = 50`).

This process is excecuted 10 times (`ensemble_fold = 10`), each time with random jitter and random rotation of the gridding, generating 10 ensembles. In the prediciton phase, only spatial-temporal points with more than 7 (`min_ensemble_required = 7`)ensemble usable are predicted (otherwise, set as `np.nan`)

Fitting and prediction follow the convention of sklearn estimator class:

```
## fit
model.fit(X_train,y_train)
## predict
pred_mean, pred_std = model.predict(X_test)
pred_mean = np.where(pred_mean>0, pred_mean, 0)
```

Where the pred_mean and pred_std are the mean and standard deviation of the predicted value across ensembles.


## Full usage:

```py
from BirdSTEM.model.AdaSTEM import AdaSTEM, AdaSTEMHurdle
from BirdSTEM.model.Hurdle import Hurdle
from xgboost import XGBClassifier, XGBRegressor

SAVE_DIR = './'

base_model = Hurdle(classifier=XGBClassifier(tree_method='hist',random_state=42, verbosity = 0, n_jobs=1),
regressor=XGBRegressor(tree_method='hist',random_state=42, verbosity = 0, n_jobs=1))



model = AdaSTEMHurdle(base_model=base_model,
ensemble_fold = 10,
min_ensemble_required= 7,
grid_len_lon_upper_threshold=50,
grid_len_lon_lower_threshold=10,
grid_len_lat_upper_threshold=50,
grid_len_lat_lower_threshold=10,
points_lower_threshold = 50,
temporal_start = 0, temporal_end=366, temporal_step=20, temporal_bin_interval = 50,
stixel_training_size_threshold = 50, ## important, should be consistent with points_lower_threshold
save_gridding_plot = True,
save_tmp = True,
save_dir=SAVE_DIR,
sample_weights_for_classifier=True)

## fit
model.fit(X_train,y_train)

## predict
pred_mean, pred_std = model.predict(X_test)
pred_mean = np.where(pred_mean>0, pred_mean, 0)
eval_metrics = AdaSTEM.eval_STEM_res('hurdle',y_test, pred_mean)
print(eval_metrics)

```


----
# Documentation:
## Documentation:
[BirdSTEM Documentation](https://chenyangkang.github.io/BirdSTEM/)
<!-- BirdSTEM -->

Expand Down
7 changes: 3 additions & 4 deletions docs/QuickStart.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# Quick Start


## Fit an AdaSTEM model
```py
from BirdSTEM.model.AdaSTEM import AdaSTEM, AdaSTEMHurdle
from BirdSTEM.model.Hurdle import Hurdle
Expand All @@ -13,6 +11,7 @@ base_model = Hurdle(classifier=XGBClassifier(tree_method='hist',random_state=42,
regressor=XGBRegressor(tree_method='hist',random_state=42, verbosity = 0, n_jobs=1))



model = AdaSTEMHurdle(base_model=base_model,
ensemble_fold = 10,
min_ensemble_required= 7,
Expand All @@ -21,7 +20,7 @@ model = AdaSTEMHurdle(base_model=base_model,
grid_len_lat_upper_threshold=50,
grid_len_lat_lower_threshold=10,
points_lower_threshold = 50,
temporal_start = 0, temporal_end=1400, temporal_step=100, temporal_bin_interval = 100,
temporal_start = 0, temporal_end=366, temporal_step=20, temporal_bin_interval = 50,
stixel_training_size_threshold = 50, ## important, should be consistent with points_lower_threshold
save_gridding_plot = True,
save_tmp = True,
Expand All @@ -37,4 +36,4 @@ pred_mean = np.where(pred_mean>0, pred_mean, 0)
eval_metrics = AdaSTEM.eval_STEM_res('hurdle',y_test, pred_mean)
print(eval_metrics)

```
```
44 changes: 35 additions & 9 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,34 @@

<!-- For full documentation visit [mkdocs.org](https://www.mkdocs.org). -->

## Commands
# BirdSTEM
Daily Abundance &amp; phenology estimation using eBird citizen science data

## Installation
## Brief introduction
Bird STEM is a AdaSTEM model for daily abundance estimation using eBird citizen science data. It leverage the "adjacency" information of surrounding bird observation in space and time, to predict the occurence and abundance of target spatial-temporal point. In the demo, we use a two-step hurdle model with XGBoostClassifier for occurence modeling and XGBoostRegressor for abundance modeling.

User can define the size of stixel (spatial temporal pixel) in terms of space and time. Larger stixel guarantee generalizability but lose precision in fine resolution; Smaller stixel may have better predictability and reduced extrapolability for points outside the stixel.

In the demo, we first split the training data using temporal sliding windows with size of 50 DOY and step of 20 DOY (`temporal_start = 0`, `temporal_end=366`, `temporal_step=20`, `temporal_bin_interval = 50`). For each temporal slice, a spatial gridding is applied, where we force the stixel to be split into smaller 1/4 pieces if the edge is larger than 50 units (measured in longitude and latitude, `grid_len_lon_upper_threshold=50`, `grid_len_lat_upper_threshold=50`), and stop splitting to prevent the edge length to shrink to less than 10 units (`grid_len_lon_lower_threshold=10`, `grid_len_lat_lower_threshold=10`) or containing less than 50 checklists (`points_lower_threshold = 50`).

This process is excecuted 10 times (`ensemble_fold = 10`), each time with random jitter and random rotation of the gridding, generating 10 ensembles. In the prediciton phase, only spatial-temporal points with more than 7 (`min_ensemble_required = 7`)ensemble usable are predicted (otherwise, set as `np.nan`)

Fitting and prediction follow the convention of sklearn estimator class:

```
## fit
model.fit(X_train,y_train)
## predict
pred_mean, pred_std = model.predict(X_test)
pred_mean = np.where(pred_mean>0, pred_mean, 0)
```

Where the pred_mean and pred_std are the mean and standard deviation of the predicted value across ensembles.


## Full usage:

## Fit an AdaSTEM model
```py
from BirdSTEM.model.AdaSTEM import AdaSTEM, AdaSTEMHurdle
from BirdSTEM.model.Hurdle import Hurdle
Expand All @@ -18,6 +41,7 @@ base_model = Hurdle(classifier=XGBClassifier(tree_method='hist',random_state=42,
regressor=XGBRegressor(tree_method='hist',random_state=42, verbosity = 0, n_jobs=1))



model = AdaSTEMHurdle(base_model=base_model,
ensemble_fold = 10,
min_ensemble_required= 7,
Expand All @@ -26,7 +50,7 @@ model = AdaSTEMHurdle(base_model=base_model,
grid_len_lat_upper_threshold=50,
grid_len_lat_lower_threshold=10,
points_lower_threshold = 50,
temporal_start = 0, temporal_end=1400, temporal_step=100, temporal_bin_interval = 100,
temporal_start = 0, temporal_end=366, temporal_step=20, temporal_bin_interval = 50,
stixel_training_size_threshold = 50, ## important, should be consistent with points_lower_threshold
save_gridding_plot = True,
save_tmp = True,
Expand All @@ -44,9 +68,11 @@ print(eval_metrics)

```

<!-- ## Project layout

mkdocs.yml # The configuration file.
docs/
index.md # The documentation homepage.
... # Other markdown pages, images and other files. -->
----
## Documentation:
[BirdSTEM Documentation](https://chenyangkang.github.io/BirdSTEM/)
<!-- BirdSTEM -->

----
![QuadTree example](QuadTree.png)

0 comments on commit 1cec863

Please sign in to comment.