Skip to content

Commit

Permalink
fix paper typo
Browse files Browse the repository at this point in the history
  • Loading branch information
chenyangkang committed Feb 23, 2024
1 parent 342b42c commit 0274325
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion paper/md_paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Spatio-temporal big data is an increasingly valuable resource for modern ecologi

Some species distribution modeling (SDM) approaches were brought forward to adjust for bias in citizen science and model on the unobserved components [@bird_statistical_2014]. Still, many failed to account for the autocorrelation of space and time [@f_dormann_methods_2007], which is especially crucial in modeling inherently spatio-temporal biological events with variations at different scales [@chave_problem_2013; @levin_problem_1992], such as seasonal migration. Adaptive Spatio-Temporal Exploratory Model (AdaSTEM) is a semi-parameterized machine learning model that leverages the spatio-temporal adjacency information of sample points to model occurrence or abundance of species [@fink_adaptive_2013]. A QuadTree algorithm [@samet_quadtree_1984] is implemented to split data into smaller spatio-temporal grids (called stixels) conditional on the data abundance, with more abundant data allowing stixels to be divided into finer resolution (up to a maximum). Stixels with sample size less than a certain threshold will not be modeled; instead, these stixels will be labeled as unpredictable. This procedure controls the degree of model extrapolation (known as "long-distance prediction" problem in spatial settings) and reduces overfitting. A base model is trained for each stixel, that is, targets are only modeled on their adjacent information in space and time. Splitting-training is carried out several times to generate multiple ensembles. Finally, prediction results were aggregated across these ensembles.

AdaSTEM shows the capacity of supporting large scale spatio-temporal ecological data modeling in many studies [@fink_modeling_2020; @fuentes2023birdflow; @la_sorte_seasonal_2022], espetially for modeling animal abundance at different scales [@fink_adaptive_2013]. One well-known application of AdaSTEM is the weekly abundance map of eBird Status and Trend product [@FinkStatusTrend2022], which was widely used as data sources of abundance data of bird populations [@bird_statistical_2014; @jarzyna_decoupled_2023; @lin_using_2022]. The application of AdaSTEM could be extended to other fields with similar data structure and spatio-temporal dependence, for example, epidemiology. Despite the foreseeable significant role of spatio-temporal big data in the coming decades of scientific research, the development of tools has not necessarily kept pace.
AdaSTEM shows the capacity of supporting large scale spatio-temporal ecological data modeling in many studies [@fink_modeling_2020; @fuentes2023birdflow; @la_sorte_seasonal_2022], especially for modeling animal abundance at different scales [@fink_adaptive_2013]. One well-known application of AdaSTEM is the weekly abundance map of eBird Status and Trend product [@FinkStatusTrend2022], which was widely used as data sources of abundance data of bird populations [@bird_statistical_2014; @jarzyna_decoupled_2023; @lin_using_2022]. The application of AdaSTEM could be extended to other fields with similar data structure and spatio-temporal dependence, for example, epidemiology. Despite the foreseeable significant role of spatio-temporal big data in the coming decades of scientific research, the development of tools has not necessarily kept pace.

Stemflow is positioned as a user-friendly Python package to meet the need of general application of modeling spatio-temporal large datasets. Scikit-learn style object-oriented modeling pipeline enables concise model construction with compact parameterization at the user end, while the rest of the modeling procedures are carried out under the hood. Once the fitting method is called, the model class recursively splits the input training data into smaller spatio-temporal stixels using QuadTree algorithm. For each of the stixels, a base model is trained only using data falls into that stixel. Stixels are then aggregated and constitute an ensemble. In the prediction phase, stemflow queries stixels for the input data according to their spatial and temporal index, followed by corresponding base model prediction. Finally, prediction results are aggregated across ensembles to generate robust estimations (see @fink_adaptive_2013 and stemflow documentation for details).

Expand Down

0 comments on commit 0274325

Please sign in to comment.