Skip to content

Releases: barthoekstra/brc-data-preprocessor

Preprocessor for #BRC15 (2023)

03 Aug 06:51
352b58f
Compare
Choose a tag to compare

Changelog

Changes since 2021.1.

  • Added an extra copy of the checked data that gets added to an inprogress-backup folder on Dropbox, so data technicians can keep track of how data cleaning steps are taken.
  • The function first used a .zip deployment archive, but this has now been replaced with a Dockerized approach.

Preprocessor for #BRC13 (2021)

03 Jul 17:03
Compare
Choose a tag to compare

Changelog

Changes since 2019.1.

  • Non-Juv SteppeE and Non-Juv ImperialE will now be flagged as unexpected records.
  • WhitePel and DalPel will be renamed to WhiteP and DalmatianP in line with GBIF dataset and data paper.
  • Extraction of count times from Trektellen is made more robust by removing whitespace in regex search.
  • A ‘single station count’ mode is added for the spring counts, which can be activated by setting environment variable SINGLE_STATION_COUNT=yes.
  • Removed package versions from requirements.txt.

First preprocessor version for #BRC12 (2019)

24 Jul 19:19
Compare
Choose a tag to compare

The first version of the preprocessor, prepared for the #BRC12 2019 season. Code leading up to this release has been improved based on feedback by previous coordinators and data technicians. Changes in new releases of the preprocessor will be documented in a CHANGELOG file.

General workflow

The preprocessor runs on Amazon Lambda and regularly checks the Trektellen site for newly uploaded BRC counts. If both stations have uploaded data for the day, the fetcher will download the data and store a raw version of the data in Dropbox (in 2019/data/raw). The preprocessor subsequently checks a copy of the raw data for all kinds of possible errors and flags them by adding a description of the potential problem to a check column in the file stored in 2019/data/inprogress. It is then up to coordinators to use their experience and knowledge of the migration during a given day to determine the validity of the flags added by the preprocessor and act accordingly. Once they have dealt with these issues and emptied the check column of flags, the file can be moved to 2019/data/clean.

Flagged records

The following records will be flagged by the preprocessor:

  • Records with invalid doublecount entries (e.g. not within 10 minutes or with the wrong distance code).
  • Records containing >1 bird that is injured and/or killed (rare occurrence).
  • Records lacking critical information in datetime, telpost, speciesname, count or location columns (very unlikely, but the possible result of a bug).
  • Records of birds in >E3 (rare occurrence).
  • Records with registered morphs for all species other than Booted Eagles (and Eleonora's Falcons).
  • Records of HB_NONJUV, HB_JUV, BK_NONJUV and BK_JUV if the number of aged birds is higher than the number of counted birds (HB and BK) within a 10-minute window around the age record.
  • Records of Honey Buzzards that should probably be single-counted (at Station 2 during the HB focus period).
  • Records of aged Honey Buzzards and Black Kites outside of expected distance codes (i.e. outside of W1-O-E1).
  • Records containing unexpected combinations of sex and/or age information.
  • Records with no timestamps, which are set to 00:00:00 during processing.
  • Records containing non-protocol species.
  • Records with age details in W3, E3 and >E3, excluding non-juvenile harriers with a sex, juvenile MonPalHen and juvenile/non-juvenile eagles.
  • Records of female Pallid Harriers with I or A age (legal per protocol, though very difficult to age in the field).