Skip to content

A project to provide some scripts to check WFCatalog database consistency.

License

Notifications You must be signed in to change notification settings

EIDA/wfc-consistency

Repository files navigation

wfc-consistency

A project to provide some scripts to check WFCatalog database consistency with the node's archive, as well as take some actions regarding found inconsistencies.

Installation

  • Clone this repository.
  • Create a virtual environment and install dependencies.
cd wfc-consistency
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Files

The project consists of the following files:

check_consistency.py

This is a script used for finding inconsistencies between archive files, FDSN station metadata and WFCatalog database.

The script produces one inconsistencies_results.db SQLite database file with the following tables:

  • inconsistent_metadata, which includes the files that are orphaned (i.e. without any metadata).
  • missing_in_wfcatalog, which includes the files that are missing in WFCatalog database.
  • inconsistent_checksum, which includes the files that have inconsistent checksum in WFCatalog database (table populated only if -c option specified).
  • older_date, which includes the files that have been modified after the date they were added in WFCatalog database.
  • remove_from_wfcatalog, which includes the files that should be removed from wfcatalog (i.e. they are not in archive or are orphaned).
  • inappropriate_naming, which includes the files that their naming does not follow the usual pattern of NET.STA.LOC.CHAN.NEL.YEAR.JDAY.

The schema of all the above tables is the following:

net sta loc cha year jday fileName
Network code (text, e.g. HL) Station code (text, e.g. ATH) Location code (text, e.g. 00) Channel name (text, e.g. HHE) Year (integer, e.g. 2023) Julian day (integer, e.g. 156) File name (text, e.g. HL.ATH.00.HHE.D.2023.156)

Note: The fileName attribute is a full path for the missing_in_wfcatalog, inconsistent_checksum, older_date tables, while it is just the name of the file for the rest of the tables. This is because of the way the rest of the scripts of this project need to get the files.

The script can be executed with some options:

  • -h or --help to print a help message.
  • -s or --start followed by a number for the year to start the test (default = last year).
  • -e or --end followed by a number for the year to end the test (default = last year).
  • -x or --exclude followed by a comma-separated list of networks to be excluded from this test (e.g. XX,YY,ZZ).
  • -c or --checksum to check inconsistency of checksums in WFCatalog. Warning: this test takes too much time.

Simply execute the script with the desired options after either using appropriate environment variables or changing the paths and URLs just below import statements into the script according to your system.

For example, the below line will execute the script to find inconsistencies from the beginning of 2010 until the end of 2022:

WFCC_MONGO_URI=mongodb://localhost:27017 WFCC_ARCHIVE_PATH=/darrays/archive/ WFCC_FDSN_ENDPOINT=eida.gein.noa.gr ./check_consistency.py -s 2010 -e 2022

delete_superfluous.py

This is a script used for removing WFCatalog entries with files that do not exist in both the EIDA FDSN station output and the node's archive.

The script reads these files from the table remove_from_wfcatalog of the inconsistencies_results.db SQLite database file, which is produced by executing the check_consistency.py script.

Simply execute the script after either using appropriate environment variable or ensuring that the Mongo client just below import statements into the script is set according to your system.

For example, the below line executes the script to remove WFCatalog entries:

WFCC_MONGO_URI=mongodb://localhost:27017 ./delete_superfluous.py

add_missing.py

This is a script used for adding entries to WFCatalog for files that are missing, although do exist in both the EIDA FDSN station output and the node's archive.

The script reads these files from the table missing_in_wfcatalog of the inconsistencies_results.db SQLite database file, which is produced by executing the check_consistency.py script.

Simply execute the script after either using appropriate environment variables or changing the paths just below import statements into the script according to your system. You may also want to change the WFCatalog collector options, look below import statements into the script for doing so.

For example, the below line executes the script to add WFCatalog missing entries:

WFCC_COLLECTOR_DIR=/home/Programs/wfcatalog/collector ./add_missing.py

update_entries.py

This is a script used for updating files in WFCatalog with inconsistent checksums or older creation date than the last time modified in node's archive.

The script reads these files from the tables inconsistent_checksum and older_date of the inconsistencies_results.db SQLite database file, which is produced by executing the check_consistency.py script.

Simply execute the script after either using appropriate environment variables or changing the paths just below import statements into the script according to your system. You may also want to change the WFCatalog collector options, look below import statements into the script for doing so.

For example, the below line executes the script to update WFCatalog entries:

WFCC_COLLECTOR_DIR=/home/Programs/wfcatalog/collector ./update_entries.py

About

A project to provide some scripts to check WFCatalog database consistency.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages