Skip to content

Commit

Permalink
Feature internal 56 METdataio validate payload (#336)
Browse files Browse the repository at this point in the history
* Internal issue #56 XML schema for validating to prevent DoS via large payload, recursive payloads, ill-formed XML specification file

* schema for validating the payload (recursive, excessively large)

* Delete METdbLoad/ush/specification_schema.xsd

* internal issue #56 validating payload using XML schema

* internal issue #56 invalid XML spec files used to test XML validation

* internal issue #56 add the location of the XML schema file to be used in validating the XML specification file

* internal issue #56 tests added to verify validation code is providing expected results

* internal issue#56 added new fixture used in testing XML validation code

* fix import for read_load_xml module

* removed extraneous ',' in import

* Working version but still needs to check for recursive payloads for some elements

* Valid XML that is used for real-world data and is valid XML

* Change the name of the XML schema file

* Use the full_example.xml file instead of the test_load_specification.xml for testing against a valid XML file

* Delete METdbLoad/ush/load_specification.xsd

* Test for recursive payload in load_val fields

* Added test for recursion under the load_val complex type

* Change values to prevent recursive payloads and remove defunct regex

* Remove unused imports, add test for recursion under the load_val fields

* Add some extra elements

* Add more recursive elements to trigger ValueError

* skip testing the recursion in load_val

* Remove limit to number of load_val elements

* Fixed incorrect skip syntax

* Reinstate the maxOccurs and minOccurs for the field

* Update temporary XML spec file to match load_specification_schema.xsd

* Config file for testing recursive payload in the fields element

* Work-in-progress.  Recursive payloads checked for some elements but no checking for large payloads

* Added test for recursive payload for fields (in addition to test for recursive val elements)

* Update tests and test config files

* Additional test configuration files

* updated schema, now working

* modified test configuration file

* Allow '-' in regex for limited string type

* Updated file so it is valid with respect to the schema

* Include testing one of the xml specification files used in testing two databases

* Reformat code for easier reading, update the load_met_gha_new.xml file to be valid

* include testing the load_met_gha_new.xml file

* Updated: reformatted and updated to conform to schema

* Added an extra date_list element

* added testing xml specification file with more than one date_list

* Clean up unnecessary comments

* Update number of date_list items

* Explicitly set minLength and maxLength for hostname, db name, password, etc.

* allow password to be string type and limit length of password

* comment out mysql commands. ci-run-all-cases

* Fix comment

* Remove main function with hard-coded paths. Only useful during development.
  • Loading branch information
bikegeek authored Oct 11, 2024
1 parent 02f3a00 commit b4eb935
Show file tree
Hide file tree
Showing 15 changed files with 902 additions and 85 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/compare_db.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -85,14 +85,14 @@ jobs:
export PYTHONPATH=${GITHUB_WORKSPACE}/baseold
cd ${GITHUB_WORKSPACE}/baseold/METdbLoad/ush
python met_db_load.py ${GITHUB_WORKSPACE}/headnew/METdbLoad/test/load_met_gha_prod.xml
mysql -e 'SHOW TABLE STATUS WHERE `rows` > 0;' -uroot -proot mv_ci_prod
#mysql -e 'SHOW TABLE STATUS WHERE `rows` > 0;' -uroot -proot mv_ci_prod
- name: run METdbload new
shell: bash
run: |
export PYTHONPATH=${GITHUB_WORKSPACE}/headnew
cd ${GITHUB_WORKSPACE}/headnew/METdbLoad/ush
python met_db_load.py ${GITHUB_WORKSPACE}/headnew/METdbLoad/test/load_met_gha_new.xml
mysql -e 'SHOW TABLE STATUS WHERE `rows` > 0;' -uroot -proot mv_ci_new
#mysql -e 'SHOW TABLE STATUS WHERE `rows` > 0;' -uroot -proot mv_ci_new
- name: run test_tables to compare tables in 2 databases
shell: bash
run: python ${GITHUB_WORKSPACE}/headnew/METdbLoad/test/test_tables.py
26 changes: 23 additions & 3 deletions METdbLoad/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@
from METdataio.METdbLoad.ush.run_sql import RunSql
from METdataio.METdbLoad.test.utils import (
get_xml_test_file,
POINT_STAT_DATA_DIR,
POINT_STAT_DATA_DIR
)

from METdbLoad.ush.read_load_xml import XmlLoadFile

# add METdataio directory to path so packages can be found
TOP_DIR = str(Path(__file__).parents[1])
Expand Down Expand Up @@ -129,4 +129,24 @@ def load_and_read_xml(

@pytest.fixture
def mock_logger():
return MagicMock()
return MagicMock()

@pytest.fixture
def get_specified_xml_loadfile( ) -> XmlLoadFile:
"""
Retrieve the specified XML load specification filee. This is useful for using different XML
specification file for validating against recursive payloads, large payloads, etc.
Args:
xml_filename: The name of the XML file of interest
Returns:
XML_LOAD_FILE: The XmlLoadFile instance corresponding to the XML specification file specified by path
and filename
"""
def get_xml_spec_file(xml_path:str, xml_filename:str):
full_xml_filename = os.path.join(xml_path, xml_filename)
XML_LOAD_FILE = XmlLoadFile(full_xml_filename)

return XML_LOAD_FILE
return get_xml_spec_file
67 changes: 67 additions & 0 deletions METdbLoad/test/full_example.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
<load_spec>
<connection>
<host>mohawk.rap.ucar.edu:3306</host>
<database>mv_rtps_href_spring_2022</database>
<user>mvuser</user>
<password>mvuser</password>
</connection>

<date_list name="folder_dates">
<start>2022050100</start>
<end>2022051200</end>
<inc>86400</inc>
<format>yyyyMMddHH</format>
</date_list>
<date_list name="valid_dates">
<start>2022050100</start>
<end>2022051200</end>
<inc>0600</inc>
<format>yyyyMMddHH</format>
</date_list>

<verbose>false</verbose>
<insert_size>1</insert_size>
<stat_header_db_check>False</stat_header_db_check>
<mode_header_db_check>false</mode_header_db_check>
<drop_indexes>FALSE</drop_indexes>
<apply_indexes>true</apply_indexes>
<load_stat>True</load_stat>
<load_mode>false</load_mode>
<load_mpr>false</load_mpr>
<group>Regional Ensemble</group>

<folder_tmpl>/var/autofs/mnt/mandan_d2/projects/RRFS/prototype/met_out/{config}/{fcst_init}/{mem}/metprd/{met_out}/</folder_tmpl>

<load_val>
<field name="fcst_init">
<date_list name="folder_dates"/>
</field>
<field name="valid_times">
<date_list name="valid_dates"/>
</field>
<field name="config">
<val>HREF_lag_offset</val>
<val>RTPS</val>
</field>
<field name="mem">
<val>mem01</val>
<val>mem02</val>
<val>mem03</val>
<val>mem04</val>
<val>mem05</val>
<val>mem06</val>
<val>mem07</val>
<val>mem08</val>
<val>mem09</val>
<val>mem10</val>
</field>
<field name="met_out">
<val>grid_stat_cmn</val>
<val>point_stat_cmn</val>
</field>
</load_val>

<load_xml>true</load_xml>
<load_note>Load HREF and RTPS data for Spring 2022.</load_note>

</load_spec>
79 changes: 40 additions & 39 deletions METdbLoad/test/load_met_gha_new.xml
Original file line number Diff line number Diff line change
@@ -1,42 +1,43 @@
<load_spec>
<connection>
<management_system>mysql</management_system>
<host>localhost:3306</host>
<database>mv_ci_new</database>
<user>root</user>
<password>root</password>
</connection>

<folder_tmpl>/home/runner/work/METdataio/METdataio/metdata/met_out/{met_tool}</folder_tmpl>

<!-- <met_version>V9.1</met_version>-->

<verbose>true</verbose>
<insert_size>1</insert_size>
<stat_header_db_check>true</stat_header_db_check>
<mode_header_db_check>true</mode_header_db_check>
<mtd_header_db_check>true</mtd_header_db_check>
<drop_indexes>false</drop_indexes>
<apply_indexes>false</apply_indexes>

<load_stat>true</load_stat>
<load_mode>true</load_mode>
<load_mtd>true</load_mtd>
<load_mpr>false</load_mpr>
<load_orank>true</load_orank>

<load_val>
<field name="met_tool">
<val>ensemble_stat</val>
<val>grid_stat</val>
<val>mode</val>
<val>point_stat</val>
<val>stat_analysis</val>
<val>wavelet_stat</val>
</field>
</load_val>

<group>METplus-Training</group>
<description>MET output generated by make test.</description>
<connection>
<management_system>mysql</management_system>
<host>localhost:3306</host>
<database>mv_ci_new</database>
<user>root</user>
<password>root</password>
</connection>


<!-- <met_version>V9.1</met_version>-->

<verbose>true</verbose>
<insert_size>1</insert_size>
<stat_header_db_check>true</stat_header_db_check>
<mode_header_db_check>true</mode_header_db_check>
<mtd_header_db_check>true</mtd_header_db_check>
<drop_indexes>false</drop_indexes>
<apply_indexes>false</apply_indexes>


<load_stat>true</load_stat>
<load_mode>true</load_mode>
<load_mtd>true</load_mtd>
<load_mpr>false</load_mpr>
<load_orank>true</load_orank>
<group>METplus-Training</group>
<description>MET output generated by make test.</description>

<folder_tmpl>/home/runner/work/METdataio/METdataio/metdata/met_out/{met_tool}</folder_tmpl>
<load_val>
<field name="met_tool">
<val>ensemble_stat</val>
<val>grid_stat</val>
<val>mode</val>
<val>point_stat</val>
<val>stat_analysis</val>
<val>wavelet_stat</val>
</field>
</load_val>


</load_spec>
66 changes: 33 additions & 33 deletions METdbLoad/test/load_met_gha_prod.xml
Original file line number Diff line number Diff line change
@@ -1,42 +1,42 @@
<load_spec>
<connection>
<management_system>mysql</management_system>
<host>localhost:3306</host>
<database>mv_ci_prod</database>
<user>root</user>
<password>root</password>
</connection>
<connection>
<management_system>mysql</management_system>
<host>localhost:3306</host>
<database>mv_ci_prod</database>
<user>root</user>
<password>root</password>
</connection>

<folder_tmpl>/home/runner/work/METdataio/METdataio/metdata/met_out/{met_tool}</folder_tmpl>

<!-- <met_version>V9.1</met_version>-->
<!-- <met_version>V9.1</met_version>-->

<verbose>true</verbose>
<insert_size>1</insert_size>
<stat_header_db_check>true</stat_header_db_check>
<mode_header_db_check>true</mode_header_db_check>
<mtd_header_db_check>true</mtd_header_db_check>
<drop_indexes>false</drop_indexes>
<apply_indexes>false</apply_indexes>
<verbose>true</verbose>
<insert_size>1</insert_size>
<stat_header_db_check>true</stat_header_db_check>
<mode_header_db_check>true</mode_header_db_check>
<mtd_header_db_check>true</mtd_header_db_check>
<drop_indexes>false</drop_indexes>
<apply_indexes>false</apply_indexes>

<load_stat>true</load_stat>
<load_mode>true</load_mode>
<load_mtd>true</load_mtd>
<load_mpr>false</load_mpr>
<load_orank>true</load_orank>
<load_stat>true</load_stat>
<load_mode>true</load_mode>
<load_mtd>true</load_mtd>
<load_mpr>false</load_mpr>
<load_orank>true</load_orank>
<group>METplus-Training</group>
<description>MET output generated by make test.</description>

<load_val>
<field name="met_tool">
<val>ensemble_stat</val>
<val>grid_stat</val>
<val>mode</val>
<val>point_stat</val>
<val>stat_analysis</val>
<val>wavelet_stat</val>
</field>
</load_val>
<folder_tmpl>/home/runner/work/METdataio/METdataio/metdata/met_out/{met_tool}</folder_tmpl>
<load_val>
<field name="met_tool">
<val>ensemble_stat</val>
<val>grid_stat</val>
<val>mode</val>
<val>point_stat</val>
<val>stat_analysis</val>
<val>wavelet_stat</val>
</field>
</load_val>

<group>METplus-Training</group>
<description>MET output generated by make test.</description>

</load_spec>
58 changes: 58 additions & 0 deletions METdbLoad/test/modified_example.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
<load_spec>
<connection>
<host>mohawk.rap.ucar.edu:3306</host>
<database>mv_rtps_href_spring_2022</database>
<user>mvuser</user>
<password>mvuser</password>
</connection>

<date_list name="folder_dates">
<start>2022050100</start>
<end>2022051200</end>
<inc>86400</inc>
<format>yyyyMMddHH</format>
</date_list>

<verbose>false</verbose>
<insert_size>1</insert_size>
<stat_header_db_check>False</stat_header_db_check>
<mode_header_db_check>false</mode_header_db_check>
<drop_indexes>FALSE</drop_indexes>
<apply_indexes>true</apply_indexes>
<load_stat>True</load_stat>
<load_mode>false</load_mode>
<load_mpr>false</load_mpr>
<group>Regional Ensemble</group>

<folder_tmpl>/var/autofs/mnt/mandan_d2/projects/RRFS/prototype/met_out/{config}/{mem}/{fcst_init}/{met_out}</folder_tmpl>

<load_val>
<field name="config">
<val>HREF_lag_offset</val>
<val>RTPS</val>
</field>
<field name="mem">
<val>mem01</val>
<val>mem02</val>
<val>mem03</val>
<val>mem04</val>
<val>mem05</val>
<val>mem06</val>
<val>mem07</val>
<val>mem08</val>
<val>mem09</val>
<val>mem10</val>
</field>
<field name="met_out">
<val>grid_stat_cmn</val>
<val>point_stat_cmn</val>
</field>
<field name="fcst_init">
<date_list name="folder_dates"/>
</field>
</load_val>

<load_xml>true</load_xml>
<load_note>Load HREF and RTPS data for Spring 2022.</load_note>

</load_spec>
12 changes: 7 additions & 5 deletions METdbLoad/test/test_load_specification.xml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<password>root_password</password>
</connection>

<folder_tmpl>/METdataio/METreformat/test/data/point_stat</folder_tmpl>

<verbose>true</verbose>
<insert_size>1</insert_size>
<stat_header_db_check>true</stat_header_db_check>
Expand All @@ -20,11 +20,13 @@
<load_mtd>true</load_mtd>
<load_mpr>true</load_mpr>
<load_orank>true</load_orank>
<force_dup_file>false</force_dup_file>
<group>Testing</group>
<description>testing DB load</description>
<folder_tmpl>/METdataio/METreformat/test/data/{met_tool}</folder_tmpl>
<load_val>
<field name="met_tool">
<val>point_stat</val>
<val>mode</val>
</field>
</load_val>
<group>Testing</group>
<description>testing DB load</description>
</load_spec>
</load_spec>
Loading

0 comments on commit b4eb935

Please sign in to comment.