Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic script to compare two BP files given a tolerance #4308

Merged
merged 7 commits into from
Aug 31, 2024

Conversation

anagainaru
Copy link
Contributor

@anagainaru anagainaru commented Aug 15, 2024

Based on a request from GX developers, this is the first version of a python script to compare the information in two BP files (given a certain tolerance). The functionality is kept similar to https://gitlab.com/remikz/nccmp

Current differences detected:

  • Metadata (number of steps, number of variables, type and dimension of each variable)
  • Stats (min/max with a tolerance) - only if the --stats option is used
  • Values (values on each position with a tolerance)

The tolerance can be absolute value or percentage.

Example metadata differences:

$ python bpcmp.py BPFileStepsWriteRead.bp BPFileStepsWriteReadKokkos.bp
COMPARE : FILE : BPFileStepsWriteRead.bp : FILE : BPFileStepsWriteReadKokkos.bp
DIFFER : NUM_STEPS : 10 <> 1 : BPFileStepsWriteRead.bp : BPFileStepsWriteReadKokkos.bp
DIFFER : AVAIL_VARS : ['bpStep'] <> [] : BPFileStepsWriteRead.bp : BPFileStepsWriteReadKokkos.bp
DIFFER : VARIABLE : bpFloats : NUM_STEPS : 10 <> 1 : BPFileStepsWriteRead.bp : BPFileStepsWriteReadKokkos.bp
DIFFER : VARIABLE : bpFloats : SHAPE : 60000 <> 4, 3 : BPFileStepsWriteRead.bp : BPFileStepsWriteReadKokkos.bp

DIFFER : VARIABLE : bpType : TYPE :  uint32_t <> float : test1.bp : test2.bp
DIFFER : VARIABLE : bpSV : SINGLE_VALUE :  true <> false : test1.bp : test2.bp

Example min/max differences:

DIFFER : VARIABLE : bpFloats : MIN : 0 <> 1 : test1.bp : test2.bp
DIFFER : VARIABLE : bpFloats : MAX : 90 <> 60089 : BPFileStepsWriteRead.bp : BPFileStepsWriteReadHip.bp

Without verbose the script only plots a summary of the value difference

DIFFER : VARIABLE : bpFloats : VALUES_SUMMARY : 599990 : BPFileStepsWriteRead.bp : BPFileStepsWriteReadHip.bp

Using --verbose the script will plot every value difference

DIFFER : VARIABLE : bpFloats : STEP : 0 : POSITION : (5,0) : VALUE : 0.0 <> 5.0 : test1.bp : test2.bp
DIFFER : VARIABLE : bpFloats : STEP : 1 : POSITION : (6,10) : VALUE : 1.0 <> 6.0 : test1.bp : test2.bp

Things that still need to be added:

  • Parallel execution (add MPI support)
  • Specific variable inclusion or exclusion.
  • Print data difference statistics (count, sum, absolute sum, min, max, range, mean, stdev) for numeric variables and compound fields.
  • Per block comparison of min/max
  • Do not read the entire values for each step to compare them. Comparing per blocks might allow multi-gigabyte files to only consume several megabytes of memory.
  • Statistics about the difference between the values (min/max/avg/std)

@anagainaru
Copy link
Contributor Author

@vicentebolea could you help me with these errors?

On macos-14 where numpy is found:

File "/Users/runner/work/ADIOS2/ADIOS2/macos-14-xcode15_4-static-serial/bin/bpcmp.py", line 1, in <module>
2024-08-20T19:42:21.3772090Z 617:     import numpy as np
2024-08-20T19:42:21.3772620Z 617: ModuleNotFoundError: No module named 'numpy'
2024-08-20T19:42:21.3773080Z 614: Traceback (most recent call last):
2024-08-20T19:42:21.3773760Z 614:   File "/Users/runner/work/ADIOS2/ADIOS2/macos-14-xcode15_4-static-serial/bin/bpcmp.py", line 1, in <module>
2024-08-20T19:42:21.3774350Z 614:     import numpy as np
2024-08-20T19:42:21.3774710Z 614: ModuleNotFoundError: No module named 'numpy'
2024-08-20T19:42:21.3794100Z 617: CMake Error at /Users/runner/work/ADIOS2/ADIOS2/macos-14-xcode15_4-static-serial/Debug/bpcmp.cmake:18 (message):
2024-08-20T19:42:21.3796980Z 617:   result: 1
2024-08-20T19:42:21.3797760Z 617: 
2024-08-20T19:42:21.3798330Z 617: 
2024-08-20T19:42:21.3799920Z  10/640 Test #617: Utils.BPcmp.Verbose.Dump ................................................................................***Failed    0.11 sec
2024-08-20T19:42:21.3803270Z         Start 617: Utils.BPcmp.Verbose.Dump

And on Ubuntu 20-04

2024-08-20T19:45:50.2123317Z 899: Working Directory: /__w/ADIOS2/ADIOS2/ubuntu20.04-clang6-serial/testing/utils/bpcmp/Utils.BPcmp
2024-08-20T19:45:50.2124278Z 899: Test timeout computed to be: 120
2024-08-20T19:45:50.5971137Z 899: Traceback (most recent call last):
2024-08-20T19:45:50.5973048Z 899:   File "/__w/ADIOS2/ADIOS2/ubuntu20.04-clang6-serial/bin/bpcmp.py", line 4, in <module>
2024-08-20T19:45:50.5973989Z 899:     from adios2 import Stream
2024-08-20T19:45:50.5974647Z 899: ModuleNotFoundError: No module named 'adios2'
2024-08-20T19:45:50.6133487Z 899: CMake Error at /__w/ADIOS2/ADIOS2/ubuntu20.04-clang6-serial/Debug/bpcmp.cmake:18 (message):
2024-08-20T19:45:50.6134771Z 899:   result: 1
2024-08-20T19:45:50.6135264Z 899: 
2024-08-20T19:45:50.6135791Z 899: 
2024-08-20T19:45:50.6143107Z  13/933 Test #899: Utils.BPcmp.Stats.Dump ..................................................................................***Failed    0.40 sec
2024-08-20T19:45:50.6144568Z         Start 899: Utils.BPcmp.Stats.Dump

@vicentebolea
Copy link
Collaborator

@anagainaru I recommend you to use this add_test wrapper:

function(python_add_test)

The reason is that it does all the fixes so it uses the correct python version and pythonmodule var

@anagainaru
Copy link
Contributor Author

Thanks ! Can I use python_add_test to store the output of the script into a file to validate it's contents?

@anagainaru
Copy link
Contributor Author

@vicentebolea I don't seem to know how to use the python test you indicated. In my case, my testing is not a python file, it's a pipeline (similar to bpls but it has a python script in the middle).
The pipeline contains a cpp file (that generates the input file for the python script), the bpcmp python script (that generates an output that is redirected to a file) and a diff (between the generated file and a redefined one).

Any advice helps, it's not straight forward to know how to take some parts of the python_add_test and add them to my test.

@vicentebolea
Copy link
Collaborator

I See, you need to extract from the python_add_test the parts that setup the needed env variables for finding adios2 python package in the build tree and also the correct python interpreter (as you already done).

  if(UNIX)
    set_property(TEST ${ARGS_NAME} PROPERTY
      ENVIRONMENT
        "PYTHONPATH=${ADIOS2_BINARY_DIR}/${CMAKE_INSTALL_PYTHONDIR}:$ENV{PYTHONPATH}"
        "PYTHONUSERBASE=${CMAKE_BINARY_DIR}"
    )
  else()
    set_property(TEST ${ARGS_NAME} PROPERTY
      ENVIRONMENT
        "PYTHONPATH=${ADIOS2_BINARY_DIR}/${CMAKE_INSTALL_PYTHONDIR};$<SHELL_PATH:$<TARGET_FILE_DIR:adios2_py>>;$ENV{PYTHONPATH}"
        "PATH=$<SHELL_PATH:$<TARGET_FILE_DIR:adios2_py>>;$<SHELL_PATH:$<TARGET_FILE_DIR:adios2_core>>;$ENV{PATH}"
        "PYTHONUSERBASE=${CMAKE_BINARY_DIR}"
    )
  endif()
  

@anagainaru
Copy link
Contributor Author

Setting the env for python did not work, @vicentebolea any other ideas?

@vicentebolea
Copy link
Collaborator

vicentebolea commented Aug 21, 2024 via email

@anagainaru
Copy link
Contributor Author

Using set() would be enough?

Something like:

set(ENV{PYTHONPATH} "All/the/paths/in/your/wrapper")

I've never set these before.

@vicentebolea
Copy link
Collaborator

vicentebolea commented Aug 21, 2024 via email

@anagainaru
Copy link
Contributor Author

@vicentebolea the tests pass ! Thanks for the help. Could you please see if this looks good to you?

Copy link
Collaborator

@vicentebolea vicentebolea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a few suggestions. I will review the python script tomorrow. Good work @anagainaru 🚀

source/utils/CMakeLists.txt Outdated Show resolved Hide resolved
source/utils/CMakeLists.txt Show resolved Hide resolved
source/utils/CMakeLists.txt Show resolved Hide resolved
source/utils/bpcmp/CMakeLists.txt Show resolved Hide resolved
source/utils/bpcmp/bpcmp.cmake.gen.in Show resolved Hide resolved
testing/utils/bpcmp/CMakeLists.txt Show resolved Hide resolved
testing/utils/bpcmp/CMakeLists.txt Outdated Show resolved Hide resolved
testing/utils/bpcmp/CMakeLists.txt Outdated Show resolved Hide resolved
testing/utils/bpcmp/CMakeLists.txt Outdated Show resolved Hide resolved
testing/utils/bpcmp/CMakeLists.txt Show resolved Hide resolved
@anagainaru
Copy link
Contributor Author

@vicentebolea thanks for the suggestions. I pushed some of the changes you mentioned.

The other changes that I did not push do make sense, but I would prefer leaving this PR consistent with the other utility scripts (the functionality and build system is mimicking bpls -- namely using PROJECT_BINARY_DIR, not using quotes for paths, not using syntax=yaml, not using the maybe convention). I think all these changes need to be done to all the scripts in source/utils not just in bpcmp.

@vicentebolea
Copy link
Collaborator

@vicentebolea thanks for the suggestions. I pushed some of the changes you mentioned.

The other changes that I did not push do make sense, but I would prefer leaving this PR consistent with the other utility scripts (the functionality and build system is mimicking bpls -- namely using PROJECT_BINARY_DIR, not using quotes for paths, not using syntax=yaml, not using the maybe convention). I think all these changes need to be done to all the scripts in source/utils not just in bpcmp.

You are right that this would break consistency which is very important, if you have extra cycles it would be cool to have the other utils things "Corrected" otherwise I am to strong about it. As for the maybe_ convention is through out the adios2 cmake codebase. It is a good way to point out, hey this is maybe not assigned.

Copy link
Collaborator

@vicentebolea vicentebolea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few code change request in the Python code. Looks goods, feel free to take the suggestions

source/utils/bpcmp/bpcmp.py Show resolved Hide resolved
source/utils/bpcmp/bpcmp.py Outdated Show resolved Hide resolved
source/utils/bpcmp/bpcmp.py Outdated Show resolved Hide resolved
@vicentebolea
Copy link
Collaborator

On the funny side of things I was reading this article https://www.chiark.greenend.org.uk/~sgtatham/quasiblog/code-review-antipatterns/ . I believe that I might have been guilty of some of those antipatterns haha. Ill take those lessons learned in my next reviews 👼

Co-authored-by: Vicente Adolfo Bolea Sanchez <vicente.bolea@kitware.com>
@anagainaru anagainaru merged commit 8dd2aef into ornladios:master Aug 31, 2024
38 checks passed
@anagainaru anagainaru deleted the bpcmp branch August 31, 2024 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants