Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

or just a question on difference with invocation of pynwb.validate #1321

Open
yarikoptic opened this issue Nov 20, 2020 · 10 comments
Open

or just a question on difference with invocation of pynwb.validate #1321

yarikoptic opened this issue Nov 20, 2020 · 10 comments
Labels
category: bug errors in the code or code behavior

Comments

@yarikoptic
Copy link
Contributor

yarikoptic commented Nov 20, 2020

@t-b asked and provided a sample file on dandi slack channel, which seems to validate "fine" if

$> python3 -m pynwb.validate 496035.05.02.nwb
Validating 496035.05.02.nwb against cached namespace information using namespace ndx-mies.
 - no errors found.

but errors are returned if

$> python3 -c 'import pynwb; print(pynwb.validate(pynwb.NWBHDF5IO("496035.05.02.nwb", "r")))'                      
[VectorIndex/description (general/intracellular_ephys/sweep_table/series_index): argument missing, VectorIndex (general/intracellular_ephys/sweep_table/series_index): incorrect type - expected 'uint8', got 'int32', VectorIndex/description (general/intracellular_ephys/sweep_table/series_index): argument missing, VectorIndex (general/intracellular_ephys/sweep_table/series_index): incorrect type - expected 'uint8', got 'int32']
and that is the ones we consider to be errors and report by `dandi validate`
$> dandi validate 496035.05.02.nwb           
496035.05.02.nwb: 5 error(s)
  VectorIndex/description (general/intracellular_ephys/sweep_table/series_index): argument missing
  VectorIndex (general/intracellular_ephys/sweep_table/series_index): incorrect type - expected 'uint8', got 'int32'
  VectorIndex/description (general/intracellular_ephys/sweep_table/series_index): argument missing
  VectorIndex (general/intracellular_ephys/sweep_table/series_index): incorrect type - expected 'uint8', got 'int32'
  Required field 'subject_id' has no value
Summary: Validation errors in 1 out of 1 files

the last one is "dandi specific" since we do require subject_id value.

so why is such a difference and either we should somehow become more lenient in dandi?

edit 1: FWIW the same with load_namespaces=True
$> python3 -c 'import pynwb; print(pynwb.validate(pynwb.NWBHDF5IO("496035.05.02.nwb", "r", load_namespaces=True)))'
/home/yoh/proj/dandi/dandi-cli/venvs/dev3/lib/python3.8/site-packages/hdmf/spec/namespace.py:470: UserWarning: ignoring namespace 'hdmf-common' because it already exists
  warn("ignoring namespace '%s' because it already exists" % ns['name'])
/home/yoh/proj/dandi/dandi-cli/venvs/dev3/lib/python3.8/site-packages/hdmf/spec/namespace.py:470: UserWarning: ignoring namespace 'core' because it already exists
  warn("ignoring namespace '%s' because it already exists" % ns['name'])
[VectorIndex/description (general/intracellular_ephys/sweep_table/series_index): argument missing, VectorIndex (general/intracellular_ephys/sweep_table/series_index): incorrect type - expected 'uint8', got 'int32', VectorIndex/description (general/intracellular_ephys/sweep_table/series_index): argument missing, VectorIndex (general/intracellular_ephys/sweep_table/series_index): incorrect type - expected 'uint8', got 'int32']
@yarikoptic yarikoptic added the category: bug errors in the code or code behavior label Nov 20, 2020
@t-b
Copy link
Collaborator

t-b commented Nov 20, 2020

You can also just tell the validation routine to not use the cached spec and the validation error can be reproduced:

$ python -m pynwb.validate --no-cached-namespace 496035.05.02.nwb
Validating 496035.05.02.nwb against pynwb namespace information using namespace core.
 - found the following errors:
VectorIndex/description (general/intracellular_ephys/sweep_table/series_index): argument missing
VectorIndex (general/intracellular_ephys/sweep_table/series_index): incorrect type - expected 'uint8', got 'int32'
VectorIndex/description (general/intracellular_ephys/sweep_table/series_index): argument missing
VectorIndex (general/intracellular_ephys/sweep_table/series_index): incorrect type - expected 'uint8', got 'int32'
(base)
thomas@Win10-Thomas-PC MSYS /e
$ python -m pynwb.validate --cached-namespace 496035.05.02.nwb
Validating 496035.05.02.nwb against cached namespace information using namespace ndx-mies.
 - no errors found.
(base)

@t-b
Copy link
Collaborator

t-b commented Nov 20, 2020

The series_index validation issue was introduced in hdmf-dev/hdmf-common-schema@2851b46. Changing from a signed to unsigned integer is strictly speaking a breaking change.

I would say that the files MIES creates are still valid as they use a cached schema. But of course I can be persuaded otherwise.

@t-b
Copy link
Collaborator

t-b commented Nov 20, 2020

Another odd thing is that the old type (int32) can hold all values of uint8 so even when validating against the stock pynwb schema this should not give a warning. AFAIR we allow bigger types instead of smaller ones.

@t-b
Copy link
Collaborator

t-b commented Nov 21, 2020

@yarikoptic

The validation error

Failed to read metadata: Could not construct IntracellularElectrode object due to: IntracellularElectrode.__init__: incorrect type for 'description' (got 'bytes', expected 'str')

is from dandi or?

Can you explain what is wrong here?

The HDF5 file type of general/intracellular_ephys/electrode_0/description is text according to the schema 1.
In the HDF5 file I'm storing that as

      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLPAD;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }

which is supported in all the other places where text is set as type.

@t-b
Copy link
Collaborator

t-b commented Nov 21, 2020

Sample file: HardwareTests-V2-MIES-187d9143.zip

@t-b
Copy link
Collaborator

t-b commented Nov 21, 2020

@yarikoptic I'm doing validation using pynwb with subprocess, see https://github.com/AllenInstitute/MIES/blob/bcb8f4ff185394729f99f5b31b3dfb9f18acbd59/tools/nwb-read-tests/nwbv2-read-test.py#L22. This at least avoids that pynwb loads the stock schema. The code might be brittle as I'm only using that in a docker container without real world users.

@t-b
Copy link
Collaborator

t-b commented Nov 26, 2020

@yarikoptic @rly How do we proceed here? Should the fix be in pynwb or in dandi?

@rly
Copy link
Contributor

rly commented Nov 30, 2020

@t-b @yarikoptic To summarize, if I understand correctly, the file is valid against the schema that is cached, but the schema was updated and the file is not valid against the latest schema. dandi validate uses the pynwb.validate(...) function which loads the schema from the active PyNWB installation, which may or may not be the latest schema or the one cached by the file. python -m pynwb.validate validates differently and provides more options than the pynwb.validate(...) function, including allowing validation against the cached schema (default).

In my opinion, DANDI should set a requirement for the minimum NWB version for validation, and it should validate against the cached schema if the version is >= the minimum requirement. It should not validate against the latest schema because older files are not guaranteed to be valid against the latest schema.

After upload of a given dandiset, DANDI should verify that all files in the dandiset use the same schema version of NWB and then display that version on the dandiset page so that users know whether the files can be read by tools that have a min version requirement.

Separately, PyNWB should include an option in the pynwb.validate(...) function where validation is done against the cached schema as opposed to the schema from the PyNWB installation. For the time being, DANDI should use the command-line interface python -m pynwb.validate to validate a file against the cached schema. If that is unworkable or inconvenient, I can provide sample code to run before pynwb.validate(...) to get it to validate against the cached schema.

@rly
Copy link
Contributor

rly commented Nov 30, 2020

Failed to read metadata: Could not construct IntracellularElectrode object due to: IntracellularElectrode.__init__: incorrect type for 'description' (got 'bytes', expected 'str')

This looks like an issue with compatibility with h5py 3.0 where strings are read as bytes instead of str. Which version of h5py was installed?

@t-b
Copy link
Collaborator

t-b commented Nov 30, 2020

This looks like an issue with compatibility with h5py 3.0 where strings are read as bytes instead of str. Which version of h5py was installed?

Yes this was a h5py issue indeed, going back to 2.10.0 solved it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: bug errors in the code or code behavior
Projects
None yet
Development

No branches or pull requests

3 participants