Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured validation results #1104

Merged
merged 124 commits into from
Oct 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
124 commits
Select commit Hold shift + click to select a range
696e7fe
Define ValidationResult class
jwodder Aug 18, 2022
037dc5e
RF: Initial steps to start RFing validate-bids to use ValidationResul…
yarikoptic Aug 26, 2022
3bc01ba
Improved class usage and added test suite
TheChymera Sep 1, 2022
43ab1f5
Debugging
TheChymera Sep 2, 2022
1f88e4e
Debugging out of place cli invocation error
TheChymera Sep 2, 2022
8e8e4b7
Starting to deprecate `print_validation_results()`
TheChymera Sep 2, 2022
8561215
comment for review
TheChymera Sep 2, 2022
f082382
Incremental deprecation of direct upstream output format
TheChymera Sep 2, 2022
dd77af2
Review comments
TheChymera Sep 2, 2022
aefb337
Fixed test log path
TheChymera Sep 6, 2022
1c8ddd3
BIDS root detection
TheChymera Sep 7, 2022
cb6d494
Assigning dandiset and (BIDS) dataset path
TheChymera Sep 7, 2022
e2b21ed
removed comment lines
TheChymera Sep 7, 2022
8af720a
Integrated metadata recording
TheChymera Sep 7, 2022
ea4e790
Processing input path type
TheChymera Sep 7, 2022
db175a2
Indentation fix for lint check
TheChymera Sep 7, 2022
a90c99a
Printing output based on severity
TheChymera Sep 8, 2022
3304b45
Message attribute is empty string when nonexistent — not none
TheChymera Sep 8, 2022
2a9377e
Force posix path for windows tests
TheChymera Sep 8, 2022
6a0875b
Debugging windows tests
TheChymera Sep 8, 2022
4f248b9
Typing fixes
TheChymera Sep 8, 2022
a564d0d
Trying more typing fixes
TheChymera Sep 8, 2022
cc57bfc
Debugging typing
TheChymera Sep 9, 2022
15e1cfb
Removed unneeded module
TheChymera Sep 9, 2022
89a070a
Use assert to ensure that Optional becomes non-optional for type chec…
yarikoptic Sep 9, 2022
fe238b6
Further corrected type
TheChymera Sep 12, 2022
25194ee
Filtered download for valid testdata
TheChymera Sep 12, 2022
e630fb8
Type correction
TheChymera Sep 12, 2022
72f47a0
Update error tests
TheChymera Sep 12, 2022
b51e92e
fixing windows fixture removal error
TheChymera Sep 12, 2022
94a1793
Debug assertion error fix
TheChymera Sep 12, 2022
ec59dad
Using error fixture for error tests
TheChymera Sep 12, 2022
43e17e9
Using correct fixture
TheChymera Sep 13, 2022
4f2e4ae
Disabled debugging code
TheChymera Sep 13, 2022
bf5c0f1
Debug assertion error
TheChymera Sep 14, 2022
697b24d
Ensuring that metadata is non-empty
TheChymera Sep 14, 2022
bb52d56
Making example invalid dataset truly invalid
TheChymera Sep 14, 2022
ee9492a
Windows debugging
TheChymera Sep 14, 2022
c828628
Further winerror 5 debugging
TheChymera Sep 15, 2022
2f1c6a7
Using base exception instead of specific one
TheChymera Sep 15, 2022
155e557
Not using auto-cleaning temp dir via while
TheChymera Sep 15, 2022
f30ad99
Removed debugging code
TheChymera Sep 16, 2022
676cbc3
More verbose error printing
TheChymera Sep 16, 2022
9a8861b
Using entire valid dataset whitelist for testing
TheChymera Sep 16, 2022
74cc797
Log creation test using whitelist instead of hard-coded dataset
TheChymera Sep 16, 2022
db6a81f
Fixed cmd_validate tests
TheChymera Sep 16, 2022
715f70e
Fixed report path test
TheChymera Sep 16, 2022
fb5847c
Testing the command line for all error cases
TheChymera Sep 16, 2022
e0027c3
Removed deprecated comments
TheChymera Sep 17, 2022
453cd65
Populating dandi and bids dataset paths
TheChymera Sep 17, 2022
786729a
Simple fixes suggested by jwodder
TheChymera Sep 19, 2022
d0070bd
Fixed typing
TheChymera Sep 20, 2022
b205aaf
Correcting type
TheChymera Sep 21, 2022
d04f4bd
Using pytest temp path generation
TheChymera Sep 20, 2022
7d21f4e
Corrected function parameters
TheChymera Sep 21, 2022
4ba4b7b
More parsimonious error reporting
TheChymera Sep 26, 2022
23e54be
Global import
TheChymera Sep 26, 2022
28e7374
Using built-in find_parent_directory_containing function
TheChymera Sep 26, 2022
3702eac
Dropped unused import
TheChymera Sep 26, 2022
2d9723c
Re-added placeholder metadata to satisfy assert
TheChymera Sep 26, 2022
61e5d3a
No longer recomputing dandi/data-set path for files in same directory
TheChymera Sep 26, 2022
1711354
Trying to make sure typechecking understands logic
TheChymera Sep 26, 2022
122581a
Removed redundant code
TheChymera Sep 27, 2022
e25375b
No placeholder metadata
TheChymera Sep 27, 2022
e292718
Unified error reporting across validation types
TheChymera Sep 28, 2022
c4a36ca
Improve style
TheChymera Sep 28, 2022
cf34de3
Import dandi.validate.Severity in local scope to prevent heavy dep load
TheChymera Sep 28, 2022
e363670
Unified error reporting logic and draft grouping implementation.
TheChymera Sep 29, 2022
6a56acd
Corrected warning color
TheChymera Sep 29, 2022
b4ad1bc
BROKEN — reusage of generic validator output for NWB
TheChymera Sep 29, 2022
de27873
RF+BF: address circular import, unify more of validations into Valida…
yarikoptic Sep 29, 2022
6b7e680
ENH: add within_asset_paths to point within files/assets
yarikoptic Sep 29, 2022
ccf9491
BF: map importance to our severity based on enum"s name
yarikoptic Sep 29, 2022
b3106f4
ENH+RF: make sure we issue a warning about nwbinspector only once
yarikoptic Sep 29, 2022
49c9a0a
Integrating pydantic error into custom object.
TheChymera Sep 30, 2022
9861be1
Corrected function return
TheChymera Sep 30, 2022
f387998
Parsing further errors into the custom dandi validation object
TheChymera Sep 30, 2022
04ee73c
Trying to fix types
TheChymera Sep 30, 2022
d161b46
Type fixing
TheChymera Sep 30, 2022
482246a
New object structure for zarr validation
TheChymera Sep 30, 2022
1691547
Further type fixes
TheChymera Oct 3, 2022
889d2e7
Attempting to fix typing errors
TheChymera Oct 3, 2022
dd03eb5
Fixed more typing errors
TheChymera Oct 3, 2022
b2f4c2e
Fixed list unpacking issue discussed in meeting
TheChymera Oct 3, 2022
0d77a0d
Further typing fixes
TheChymera Oct 3, 2022
e263bf4
Defined missing and removed unused variable
TheChymera Oct 3, 2022
8b58dce
Trying to standardize error output
TheChymera Oct 3, 2022
f2f577d
Fixed syntax
TheChymera Oct 3, 2022
5c1890c
Expanded list comprehension to assert attribute presence
TheChymera Oct 4, 2022
e30809d
No more typing errors
TheChymera Oct 4, 2022
87048c7
Removed unused imports
TheChymera Oct 4, 2022
d4b35b4
Fixed weird typing issues only appearing on GH
TheChymera Oct 4, 2022
acd78b7
Placeholder for undefined variable string
TheChymera Oct 4, 2022
cc2ad26
Fixed assertion in test to match object output
TheChymera Oct 4, 2022
b5e16b1
Integrate OSError into object-based validation reporting
TheChymera Oct 4, 2022
9f49cc9
Standardized dandi import
TheChymera Oct 4, 2022
382c3ae
General-purpose exception catching
TheChymera Oct 4, 2022
8d71483
More standardized error IDs
TheChymera Oct 4, 2022
2d4e284
Typing fix for exception message
TheChymera Oct 5, 2022
2c8d034
Description/testing of command line validation `--grouping` option.
TheChymera Oct 5, 2022
80db60e
Workaround e.message typechecking dillema
yarikoptic Oct 5, 2022
af48872
Removed unneeded imports, added tests for broken nwb dataset
TheChymera Oct 5, 2022
15ce3c7
Removed deprecated comment
TheChymera Oct 5, 2022
8c430f6
Added click decorator for grouping parameter in nwb validator
TheChymera Oct 5, 2022
6348f64
Standardized CLI NWB validator wrt BIDS validator draft and added tests
TheChymera Oct 5, 2022
ed5f353
Reporting lack of errors explicitly in CLI validation
TheChymera Oct 5, 2022
65ae2c9
Simplified git command execution
TheChymera Oct 10, 2022
b4d5dcd
Using click option specification
TheChymera Oct 11, 2022
624e3b0
List comprehension for more compact code.
TheChymera Oct 11, 2022
41df99d
Removed superfluous list bracketing
TheChymera Oct 11, 2022
09e526b
clarified comment, removed commented code
TheChymera Oct 11, 2022
e38577b
Extracted validation variant common code into internal function
TheChymera Oct 11, 2022
456fafd
Importing outside of test functions
TheChymera Oct 11, 2022
857d7b8
Updated grouping for vanilla validator
TheChymera Oct 11, 2022
56973d3
Linting fixes
TheChymera Oct 17, 2022
4eb0c70
Importing cast function from typing
TheChymera Oct 17, 2022
404968b
Type annotations
TheChymera Oct 17, 2022
6ac5b99
Fixed typing
TheChymera Oct 17, 2022
abc8e8d
Removed superfluous print call
TheChymera Oct 19, 2022
f80b5b4
Fixes proposed by Yaro
TheChymera Oct 19, 2022
011b45d
Not duplicating code for error subclass
TheChymera Oct 19, 2022
76ddf06
Orthography
TheChymera Oct 19, 2022
2c7dd6a
We need newer bidsschematools
TheChymera Oct 27, 2022
6205109
Type fix
TheChymera Oct 27, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 0 additions & 77 deletions dandi/bids_utils.py

This file was deleted.

194 changes: 138 additions & 56 deletions dandi/cli/cmd_validate.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
import logging
import os
from typing import List, cast

import click

from .base import devel_debug_option, devel_option, lgr, map_to_click_exceptions
from .base import devel_debug_option, devel_option, map_to_click_exceptions
from ..utils import pluralize
from ..validate_types import Severity


@click.command()
@click.option(
"--schema", help="Validate against new BIDS schema version", metavar="VERSION"
"--schema", help="Validate against new BIDS schema version.", metavar="VERSION"
)
@click.option(
"--report-path",
Expand All @@ -20,13 +23,21 @@
is_flag=True,
help="Whether to write a report under a unique path in the DANDI log directory.",
)
@click.option(
"--grouping",
"-g",
yarikoptic marked this conversation as resolved.
Show resolved Hide resolved
help="How to group error/warning reporting.",
type=click.Choice(["none", "path"], case_sensitive=False),
default="none",
)
@click.argument("paths", nargs=-1, type=click.Path(exists=True, dir_okay=True))
@map_to_click_exceptions
def validate_bids(
paths,
schema,
report,
report_path,
grouping="none",
):
"""Validate BIDS paths.

Expand All @@ -36,20 +47,16 @@ def validate_bids(
dandi validate-bids /my/path
"""

from ..bids_utils import is_valid, report_errors
from ..validate import validate_bids as validate_bids_

validator_result = validate_bids_(
validator_result = validate_bids_( # Controller
*paths,
report=report,
report_path=report_path,
schema_version=schema,
)
valid = is_valid(validator_result)
report_errors(validator_result)

if not valid:
raise SystemExit(1)
_process_issues(validator_result, grouping)


@click.command()
Expand All @@ -60,13 +67,32 @@ def validate_bids(
default=False,
is_flag=True,
)
@click.option(
"--grouping",
"-g",
help="How to group error/warning reporting.",
type=click.Choice(["none", "path"], case_sensitive=False),
default="none",
)
@click.argument("paths", nargs=-1, type=click.Path(exists=True, dir_okay=True))
@devel_debug_option()
@map_to_click_exceptions
def validate(paths, schema=None, devel_debug=False, allow_any_path=False):
def validate(
paths,
schema=None,
devel_debug=False,
allow_any_path=False,
grouping="none",
):
"""Validate files for NWB and DANDI compliance.

Exits with non-0 exit code if any file is not compliant.

Parameters
----------

grouping : str, optional
A string which is either "", "error", or "path" — "error" not yet implemented.
"""
from ..pynwb_utils import ignore_benign_pynwb_warnings
from ..validate import validate as validate_
Expand All @@ -84,61 +110,117 @@ def validate(paths, schema=None, devel_debug=False, allow_any_path=False):
# at this point, so we ignore it although ideally there should be a formal
# way to get relevant warnings (not errors) from PyNWB
ignore_benign_pynwb_warnings()
view = "one-at-a-time" # TODO: rename, add groupped

all_files_errors = {}
nfiles = 0
for path, errors in validate_(
validator_result = validate_(
*paths,
schema_version=schema,
devel_debug=devel_debug,
allow_any_path=allow_any_path,
):
nfiles += 1
if view == "one-at-a-time":
display_errors(path, errors)
all_files_errors[path] = errors

if view == "groupped":
# TODO: Most likely we want to summarize errors across files since they
# are likely to be similar
# TODO: add our own criteria for validation (i.e. having needed metadata)

# # can't be done since fails to compare different types of errors
# all_errors = sum(errors.values(), [])
# all_error_types = []
# errors_unique = sorted(set(all_errors))
# from collections import Counter
# # Let's make it
# print(
# "{} unique errors in {} files".format(
# len(errors_unique), len(errors))
# )
raise NotImplementedError("TODO")

files_with_errors = [f for f, errors in all_files_errors.items() if errors]

if files_with_errors:
click.secho(
"Summary: Validation errors in {} out of {} files".format(
len(files_with_errors), nfiles
),
bold=True,
fg="red",
)

_process_issues(validator_result, grouping)


def _process_issues(validator_result, grouping):

issues = [i for i in validator_result if i.severity]

purviews = [
list(filter(bool, [i.path, i.path_regex, i.dataset_path]))[0] for i in issues
]
if grouping == "none":
display_errors(
purviews,
[i.id for i in issues],
[i.severity for i in issues],
[i.message for i in issues],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what meaning of `purview` name here? the ones from dict not sure if match.
❯ dict purview
3 definitions found

From The Collaborative International Dictionary of English v.0.48 [gcide]:

  Purview \Pur"view\, n. [OF. purveu, pourveu, F. pourvu,
     provided, p. p. of OF. porveoir, F. pourvoir. See {Purvey},
     {View}, and cf. {Proviso}.]
     1.
        (a) (Law) The body of a statute, or that part which begins
            with " Be it enacted, " as distinguished from the
            preamble. --Cowell.
        (b) Hence: The limit or scope of a statute; the whole
            extent of its intention or provisions. --Marshall.
            [1913 Webster]
  
                  Profanations within the purview of several
                  statutes.                         --Bacon.
            [1913 Webster]
  
     2. Limit or sphere of authority; scope; extent.
        [1913 Webster]
  
              In determining the extent of information required in
              the exercise of a particular authority, recourse
              must be had to the objects within the purview of
              that authority.                       --Madison.
        [1913 Webster]

From WordNet (r) 3.0 (2006) [wn]:

  purview
      n 1: the range of interest or activity that can be anticipated;
           "It is beyond the horizon of present knowledge" [syn:
           {horizon}, {view}, {purview}]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not to make display_errors just to get a list of them instead of breaking them apart like this just to zip them back together inside that function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what meaning of purview name here? the ones from dict not sure if match.

https://www.merriam-webster.com/dictionary/purview
1b or 2. it's basically a synonym of scope since scope was taken.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not to make display_errors just to get a list of them instead of breaking them apart like this just to zip them back together inside that function?

Well, the idea was to have a separate function which does as little as possible other than click.secho(). The lists need to be populated differently depending on the grouping anyway, and additional edge cases will pop up as we have e.g. multi-path errors, so we do all that in the parent function and just echo depending on the list lengths. If you want I can roll it all up in a single function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel that it is odd design here, not ready ATM to give a complete solution, could at least be display_errors(purviews, issues) or alike, with all needed unzipping/zipping done inside if needed. but ok, we could RF later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really suggest leaving it like this for now, and see if moving that code bit to another function is still needed when we merge validation functions.

)
raise SystemExit(1)
elif grouping == "path":
for purview in purviews:
applies_to = [
i for i in issues if purview in [i.path, i.path_regex, i.dataset_path]
]
display_errors(
[purview],
[i.id for i in applies_to],
[i.severity for i in applies_to],
[i.message for i in applies_to],
)
else:
click.secho(
"Summary: No validation errors among {} file(s)".format(nfiles),
bold=True,
fg="green",
raise NotImplementedError(
"The `grouping` parameter values currently supported are " "path or None."
)

validation_errors = [i for i in issues if i.severity == Severity.ERROR]

def display_errors(path, errors):
if not errors:
lgr.info("%s: ok", path)
if validation_errors:
raise SystemExit(1)
else:
click.secho("No errors found.", fg="green")


def _get_severity_color(severities):

if Severity.ERROR in severities:
return "red"
elif Severity.WARNING in severities:
return "yellow"
else:
return "blue"


def display_errors(
purviews: List[str],
errors: List[str],
severities: List[Severity],
messages: List[str],
) -> None:
"""
Unified error display for validation CLI, which auto-resolves grouping logic based on
the length of input lists.


Notes
-----
* There is a bit of roundabout and currently untestable logic to deal with potential cases
where the same error has multiple error message, could be removed in the future and removed
by assert if this won't ever be the case.
"""

if all(len(cast(list, i)) == 1 for i in [purviews, errors, severities, messages]):
fg = _get_severity_color(severities)
error_message = f"[{errors[0]}] {purviews[0]} — {messages[0]}"
click.secho(error_message, fg=fg)
elif len(purviews) == 1:
group_message = f"{purviews[0]}: {pluralize(len(errors), 'issue')} detected."
fg = _get_severity_color(severities)
click.secho(group_message, fg=fg)
for error, severity, message in zip(errors, severities, messages):
error_message = f" [{error}] {message}"
fg = _get_severity_color([severity])
click.secho(error_message, fg=fg)
elif len(errors) == 1:
fg = _get_severity_color(severities)
group_message = (
f"{errors[0]}: detected in {pluralize(len(purviews), 'purviews')}"
)
if len(set(messages)) == 1:
error_message += f" — {messages[0]}."
click.secho(group_message, fg=fg)
for purview, severity in zip(purviews, severities):
error_message = f" {purview}"
fg = _get_severity_color([severity])
click.secho(error_message, fg=fg)
else:
error_message += "."
for purview, severity, message in zip(purviews, severities, messages):
error_message = f" {purview} — {message}"
fg = _get_severity_color([severity])
click.secho(error_message, fg=fg)
else:
lgr.error("%s: %d error(s)", path, len(errors))
for error in errors:
lgr.error(" %s", error)
for purview, error, severity, message in zip(
purviews, errors, severities, messages
):
fg = _get_severity_color([severity])
error_message = f"[{error}] {purview} — {message}"
click.secho(error_message, fg=fg)
Loading