Enhance error reporting and handling for large JSON validations #443

NiklasA · 2024-09-26T17:26:46Z

First of all, thank you very much for the great work and for providing the datacontract-cli!

While using it, we noticed three topics:

We use the datacontract-cli to check the structure and content of large JSON files. Currently, the fastjsonschema.compile process stops at the first error, which means that if there are multiple errors in the JSON file, the datacontract-cli has to be executed several times until all errors are fixed. We also considered executing the data_contract.test() for each object in the JSON file, but the overhead is too large, resulting in poor performance. Would it be possible to adjust the following call:

validate = fastjsonschema.compile(
    schema,
    formats={"uuid": r"^[0-9a-fA-F]{8}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{12}$"},
)

so that it works more like the following approach:

validate = fastjsonschema.compile(
    schema,
    formats={"uuid": r"^[0-9a-fA-F]{8}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{12}$"},
)

def validate_array(data):
    errors = []
    for index, item in enumerate(data):
        try:
            validate(item)
        except JsonSchemaException as e:
            # append error to list
            errors.append({"index": index, "error": str(e)})
    return errors

Additionally, we noticed that in data_contract.py on line 202, the model name is not included in the list of executed checks.

run.checks.append(
    Check(type=e.type, name=e.name, result=e.result, engine=e.engine, reason=e.reason, **model=e.model**)
)

Lastly, it would be really helpful for debugging purposes if we could identify which JSON object in an array caused the error. Could the validate_json_stream method be extended so that it identifies the primary key of the related model in the data contract and returns the value from the found attribute along with the error if needed?

Let me know if I can help to create a pull request for those topics!

Best regards
Niklas

The text was updated successfully, but these errors were encountered:

jochenchrist · 2024-10-10T06:35:18Z

Hi @NiklasA,

Thanks for your feedback and these improvement ideas.

I understand the idea and it would certainly be doable. I hesitate a little, what happens when you have millions of JSON records that all have the same schema issue. Would you really like to see millions of errors in the response? We would need to implement a max number of errors (like 100?) here.
Fixed and commited to main.
Agree. Not sure about the primary key, but we can think of an index here in the error message.

You are welcome to contribute a PR for 1 and 3.

NiklasA · 2024-10-12T18:04:13Z

@jochenchrist okay, thank you! I will work on the PR for 1 and 3. It will probably take 3 - 4 weeks because I'm on vacation next week.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance error reporting and handling for large JSON validations #443

Enhance error reporting and handling for large JSON validations #443

NiklasA commented Sep 26, 2024 •

edited

Loading

jochenchrist commented Oct 10, 2024

NiklasA commented Oct 12, 2024

Enhance error reporting and handling for large JSON validations #443

Enhance error reporting and handling for large JSON validations #443

Comments

NiklasA commented Sep 26, 2024 • edited Loading

jochenchrist commented Oct 10, 2024

NiklasA commented Oct 12, 2024

NiklasA commented Sep 26, 2024 •

edited

Loading