-
Notifications
You must be signed in to change notification settings - Fork 538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Row Level Results #451
Conversation
@@ -41,4 +41,6 @@ case class MaxLength(column: String, where: Option[String] = None) | |||
} | |||
|
|||
override def filterCondition: Option[String] = where | |||
|
|||
private def criterion: Column = length(conditionalSelection(column, where)).cast(DoubleType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a private method here and a public method in Completeness. If it is required in each analyzer, should it be part of a base class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the other one is used by a test that constructs an expected object and compares the result of Completeness to that object. It shouldn't be public because I don't think anything in the code should invoke it. I'll make it package-private inside deequ and mark it as @VisibleForTesting
to point out it's an internal detail for the analyzer
Opening this against master instead #452 |
982e54d
to
4147997
Compare
Thanks for updating the branch to master. LGTM. |
* Demo implementation of returning row-level results from metrics * Row-level results from VerificationResult * Row-level results from VerificationResult * Fix some tests by expecting a full result column * Fix Deequ tests to expect full Completeness result * Checks can return row-level result column names, if any * Make Analyzer and Constraint classes serializable explicitly * Refactor tests * Move row-level management to trait * MaxLength analyzer returns length of each record * Refactor VerificationResult to correctly match Metrics to Analyzers * VerificationResult aggregates all columns for a check * Return row-level results for two constraints * Improve naming and comments --------- Co-authored-by: Yannis Mentekidis <mentekid@amazon.com>
* Demo implementation of returning row-level results from metrics * Row-level results from VerificationResult * Row-level results from VerificationResult * Fix some tests by expecting a full result column * Fix Deequ tests to expect full Completeness result * Checks can return row-level result column names, if any * Make Analyzer and Constraint classes serializable explicitly * Refactor tests * Move row-level management to trait * MaxLength analyzer returns length of each record * Refactor VerificationResult to correctly match Metrics to Analyzers * VerificationResult aggregates all columns for a check * Return row-level results for two constraints * Improve naming and comments --------- Co-authored-by: Yannis Mentekidis <mentekid@amazon.com>
* Demo implementation of returning row-level results from metrics * Row-level results from VerificationResult * Row-level results from VerificationResult * Fix some tests by expecting a full result column * Fix Deequ tests to expect full Completeness result * Checks can return row-level result column names, if any * Make Analyzer and Constraint classes serializable explicitly * Refactor tests * Move row-level management to trait * MaxLength analyzer returns length of each record * Refactor VerificationResult to correctly match Metrics to Analyzers * VerificationResult aggregates all columns for a check * Return row-level results for two constraints * Improve naming and comments --------- Co-authored-by: Yannis Mentekidis <mentekid@amazon.com>
* Demo implementation of returning row-level results from metrics * Row-level results from VerificationResult * Row-level results from VerificationResult * Fix some tests by expecting a full result column * Fix Deequ tests to expect full Completeness result * Checks can return row-level result column names, if any * Make Analyzer and Constraint classes serializable explicitly * Refactor tests * Move row-level management to trait * MaxLength analyzer returns length of each record * Refactor VerificationResult to correctly match Metrics to Analyzers * VerificationResult aggregates all columns for a check * Return row-level results for two constraints * Improve naming and comments --------- Co-authored-by: Yannis Mentekidis <mentekid@amazon.com>
Issue #, if available:
N/A
Description of changes:
This is an early version of the row-level results feature. I have enabled row-level results in three constraints:
IsComplete
,HasCompleteness
, andMaxLength
.This requires the definition and change of certain classes defined in Deequ both to expose information previously not available, and to allow distinguishing between different analyzers' results.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.