Skip to content

Latest commit

 

History

History
88 lines (68 loc) · 4.25 KB

File metadata and controls

88 lines (68 loc) · 4.25 KB

Great Expectations - Run Expectation Suite

Description

The GE - Run Expectation Suite custom step enables SAS Studio Flow users to utilize Python Great Expectations (GE) to compare datasets based on the expectation suites created by the GE - Generate Expectation Suite custom step. This custom step requires the output files of the GE - Generate Expectation Suite step (an expectation suite .json file and a data context .yml file), and the location where these are stored. This step returns a series of six tables reporting on how the input data preformed against the expectation suite.

User Interface

  • Generate Expectations tab

    Standalone mode Flow mode
  • About tab

Requirements

  • Tested on Viya version Stable 2023.04.
  • Python's great_expectatons library version v0.16.8 or after.
  • Python's pandas library version 1.5.3.
  • Python's json library.
  • Python's os library.
  • Python's numpy library version 1.23.5.
  • Python's datetime library.
  • The output .json expectation suite file from running GE - Generate Expectation Suite
  • The output great_expectations.yml file from running GE - Generate Expectation Suite
  • This Custom Step requires that Python be deployed and available in your SAS environment. The easiest way to achieve this is to enable and configure sas-pyconfig job which also brings along the GE package, following the steps indicated in this article.
  • Alternatively, one can run this custom step by first pip installing Python and GE. Follow the steps below to get GE into your environment:
import pip
import os
os.getcwd()
pip.main(['install','great_expectations','--target=.'])
sys.path.append('./local/bin')
sys.path

Usage

Use the following code to get example data sets for use with the GE - Generate Expectation Suite and GE - Run Expectation Suite custom steps.

/* create example data for rule generation and rule execution */
/* Example data sets will be created from sashelp.cars */

/* set values for training and validation */
%let propTrain = 0.7;         /* proportion of training data */
%let propValid = 0.3;         /* proportion of validation data */

/* create a separate data set for each role */
data Train Validate;
array p[2] _temporary_ (&propTrain, &propValid);
set Sashelp.Cars;
call streaminit(12);         /* set random number seed */
/* RAND("table") returns 1, 2, or 3 with specified probabilities */
_k = rand("Table", of p[*]);
if      _k = 1 then output Train;
else if _k = 2 then output Validate;
drop _k;
run;

*/ Train should be used with GE - Generate Expecation Suite */
*/ Validate should be used with GE - Run Expectation Suite */
  • Parameters

    • Folder selector: Select the directory where the expectation suite .json and the great_expectations.yml files output by the GE - Generate Expectation Suite custom step are stored.
    • Expectation suite name: The name of the expectation suite that should be used, omitting the file extension .json, so if the expectation suite has been saved as taxi_exp.json the custom step expects an input of taxi_exp.
  • Outputs

    • Suite report: Summary of statistics on how the data performed against the expectation suite (number of expectation evaluated, number of expectations passed, number of expectations failed, percent success).
    • Good expectations: Summary of which expectations (and their corresponding columns) that were successful.
    • Bad expectations: Summary of which expectations (and their corresponding columns) that were unsuccessful.
    • Good records: Contains records that meet the expectation suite's criteria.
    • Bad records: Contains records that do not meet the expectation suite's criteria.
    • Exceptions: Details statistics of the failed records and what rules have failed.

Change Log

  • Version 1.1 (19OCT2023)
    • Removed unsupported sd2df parameters, added in clear statements to keep user directory clean
  • Version 1.0 (12OCT2023)
    • Initial version