Checkpoints and Actions
#
IntroductionAPI note
As part of the new modular expectations API in Great Expectations, Validation Operators are evolving into Checkpoints. At some point in the future Validation Operators will be fully deprecated.
The batch.validate()
method evaluates one Batch of data against one Expectation Suite and returns a dictionary of
Validation Results. This is sufficient when you explore your data and get to know Great Expectations. When deploying
Great Expectations in a real data pipeline, you will typically discover additional needs:
- Validating a group of Batches that are logically related (for example, a Checkpoint for all staging tables).
- Validating a Batch against several Expectation Suites (for example, run three suites to protect a machine learning
model
churn.critical
,churn.warning
,churn.drift
). - Doing something with the Validation Results (for example, saving them for later review, sending notifications in case of failures, etc.).
Checkpoints provide a convenient abstraction for bundling the validation of a Batch (or Batches) of data against an Expectation Suite (or several), as well as the actions that should be taken after the validation. Like Expectation Suites and Validation Results, Checkpoints are managed using a Data Context, and have their own Store which is used to persist their configurations to YAML files. These configurations can be committed to version control and shared with your team.
The classes that implement Checkpoints are in the great_expectations.checkpoint
module.
#
Validation ActionsActions are Python classes with a run
method that takes the result of validating a Batch against an Expectation Suite
and does something with it (e.g., save Validation Results to disk, or send a Slack notification). Classes that implement
this API can be configured to be added to the list of actions used by a particular Checkpoint.
Classes that implement Actions can be found in the great_expectations.checkpoint.actions
module.
#
Checkpoint configurationA Checkpoint uses its configuration to determine what data to validate against which Expectation Suite(s), and what
actions to perform on the Validation Results - these validations and actions are executed by calling a
Checkpoint's run
method (analogous to calling validate
with a single Batch). Checkpoint configurations are very
flexible. At one end of the spectrum, you can specify a complete configuration in a Checkpoint's YAML file, and simply
call my_checkpoint.run()
. At the other end, you can specify a minimal configuration in the YAML file and provide
missing keys as kwargs when calling run
.
At runtime, a Checkpoint configuration has three required and three optional keys, and is built using a combination of the YAML configuration and any kwargs passed in at runtime:
#
Required keysname
: user-selected Checkpoint name (e.g. "staging_tables")config_version
: version number of the Checkpoint configurationvalidations
: a list of dictionaries that describe each validation that is to be executed, including any actions. Each validation dictionary has three required and three optional keys:#
Required keysbatch_request
: a dictionary describing the batch of data to validate (learn more about specifying Batches here: Dividing data assets into Batches)expectation_suite_name
: the name of the Expectation Suite to validate the batch of data againstaction_list
: a list of actions to perform after each batch is validated
#
Optional keysname
: providing a name will allow referencing the validation inside the run by name (e.g. " user_table_validation")evaluation_parameters
: used to define named parameters using Great Expectations Evaluation Parameter syntaxruntime_configuration
: provided to the Validator'sruntime_configuration
(e.g.result_format
)
#
Optional keysclass_name
: the class of the Checkpoint to be instantiated, defaults toCheckpoint
template_name
: the name of another Checkpoint to use as a base templaterun_name_template
: a template to create run names, using environment variables and datetime-template syntax (e.g. " %Y-%M-staging-$MY_ENV_VAR")
#
Configuration defaults and parameter override behaviorCheckpoint configurations follow a nested pattern, where more general keys provide defaults for more specific ones. For
instance, any required validation dictionary keys (e.g. expectation_suite_name
) can be specified at the top-level (
i.e. at the same level as the validations list), serving as runtime defaults. Starting at the earliest reference
template, if a configuration key is re-specified, its value can be appended, updated, replaced, or cause an error when
redefined.
#
Replacedname
module_name
class_name
run_name_template
expectation_suite_name
#
Updatedbatch_request
: at runtime, if a key is re-defined, an error will be thrownaction_list
: actions that share the same user-defined name will be updated, otherwise a new action will be appendedevaluation_parameters
runtime_configuration
#
Appendedaction_list
: actions that share the same user-defined name will be updated, otherwise a new action will be appendedvalidations
#
SimpleCheckpoint classFor many use cases, the SimpleCheckpoint class can be used to simplify the process of specifying a Checkpoint
configuration. SimpleCheckpoint provides a basic set of actions - store Validation Result, store evaluation parameters,
update Data Docs, and optionally, send a Slack notification - allowing you to omit an action_list
from your
configuration and at runtime.
Configurations using the SimpleCheckpoint class can optionally specify four additional top-level keys that customize and extend the basic set of default actions:
site_names
: a list of Data Docs site names to update as part of the update Data Docs action - defaults to "all"slack_webhook
: if provided, an action will be added that sends a Slack notification to the provided webhooknotify_on
: used to define when a notification is fired, according to Validation Result outcome -all
,failure
, orsuccess
. Defaults toall
.notify_with
: a list of Data Docs site names for which to include a URL in any notifications - defaults toall
#
CheckpointResultThe return object of a Checkpoint run is a CheckpointResult object. The run_results
attribute forms the backbone of
this type and defines the basic contract for what a Checkpoint's run
method returns. It is a dictionary where the
top-level keys are the ValidationResultIdentifiers of the Validation Results generated in the run. Each value is a
dictionary having at minimum, a validation_result
key containing an ExpectationSuiteValidationResult and
an actions_results
key containing a dictionary where the top-level keys are names of actions performed after that
particular validation, with values containing any relevant outputs of that action (at minimum and in many cases, this
would just be a dictionary with the action's class_name
).
The run_results
dictionary can contain other keys that are relevant for a specific Checkpoint implementation. For
example, the run_results
dictionary from a WarningAndFailureExpectationSuiteCheckpoint might have an extra key named "
expectation_suite_severity_level" to indicate if the suite is at either a "warning" or "failure" level.
CheckpointResult objects include many convenience methods (e.g. list_data_asset_names
) that make working with
Checkpoint results easier. You can learn more about these methods in the documentation for class: great_expectations.checkpoint.types.checkpoint_result.CheckpointResult
.
Below is an example of a CheckpointResult
object which itself contains ValidationResult
, ExpectationSuiteValidationResult
, and CheckpointConfig
objects.
#
Example CheckpointResult:results = { "run_id": RunIdentifier, "run_results": { ValidationResultIdentifier: { "validation_result": ExpectationSuiteValidationResult, "actions_results": { "<ACTION NAME FOR STORING VALIDATION RESULTS>": { "class": "StoreValidationResultAction" } }, } }, "checkpoint_config": CheckpointConfig, "success": True,}
#
Checkpoint configuration default and override behavior- No nesting
- Nesting with defaults
- Keys passed at runtime
- Using template
- Using SimpleCheckpoint
YAML:
name: my_checkpointconfig_version: 1class_name: Checkpointrun_name_template: "%Y-%M-foo-bar-template-$VAR"validations: - batch_request: datasource_name: taxi_datasource data_connector_name: default_inferred_data_connector_name data_asset_name: yellow_tripdata_sample_2019-01 expectation_suite_name: my_expectation_suite action_list: - name: store_validation_result action: class_name: StoreValidationResultAction - name: store_evaluation_params action: class_name: StoreEvaluationParametersAction - name: update_data_docs action: class_name: UpdateDataDocsAction evaluation_parameters: GT_PARAM: 1000 LT_PARAM: 50000 runtime_configuration: result_format: result_format: BASIC partial_unexpected_count: 20
runtime:
results = context.run_checkpoint(checkpoint_name="my_checkpoint")
YAML:
name: my_checkpointconfig_version: 1class_name: Checkpointrun_name_template: "%Y-%M-foo-bar-template-$VAR"validations: - batch_request: datasource_name: taxi_datasource data_connector_name: default_inferred_data_connector_name data_asset_name: yellow_tripdata_sample_2019-01 - batch_request: datasource_name: taxi_datasource data_connector_name: default_inferred_data_connector_name data_asset_name: yellow_tripdata_sample_2019-02expectation_suite_name: my_expectation_suiteaction_list: - name: store_validation_result action: class_name: StoreValidationResultAction - name: store_evaluation_params action: class_name: StoreEvaluationParametersAction - name: update_data_docs action: class_name: UpdateDataDocsActionevaluation_parameters: GT_PARAM: 1000 LT_PARAM: 50000runtime_configuration: result_format: result_format: BASIC partial_unexpected_count: 20
Runtime:
results = context.run_checkpoint(checkpoint_name="my_checkpoint")
Results:
first_validation_result = list(results.run_results.items())[0][1]["validation_result"]second_validation_result = list(results.run_results.items())[1][1]["validation_result"]
first_expectation_suite = first_validation_result["meta"]["expectation_suite_name"]first_data_asset = first_validation_result["meta"]["active_batch_definition"][ "data_asset_name"]second_expectation_suite = second_validation_result["meta"]["expectation_suite_name"]second_data_asset = second_validation_result["meta"]["active_batch_definition"][ "data_asset_name"]
print(first_expectation_suite)my_expectation_suite
print(first_data_asset)yellow_tripdata_sample_2019-01
print(second_expectation_suite)my_expectation_suite
print(second_data_asset)yellow_tripdata_sample_2019-02
YAML:
name: my_base_checkpointconfig_version: 1class_name: Checkpointrun_name_template: "%Y-%M-foo-bar-template-$VAR"action_list: - name: store_validation_result action: class_name: StoreValidationResultAction - name: store_evaluation_params action: class_name: StoreEvaluationParametersAction - name: update_data_docs action: class_name: UpdateDataDocsActionevaluation_parameters: GT_PARAM: 1000 LT_PARAM: 50000runtime_configuration: result_format: result_format: BASIC partial_unexpected_count: 20
Runtime:
results = context.run_checkpoint( checkpoint_name="my_base_checkpoint", validations=[ { "batch_request": { "datasource_name": "taxi_datasource", "data_connector_name": "default_inferred_data_connector_name", "data_asset_name": "yellow_tripdata_sample_2019-01", }, "expectation_suite_name": "my_expectation_suite", }, { "batch_request": { "datasource_name": "taxi_datasource", "data_connector_name": "default_inferred_data_connector_name", "data_asset_name": "yellow_tripdata_sample_2019-02", }, "expectation_suite_name": "my_other_expectation_suite", }, ],)
Results:
first_validation_result = list(results.run_results.items())[0][1]["validation_result"]second_validation_result = list(results.run_results.items())[1][1]["validation_result"]
first_expectation_suite = first_validation_result["meta"]["expectation_suite_name"]first_data_asset = first_validation_result["meta"]["active_batch_definition"][ "data_asset_name"]second_expectation_suite = second_validation_result["meta"]["expectation_suite_name"]second_data_asset = second_validation_result["meta"]["active_batch_definition"][ "data_asset_name"]
print(first_expectation_suite)my_expectation_suite
print(first_data_asset)yellow_tripdata_sample_2019-01
print(second_expectation_suite)my_other_expectation_suite
print(second_data_asset)yellow_tripdata_sample_2019-02
YAML:
name: my_checkpointconfig_version: 1class_name: Checkpointtemplate_name: my_base_checkpointvalidations: - batch_request: datasource_name: taxi_datasource data_connector_name: default_inferred_data_connector_name data_asset_name: yellow_tripdata_sample_2019-01 expectation_suite_name: my_expectation_suite - batch_request: datasource_name: taxi_datasource data_connector_name: default_inferred_data_connector_name data_asset_name: yellow_tripdata_sample_2019-02 expectation_suite_name: my_other_expectation_suite
Runtime:
results = context.run_checkpoint(checkpoint_name="my_checkpoint")
Results:
first_validation_result = list(results.run_results.items())[0][1]["validation_result"]second_validation_result = list(results.run_results.items())[1][1]["validation_result"]
first_expectation_suite = first_validation_result["meta"]["expectation_suite_name"]first_data_asset = first_validation_result["meta"]["active_batch_definition"][ "data_asset_name"]second_expectation_suite = second_validation_result["meta"]["expectation_suite_name"]second_data_asset = second_validation_result["meta"]["active_batch_definition"][ "data_asset_name"]
print(first_expectation_suite)my_expectation_suite
print(first_data_asset)yellow_tripdata_sample_2019-01
print(second_expectation_suite)my_other_expectation_suite
print(second_data_asset)yellow_tripdata_sample_2019-02
YAML, using SimpleCheckpoint:
name: my_checkpointconfig_version: 1class_name: SimpleCheckpointvalidations: - batch_request: datasource_name: taxi_datasource data_connector_name: default_inferred_data_connector_name data_asset_name: yellow_tripdata_sample_2019-01 expectation_suite_name: my_expectation_suitesite_names: allslack_webhook: <YOUR SLACK WEBHOOK URL>notify_on: failurenotify_with: all
Equivalent YAML, using Checkpoint:
name: my_checkpointconfig_version: 1class_name: Checkpointvalidations: - batch_request: datasource_name: taxi_datasource data_connector_name: default_inferred_data_connector_name data_asset_name: yellow_tripdata_sample_2019-01 expectation_suite_name: my_expectation_suiteaction_list: - name: store_validation_result action: class_name: StoreValidationResultAction - name: store_evaluation_params action: class_name: StoreEvaluationParametersAction - name: update_data_docs action: class_name: UpdateDataDocsAction - name: send_slack_notification action: class_name: SlackNotificationAction slack_webhook: <YOUR SLACK WEBHOOK URL> notify_on: failure notify_with: all renderer: module_name: great_expectations.render.renderer.slack_renderer class_name: SlackRenderer
Runtime:
results = context.run_checkpoint(checkpoint_name="my_checkpoint")
Results:
validation_result = list(results.run_results.items())[0][1]["validation_result"]
expectation_suite = validation_result["meta"]["expectation_suite_name"]data_asset = validation_result["meta"]["active_batch_definition"]["data_asset_name"]
print(expectation_suite)my_expectation_suite
print(data_asset)yellow_tripdata_sample_2019-01
#
Additional NotesTo view the full script used in this page, see it on GitHub: