How to validate data by running a Checkpoint

This guide will help you validate your data by running a Checkpoint.

As stated in the Getting Started Tutorial Validate your data using a Checkpoint, the best way to validate data in production with Great Expectations is using a Checkpoint. The advantage of using a Checkpoint is ease of use, due to its principal capability of combining the existing configuration in order to set up and perform the validation:

Otherwise, configuring these validation parameters would have to be done via the API. A Checkpoint encapsulates this "boilerplate" and ensures that all components work in harmony together. Finally, running a configured Checkpoint is a one-liner, as described below.

Prerequisites: This how-to guide assumes you have:

Completed the Getting Started Tutorial
Have a working installation of Great Expectations
Configured a Data Context.
Configured an Expectations Suite.
Configured a Checkpoint

You can run the Checkpoint from the CLI in a Terminal shell or using Python.

Terminal
Python

Steps#

Checkpoints can be run like applications from the command line by running:

great_expectations checkpoint run my_checkpointValidation failed!

Next, observe the output which will tell you if all validations passed or failed.

Additional notes#

This command will return posix status codes and print messages as follows:

+-------------------------------+-----------------+-----------------------+| **Situation**                 | **Return code** | **Message**           |+-------------------------------+-----------------+-----------------------+| all validations passed        | 0               | Validation succeeded! |+-------------------------------+-----------------+-----------------------+| one or more validation failed | 1               | Validation failed!    |+-------------------------------+-----------------+-----------------------+

Steps#

First, generate the Python script with the command:

great_expectations checkpoint script my_checkpoint

Next, you will see a message about where the Python script was created like:

A Python script was created that runs the checkpoint named: `my_checkpoint`  - The script is located in `great_expectations/uncommitted/run_my_checkpoint.py`  - The script can be run with `python great_expectations/uncommitted/run_my_checkpoint.py`

Next, open the script -- it should look like this:

"""This is a basic generated Great Expectations script that runs a Checkpoint.
Checkpoints are the primary method for validating batches of data in production and triggering any followup actions.
A Checkpoint facilitates running a validation as well as configurable Actions such as updating Data Docs, sending anotification to team members about Validation Results, or storing a result in a shared cloud storage.
Checkpoints can be run directly without this script using the `great_expectations checkpoint run` command.  This scriptis provided for those who wish to run Checkpoints in Python.
Usage:- Run this file: `python great_expectations/uncommitted/run_my_checkpoint.py`.- This can be run manually or via a scheduler such, as cron.- If your pipeline runner supports Python snippets, then you can paste this into your pipeline."""import sys
from great_expectations.checkpoint.types.checkpoint_result import CheckpointResultfrom great_expectations.data_context import DataContext
data_context: DataContext = DataContext(    context_root_dir="/path/to/great_expectations")
result: CheckpointResult = data_context.run_checkpoint(    checkpoint_name="my_checkpoint",    batch_request=None,    run_name=None,)
if not result["success"]:    print("Validation failed!")    sys.exit(1)
print("Validation succeeded!")sys.exit(0)

This Python script can then be invoked directly using Python:

python great_expectations/uncommitted/run_my_checkpoint.py

Alternatively, the above Python code can be embedded in your pipeline.

Additional Notes#

Other arguments to the DataContext.run_checkpoint() method may be required, depending on the amount and specifics of the Checkpoint configuration previously saved in the configuration file of the Checkpoint with the corresponding name.
The dynamically specified Checkpoint configuration, provided to the runtime as arguments to DataContext.run_checkpoint() must complement the settings in the Checkpoint configuration file so as to comprise a properly and sufficiently configured Checkpoint with the given name.
Please see How to configure a new Checkpoint using test_yaml_config for more Checkpoint configuration examples (including the convenient templating mechanism) and DataContext.run_checkpoint() invocation options.