How to validate data by running a Checkpoint
This guide will help you validate your data by running a Checkpoint.
As stated in the Getting Started Tutorial Validate your data using a Checkpoint, the best way to validate data in production with Great Expectations is using a Checkpoint. The advantage of using a Checkpoint is ease of use, due to its principal capability of combining the existing configuration in order to set up and perform the validation:
Otherwise, configuring these validation parameters would have to be done via the API. A Checkpoint encapsulates this "boilerplate" and ensures that all components work in harmony together. Finally, running a configured Checkpoint is a one-liner, as described below.
Prerequisites: This how-to guide assumes you have:
- Completed the Getting Started Tutorial
- Have a working installation of Great Expectations
- Configured a Data Context.
- Configured an Expectations Suite.
- Configured a Checkpoint
You can run the Checkpoint from the CLI in a Terminal shell or using Python.
- Terminal
- Python
#
Steps- Checkpoints can be run like applications from the command line by running:
great_expectations checkpoint run my_checkpointValidation failed!
- Next, observe the output which will tell you if all validations passed or failed.
#
Additional notesThis command will return posix status codes and print messages as follows:
+-------------------------------+-----------------+-----------------------+| **Situation** | **Return code** | **Message** |+-------------------------------+-----------------+-----------------------+| all validations passed | 0 | Validation succeeded! |+-------------------------------+-----------------+-----------------------+| one or more validation failed | 1 | Validation failed! |+-------------------------------+-----------------+-----------------------+
#
Steps- First, generate the Python script with the command:
great_expectations checkpoint script my_checkpoint
- Next, you will see a message about where the Python script was created like:
A Python script was created that runs the checkpoint named: `my_checkpoint` - The script is located in `great_expectations/uncommitted/run_my_checkpoint.py` - The script can be run with `python great_expectations/uncommitted/run_my_checkpoint.py`
- Next, open the script -- it should look like this:
"""This is a basic generated Great Expectations script that runs a Checkpoint.
Checkpoints are the primary method for validating batches of data in production and triggering any followup actions.
A Checkpoint facilitates running a validation as well as configurable Actions such as updating Data Docs, sending anotification to team members about Validation Results, or storing a result in a shared cloud storage.
Checkpoints can be run directly without this script using the `great_expectations checkpoint run` command. This scriptis provided for those who wish to run Checkpoints in Python.
Usage:- Run this file: `python great_expectations/uncommitted/run_my_checkpoint.py`.- This can be run manually or via a scheduler such, as cron.- If your pipeline runner supports Python snippets, then you can paste this into your pipeline."""import sys
from great_expectations.checkpoint.types.checkpoint_result import CheckpointResultfrom great_expectations.data_context import DataContext
data_context: DataContext = DataContext( context_root_dir="/path/to/great_expectations")
result: CheckpointResult = data_context.run_checkpoint( checkpoint_name="my_checkpoint", batch_request=None, run_name=None,)
if not result["success"]: print("Validation failed!") sys.exit(1)
print("Validation succeeded!")sys.exit(0)
- This Python script can then be invoked directly using Python:
python great_expectations/uncommitted/run_my_checkpoint.py
Alternatively, the above Python code can be embedded in your pipeline.
#
Additional Notes- Other arguments to the
DataContext.run_checkpoint()
method may be required, depending on the amount and specifics of the Checkpoint configuration previously saved in the configuration file of the Checkpoint with the correspondingname
. - The dynamically specified Checkpoint configuration, provided to the runtime as arguments to
DataContext.run_checkpoint()
must complement the settings in the Checkpoint configuration file so as to comprise a properly and sufficiently configured Checkpoint with the givenname
. - Please see How to configure a new Checkpoint using test_yaml_config for more Checkpoint configuration examples (including the convenient templating mechanism) and
DataContext.run_checkpoint()
invocation options.