Contribution and Testing
#
Running testsYou can run all unit tests by running pytest
in the great_expectations
directory root.
If you did not configure optional backends for testing, tests against these backends will fail.
You can suppress these tests by adding the following flags:
--no-postgresql
will skip Postgres tests--no-spark
will skip spark tests--no-sqlalchemy
will skip all tests using SQLAlchemy (i.e. all database backends)
For example, you can run pytest --no-spark --no-sqlalchemy
to skip all local backend tests (with the exception of the pandas backend). Please note that these tests will still be run by the CI as soon as you open a PR, so some tests might fail if your code changes affected them.
Note: as of early 2020, the tests generate many warnings. Most of these are generated by dependencies (pandas, sqlalchemy, etc.) You can suppress them with pytestâs --disable-pytest-warnings
flag: pytest --no-spark --no-sqlalchemy --disable-pytest-warnings
.
#
BigQuery testsIn order to run BigQuery tests, you first need to go through the following steps:
- Select or create a Cloud Platform project.
- Setup Authentication.
- In your project, create a BigQuery dataset (e.g. named
test_ci
) and set the dataset default table expiration to.1
days
After setting up authentication, you can run with your project using the environment variables GE_TEST_BIGQUERY_PROJECT
and GE_TEST_BIGQUERY_DATASET
, e.g.
GE_TEST_BIGQUERY_PROJECT=<YOUR_GOOGLE_CLOUD_PROJECT> GE_TEST_BIGQUERY_DATASET=test_ci pytest tests/test_definitions/test_expectations_cfe.py --bigquery --no-spark --no-postgresql
#
Writing unit and integration testsProduction code in Great Expectations must be thoroughly tested. In general, we insist on unit tests for all branches of every method, including likely error states. Most new feature contributions should include several unit tests. Contributions that modify or extend existing features should include a test of the new behavior.
Experimental code in Great Expectations need only be tested lightly. We are moving to a convention where experimental features are clearly labeled in documentation and the code itself. However, this convention is not uniformly applied today.
Most of Great Expectationsâ integration testing is in the CLI, which naturally exercises most of the core code paths. Because integration tests require a lot of developer time to maintain, most contributions should not include new integration tests, unless they change the CLI itself.
Note: we do not currently test Great Expectations against all types of SQL database. CI test coverage for SQL is limited to PostgreSQL, SQLite, MSSQL, and BigQuery. We have observed some bugs because of unsupported features or differences in SQL dialects, and we are actively working to improve dialect-specific support and testing.
#
Unit tests for ExpectationsOne of Great Expectationsâ important promises is that the same Expectation will produce the same result across all supported execution environments: pandas, sqlalchemy, and Spark.
To accomplish this, Great Expectations encapsulates unit tests for Expectations as JSON files. These files are used as fixtures and executed using a specialized test runner that executes tests against all execution environments.
Test fixture files are structured as follows:
{ "expectation_type" : "expect_column_max_to_be_between", "datasets" : [{ "data" : {...}, "schemas" : {...}, "tests" : [...] }]}
Each item under datasets
includes three entries: data
, schemas
, and tests
.
#
dataâŠdefines a dataframe of sample data to apply Expectations against. The dataframe is defined as a dictionary of lists, with keys containing column names and values containing lists of data entries. All lists within a dataset must have the same length.
"data" : { "w" : [1, 2, 3, 4, 5, 5, 4, 3, 2, 1], "x" : [2, 3, 4, 5, 6, 7, 8, 9, null, null], "y" : [1, 1, 1, 2, 2, 2, 3, 3, 3, 4], "z" : ["a", "b", "c", "d", "e", null, null, null, null, null], "zz" : ["1/1/2016", "1/2/2016", "2/2/2016", "2/2/2016", "3/1/2016", "2/1/2017", null, null, null, null], "a" : [null, 0, null, null, 1, null, null, 2, null, null],},
#
schemasâŠdefine the types to be used when instantiating tests against different execution environments, including different SQL dialects. Each schema is defined as dictionary with column names and types as key-value pairs. If the schema isnât specified for a given execution environment, Great Expectations will introspect values and attempt to guess the schema.
"schemas": { "sqlite": { "w" : "INTEGER", "x" : "INTEGER", "y" : "INTEGER", "z" : "VARCHAR", "zz" : "DATETIME", "a" : "INTEGER", }, "postgresql": { "w" : "INTEGER", "x" : "INTEGER", "y" : "INTEGER", "z" : "TEXT", "zz" : "TIMESTAMP", "a" : "INTEGER", }},
#
testsâŠdefine the tests to be executed against the dataframe. Each item in tests
must have title
, exact_match_out
, in
, and out
. The test runner will execute the named Expectation once for each item, with the values in in
supplied as kwargs.
The test passes if the values in the expectation Validation Result correspond with the values in out
. If exact_match_out
is true, then every field in the Expectation output must have a corresponding, matching field in out
. If itâs false, then only the fields specified in out
need to match. For most use cases, false is a better fit, because it allows narrower targeting of the relevant output.
suppress_test_for
is an optional parameter to disable an Expectation for a specific list of backends.
See an example below. For other examples
"tests" : [{ "title": "Basic negative test case", "exact_match_out" : false, "in": { "column": "w", "result_format": "BASIC", "min_value": null, "max_value": 4 }, "out": { "success": false, "observed_value": 5 }, "suppress_test_for": ["sqlite"]},...]
The test fixture files are stored in subdirectories of tests/test_definitions/
corresponding to the class of Expectation:
- column_map_expectations
- column_aggregate_expectations
- column_pair_map_expectations
- column_distributional_expectations
- multicolumn_map_expectations
- other_expectations
By convention, the name of the the file is the name of the Expectation, with a .json suffix. Creating a new json file will automatically add the new Expectation tests to the test suite.
Note: If you are implementing a new Expectation, but donât plan to immediately implement it for all execution environments, you should add the new test to the appropriate list(s) in the candidate_test_is_on_temporary_notimplemented_list
method within tests/test_utils.py
. Often, we see Expectations developed first for pandas, then later extended to SqlAlchemy and Spark.
You can run just the Expectation tests with pytest tests/test_definitions/test_expectations.py
.
#
Performance testing#
Configuring Data Before Running Performance TestsThe performance tests use BigQuery.
Before running a performance test, setup data with tests/performance/setup_bigquery_tables_for_performance_test.sh
.
For example:
GE_TEST_BIGQUERY_PEFORMANCE_DATASET=<YOUR_GCP_PROJECT> tests/performance/setup_bigquery_tables_for_performance_test.sh
For more information on getting started with BigQuery, please refer to the above section on BigQuery tests.
#
Running the Performance TestsRun the performance tests with pytest, e.g.
pytest tests/performance/test_bigquery_benchmarks.py \ --bigquery --performance-tests \ -k 'test_taxi_trips_benchmark[1-True-V3]' \ --benchmark-json=tests/performance/results/`date "+%H%M"`_${USER}.json \ --no-spark --no-postgresql -rP -vv
Some benchmarks take a long time to complete. In this example, only the relatively fast test_taxi_trips_benchmark[1-True-V3]
benchmark is run and the output should include runtime like the following:
--------------------------------------------------- benchmark: 1 tests ------------------------------------------------------Name (time in s) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations-----------------------------------------------------------------------------------------------------------------------------test_taxi_trips_benchmark[1-True-V3] 5.0488 5.0488 5.0488 0.0000 5.0488 0.0000 0;0 0.1981 1 1-----------------------------------------------------------------------------------------------------------------------------
The result is saved for comparisons as described below.
#
Comparing Performance ResultsCompare test results in this directory with py.test-benchmark compare
, e.g.
$ py.test-benchmark compare --group-by name tests/performance/results/initial_baseline.json tests/performance/results/*${USER}.json
---------------------------------------------------------------------------- benchmark 'test_taxi_trips_benchmark[1-True-V3]': 2 tests ---------------------------------------------------------------------------Name (time in s) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------test_taxi_trips_benchmark[1-True-V3] (initial_base) 5.0488 (1.0) 5.0488 (1.0) 5.0488 (1.0) 0.0000 (1.0) 5.0488 (1.0) 0.0000 (1.0) 0;0 0.1981 (1.0) 1 1test_taxi_trips_benchmark[1-True-V3] (2114_work) 6.4675 (1.28) 6.4675 (1.28) 6.4675 (1.28) 0.0000 (1.0) 6.4675 (1.28) 0.0000 (1.0) 0;0 0.1546 (0.78) 1 1------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Please refer to pytest-benchmark documentation for more info.
#
Checking in new benchmark resultsWhen creating a pull request that is intended to improve performance, please include in the pull request benchmark results showing improvements. Please use the script run_benchmark_multiple_times.sh
to run the benchmark multiple times. Name the tests with the first argument provided to that script. For example, the tests/performance/results/minimal_multithreading_*.json
files were created with the following command:
$ tests/performance/run_benchmark_multiple_times.sh minimal_multithreading
#
Manual testingWe do manual testing (e.g. against various databases and backends) before major releases and in response to specific bugs and issues.