Contribution and Testing

Running tests#

You can run all unit tests by running pytest in the great_expectations directory root.

If you did not configure optional backends for testing, tests against these backends will fail.

You can suppress these tests by adding the following flags:

--no-postgresql will skip Postgres tests
--no-spark will skip spark tests
--no-sqlalchemy will skip all tests using SQLAlchemy (i.e. all database backends)

For example, you can run pytest --no-spark --no-sqlalchemy to skip all local backend tests (with the exception of the pandas backend). Please note that these tests will still be run by the CI as soon as you open a PR, so some tests might fail if your code changes affected them.

Note: as of early 2020, the tests generate many warnings. Most of these are generated by dependencies (pandas, sqlalchemy, etc.) You can suppress them with pytest’s --disable-pytest-warnings flag: pytest --no-spark --no-sqlalchemy --disable-pytest-warnings.

BigQuery tests#

In order to run BigQuery tests, you first need to go through the following steps:

Select or create a Cloud Platform project.
Setup Authentication.
In your project, create a BigQuery dataset (e.g. named test_ci) and set the dataset default table expiration to .1 days

After setting up authentication, you can run with your project using the environment variables GE_TEST_BIGQUERY_PROJECT and GE_TEST_BIGQUERY_DATASET, e.g.

    GE_TEST_BIGQUERY_PROJECT=<YOUR_GOOGLE_CLOUD_PROJECT>     GE_TEST_BIGQUERY_DATASET=test_ci    pytest tests/test_definitions/test_expectations_cfe.py --bigquery --no-spark --no-postgresql

Writing unit and integration tests#

Production code in Great Expectations must be thoroughly tested. In general, we insist on unit tests for all branches of every method, including likely error states. Most new feature contributions should include several unit tests. Contributions that modify or extend existing features should include a test of the new behavior.

Experimental code in Great Expectations need only be tested lightly. We are moving to a convention where experimental features are clearly labeled in documentation and the code itself. However, this convention is not uniformly applied today.

Most of Great Expectations’ integration testing is in the CLI, which naturally exercises most of the core code paths. Because integration tests require a lot of developer time to maintain, most contributions should not include new integration tests, unless they change the CLI itself.

Note: we do not currently test Great Expectations against all types of SQL database. CI test coverage for SQL is limited to PostgreSQL, SQLite, MSSQL, and BigQuery. We have observed some bugs because of unsupported features or differences in SQL dialects, and we are actively working to improve dialect-specific support and testing.

Unit tests for Expectations#

One of Great Expectations’ important promises is that the same Expectation will produce the same result across all supported execution environments: pandas, sqlalchemy, and Spark.

To accomplish this, Great Expectations encapsulates unit tests for Expectations as JSON files. These files are used as fixtures and executed using a specialized test runner that executes tests against all execution environments.

Test fixture files are structured as follows:

{    "expectation_type" : "expect_column_max_to_be_between",    "datasets" : [{        "data" : {...},        "schemas" : {...},        "tests" : [...]    }]}

Each item under datasets includes three entries: data, schemas, and tests.

data#

…defines a dataframe of sample data to apply Expectations against. The dataframe is defined as a dictionary of lists, with keys containing column names and values containing lists of data entries. All lists within a dataset must have the same length.

"data" : {    "w" : [1, 2, 3, 4, 5, 5, 4, 3, 2, 1],    "x" : [2, 3, 4, 5, 6, 7, 8, 9, null, null],    "y" : [1, 1, 1, 2, 2, 2, 3, 3, 3, 4],    "z" : ["a", "b", "c", "d", "e", null, null, null, null, null],    "zz" : ["1/1/2016", "1/2/2016", "2/2/2016", "2/2/2016", "3/1/2016", "2/1/2017", null, null, null, null],    "a" : [null, 0, null, null, 1, null, null, 2, null, null],},

schemas#

…define the types to be used when instantiating tests against different execution environments, including different SQL dialects. Each schema is defined as dictionary with column names and types as key-value pairs. If the schema isn’t specified for a given execution environment, Great Expectations will introspect values and attempt to guess the schema.

"schemas": {    "sqlite": {        "w" : "INTEGER",        "x" : "INTEGER",        "y" : "INTEGER",        "z" : "VARCHAR",        "zz" : "DATETIME",        "a" : "INTEGER",    },    "postgresql": {        "w" : "INTEGER",        "x" : "INTEGER",        "y" : "INTEGER",        "z" : "TEXT",        "zz" : "TIMESTAMP",        "a" : "INTEGER",    }},

tests#

…define the tests to be executed against the dataframe. Each item in tests must have title, exact_match_out, in, and out. The test runner will execute the named Expectation once for each item, with the values in in supplied as kwargs.

The test passes if the values in the expectation Validation Result correspond with the values in out. If exact_match_out is true, then every field in the Expectation output must have a corresponding, matching field in out. If it’s false, then only the fields specified in out need to match. For most use cases, false is a better fit, because it allows narrower targeting of the relevant output.

suppress_test_for is an optional parameter to disable an Expectation for a specific list of backends.

See an example below. For other examples

"tests" : [{    "title": "Basic negative test case",    "exact_match_out" : false,    "in": {        "column": "w",        "result_format": "BASIC",        "min_value": null,        "max_value": 4    },    "out": {        "success": false,        "observed_value": 5    },    "suppress_test_for": ["sqlite"]},...]

The test fixture files are stored in subdirectories of tests/test_definitions/ corresponding to the class of Expectation:

column_map_expectations
column_aggregate_expectations
column_pair_map_expectations
column_distributional_expectations
multicolumn_map_expectations
other_expectations

By convention, the name of the the file is the name of the Expectation, with a .json suffix. Creating a new json file will automatically add the new Expectation tests to the test suite.

Note: If you are implementing a new Expectation, but don’t plan to immediately implement it for all execution environments, you should add the new test to the appropriate list(s) in the candidate_test_is_on_temporary_notimplemented_list method within tests/test_utils.py. Often, we see Expectations developed first for pandas, then later extended to SqlAlchemy and Spark.

You can run just the Expectation tests with pytest tests/test_definitions/test_expectations.py.

Performance testing#

Configuring Data Before Running Performance Tests#

The performance tests use BigQuery.

Before running a performance test, setup data with tests/performance/setup_bigquery_tables_for_performance_test.sh.

For example:

GE_TEST_BIGQUERY_PEFORMANCE_DATASET=<YOUR_GCP_PROJECT> tests/performance/setup_bigquery_tables_for_performance_test.sh

For more information on getting started with BigQuery, please refer to the above section on BigQuery tests.

Running the Performance Tests#

Run the performance tests with pytest, e.g.

pytest tests/performance/test_bigquery_benchmarks.py \  --bigquery --performance-tests \  -k 'test_taxi_trips_benchmark[1-True-V3]'  \  --benchmark-json=tests/performance/results/`date "+%H%M"`_${USER}.json \  --no-spark --no-postgresql -rP -vv

Some benchmarks take a long time to complete. In this example, only the relatively fast test_taxi_trips_benchmark[1-True-V3] benchmark is run and the output should include runtime like the following:

--------------------------------------------------- benchmark: 1 tests ------------------------------------------------------Name (time in s)                         Min     Max    Mean  StdDev  Median     IQR  Outliers     OPS  Rounds  Iterations-----------------------------------------------------------------------------------------------------------------------------test_taxi_trips_benchmark[1-True-V3]     5.0488  5.0488  5.0488  0.0000  5.0488  0.0000       0;0  0.1981       1           1-----------------------------------------------------------------------------------------------------------------------------

The result is saved for comparisons as described below.

Comparing Performance Results#

Compare test results in this directory with py.test-benchmark compare, e.g.

$ py.test-benchmark compare --group-by name tests/performance/results/initial_baseline.json tests/performance/results/*${USER}.json                                                                   
---------------------------------------------------------------------------- benchmark 'test_taxi_trips_benchmark[1-True-V3]': 2 tests ---------------------------------------------------------------------------Name (time in s)                                        Min               Max              Mean            StdDev            Median               IQR            Outliers     OPS            Rounds  Iterations------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------test_taxi_trips_benchmark[1-True-V3] (initial_base)     5.0488 (1.0)      5.0488 (1.0)      5.0488 (1.0)      0.0000 (1.0)      5.0488 (1.0)      0.0000 (1.0)           0;0  0.1981 (1.0)           1           1test_taxi_trips_benchmark[1-True-V3] (2114_work)        6.4675 (1.28)     6.4675 (1.28)     6.4675 (1.28)     0.0000 (1.0)      6.4675 (1.28)     0.0000 (1.0)           0;0  0.1546 (0.78)          1           1------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Please refer to pytest-benchmark documentation for more info.

Checking in new benchmark results#

When creating a pull request that is intended to improve performance, please include in the pull request benchmark results showing improvements. Please use the script run_benchmark_multiple_times.sh to run the benchmark multiple times. Name the tests with the first argument provided to that script. For example, the tests/performance/results/minimal_multithreading_*.json files were created with the following command:

$ tests/performance/run_benchmark_multiple_times.sh minimal_multithreading

Manual testing#

We do manual testing (e.g. against various databases and backends) before major releases and in response to specific bugs and issues.