Data applications are notoriously difficult to test and are therefore often un- or under-tested.
Creating testable and verifiable data pipelines is one of the focuses of Dagster. We believe ensuring data quality is critical for managing the complexity of data systems. Here, we'll show how to write unit tests for Dagster jobs and ops.
Let's go back to the diamond
job we wrote in the prior section, and ensure that it's working as expected by writing some unit tests.
We'll start by writing a test for the find_highest_calorie_cereal
op, which takes a set of cereals as input and returns the name of the cereal with the most calories. To run an op, we can invoke it directly, as if it's a regular Python function:
def test_find_highest_calorie_cereal():
cereals = [
{"name": "hi-cal cereal", "calories": 400},
{"name": "lo-cal cereal", "calories": 50},
]
result = find_highest_calorie_cereal(cereals)
assert result == "hi-cal cereal"
We'll also write a test for the entire job. The JobDefinition.execute_in_process
method synchronously executes a job and returns a ExecuteInProcessResult
, whose methods let us investigate, in detail, the success or failure of execution, the outputs produced by ops, and (as we'll see later) other events associated with execution.
def test_diamond():
res = diamond.execute_in_process()
assert res.success
assert res.output_for_node("find_highest_protein_cereal") == "Special K"
Now you can use pytest, or your test runner of choice, to run unit tests as you develop your data applications.
pytest test_complex_job.py
Obviously, in production we'll often execute jobs in a parallel, streaming way that doesn't admit this kind of API, which is intended to enable local tests like this.
Dagster is written to make testing easy in a domain where it has historically been very difficult. Throughout the rest of this tutorial, we'll explore the writing of unit tests for each piece of the framework as we learn about it. You can learn more about Testing in Dagster by reading the Testing page.
π Congratulations! Having reached this far, you now have a working, testable, and maintainable data pipeline. You should now be able to build your own data applications using Dagster!