Versioning and Memoization

Dagster allows for code versioning and memoization of previous outputs based upon that versioning. Listed here are APIs related to versioning and memoization.

Versioning

class dagster.VersionStrategy[source]

Abstract class for defining a strategy to version solids and resources.

When subclassing, get_solid_version must be implemented, and get_resource_version can be optionally implemented.

get_solid_version should ingest a SolidVersionContext, and get_resource_version should ingest a ResourceVersionContext. From that, each synthesize a unique string called a version, which will be tagged to outputs of that solid in the pipeline. Providing a VersionStrategy instance to a job will enable memoization on that job, such that only steps whose outputs do not have an up-to-date version will run.

class dagster.SourceHashVersionStrategy[source]

Memoization

class dagster.MemoizableIOManager[source]

Base class for IO manager enabled to work with memoized execution. Users should implement the load_input and handle_output methods described in the IOManager API, and the has_output method, which returns a boolean representing whether a data object can be found.

abstract has_output(context)[source]

The user-defined method that returns whether data exists given the metadata.

Parameters

context (OutputContext) – The context of the step performing this check.

Returns

True if there is data present that matches the provided context. False otherwise.

Return type

bool

See also: dagster.IOManager.

dagster.MEMOIZED_RUN_TAG

Provide this tag to a run to toggle memoization on or off. {MEMOIZED_RUN_TAG: "true"} toggles memoization on, while {MEMOIZED_RUN_TAG: "false"} toggles memoization off.