Task Graphs#
Taskgraph’s namesake comes from the fact that it produces a graph of tasks as output. Specifically, it is a directed acyclic graph (DAG) whose nodes are tasks, and whose edges are the dependencies between them. The root of the DAG is a special task called the Decision Task.
For example let’s say you had a lint task, a build and a test that depends on the build. Let’s also say that the lint and test tasks depend on an image task (which builds the docker image they run in). The resulting task graph would be:
Graph Generation#
The graph is generated via a series of steps that run one after the other. Broadly, these steps are:
full_task_set: For all kinds, generate all tasks.
full_task_graph: Create dependency links between tasks using kind-specific mechanisms.
target_task_set: Filter the tasks based on a series of filters that are defined for each project. Tasks are typically filtered out based on the Parameters. For example, release oriented tasks would be removed from the graph if we’re generating it for a pull request. The tasks remaining after this step are called the target tasks.
target_task_graph: Based on the full task graph, calculate the transitive closure of the target task set. That is, the target tasks and all dependencies of those tasks.
optimized_task_graph: Optimize the target task graph using task-specific optimization methods. Optimizations are similar to the target filters in that they cause additional tasks to be pruned. Conceptually the difference is that the tasks in this phase are still relevant to the Parameters being used, it’s just we may choose not to run them for other reasons (like cost).
morphed_task_graph: Morphs are like syntactic sugar. They keep the same meaning, but express it in a lower-level way. These generally work around limitations in the TaskCluster platform, such as number of dependencies or routes in a task.
Create tasks for all tasks in the morphed task graph via the Taskcluster API.
Graph generation is handled by the
TaskGraphGenerator
class. Each phase of the graph
has an associated property and can be generated by accessing it:
from taskgraph.generator import TaskGraphGenerator
generator = TaskGraphGenerator(...)
# kicks off graph generation but returns after the full_task_graph
# step is finished
generator.full_task_graph
The taskgraph
shell command similarly has subcommands that can help you generate
each phase locally:
# generates the full_task_graph and exits
$ taskgraph full
# generates the target_task_graph and exits
$ taskgraph target
# etc..
The decision task uses the command taskgraph decision
, which spans the
whole generation process all the way to task creation in the last step. See
Run Taskgraph Locally for more information on running the
taskgraph
command locally.
Transitive Closure#
Transitive closure is a fancy name for this sort of operation:
start with a set of tasks
add all tasks on which any of those tasks depend
repeat until nothing changes
The effect is this: imagine you start with a linux32 test task and a linux64 test task. In the first round, each test task depends on the test docker image task, so add that image task. Each test also depends on a build, so add the linux32 and linux64 build tasks.
Then repeat: the test docker image task is already present, as are the build tasks, but those build tasks depend on the build docker image task. So add that build docker image task. Repeat again: this time, none of the tasks in the set depend on a task not in the set, so nothing changes and the process is complete.
And as you can see, the graph we’ve built now includes everything we wanted (the test tasks) plus everything required to do that (docker images, builds).
Dependencies#
Dependencies between tasks are represented as labeled edges in the task graph.
They are specified via the dependencies
key which is an object of the form
{ "<edge>": "<label>"}
. The edge
is an arbitrary name which can be used
to refer to the dependency later on. The label
is the task label of the
dependency.
Taskgraph is only able to add a dependency to tasks that have already been
generated. The kind-dependencies
key must be used to determine the order in
which different kinds of tasks are generated. It is specified as a list of
kinds that are guaranteed to be generated before the current kind. It is an
error to depend on a task of the same kind, or to create any cycles in the
kind-dependencies
chain.
Note
All examples assume build-linux
and build-windows
tasks have been
defined elsewhere in a build
kind.
For example, a test task might depend on the artifacts a build task creates. This might be expressed as follows:
kind-dependencies:
- build
tasks:
test-linux:
dependencies:
build: build-linux
# .. rest of task definition ..
test-windows:
dependencies:
build: build-windows
# .. rest of task definition ..
First, note the kind-dependencies
key. This ensures the tasks in the
build
kind have already been generated, and are candidates to be added as
dependencies.
Second, notice how both the test-linux
and test-windows
task use the
same edge name to reference their build dependency. This will allow
transforms later on to find their build dependency even
without knowing which specific task it is they depend on.
Other Types of Dependencies#
Dependencies are typically used to ensure that prerequisites to a task, such as creation of binary artifacts, are completed before that task runs. But dependencies can also be used to schedule follow-up work such as summarizing results of dependencies, sending notifications that dependencies are completed or uploading artifacts to a server. In many of these cases, it may not be desired for the dependent task to “pull in” the dependency as would normally be the case.
To help with these uses cases, there are two additional types of dependencies.
If Dependencies#
The if-dependencies
key (list) can be used to denote a task that should
only run if at least one of these specified dependencies are also run.
Dependencies specified by this key will not be “pulled in”. This makes it
suitable for things like signing builds or uploading symbols.
This key is specified as a list of dependency edge names (e.g, build
rather
than the label of the build), which means the original dependency must be specified
as normal. For example:
kind-dependencies:
- build
tasks:
upload-build:
dependencies:
build: build-windows
if-dependencies:
- build
In the above example, the upload-build
task will only run if the build
task would have run anyway. It will not cause the build
task to get “pulled
into” the graph.
Soft Dependencies#
To add a task depending on arbitrary tasks remaining after the optimization
process is complete, you can use soft-dependencies
, as a list of optimized
tasks labels. This is useful for tasks that need to perform some action on N
other tasks and it is not known how many. Unlike if-dependencies
, tasks
that specify soft-dependencies
will still be scheduled, even if none of the
candidate dependencies are.
For example:
tasks:
notify-build:
soft-dependencies:
- build-linux
- build-windows
Note that neither the kind-dependencies
nor dependencies
keys are used
here. This is because the dependency edge is created after the optimization
phase of the task graph, rather than in the full task graph phase. If either or
both of build-linux
/ build-windows
end up being optimized away, the
dependency links will simply not be created.