Task Graphs#

Taskgraph’s namesake comes from the fact that it produces a graph of tasks as output. Specifically, it is a directed acyclic graph (DAG) whose nodes are tasks, and whose edges are the dependencies between them. The root of the DAG is a special task called the Decision Task.

For example let’s say you had a lint task, a build and a test that depends on the build. Let’s also say that the lint and test tasks depend on an image task (which builds the docker image they run in). The resulting task graph would be:

flowchart TB D(decision) I(image) L(lint) B(build) T(test) D --> I D --> B B --> T I --> L I --> T

Graph Generation#

The graph is generated via a series of steps that run one after the other. Broadly, these steps are:

  1. full_task_set: For all kinds, generate all tasks.

  2. full_task_graph: Create dependency links between tasks using kind-specific mechanisms.

  3. target_task_set: Filter the tasks based on a series of filters that are defined for each project. Tasks are typically filtered out based on the Parameters. For example, release oriented tasks would be removed from the graph if we’re generating it for a pull request. The tasks remaining after this step are called the target tasks.

  4. target_task_graph: Based on the full task graph, calculate the transitive closure of the target task set. That is, the target tasks and all dependencies of those tasks.

  5. optimized_task_graph: Optimize the target task graph using task-specific optimization methods. Optimizations are similar to the target filters in that they cause additional tasks to be pruned. Conceptually the difference is that the tasks in this phase are still relevant to the Parameters being used, it’s just we may choose not to run them for other reasons (like cost).

  6. morphed_task_graph: Morphs are like syntactic sugar. They keep the same meaning, but express it in a lower-level way. These generally work around limitations in the TaskCluster platform, such as number of dependencies or routes in a task.

  7. Create tasks for all tasks in the morphed task graph via the Taskcluster API.

Graph generation is handled by the TaskGraphGenerator class. Each phase of the graph has an associated property and can be generated by accessing it:

from taskgraph.generator import TaskGraphGenerator
generator = TaskGraphGenerator(...)
# kicks off graph generation but returns after the full_task_graph
# step is finished
generator.full_task_graph

The taskgraph shell command similarly has subcommands that can help you generate each phase locally:

# generates the full_task_graph and exits
$ taskgraph full
# generates the target_task_graph and exits
$ taskgraph target
# etc..

The decision task uses the command taskgraph decision, which spans the whole generation process all the way to task creation in the last step. See Run Taskgraph Locally for more information on running the taskgraph command locally.

Transitive Closure#

Transitive closure is a fancy name for this sort of operation:

  • start with a set of tasks

  • add all tasks on which any of those tasks depend

  • repeat until nothing changes

The effect is this: imagine you start with a linux32 test task and a linux64 test task. In the first round, each test task depends on the test docker image task, so add that image task. Each test also depends on a build, so add the linux32 and linux64 build tasks.

Then repeat: the test docker image task is already present, as are the build tasks, but those build tasks depend on the build docker image task. So add that build docker image task. Repeat again: this time, none of the tasks in the set depend on a task not in the set, so nothing changes and the process is complete.

And as you can see, the graph we’ve built now includes everything we wanted (the test tasks) plus everything required to do that (docker images, builds).

Dependencies#

Dependencies between tasks are represented as labeled edges in the task graph. They are specified via the dependencies key which is an object of the form { "<edge>": "<label>"}. The edge is an arbitrary name which can be used to refer to the dependency later on. The label is the task label of the dependency.

Taskgraph is only able to add a dependency to tasks that have already been generated. The kind-dependencies key must be used to determine the order in which different kinds of tasks are generated. It is specified as a list of kinds that are guaranteed to be generated before the current kind. It is an error to depend on a task of the same kind, or to create any cycles in the kind-dependencies chain.

Note

All examples assume build-linux and build-windows tasks have been defined elsewhere in a build kind.

For example, a test task might depend on the artifacts a build task creates. This might be expressed as follows:

kind-dependencies:
  - build

tasks:
  test-linux:
    dependencies:
      build: build-linux
    # .. rest of task definition ..
  test-windows:
    dependencies:
      build: build-windows
    # .. rest of task definition ..

First, note the kind-dependencies key. This ensures the tasks in the build kind have already been generated, and are candidates to be added as dependencies.

Second, notice how both the test-linux and test-windows task use the same edge name to reference their build dependency. This will allow transforms later on to find their build dependency even without knowing which specific task it is they depend on.

Other Types of Dependencies#

Dependencies are typically used to ensure that prerequisites to a task, such as creation of binary artifacts, are completed before that task runs. But dependencies can also be used to schedule follow-up work such as summarizing results of dependencies, sending notifications that dependencies are completed or uploading artifacts to a server. In many of these cases, it may not be desired for the dependent task to “pull in” the dependency as would normally be the case.

To help with these uses cases, there are two additional types of dependencies.

If Dependencies#

The if-dependencies key (list) can be used to denote a task that should only run if at least one of these specified dependencies are also run. Dependencies specified by this key will not be “pulled in”. This makes it suitable for things like signing builds or uploading symbols.

This key is specified as a list of dependency edge names (e.g, build rather than the label of the build), which means the original dependency must be specified as normal. For example:

kind-dependencies:
  - build

tasks:
 upload-build:
   dependencies:
     build: build-windows
   if-dependencies:
     - build

In the above example, the upload-build task will only run if the build task would have run anyway. It will not cause the build task to get “pulled into” the graph.

Soft Dependencies#

To add a task depending on arbitrary tasks remaining after the optimization process is complete, you can use soft-dependencies, as a list of optimized tasks labels. This is useful for tasks that need to perform some action on N other tasks and it is not known how many. Unlike if-dependencies, tasks that specify soft-dependencies will still be scheduled, even if none of the candidate dependencies are.

For example:

tasks:
  notify-build:
    soft-dependencies:
      - build-linux
      - build-windows

Note that neither the kind-dependencies nor dependencies keys are used here. This is because the dependency edge is created after the optimization phase of the task graph, rather than in the full task graph phase. If either or both of build-linux / build-windows end up being optimized away, the dependency links will simply not be created.