Task Graphs

Taskgraph’s namesake comes from the fact that it produces a graph of tasks as output. Specifically, it is a directed acyclic graph (DAG) whose nodes are tasks, and whose edges are the dependencies between them. The root of the DAG is a special task called the Decision Task.

For example let’s say you had a lint task, a build and a test that depends on the build. Let’s also say that the lint and test tasks depend on an image task (which builds the docker image they run in). The resulting task graph would be:

flowchart TB D(decision) I(image) L(lint) B(build) T(test) D --> I D --> B B --> T I --> L I --> T

Graph Generation

The graph is generated via a series of steps that run one after the other. Broadly, these steps are:

  1. full_task_set: For all kinds, generate all tasks.

  2. full_task_graph: Create dependency links between tasks using kind-specific mechanisms.

  3. target_task_set: Filter the tasks based on a series of filters that are defined for each project. Tasks are typically filtered out based on the Parameters. For example, release oriented tasks would be removed from the graph if we’re generating it for a pull request. The tasks remaining after this step are called the target tasks.

  4. target_task_graph: Based on the full task graph, calculate the transitive closure of the target task set. That is, the target tasks and all dependencies of those tasks.

  5. optimized_task_graph: Optimize the target task graph using task-specific optimization methods. Optimizations are similar to the target filters in that they cause additional tasks to be pruned. Conceptually the difference is that the tasks in this phase are still relevant to the Parameters being used, it’s just we may choose not to run them for other reasons (like cost).

  6. morphed_task_graph: Morphs are like syntactic sugar. They keep the same meaning, but express it in a lower-level way. These generally work around limitations in the TaskCluster platform, such as number of dependencies or routes in a task.

  7. Create tasks for all tasks in the morphed task graph via the Taskcluster API.

Graph generation is handled by the TaskGraphGenerator class. Each phase of the graph has an associated property and can be generated by accessing it:

from taskgraph.generator import TaskGraphGenerator
generator = TaskGraphGenerator(...)
# kicks off graph generation but returns after the full_task_graph
# step is finished
generator.full_task_graph

The taskgraph shell command similarly has subcommands that can help you generate each phase locally:

# generates the full_task_graph and exits
$ taskgraph full
# generates the target_task_graph and exits
$ taskgraph target
# etc..

The decision task uses the command taskgraph decision, which spans the whole generation process all the way to task creation in the last step. See Run Taskgraph Locally for more information on running the taskgraph command locally.

Transitive Closure

Transitive closure is a fancy name for this sort of operation:

  • start with a set of tasks

  • add all tasks on which any of those tasks depend

  • repeat until nothing changes

The effect is this: imagine you start with a linux32 test job and a linux64 test job. In the first round, each test task depends on the test docker image task, so add that image task. Each test also depends on a build, so add the linux32 and linux64 build tasks.

Then repeat: the test docker image task is already present, as are the build tasks, but those build tasks depend on the build docker image task. So add that build docker image task. Repeat again: this time, none of the tasks in the set depend on a task not in the set, so nothing changes and the process is complete.

And as you can see, the graph we’ve built now includes everything we wanted (the test jobs) plus everything required to do that (docker images, builds).

Dependencies

Dependencies between tasks are represented as labeled edges in the task graph. For example, a test task must depend on the build task creating the artifact it tests, and this dependency edge is named ‘build’. The task graph generation process later resolves these dependencies to specific task ids.

Dependencies are typically used to ensure that prerequisites to a task, such as creation of binary artifacts, are completed before that task runs. But dependencies can also be used to schedule follow-up work such as summarizing test results. In the latter case, the summarization task will “pull in” all of the tasks it depends on, even if those tasks might otherwise be optimized away.

If Dependencies

The if-dependencies key (list) can be used to denote a task that should only run if at least one of these specified dependencies are also run. Dependencies specified by this key will not be “pulled in”. This makes it suitable for things like signing builds or uploading symbols.

This key is specified as a list of dependency names (e.g, build rather than the label of the build).

Soft Dependencies

To add a task depending on arbitrary tasks remaining after the optimization process is complete, you can use soft-dependencies, as a list of optimized tasks labels. This is useful for tasks that need to perform some action on N other tasks and it is not known how many. Unlike if-dependencies, tasks that specify soft-dependencies will still be scheduled, even if none of the candidate dependencies are.