.. _task graph:

Task Graphs
===========

Taskgraph's namesake comes from the fact that it produces a *graph of tasks* as
output. Specifically, it is a `directed acyclic graph`_ (DAG) whose nodes are
tasks, and whose edges are the dependencies between them. The root of the DAG
is a special task called the :ref:`decision task`.

For example let's say you had a lint task, a build and a test that depends on
the build. Let's also say that the lint and test tasks depend on an image task
(which builds the docker image they run in). The resulting task graph would be:

.. mermaid::
   :align: center
   :alt: Flowchart illustrating a simple task graph.

   flowchart TB
       D(decision)
       I(image)
       L(lint)
       B(build)
       T(test)
       D --> I
       D --> B
       B --> T
       I --> L
       I --> T


.. _directed acyclic graph: https://en.wikipedia.org/wiki/Directed_acyclic_graph

.. _graph generation:

Graph Generation
----------------

The graph is generated via a series of *steps* that run one after the other.
Broadly, these steps are:

#. **full_task_set**: For all kinds, generate all tasks.
#. **full_task_graph**: Create dependency links between tasks using
   kind-specific mechanisms.
#. **target_task_set**: Filter the tasks based on a series of filters that are
   defined for each project. Tasks are typically filtered out based on the
   :term:`Parameters`. For example, release oriented tasks would be removed
   from the graph if we're generating it for a pull request. The tasks remaining
   after this step are called the *target tasks*.
#. **target_task_graph**: Based on the full task graph, calculate the
   transitive closure of the target task set. That is, the target tasks and all
   dependencies of those tasks.
#. **optimized_task_graph**: Optimize the target task graph using task-specific
   optimization methods. Optimizations are similar to the target filters in
   that they cause additional tasks to be pruned. Conceptually the difference
   is that the tasks in this phase are still *relevant* to the
   :term:`Parameters` being used, it's just we may choose not to run them for
   other reasons (like cost).
#. **morphed_task_graph**: Morphs are like syntactic sugar. They keep the same
   meaning, but express it in a lower-level way. These generally work around
   limitations in the TaskCluster platform, such as number of dependencies or
   routes in a task.
#. Create tasks for all tasks in the morphed task graph via the Taskcluster
   API.

Graph generation is handled by the
:class:`~taskgraph.generator.TaskGraphGenerator` class. Each phase of the graph
has an associated property and can be generated by accessing it:

.. code-block:: python

   from taskgraph.generator import TaskGraphGenerator
   generator = TaskGraphGenerator(...)
   # kicks off graph generation but returns after the full_task_graph
   # step is finished
   generator.full_task_graph

The ``taskgraph`` shell command similarly has subcommands that can help you generate
each phase locally:

.. code-block:: shell

   # generates the full_task_graph and exits
   $ taskgraph full
   # generates the target_task_graph and exits
   $ taskgraph target
   # etc..

The decision task uses the command ``taskgraph decision``, which spans the
whole generation process all the way to task creation in the last step. See
:doc:`/howto/run-locally` for more information on running the
``taskgraph`` command locally.

Transitive Closure
..................

Transitive closure is a fancy name for this sort of operation:

 * start with a set of tasks
 * add all tasks on which any of those tasks depend
 * repeat until nothing changes

The effect is this: imagine you start with a linux32 test task and a linux64
test task. In the first round, each test task depends on the test docker image
task, so add that image task. Each test also depends on a build, so add the
linux32 and linux64 build tasks.

Then repeat: the test docker image task is already present, as are the build
tasks, but those build tasks depend on the build docker image task.  So add
that build docker image task.  Repeat again: this time, none of the tasks in
the set depend on a task not in the set, so nothing changes and the process is
complete.

And as you can see, the graph we've built now includes everything we wanted
(the test tasks) plus everything required to do that (docker images, builds).

Dependencies
------------

Dependencies between tasks are represented as labeled edges in the task graph.
They are specified via the ``dependencies`` key which is an object of the form
``{ "<edge>": "<label>"}``. The ``edge`` is an arbitrary name which can be used
to refer to the dependency later on. The ``label`` is the task label of the
dependency.

Taskgraph is only able to add a dependency to tasks that have already been
generated. The ``kind-dependencies`` key must be used to determine the order in
which different kinds of tasks are generated. It is specified as a list of
kinds that are guaranteed to be generated before the current kind. It is an
error to depend on a task of the same kind, or to create any cycles in the
``kind-dependencies`` chain.

.. note::
   All examples assume ``build-linux`` and ``build-windows`` tasks have been
   defined elsewhere in a ``build`` kind.

For example, a test task might depend on the artifacts a build task creates. This
might be expressed as follows:

.. code-block:: yaml

   kind-dependencies:
     - build

   tasks:
     test-linux:
       dependencies:
         build: build-linux
       # .. rest of task definition ..
     test-windows:
       dependencies:
         build: build-windows
       # .. rest of task definition ..

First, note the ``kind-dependencies`` key. This ensures the tasks in the
``build`` kind have already been generated, and are candidates to be added as
dependencies.

Second, notice how both the ``test-linux`` and ``test-windows`` task use the
same edge name to reference their build dependency. This will allow
:term:`transforms <Transform>` later on to find their build dependency even
without knowing which specific task it is they depend on.

Other Types of Dependencies
...........................

Dependencies are typically used to ensure that prerequisites to a task, such as
creation of binary artifacts, are completed before that task runs. But
dependencies can also be used to schedule follow-up work such as summarizing
results of dependencies, sending notifications that dependencies are completed
or uploading artifacts to a server. In many of these cases, it may not be desired
for the dependent task to "pull in" the dependency as would normally be the case.

To help with these uses cases, there are two additional types of dependencies.

If Dependencies
```````````````

The ``if-dependencies`` key (list) can be used to denote a task that should
only run if at least one of these specified dependencies are also run.
Dependencies specified by this key will not be “pulled in”. This makes it
suitable for things like signing builds or uploading symbols.

This key is specified as a list of dependency edge names (e.g, ``build`` rather
than the label of the build), which means the original dependency must be specified
as normal. For example:

.. code-block:: yaml

   kind-dependencies:
     - build

   tasks:
    upload-build:
      dependencies:
        build: build-windows
      if-dependencies:
        - build

In the above example, the ``upload-build`` task will only run if the ``build``
task would have run anyway. It will not cause the ``build`` task to get "pulled
into" the graph.

Soft Dependencies
`````````````````

To add a task depending on arbitrary tasks remaining after the optimization
process is complete, you can use ``soft-dependencies``, as a list of optimized
tasks labels. This is useful for tasks that need to perform some action on N
other tasks and it is not known how many. Unlike ``if-dependencies``, tasks
that specify ``soft-dependencies`` will still be scheduled, even if none of the
candidate dependencies are.

For example:

.. code-block:: yaml

   tasks:
     notify-build:
       soft-dependencies:
         - build-linux
         - build-windows

Note that neither the ``kind-dependencies`` nor ``dependencies`` keys are used
here. This is because the dependency edge is created *after* the optimization
phase of the task graph, rather than in the full task graph phase. If either or
both of ``build-linux`` / ``build-windows`` end up being optimized away, the
dependency links will simply not be created.
