Transforms#
Often tasks grow in complexity such that it is annoying or even impossible to specify as a YAML object. Taskgraph has the concept of transforms to help deal with this. Transforms allow you to layer programmatic logic on top of your task definitions.
Overview#
To begin, a kind implementation generates a collection of tasks; see Loading Tasks. These are Python dictionaries that describe semantically what the task should do.
The kind also defines a sequence of transformations. These are applied in order to each task. Early transforms might apply default values or break tasks up into smaller tasks (for example, chunking a test suite). Later transforms rewrite the tasks entirely, with the final result being a task definition conforming to the Taskcluster task schema.
Specifying Transforms#
Transforms are specified as a list of strings, where each string references a
taskgraph.transforms.base.TransformSequence
. For example, in a kind.yml
file, you may add:
transforms:
- project_taskgraph.transforms:transforms
- taskgraph.transforms.task:transforms
The format of the reference is <module path>:<object>
. So the above example
will first load the transforms
object from the
project_taskgraph.transforms
module, then the transforms
object from
the taskgraph.transforms.task
module. All referenced modules must be
available in the Python path. Note how transforms can be defined both within
the project that needs them, or in third party packages (like Taskgraph itself).
The taskgraph.transforms.task
transforms are a special set of transforms that
nearly every task should use. These transforms are responsible for formatting a task
into a valid Taskcluster task definition.
Default Object#
Using the name transforms
for the object is a convention, and Taskgraph will
use that by default if no object is specified. So the following example is equivalent
to the previous one:
transforms:
- project_taskgraph.transforms
- taskgraph.transforms.task
Transform Functions#
Each transform function looks like:
from typing import Dict, Iterator
from taskgraph.transforms.base import TransformConfig, TransformSequence
transforms = TransformSequence()
@transforms.add
def transform_a_task(config: TransformConfig, tasks: Iterator[Dict]) -> Iterator[Dict]:
"""This transform ...""" # always include a docstring!
for task in tasks:
# do stuff to the task..
yield task
The config
argument is a Python object containing useful configuration for
the kind, and is an instance of
taskgraph.transforms.base.TransformConfig
, which specifies a few of
its attributes. Kinds may subclass and add additional attributes if necessary.
While most transforms yield one task for each task consumed, this is not always the case. Tasks that are not yielded are effectively filtered out. Yielding multiple tasks for each consumed task is a form of duplication. This is how test chunking is accomplished, for example.
The transforms
object is an instance of
taskgraph.transforms.base.TransformSequence
, which serves as a simple
mechanism to combine a sequence of transforms into one.
Schemas#
The tasks used in transforms can be validated against some schemas at various points in the transformation process. These schemas accomplish two things: they provide a place to add comments about the meaning of each field, and they enforce that the fields are actually used in the documented fashion.
Using schemas is a best practice as it allows others to more easily reason about the state of the tasks at given points. Here is an example:
from voluptuous import Optional, Required
from taskgraph.transforms.base import TransformSequence
from taskgraph.util.schema import Schema
my_schema = Schema({
Required("foo"): str,
Optional("bar"): bool,
})
transforms.add_validate(my_schema)
In the above example, we can be sure that every task dict has a string field
called foo
, and may or may not have a boolean field called bar
.
Keyed By#
Fields in the input tasks can be “keyed by” another value in the task.
For example, a task’s max-runtime
may be keyed by platform
.
In the task, this looks like:
max-runtime:
by-platform:
android: 7200
windows: 3600
default: 1800
This is a simple but powerful way to encode business rules in the tasks
provided as input to the transforms, rather than expressing those rules in the
transforms themselves. The structure is easily resolved to a single value
using the resolve_keyed_by()
utility function:
from taskgraph.util.schema import resolve_keyed_by
@transforms.add
def resolve_max_runtime(config, tasks):
for task in tasks:
# Note that task["label"] is not a standard key, use whatever best
# identifies your task at this stage of the transformation.
resolve_keyed_by(task, "max-runtime", task["label"])
yield task
Exact matches are used immediately. If no exact matches are found, each
alternative is treated as a regular expression, matched against the whole
value. Thus android.*
would match android-arm/debug
. If nothing
matches as a regular expression, but there is a default
alternative, it is
used. Otherwise, an exception is raised and graph generation stops.
Organization#
Task creation operates broadly in a few phases, with the interfaces of those stages defined by schemas. The process begins with the raw data structures parsed from the YAML files in the kind configuration. This data can processed by kind-specific transforms resulting in a “kind specific description”.
From there, it’s common for tasks to use the run transforms
which provide convenient utilities for things such
as cloning repositories, downloading artifacts, caching and much more! After
these transforms tasks will conform to the “run description”.
Finally almost all kinds should use the task transforms
. These transforms massage the task into the
Taskcluster task schema
Run Descriptions#
A run description defines what to run in the task. It is a combination of a
run
section and all of the fields from a task description. The run section
has a using
property that defines how this task should be run; for example,
run-task
to run arbitrary commands, or toolchain-script
to invoke a
well defined script. The remainder of the run section is specific to the
run-using implementation.
The effect of a run description is to say “run this thing on this worker”. The
run description must contain enough information about the worker to identify
the workerType and the implementation (docker-worker, generic-worker, etc.).
Alternatively, run descriptions can specify the platforms
field in
conjunction with the by-platform
key to specify multiple workerTypes and
implementations. Any other task-description information is passed along
verbatim, although it is augmented by the run-using implementation.
The following run-using
values are supported:
run-task
toolchain-script
index-search
Task Descriptions#
Every kind needs to create tasks, and all of those tasks have some things in common. E.g, they all run on one of a small set of worker implementations, each with their own idiosyncrasies.
The transforms in taskgraph.transforms.task
implement this common
functionality. They expect a “task description” and produce a task
definition. The schema for a task description is defined at the top of
task.py
, with copious comments. Go forth and read it now!
In general, the task-description transforms handle functionality that is common to all tasks. While the schema is the definitive reference, the functionality includes:
Build index routes
Information about the projects on which this task should run
Optimizations
Defaults for
expires-after
and anddeadline-after
, based on projectWorker configuration
The parts of the task description that are specific to a worker implementation
are isolated in a task_description['worker']
object which has an
implementation
property naming the worker implementation. Each worker
implementation has its own section of the schema describing the fields it
expects. Thus the transforms that produce a task description must be aware of
the worker implementation to be used, but need not be aware of the details of
its payload format.
The task.py
file also contains a dictionary mapping treeherder groups to
group names using an internal list of group names. Feel free to add additional
groups to this list as necessary.