Use Keyed By#
Often fields in a task can depend on other values in the task. For example, a
task’s max-runtime may depend on the platform. To handle this, you
could re-define max-runtime in each task’s definition like so:
tasks:
taskA:
platform: android
worker:
max-runtime: 7200
taskB:
platform: ios
worker:
max-runtime: 7200
taskC:
platform: windows
worker:
max-runtime: 3600
taskD:
platform: mac
worker:
max-runtime: 1800
...
This is simple, but if you have lots of tasks it’s also tedious and makes updating the configuration a pain. To avoid this duplication you could use a transform:
@transforms.add
def set_max_runtime(config, tasks):
for task in tasks:
if task["platform"] in ("android", "ios"):
task["worker"]["max-runtime"] = 7200
elif task["platform"] == "windows":
task["worker"]["max-runtime"] = 3600
else:
task["worker"]["max-runtime"] = 1800
yield task
This works but now we’ve hardcoded constants into our code logic far away from the task’s original definition! Besides this is pretty verbose and it can get complicated if you want to be able to change these constants per task.
An Alternative Approach#
Another way to accomplish the same thing is to use Taskgraph’s “keyed by”
feature. This can be used in combination with the task-defaults key to
express the same logic directly in the kind.yml file:
task-defaults:
worker:
max-runtime:
by-platform:
(ios|android): 7200
windows: 3600
default: 1800
tasks:
taskA:
platform: android
taskB:
platform: windows
taskC:
platform: mac
...
The structure under the by-platform key is resolved to a single value using
the resolve_keyed_by() utility function. When
“keying by” another attribute in the task, you must call this utility later on
in a transform:
from taskgraph.util.schema import resolve_keyed_by
@transforms.add
def resolve_max_runtime(config, tasks):
for task in tasks:
resolve_keyed_by(task, "worker.max-runtime", f"Task {task['label']")
yield task
In this example, resolve_keyed_by() takes the root
container object (aka, the task), the subkey to operate on, and a descriptor
that will be used in any exceptions that get raised.
Exact matches are used immediately. If no exact matches are found, each
alternative is treated as a regular expression, matched against the whole
value. Thus android.* would match android-arm/debug. If nothing
matches as a regular expression, but there is a default alternative, it is
used. Otherwise, an exception is raised and graph generation stops.
Passing Additional Context#
By default when you use the pattern by-<name> and then feed it into
resolve_keyed_by(), <name> is assumed to be a
valid top-level key in the task definition. However, sometimes you want to key
by some other value that is either nested deeper in the task definition, or not
even known ahead of time!
For this reason you can specify additional context via **kwargs. Typically
it will make the most sense to use this following a prior transform that sets
some value that’s not known statically. This comes up frequently when splitting
a task from one definition into several. For example:
tasks:
task:
platforms: [android, windows, mac]
worker:
max-runtime:
by-platform:
(ios|android): 7200
windows: 3600
default: 1800
@transforms.add
def split_platforms(config, tasks):
for task in tasks:
for platform in task.pop("platforms"):
new_task = deepcopy(task)
# ...
resolve_keyed_by(
new_task,
"worker.max-runtime",
task["label"],
platform=platform,
)
yield new_task
Here we did not know the value of “platform” ahead of time, but it was still possible to use it in a “keyed by” statement thanks to the ability to pass in extra context.
Note
A good rule of thumb is to only consider using “keyed by” in
task-defaults or in a task definition that will be split into many
tasks down the line.
Specifying the Subkey#
The subkey in resolve_keyed_by() is expressed in
dot path notation with each part of the path representing a nested dictionary.
If any part of the subkey is a list, you can use [] to operate on each item
in the list. For example, consider this excerpt of a task definition:
worker:
artifacts:
- name: foo
path:
by-platform:
windows: foo.zip
default: foo.tar.gz
- name: bar
path:
by-platform:
windows: bar.zip
default: bar.tar.gz
With the associated transform:
@transforms.add
def resolve_artifact_paths(config, tasks):
for task in tasks:
resolve_keyed_by(task, "worker.artifacts[].path", task["label"])
yield task
In this example, Taskgraph resolves by-platform in both the foo and bar
artifacts.
Note
Calling resolve_keyed_by on a subkey that doesn’t contain a by-*
field is a no-op.
Creating Schemas with Keyed By#
Having fields of a task that may or may not be keyed by another field, can cause
problems for any schemas your transforms define. For that reason Taskgraph provides
the optionally_keyed_by() utility function.
It can be used to generate a valid schema that allows a field to either use “keyed by” or not. For example:
from taskgraph.util.schema import Schema, optionally_keyed_by
schema = Schema({
# ...
Optional("worker"): {
Optional("max-run-time"): optionally_keyed_by("platform", int),
},
})
transforms.add_validate(schema)
The example above allows both of the following task definitions:
taskA:
worker:
max-run-time: 3600
taskB:
worker:
max-run-time:
by-platform:
windows: 7200
default: 3600
If there are more than one fields that another field may be keyed by, it can be specified like this:
Optional("max-run-time"): optionally_keyed_by("platform", "build-type", int)
In this example either by-platform or by-build-type may be used. You
may specify as many fields as you like this way, as long as the last argument to
optionally_keyed_by() is the type of the field
after resolving is finished (or if keyed by is unused).