Use Keyed By#

Often fields in a task can depend on other values in the task. For example, a task’s max-runtime may depend on the platform. To handle this, you could re-define max-runtime in each task’s definition like so:

tasks:
  taskA:
    platform: android
    worker:
      max-runtime: 7200

  taskB:
    platform: ios
    worker:
      max-runtime: 7200

  taskC:
    platform: windows
    worker:
      max-runtime: 3600

  taskD:
    platform: mac
    worker:
      max-runtime: 1800

  ...

This is simple, but if you have lots of tasks it’s also tedious and makes updating the configuration a pain. To avoid this duplication you could use a transform:

@transforms.add
def set_max_runtime(config, tasks):
    for task in tasks:
        if task["platform"] in ("android", "ios"):
            task["worker"]["max-runtime"] = 7200
        elif task["platform"] == "windows":
            task["worker"]["max-runtime"] = 3600
        else:
            task["worker"]["max-runtime"] = 1800

        yield task

This works but now we’ve hardcoded constants into our code logic far away from the task’s original definition! Besides this is pretty verbose and it can get complicated if you want to be able to change these constants per task.

An Alternative Approach#

Another way to accomplish the same thing is to use Taskgraph’s “keyed by” feature. This can be used in combination with the task-defaults key to express the same logic directly in the kind.yml file:

task-defaults:
  worker:
    max-runtime:
      by-platform:
        (ios|android): 7200
        windows: 3600
        default: 1800

tasks:
  taskA:
    platform: android

  taskB:
    platform: windows

  taskC:
    platform: mac

  ...

The structure under the by-platform key is resolved to a single value using the resolve_keyed_by() utility function. When “keying by” another attribute in the task, you must call this utility later on in a transform:

from taskgraph.util.schema import resolve_keyed_by

@transforms.add
def resolve_max_runtime(config, tasks):
    for task in tasks:
        resolve_keyed_by(task, "worker.max-runtime", f"Task {task['label']")
        yield task

In this example, resolve_keyed_by() takes the root container object (aka, the task), the subkey to operate on, and a descriptor that will be used in any exceptions that get raised.

Exact matches are used immediately. If no exact matches are found, each alternative is treated as a regular expression, matched against the whole value. Thus android.* would match android-arm/debug. If nothing matches as a regular expression, but there is a default alternative, it is used. Otherwise, an exception is raised and graph generation stops.

Passing Additional Context#

By default when you use the pattern by-<name> and then feed it into resolve_keyed_by(), <name> is assumed to be a valid top-level key in the task definition. However, sometimes you want to key by some other value that is either nested deeper in the task definition, or not even known ahead of time!

For this reason you can specify additional context via **kwargs. Typically it will make the most sense to use this following a prior transform that sets some value that’s not known statically. This comes up frequently when splitting a task from one definition into several. For example:

tasks:
  task:
    platforms: [android, windows, mac]
    worker:
      max-runtime:
        by-platform:
          (ios|android): 7200
          windows: 3600
          default: 1800

@transforms.add
def split_platforms(config, tasks):
    for task in tasks:
        for platform in task.pop("platforms"):
            new_task = deepcopy(task)
            # ...
            resolve_keyed_by(
                new_task,
                "worker.max-runtime",
                task["label"],
                platform=platform,
            )
            yield new_task

Here we did not know the value of “platform” ahead of time, but it was still possible to use it in a “keyed by” statement thanks to the ability to pass in extra context.

Note

A good rule of thumb is to only consider using “keyed by” in task-defaults or in a task definition that will be split into many tasks down the line.

Specifying the Subkey#

The subkey in resolve_keyed_by() is expressed in dot path notation with each part of the path representing a nested dictionary. If any part of the subkey is a list, you can use [] to operate on each item in the list. For example, consider this excerpt of a task definition:

worker:
    artifacts:
        - name: foo
          path:
              by-platform:
                  windows: foo.zip
                  default: foo.tar.gz
        - name: bar
          path:
              by-platform:
                  windows: bar.zip
                  default: bar.tar.gz

With the associated transform:

@transforms.add
def resolve_artifact_paths(config, tasks):
    for task in tasks:
        resolve_keyed_by(task, "worker.artifacts[].path", task["label"])
        yield task

In this example, Taskgraph resolves by-platform in both the foo and bar artifacts.

Note

Calling resolve_keyed_by on a subkey that doesn’t contain a by-* field is a no-op.

Creating Schemas with Keyed By#

Having fields of a task that may or may not be keyed by another field, can cause problems for any schemas your transforms define. For that reason Taskgraph provides the optionally_keyed_by() utility function.

It can be used to generate a valid schema that allows a field to either use “keyed by” or not. For example:

from taskgraph.util.schema import Schema, optionally_keyed_by


schema = Schema({
    # ...
    Optional("worker"): {
        Optional("max-run-time"): optionally_keyed_by("platform", int),
    },
})

transforms.add_validate(schema)

The example above allows both of the following task definitions:

taskA:
    worker:
        max-run-time: 3600

taskB:
    worker:
        max-run-time:
            by-platform:
                windows: 7200
                default: 3600

If there are more than one fields that another field may be keyed by, it can be specified like this:

Optional("max-run-time"): optionally_keyed_by("platform", "build-type", int)

In this example either by-platform or by-build-type may be used. You may specify as many fields as you like this way, as long as the last argument to optionally_keyed_by() is the type of the field after resolving is finished (or if keyed by is unused).

Use Keyed By

Contents

Use Keyed By#

An Alternative Approach#

Passing Additional Context#

Specifying the Subkey#

Creating Schemas with Keyed By#