Submitting your Graph to Taskcluster#

This tutorial will explain how to connect your repository with Taskcluster and create a Decision Task to generate and submit your graph. This tutorial assumes you have a functional Taskgraph setup. If you don’t already have one, see Creating a Simple Task Graph.

Configuring your Project#

Every Taskcluster instance has a set of configured repositories, with associated scopes, worker pools, a trust domain, and more. So the first phase is to talk to your Taskcluster administrator and ask them to help get your repository configured.

Note

The configuration is typically managed by the tc-admin tool in its own dedicated repository. Here are some configuration repositories for known Taskcluster instances:

If using Github, you’ll also need to install the Taskcluster Github integration. Please note that only org administrators can enable the integration.

Note

The specific integration app depends on the Taskcluster instance you are using. Here are the Github integrations for known Taskcluster instances:

Populate the Requirements#

First, let’s populate the requirements file. This will be used by the Decision task later on to install any dependencies needed to generate the graph (this will at least include Taskgraph itself).

Follow the Define Requirements instructions to get it set up.

Defining the Decision Task#

Next we’ll declare when and how the graph will be generated in response to various repository actions (like pushing to the main branch or opening a pull request). To do this we define a Decision Task in the repository’s .taskcluster.yml file.

Note

The .taskcluster.yml file uses JSON-e. If you are confused about the syntax, see the JSON-e reference or playground to learn more.

There are many different ways you could set up the Decision Task. But here is the recommended method:

  1. Setup the initial .taskcluster.yml at the root of your repo:

    ---
    version: 1
    reporting: checks-v1
    policy:
        pullRequests: collaborators
    tasks:
        -
    
  2. It’s often useful to define some variables that can be used later on in the file. We’ll start by defining a Trust Domain:

    tasks:
        - $let:
              trustDomain: my-project
    

    If using a Taskcluster instance that doesn’t use trust domains, this part can be skipped.

  3. If using Github, you’ll want to define additional variables based on the Github push, pull request or release events. For example:

    tasks:
        - $let:
              trustDomain: my-project
    
              # Normalize some variables that differ across Github events
              ownerEmail:
                  $if: 'tasks_for == "github-push"'
                  then: '${event.pusher.email}'
                  else:
                      $if: 'tasks_for == "github-pull-request"'
                      then: '${event.pull_request.user.login}@users.noreply.github.com'
                      else:
                          $if: 'tasks_for == "github-release"'
                          then: '${event.sender.login}@users.noreply.github.com'
              baseRepoUrl:
                  $if: 'tasks_for == "github-push"'
                  then: '${event.repository.html_url}'
                  else:
                      $if: 'tasks_for == "github-pull-request"'
                      then: '${event.pull_request.base.repo.html_url}'
              repoUrl:
                  $if: 'tasks_for == "github-push"'
                  then: '${event.repository.html_url}'
                  else:
                      $if: 'tasks_for == "github-pull-request"'
                      then: '${event.pull_request.head.repo.html_url}'
              project:
                  $if: 'tasks_for == "github-push"'
                  then: '${event.repository.name}'
                  else:
                      $if: 'tasks_for == "github-pull-request"'
                      then: '${event.pull_request.head.repo.name}'
              headBranch:
                  $if: 'tasks_for == "github-pull-request"'
                  then: ${event.pull_request.head.ref}
                  else:
                      $if: 'tasks_for == "github-push"'
                      then: ${event.ref}
              headSha:
                  $if: 'tasks_for == "github-push"'
                  then: '${event.after}'
                  else:
                      $if: 'tasks_for == "github-pull-request"'
                      then: '${event.pull_request.head.sha}'
    

    This isn’t strictly necessary, but the format of the various Github events can vary considerably. By normalizing some of these values into variables early on, we can save considerable logic later in the file.

    Here’s Fenix’s .taskcluster.yml for an idea of other variables that may be useful to define.

  4. Next we determine whether or not to generate tasks at all. For example, we may only want to run CI tasks on the main branch or with certain pull request actions. The easiest way to accomplish this is a JSON-e if statement which has no else clause (i.e, no task definition):

    tasks:
        - $let:
              ...
          in:
              $if: >
                  tasks_for == "github-push" && headBranch == "main"
                  || (tasks_for == "github-pull-request" && ${event.action} in ["opened", "reopened", "synchronize"])
              then:
                  # Task definition goes here. Since there is no "else" clause, if
                  # the above if statement evaluates to false, there will be no
                  # decision task.
    
  5. Up to this point, we’ve defined some variables and decided when to generate tasks. Now it’s time to create the Decision task definition! Like any task, the Decision task must conform to Taskcluster’s task schema. From here on out each step will highlight important top-level keys in the task definition. Depending on the key you may wish to use static values or JSON-e logic as necessary.

    1. Define taskId and taskGroupId. This is passed into the .taskcluster.yml context as ownTaskId. Decision tasks have taskGroupId set to their own id:

      then:
          taskId: '${ownTaskId}'
          taskGroupId: '${ownTaskId}'
      
    2. Define date fields. JSON-e has a convenient fromNow operator which can help populate the date fields like created, deadline and expires:

      then:
          created: {$fromNow: ''}
          deadline: {$fromNow: '1 day'}
          expires: {$fromNow: '1 year 1 second'}  # 1 second so artifacts expire first, despite rounding errors
      
    3. Define metadata:

      then:
          metadata:
              owner: "${ownerEmail}"
              name: "Decision Task"
              description: "Task that generates a taskgraph and submits it to Taskcluster"
              source: '${repoUrl}/raw/${headSha}/.taskcluster.yml'
      
    4. Define the provisionerId and workerType. These values will depend on the Taskcluster configuration created for your repo in the first phase. Talk to an administrator if you are unsure what to use. For now, let’s assume they are set as follows:

      then:
          provisionerId: "${trustDomain}-provisioner"
          workerType: "decision"
      
    5. Define scopes. Decision tasks need to have scopes to do anything other tasks in the graph do. While you could list them all out individually here, a better practice is to create a “role” associated with your repository in the Taskcluster configuration. Then all you need to do in your task definition is “assume” that role:

      then:
          scopes:
              $if: 'tasks_for == "github-push"'
              then:
                  # ${repoUrl[8:]} strips out the leading 'https://'
                  # while ${headBranch[11:]} strips out 'refs/heads/'
                  - 'assume:repo:${repoUrl[8:]}:branch:${headBranch[11:]}'
              else:
                  $if: 'tasks_for == "github-pull-request"'
                  then:
                      - 'assume:repo:github.com/${event.pull_request.base.repo.full_name}:pull-request'
      

      Notice how we assume different roles depending on whether the task is coming from a push or a pull request. This is useful when you have tasks that handle releases or other sensitive operations. You don’t want those accidentally running on a pull request! By using different scopes, you can ensure it won’t ever happen.

      The roles assumed above may vary depending on the Taskcluster configuration.

  6. Last but not least we define the payload, which controls what the task actually does. The schema for the payload depends on the worker implementation your provisioner uses. This will typically either be docker-worker or generic-worker. For now it’s recommended to use the older docker-worker as that provides a simpler interface to Docker. But as generic-worker matures it will eventually subsume docker-worker. For now, this tutorial will assume we’re using the docker-worker payload.

    1. Define the image. Taskgraph conveniently provides a pre-built image for most Decision task contexts, called taskgraph:decision.

      You may also build your own image if desired, either on top of taskgraph:decision or from scratch. For this tutorial we’ll just use the general purpose image:

      then:
          payload:
              image:
                  mozillareleases/taskgraph:decision-cf4b4b4baff57d84c1f9ec8fcd70c9839b70a7d66e6430a6c41ffe67252faa19@sha256:425e07f6813804483bc5a7258288a7684d182617ceeaa0176901ccc7702dfe28
      

      You should use the latest version of the image. Note that both the image id and sha256 are required (separated by @).

    2. Enable the taskclusterProxy feature.

      then:
          payload:
              features:
                  taskclusterProxy: true
      
    3. Define the environment and command. The Taskgraph docker images have a script called run-task baked in. Using this script is optional, but provides a few convenient wrappers for things like pulling your repository into the task and installing Taskgraph itself. You can specify repositories to clone via a combination of commandline arguments and environment variables. The final argument to run-task is the command we want to run, which in our case is taskgraph decision. Here’s an example:

      then:
          payload:
              env:
                  $merge:
                      # run-task uses these environment variables to clone your
                      # repo and checkout the proper revision
                      - MYREPO_BASE_REPOSITORY: '${baseRepoUrl}'
                        MYREPO_HEAD_REPOSITORY: '${repoUrl}'
                        MYREPO_HEAD_REF: '${headBranch}'
                        MYREPO_HEAD_REV: '${headSha}'
                        MYREPO_REPOSITORY_TYPE: git
                        # run-task installs this requirements.txt before
                        # running your command
                        MYREPO_PIP_REQUIREMENTS: taskcluster/requirements.txt
                        REPOSITORIES: {$json: {myrepo: "MyRepo"}}
                      - $if: 'tasks_for in ["github-pull-request"]'
                        then:
                            MYREPO_PULL_REQUEST_NUMBER: '${event.pull_request.number}'
              command:
                  - /usr/local/bin/run-task
                  # This 'myrepo' gets uppercased and is how `run-task`
                  # knows to look for 'MYREPO_*' environment variables.
                  - '--myrepo-checkout=/builds/worker/checkouts/myrepo'
                  - '--task-cwd=/builds/worker/checkouts/myrepo'
                  - '--'
                  # Now for the actual command.
                  - bash
                  - -cx
                  - >
                    ~/.local/bin/taskgraph decision
                    --pushlog-id='0'
                    --pushdate='0'
                    --project='${project}'
                    --message=""
                    --owner='${ownerEmail}'
                    --level='1'
                    --base-repository="$MYREPO_BASE_REPOSITORY"
                    --head-repository="$MYREPO_HEAD_REPOSITORY"
                    --head-ref="$MYREPO_HEAD_REF"
                    --head-rev="$MYREPO_HEAD_REV"
                    --repository-type="$MYREPO_REPOSITORY_TYPE"
                    --tasks-for='${tasks_for}'
      

For convenience, the full .taskcluster.yml can be downloaded here.

Note

See the Taskcluster documentation and/or Github quickstart resources for more information on creating a .taskcluster.yml file.

Testing it Out#

From here you should be ready to commit to your repo (directly or via pull request) and start testing things out! It’s very likely that you’ll run into some error or another at first. If you suspect a problem in the task configuration, see Run Taskgraph Locally for tips on how to solve it. Otherwise you might need to tweak the .taskcluster.yml or make changes to your repo’s Taskcluster configuration. If the latter is necessary, reach out to your Taskcluster administrators for assistance.

Phew! While that was a lot, this only scratches the surface. You may also want to incorporate:

  • Dependencies

  • Artifacts

  • Docker images

  • Action / Cron tasks

  • Levels

  • Treeherder support

  • Chain of Trust

  • Release tasks (using scriptworker)

  • ..and much more

But hopefully this tutorial helped provide a solid foundation of knowledge upon which to build.