Using github actions from a github app

Eventually, we want snekomatic to be able to do things like:

- check out a source repository and run `pip-compile` to update deps
- check out a branch and run auto-formatters
- run `setup.py` and upload the resulting binaries

There are two difficult things about running these kinds of code from inside a github app:

- You really want to be able to see the logs, ideally through some nice web interface, so you can debug when stuff goes wrong.

- Our github app basically has admin privileges on the whole organization, so we don't want to be running random code like `setup.py` in there; it's a potential security problem. (Generally our threat model is: we are quick to trust people to commit stuff to our repos, because it's easy to revert if they do something nasty; but we don't want to let people escalate that to get longer-term privileges or avoid logging/auditing.) So `setup.py` at least needs some kind of sandboxing.

We could try to implement this stuff ourselves, but I'm thinking: what if we let Github Actions do the work for us? They've already solved both of these problems.

I considered a bunch of different strategies, and I think the one that makes the most sense is:

We write a script in this repository that knows how to do the various actions we need ("clone repo and run `pip-compile`", "run setup.py", etc.)

We add an `action.yml` and `Dockerfile` to this repository, that will let our script run as a github action

We add a `.github/workflows/blah.yml` file to configure that script to run in response to a `repository_dispatch` event. In particular, something like:

```yaml
on: [repository_dispatch]

jobs:
  unprivileged:
      runs-on: ubuntu-latest
      name: "unprivileged-${{ github.event.client_payload.id }}"

       steps:
       - uses: python-trio/snekomatic@master
       - uses: action/upload-artifact@v1
         with:
           name: artifacts
           path: artifacts/

  privileged:
      runs-on: ubuntu-latest
      name: "privileged-${{ github.event.client_payload.id }}"
      needs: unprivileged

       steps:
       - uses: action/download-artifact@v1
         with:
           name: artifacts
       - uses: python-trio/snekomatic@master
         with:
           github_token: ${{ secrets.GITHUB_TOKEN }}
           pypi_token: ${{ secrets.PYPI_TOKEN }}
```

There's a lot going on here. Here's how it would work:

When the snekomatic heroku app wants to run one of these operations, it uses the github api to send a [`repository_dispatch` event](https://help.github.com/en/actions/automating-your-workflow-with-github-actions/events-that-trigger-workflows#external-events-repository_dispatch) to the `python-trio/snekomatic` repo. This event can contain an arbitrary JSON payload, which we use to encode info about what operation we want to run. E.g.:

```json
{
  "operation": "pre-merge-formatting",
  "id": "FEMHkBB9cXrw1g",
  "repo": "python-trio/trio",
  "pr": 1234
}
```

The Github actions machinery sees this event, and spawns a copy of our "workflow". This workflow has two "jobs", which are run sequentially (because of the `needs:` key on the second job).

Each job gets its own virtual machine. The first job doesn't get passed any secrets, so it can run arbitrary untrusted code, and github will take care of sandboxing it. The only way it can affect the outside world is by dropping files into the `artifacts/` directory, which will be zipped up and passed to the second job. The second job then can take those files, and do whatever it wants with them.

For example, the first job might clone the given repo, check out the given revision, do `python setup.py sdist bdist_wheel`, then put the resulting sdist and wheel into the `artifacts/` directory. The second job can then take the sdist and wheel files, and upload them to PyPI. If someone puts some nasty code in `setup.py`, it could create an arbitrarily broken sdist/wheel... but that's about all it could do.

The script that runs inside the jobs can see the information about the `repository_dispatch` event that triggered it. This includes the arbitrary json payload, which it can use to decide what to do. **But** before it does that, it should first check that the `sender` field in the event shows it coming from `trio-bot`. Any app with *read-only* access to the repo can create a `repository_dispatch` event, and who knows what kind of shenanigans that could enable, so better to make sure the message really comes from `trio-bot` up front.

When the jobs start, Github sends out a `check_run` event to any webhook listeners, and this includes both the "name" of the job, plus a bunch of metadata about the job. By using clever template-y magic, we interpolate the `id` field from the `repository_dispatch` payload into the job `name`. This way, our Github app can listen for the `check_run` event, and match it up with the `repository_dispatch` it sent. This way it can do stuff like post back to the original user who triggered the operation "OK, that's started, if you want to watch the progress then click here: [URL]", and if it's running multiple operations at the same time it can keep track of which one is which, and eventually get the exit status to figure out whether it succeeded or failed.

So... yeah. It's kind of elaborate, but I think it all does work, gives solid, easy-to-reason-about security guarantees, and is way easier than building our own sandboxing mechanism.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Using github actions from a github app #68

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Using github actions from a github app #68

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions