Skip to content

Add Github Runner with Cloud Run Worker Pools to Cloud Run samples #13481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions run/github-runner/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# syntax=docker/dockerfile:1
# Best practices: https://docs.docker.com/build/building/best-practices/

FROM ghcr.io/actions/actions-runner:2.322.0

# Add scripts with right permissions.
USER root
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems overkill to run as root just to set a file as executable, I don't think you need to. Can you ask around how to solve that without root?

ADD start.sh start.sh
RUN chmod +x start.sh

# Add start entrypoint with right permissions.
USER runner
ENTRYPOINT ["./start.sh"]
113 changes: 113 additions & 0 deletions run/github-runner/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# GH Runner Worker Pools Sample

The following example walks through how to host self hosted GitHub Runner on worker pools which will execute the workflows defined in your GitHub repository.

## About self-hosted Github runners
Runners are the machines that execute jobs in a GitHub Actions workflow. For example, a runner can clone your repository locally, install testing software, and then run commands that evaluate your code.

A self-hosted runner is a system that you deploy and manage to execute jobs from GitHub Actions on GitHub.
Self-hosted runners:
- Give you more control of hardware, operating system, and software tools than GitHub-hosted runners provide.
- Are free to use with GitHub Actions, but you are responsible for the cost of maintaining your runner machines.
- Let you create custom hardware configurations that meet your needs with processing power or memory to run larger jobs, install software available on your local network.
- Receive automatic updates for the self-hosted runner application only, though you may disable automatic updates of the runner.
- Can use cloud services or local machines that you already pay for.
- Don't need to have a clean instance for every job execution.
- Can be physical, virtual, in a container, on-premises, or in a cloud.

## Benefits of using Cloud Run Worker Pools for hosting Github runners
Cloud Run Worker Pools offer an easy way to use Cloud Run API to host runners instead of managing your VM or GKE cluster.
With fast startup and shutdown, you can configure autoscaling using Worker Pools API to execute Github actions workflow on-demand with effective compute resource utilization in response to webhook events.
With a combination of competitive pricing and scale to zero, Worker Pools offer a cost effective solution to run workflow jobs.

### Getting started
In this example, you use the following billable components of Google Cloud:
- [Artifact Registry](https://cloud.google.com/artifact-registry)
- [Cloud Build](https://cloud.google.com/cloud-build)
- [Cloud Run](https://cloud.google.com/run)
- [Secret Manager](https://cloud.google.com/security/products/secret-manager)

### Ensure you have the following IAM roles granted to your account:
- [Cloud Run Admin](https://cloud.google.com/iam/docs/roles-permissions/run#run.admin) (roles/run.admin)
- [Project IAM Admin](https://cloud.google.com/iam/docs/roles-permissions/resourcemanager#resourcemanager.projectIamAdmin) (roles/resourcemanager.projectIamAdmin)
- [Service Usage Consumer](https://cloud.google.com/iam/docs/roles-permissions/serviceusage#serviceusage.serviceUsageConsumer) (roles/serviceusage.serviceUsageConsumer)
- [Secret Manager Secret Accessor](https://cloud.google.com/iam/docs/understanding-roles#secretmanager.secretAccessor) (roles/secretmanager.secretAccessor)
- [Artifact Registry Admin](https://cloud.google.com/iam/docs/roles-permissions/artifactregistry#artifactregistry.admin) (roles/artifactregistry.admin)
- [Cloud Build Editor](https://cloud.google.com/iam/docs/roles-permissions/cloudbuild#cloudbuild.builds.editor) (roles/cloudbuild.builds.editor)

### Deploy the Runner as Cloud Run Worker Pool deployment

Clone:

```sh
git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git/
```

Create the secret:

> [!IMPORTANT]
> Change the values of `GITHUB_SECRET_VALUE`.
> See [How to get a Github register token](#how-to-get-a-github-register-token)

```sh
gcloud secrets create GH_TOKEN --replication-policy="automatic"
echo -n "GITHUB_SECRET_VALUE" | gcloud secrets versions add GH_TOKEN --data-file=-
```

Permissions:

> [!NOTE]
> Need to set the `secretAccessor` to the right service account.

```sh
gcloud secrets add-iam-policy-binding GH_TOKEN \
--member="serviceAccount:XXXX@developer.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"
```

Deploy:

> [!IMPORTANT]
> Change the values of `GITHUB_USER_OR_ORGANIZATION` and `REPOSITORY_NAME`.

```sh
gcloud beta run worker-pools deploy cloud-run-github-runner \
--source=. \
--scaling=1 \
--set-env-vars GH_OWNER=GITHUB_USER_OR_ORGANIZATION,GH_REPOSITORY=REPOSITORY_NAME \
--set-secrets GH_TOKEN=GH_TOKEN:latest
```

> [!NOTE]
> In this case `cloud-run-github-runner` is the name of the Cloud Run Worker pool.

### How to get a Github register token

Go to "Add new self-hosted runner" in settings section of your repository.

![example of hosted runner form](docs/assets/add-new-self-hosted-runner.png)

Copy the *register token*.

## Github Runner Autoscaler

Once you deploy the worker pool with an active github runner, it's time to configure the autoscaler to provision worker instances based on the job status in the actions queue.

You can automatically increase or decrease the number of self-hosted runners in your environment in response to the webhook events you receive with a particular label. For example, you can create automation that adds a new self-hosted runner each time you receive a [workflow_job](https://docs.github.com/en/webhooks/webhook-events-and-payloads#workflow_job) webhook event with the [queued](https://docs.github.com/en/webhooks-and-events/webhooks/webhook-events-and-payloads#workflow_job) activity, which notifies you that a new job is ready for processing. The webhook payload includes label data, so you can identify the type of runner the job is requesting. Once the job has finished, you can then create automation that removes the runner in response to the workflow_job [completed](https://docs.github.com/en/webhooks-and-events/webhooks/webhook-events-and-payloads#workflow_job) activity.

### Deploy the function to receive webhook requests

```sh

cd github-runner-autoscaler

gcloud run deploy github-runner-autoscaler --function github_webhook_handler --region us-central1 --source . --set-env-vars GITHUB_ORG_OR_REPO='OWNER/REPO-NAME',RUNNER_SCOPE='repo',MAX_RUNNERS=5,GCP_PROJECT='PROJECT',CLOUD_RUN_WORKER_POOL_NAME='CLOUD_RUN_WORKER_POOL_NAME'
```
> [!NOTE]
> In this case `CLOUD_RUN_WORKER_POOL_NAME` is the name of the Cloud Run Worker pool you wish to autoscale.

## Configure the webhook

Under your repository, go to Settings -> Webhooks -> Manage webhook to configure the functions endpoint as the payload URL. Select Push events to trigger the webhook
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The instruction to select "Push events" for the webhook is incorrect. The autoscaler function handles workflow_job events. Update the documentation to specify workflow_job events.

Suggested change
Under your repository, go to Settings -> Webhooks -> Manage webhook to configure the functions endpoint as the payload URL. Select Push events to trigger the webhook
Under your repository, go to Settings -> Webhooks -> Manage webhook to configure the functions endpoint as the payload URL. Select "Workflow job" events to trigger the webhook


![example of configure webhook form](docs/assets/configure-webhook.png)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
254 changes: 254 additions & 0 deletions run/github-runner/github-runner-autoscaler/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,254 @@
# Copyright 2025 Google, LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os
import hmac
import hashlib
import requests
import json
import logging
from google.oauth2 import service_account
from google.auth.transport.requests import Request as GoogleRequest
import google.auth
from flask import Request
from google.cloud import secretmanager
from github import Github


# --- Configuration ---
PROJECT_ID = os.environ.get('GCP_PROJECT')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOCATION = 'us-central1' # Or your desired region
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLOUD_RUN_WORKER_POOL_NAME = os.environ.get('CLOUD_RUN_WORKER_POOL_NAME') # Your worker pool name


# GitHub specific config
GITHUB_ORG_OR_REPO = os.environ.get('GITHUB_ORG_OR_REPO', 'YOUR_ORG/YOUR_REPO') # e.g., 'my-org' or 'my-org/my-repo'
RUNNER_SCOPE = os.environ.get('RUNNER_SCOPE', 'repo') # 'org' or 'repo'


# Autoscaling parameters
MAX_RUNNERS = int(os.environ.get('MAX_RUNNERS', 5)) # Max number of concurrent runners
IDLE_TIMEOUT_MINUTES = int(os.environ.get('IDLE_TIMEOUT_MINUTES', 15)) # How long to wait before scaling down idle runners
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The configuration variable IDLE_TIMEOUT_MINUTES is defined but not used. Remove this dead code.



# Initialize GitHub client
github_client = None
github_entity = None
try:
# Get GH_TOKEN from Secret Manager
client = secretmanager.SecretManagerServiceClient()
secret_name = f"projects/{PROJECT_ID}/secrets/GH_TOKEN/versions/latest"
response = client.access_secret_version(request={"name": secret_name})
gh_token = response.payload.data.decode("UTF-8")
github_client = Github(gh_token)


if RUNNER_SCOPE == 'org':
github_entity = github_client.get_organization(GITHUB_ORG_OR_REPO)
else:
owner, repo_name = GITHUB_ORG_OR_REPO.split('/')
github_entity = github_client.get_user(owner).get_repo(repo_name)
except Exception as e:
logging.error(f"Failed to initialize GitHub client or access GH_TOKEN: {e}")
Comment on lines +45 to +63
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The block for initializing the GitHub client, along with the global variables github_client and github_entity, is not used. This is dead code that adds unnecessary complexity and dependencies. Remove this block and the corresponding unused imports.



def get_authenticated_request():
"""Returns a requests.Session object authenticated for Google Cloud APIs."""
credentials, project = google.auth.default()
scoped_credentials = credentials.with_scopes(['https://www.googleapis.com/auth/cloud-platform'])
auth_req = GoogleRequest()
scoped_credentials.refresh(auth_req)
return auth_req, scoped_credentials.token


def get_current_worker_pool_instance_count():
"""
Retrieves the current manualInstanceCount of the Cloud Run worker pool.
Returns the instance count as an integer, or -1 if retrieval fails.
"""
auth_req, access_token = get_authenticated_request()
if not access_token:
logging.error("Failed to retrieve Google Cloud access token to get current instance count.")
return -1


url = f"https://run.googleapis.com/v2/projects/{PROJECT_ID}/locations/{LOCATION}/workerPools/{CLOUD_RUN_WORKER_POOL_NAME}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using the Cloud Run Python client library instead?
This works, but client library might be more robust.



headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {access_token}"
}


try:
response = auth_req.session.get(url, headers=headers)
response.raise_for_status()
worker_pool_data = response.json()
current_instance_count = worker_pool_data.get('scaling', {}).get('manualInstanceCount', 0)
logging.info(f"Current worker pool instance count: {current_instance_count}")
return current_instance_count
except requests.exceptions.RequestException as e:
logging.error(f"Error getting Cloud Run worker pool details: {e}")
if response is not None:
logging.error(f"Response Status Code: {response.status_code}")
logging.error(f"Response Text: {response.text}")
return -1


def update_runner_vm_instance_count(instance_count: int):
"""
Updates a Cloud Run worker pool with the specified instance count.
"""
auth_req, access_token = get_authenticated_request()
if not access_token:
print("Failed to retrieve Google Cloud access token. Exiting.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the logging module instead of print for error messages for consistency and better log management.

logging.error("Failed to retrieve Google Cloud access token. Exiting.")

return


url = (f"https://run.googleapis.com/v2/projects/{PROJECT_ID}/locations/{LOCATION}/workerPools/"
f"{CLOUD_RUN_WORKER_POOL_NAME}?updateMask=scaling.manualInstanceCount")
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {access_token}"
}
payload = {
"scaling": {
"scalingMode": "MANUAL",
"manualInstanceCount": instance_count
}
}




try:
response = auth_req.session.patch(url, headers=headers, json=payload)
response.raise_for_status()


print(f"Successfully updated Cloud Run worker pool. Status Code: {response.status_code}")
print("Response JSON:")
print(json.dumps(response.json(), indent=2))


except requests.exceptions.RequestException as e:
print(f"Error updating Cloud Run worker pool: {e}")
if response is not None:
print(f"Response Status Code: {response.status_code}")
print(f"Response Text: {response.text}")
Comment on lines +141 to +150
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the logging module instead of print for all application output to ensure logs are correctly captured, formatted, and routed in a cloud environment. Replace these print calls with logging.info and logging.error.

logging.info(f"Successfully updated Cloud Run worker pool. Status Code: {response.status_code}")
       logging.info("Response JSON:")
       logging.info(json.dumps(response.json(), indent=2))


   except requests.exceptions.RequestException as e:
       logging.error(f"Error updating Cloud Run worker pool: {e}")
       if response is not None:
           logging.error(f"Response Status Code: {response.status_code}")
           logging.error(f"Response Text: {response.text}")



def create_runner_vm(count: int):
"""Updates a Cloud Run worker pool to scale up to the specified count."""
logging.info(f"Attempting to scale up Cloud Run worker pool to {count} instances.")
update_runner_vm_instance_count(count)


def delete_runner_vm(count: int):
"""Updates a Cloud Run worker pool to scale down to the specified count."""
logging.info(f"Attempting to scale down Cloud Run worker pool to {count} instances.")
update_runner_vm_instance_count(count)




# --- Main Webhook Handler ---


def github_webhook_handler(request: Request):
"""
HTTP Cloud Function that handles GitHub workflow_job events for autoscaling.
"""
logging.getLogger().setLevel(logging.INFO) # Set logging level


# 1. Validate Webhook Signature (IMPORTANT FOR PRODUCTION)
# You need to implement this with your GitHub Webhook Secret.
# This is commented out in your original code, but critical for security.
# Example (you need to retrieve webhook_secret from Secret Manager too):
# webhook_secret = get_secret_from_secret_manager("GITHUB_WEBHOOK_SECRET")
# if not validate_signature(request, webhook_secret):
# return ("Invalid signature", 403)
Comment on lines +177 to +183
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Webhook signature validation is commented out, posing a critical security risk. Without validation, anyone can send payloads, triggering the autoscaler. Implement this validation for any real-world usage.



# 2. Parse Event
event_type = request.headers.get('X-GitHub-Event')
if event_type != 'workflow_job':
logging.info(f"Received event type '{event_type}', ignoring.")
return ("OK", 200)


try:
payload = request.get_json()
except Exception as e:
logging.error(f"Error parsing JSON payload: {e}")
return ("Bad Request", 400)


action = payload.get('action')
job = payload.get('workflow_job')


if not job:
logging.warning("No 'workflow_job' found in payload.")
return ("OK", 200)


job_id = job.get('id')
job_name = job.get('name')
job_status = job.get('status') # 'queued', 'in_progress', 'completed'
job_conclusion = job.get('conclusion') # 'success', 'failure', 'cancelled', 'skipped'


logging.info(f"Received workflow_job event: Job ID {job_id}, Name '{job_name}', Status '{job_status}', Action '{action}'")


# 3. Handle Scaling Logic


current_instance_count = get_current_worker_pool_instance_count()


if current_instance_count == -1:
logging.error("Could not retrieve current instance count. Aborting scaling operation.")
return ("Internal Server Error", 500)


# Scale Up: If a job is queued and we have available capacity
if action == 'queued' and job_status == 'queued':
if current_instance_count < MAX_RUNNERS:
new_instance_count = current_instance_count + 1
logging.info(f"Job '{job_name}' is queued. Scaling up from {current_instance_count} to {new_instance_count} runners.")
create_runner_vm(new_instance_count)
else:
logging.info(f"Job '{job_name}' is queued, but max runners ({MAX_RUNNERS}) reached. Current runners: {current_instance_count}.")


# Scale Down: If a job is completed, find the corresponding runner and consider terminating it
elif action == 'completed' and job_status == 'completed':
# You might want more sophisticated logic here to determine which runner to shut down,
# especially if you have multiple runners and want to only shut down idle ones.
# For simplicity, this example scales down by one, ensuring it doesn't go below zero.
if current_instance_count > 0:
new_instance_count = current_instance_count - 1
logging.info(f"Job '{job_name}' completed. Scaling down from {current_instance_count} to {new_instance_count} runners.")
delete_runner_vm(new_instance_count)
else:
logging.info(f"Job '{job_name}' completed, but no runners are currently active to scale down.")
else:
logging.info(f"Workflow job event for '{job_name}' with action '{action}' and status '{job_status}' did not trigger a scaling action.")


return ("OK", 200)
7 changes: 7 additions & 0 deletions run/github-runner/github-runner-autoscaler/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Flask
requests
google-cloud-secret-manager
google-auth
google-auth-oauthlib
google-api-python-client
PyGithub
Comment on lines +1 to +7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file contains unused dependencies: google-cloud-secret-manager, google-auth-oauthlib, google-api-python-client, and PyGithub. Removing them will reduce the deployment package size and improve security.

Flask
requests
google-auth

Loading