-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Add Github Runner with Cloud Run Worker Pools to Cloud Run samples #13481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# syntax=docker/dockerfile:1 | ||
# Best practices: https://docs.docker.com/build/building/best-practices/ | ||
|
||
FROM ghcr.io/actions/actions-runner:2.322.0 | ||
|
||
# Add scripts with right permissions. | ||
USER root | ||
ADD start.sh start.sh | ||
RUN chmod +x start.sh | ||
|
||
# Add start entrypoint with right permissions. | ||
USER runner | ||
ENTRYPOINT ["./start.sh"] |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,113 @@ | ||||||
# GH Runner Worker Pools Sample | ||||||
|
||||||
The following example walks through how to host self hosted GitHub Runner on worker pools which will execute the workflows defined in your GitHub repository. | ||||||
|
||||||
## About self-hosted Github runners | ||||||
Runners are the machines that execute jobs in a GitHub Actions workflow. For example, a runner can clone your repository locally, install testing software, and then run commands that evaluate your code. | ||||||
|
||||||
A self-hosted runner is a system that you deploy and manage to execute jobs from GitHub Actions on GitHub. | ||||||
Self-hosted runners: | ||||||
- Give you more control of hardware, operating system, and software tools than GitHub-hosted runners provide. | ||||||
- Are free to use with GitHub Actions, but you are responsible for the cost of maintaining your runner machines. | ||||||
- Let you create custom hardware configurations that meet your needs with processing power or memory to run larger jobs, install software available on your local network. | ||||||
- Receive automatic updates for the self-hosted runner application only, though you may disable automatic updates of the runner. | ||||||
- Can use cloud services or local machines that you already pay for. | ||||||
- Don't need to have a clean instance for every job execution. | ||||||
- Can be physical, virtual, in a container, on-premises, or in a cloud. | ||||||
|
||||||
## Benefits of using Cloud Run Worker Pools for hosting Github runners | ||||||
Cloud Run Worker Pools offer an easy way to use Cloud Run API to host runners instead of managing your VM or GKE cluster. | ||||||
With fast startup and shutdown, you can configure autoscaling using Worker Pools API to execute Github actions workflow on-demand with effective compute resource utilization in response to webhook events. | ||||||
With a combination of competitive pricing and scale to zero, Worker Pools offer a cost effective solution to run workflow jobs. | ||||||
|
||||||
### Getting started | ||||||
In this example, you use the following billable components of Google Cloud: | ||||||
- [Artifact Registry](https://cloud.google.com/artifact-registry) | ||||||
- [Cloud Build](https://cloud.google.com/cloud-build) | ||||||
- [Cloud Run](https://cloud.google.com/run) | ||||||
- [Secret Manager](https://cloud.google.com/security/products/secret-manager) | ||||||
|
||||||
### Ensure you have the following IAM roles granted to your account: | ||||||
- [Cloud Run Admin](https://cloud.google.com/iam/docs/roles-permissions/run#run.admin) (roles/run.admin) | ||||||
- [Project IAM Admin](https://cloud.google.com/iam/docs/roles-permissions/resourcemanager#resourcemanager.projectIamAdmin) (roles/resourcemanager.projectIamAdmin) | ||||||
- [Service Usage Consumer](https://cloud.google.com/iam/docs/roles-permissions/serviceusage#serviceusage.serviceUsageConsumer) (roles/serviceusage.serviceUsageConsumer) | ||||||
- [Secret Manager Secret Accessor](https://cloud.google.com/iam/docs/understanding-roles#secretmanager.secretAccessor) (roles/secretmanager.secretAccessor) | ||||||
- [Artifact Registry Admin](https://cloud.google.com/iam/docs/roles-permissions/artifactregistry#artifactregistry.admin) (roles/artifactregistry.admin) | ||||||
- [Cloud Build Editor](https://cloud.google.com/iam/docs/roles-permissions/cloudbuild#cloudbuild.builds.editor) (roles/cloudbuild.builds.editor) | ||||||
|
||||||
### Deploy the Runner as Cloud Run Worker Pool deployment | ||||||
|
||||||
Clone: | ||||||
|
||||||
```sh | ||||||
git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git/ | ||||||
``` | ||||||
|
||||||
Create the secret: | ||||||
|
||||||
> [!IMPORTANT] | ||||||
> Change the values of `GITHUB_SECRET_VALUE`. | ||||||
> See [How to get a Github register token](#how-to-get-a-github-register-token) | ||||||
|
||||||
```sh | ||||||
gcloud secrets create GH_TOKEN --replication-policy="automatic" | ||||||
echo -n "GITHUB_SECRET_VALUE" | gcloud secrets versions add GH_TOKEN --data-file=- | ||||||
``` | ||||||
|
||||||
Permissions: | ||||||
|
||||||
> [!NOTE] | ||||||
> Need to set the `secretAccessor` to the right service account. | ||||||
|
||||||
```sh | ||||||
gcloud secrets add-iam-policy-binding GH_TOKEN \ | ||||||
--member="serviceAccount:XXXX@developer.gserviceaccount.com" \ | ||||||
--role="roles/secretmanager.secretAccessor" | ||||||
``` | ||||||
|
||||||
Deploy: | ||||||
|
||||||
> [!IMPORTANT] | ||||||
> Change the values of `GITHUB_USER_OR_ORGANIZATION` and `REPOSITORY_NAME`. | ||||||
|
||||||
```sh | ||||||
gcloud beta run worker-pools deploy cloud-run-github-runner \ | ||||||
--source=. \ | ||||||
--scaling=1 \ | ||||||
--set-env-vars GH_OWNER=GITHUB_USER_OR_ORGANIZATION,GH_REPOSITORY=REPOSITORY_NAME \ | ||||||
--set-secrets GH_TOKEN=GH_TOKEN:latest | ||||||
``` | ||||||
|
||||||
> [!NOTE] | ||||||
> In this case `cloud-run-github-runner` is the name of the Cloud Run Worker pool. | ||||||
|
||||||
### How to get a Github register token | ||||||
|
||||||
Go to "Add new self-hosted runner" in settings section of your repository. | ||||||
|
||||||
 | ||||||
|
||||||
Copy the *register token*. | ||||||
|
||||||
## Github Runner Autoscaler | ||||||
|
||||||
Once you deploy the worker pool with an active github runner, it's time to configure the autoscaler to provision worker instances based on the job status in the actions queue. | ||||||
|
||||||
You can automatically increase or decrease the number of self-hosted runners in your environment in response to the webhook events you receive with a particular label. For example, you can create automation that adds a new self-hosted runner each time you receive a [workflow_job](https://docs.github.com/en/webhooks/webhook-events-and-payloads#workflow_job) webhook event with the [queued](https://docs.github.com/en/webhooks-and-events/webhooks/webhook-events-and-payloads#workflow_job) activity, which notifies you that a new job is ready for processing. The webhook payload includes label data, so you can identify the type of runner the job is requesting. Once the job has finished, you can then create automation that removes the runner in response to the workflow_job [completed](https://docs.github.com/en/webhooks-and-events/webhooks/webhook-events-and-payloads#workflow_job) activity. | ||||||
|
||||||
### Deploy the function to receive webhook requests | ||||||
|
||||||
```sh | ||||||
|
||||||
cd github-runner-autoscaler | ||||||
|
||||||
gcloud run deploy github-runner-autoscaler --function github_webhook_handler --region us-central1 --source . --set-env-vars GITHUB_ORG_OR_REPO='OWNER/REPO-NAME',RUNNER_SCOPE='repo',MAX_RUNNERS=5,GCP_PROJECT='PROJECT',CLOUD_RUN_WORKER_POOL_NAME='CLOUD_RUN_WORKER_POOL_NAME' | ||||||
``` | ||||||
> [!NOTE] | ||||||
> In this case `CLOUD_RUN_WORKER_POOL_NAME` is the name of the Cloud Run Worker pool you wish to autoscale. | ||||||
|
||||||
## Configure the webhook | ||||||
|
||||||
Under your repository, go to Settings -> Webhooks -> Manage webhook to configure the functions endpoint as the payload URL. Select Push events to trigger the webhook | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The instruction to select "Push events" for the webhook is incorrect. The autoscaler function handles
Suggested change
|
||||||
|
||||||
 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,254 @@ | ||
# Copyright 2025 Google, LLC. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import os | ||
import hmac | ||
import hashlib | ||
import requests | ||
import json | ||
import logging | ||
from google.oauth2 import service_account | ||
from google.auth.transport.requests import Request as GoogleRequest | ||
import google.auth | ||
from flask import Request | ||
from google.cloud import secretmanager | ||
from github import Github | ||
|
||
|
||
# --- Configuration --- | ||
PROJECT_ID = os.environ.get('GCP_PROJECT') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here, retrieve from metadata server: https://github.com/GoogleCloudPlatform/cloud-run-hello/blob/master/hello.go#L105 |
||
LOCATION = 'us-central1' # Or your desired region | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you retrieve this from metadata server? |
||
CLOUD_RUN_WORKER_POOL_NAME = os.environ.get('CLOUD_RUN_WORKER_POOL_NAME') # Your worker pool name | ||
|
||
|
||
# GitHub specific config | ||
GITHUB_ORG_OR_REPO = os.environ.get('GITHUB_ORG_OR_REPO', 'YOUR_ORG/YOUR_REPO') # e.g., 'my-org' or 'my-org/my-repo' | ||
RUNNER_SCOPE = os.environ.get('RUNNER_SCOPE', 'repo') # 'org' or 'repo' | ||
|
||
|
||
# Autoscaling parameters | ||
MAX_RUNNERS = int(os.environ.get('MAX_RUNNERS', 5)) # Max number of concurrent runners | ||
IDLE_TIMEOUT_MINUTES = int(os.environ.get('IDLE_TIMEOUT_MINUTES', 15)) # How long to wait before scaling down idle runners | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
|
||
# Initialize GitHub client | ||
github_client = None | ||
github_entity = None | ||
try: | ||
# Get GH_TOKEN from Secret Manager | ||
client = secretmanager.SecretManagerServiceClient() | ||
secret_name = f"projects/{PROJECT_ID}/secrets/GH_TOKEN/versions/latest" | ||
response = client.access_secret_version(request={"name": secret_name}) | ||
gh_token = response.payload.data.decode("UTF-8") | ||
github_client = Github(gh_token) | ||
|
||
|
||
if RUNNER_SCOPE == 'org': | ||
github_entity = github_client.get_organization(GITHUB_ORG_OR_REPO) | ||
else: | ||
owner, repo_name = GITHUB_ORG_OR_REPO.split('/') | ||
github_entity = github_client.get_user(owner).get_repo(repo_name) | ||
except Exception as e: | ||
logging.error(f"Failed to initialize GitHub client or access GH_TOKEN: {e}") | ||
Comment on lines
+45
to
+63
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
|
||
def get_authenticated_request(): | ||
"""Returns a requests.Session object authenticated for Google Cloud APIs.""" | ||
credentials, project = google.auth.default() | ||
scoped_credentials = credentials.with_scopes(['https://www.googleapis.com/auth/cloud-platform']) | ||
auth_req = GoogleRequest() | ||
scoped_credentials.refresh(auth_req) | ||
return auth_req, scoped_credentials.token | ||
|
||
|
||
def get_current_worker_pool_instance_count(): | ||
""" | ||
Retrieves the current manualInstanceCount of the Cloud Run worker pool. | ||
Returns the instance count as an integer, or -1 if retrieval fails. | ||
""" | ||
auth_req, access_token = get_authenticated_request() | ||
if not access_token: | ||
logging.error("Failed to retrieve Google Cloud access token to get current instance count.") | ||
return -1 | ||
|
||
|
||
url = f"https://run.googleapis.com/v2/projects/{PROJECT_ID}/locations/{LOCATION}/workerPools/{CLOUD_RUN_WORKER_POOL_NAME}" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have you considered using the Cloud Run Python client library instead? |
||
|
||
|
||
headers = { | ||
"Content-Type": "application/json", | ||
"Authorization": f"Bearer {access_token}" | ||
} | ||
|
||
|
||
try: | ||
response = auth_req.session.get(url, headers=headers) | ||
response.raise_for_status() | ||
worker_pool_data = response.json() | ||
current_instance_count = worker_pool_data.get('scaling', {}).get('manualInstanceCount', 0) | ||
logging.info(f"Current worker pool instance count: {current_instance_count}") | ||
return current_instance_count | ||
except requests.exceptions.RequestException as e: | ||
logging.error(f"Error getting Cloud Run worker pool details: {e}") | ||
if response is not None: | ||
logging.error(f"Response Status Code: {response.status_code}") | ||
logging.error(f"Response Text: {response.text}") | ||
return -1 | ||
|
||
|
||
def update_runner_vm_instance_count(instance_count: int): | ||
""" | ||
Updates a Cloud Run worker pool with the specified instance count. | ||
""" | ||
auth_req, access_token = get_authenticated_request() | ||
if not access_token: | ||
print("Failed to retrieve Google Cloud access token. Exiting.") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
return | ||
|
||
|
||
url = (f"https://run.googleapis.com/v2/projects/{PROJECT_ID}/locations/{LOCATION}/workerPools/" | ||
f"{CLOUD_RUN_WORKER_POOL_NAME}?updateMask=scaling.manualInstanceCount") | ||
headers = { | ||
"Content-Type": "application/json", | ||
"Authorization": f"Bearer {access_token}" | ||
} | ||
payload = { | ||
"scaling": { | ||
"scalingMode": "MANUAL", | ||
"manualInstanceCount": instance_count | ||
} | ||
} | ||
|
||
|
||
|
||
|
||
try: | ||
response = auth_req.session.patch(url, headers=headers, json=payload) | ||
response.raise_for_status() | ||
|
||
|
||
print(f"Successfully updated Cloud Run worker pool. Status Code: {response.status_code}") | ||
print("Response JSON:") | ||
print(json.dumps(response.json(), indent=2)) | ||
|
||
|
||
except requests.exceptions.RequestException as e: | ||
print(f"Error updating Cloud Run worker pool: {e}") | ||
if response is not None: | ||
print(f"Response Status Code: {response.status_code}") | ||
print(f"Response Text: {response.text}") | ||
Comment on lines
+141
to
+150
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use the logging.info(f"Successfully updated Cloud Run worker pool. Status Code: {response.status_code}")
logging.info("Response JSON:")
logging.info(json.dumps(response.json(), indent=2))
except requests.exceptions.RequestException as e:
logging.error(f"Error updating Cloud Run worker pool: {e}")
if response is not None:
logging.error(f"Response Status Code: {response.status_code}")
logging.error(f"Response Text: {response.text}") |
||
|
||
|
||
def create_runner_vm(count: int): | ||
"""Updates a Cloud Run worker pool to scale up to the specified count.""" | ||
logging.info(f"Attempting to scale up Cloud Run worker pool to {count} instances.") | ||
update_runner_vm_instance_count(count) | ||
|
||
|
||
def delete_runner_vm(count: int): | ||
"""Updates a Cloud Run worker pool to scale down to the specified count.""" | ||
logging.info(f"Attempting to scale down Cloud Run worker pool to {count} instances.") | ||
update_runner_vm_instance_count(count) | ||
|
||
|
||
|
||
|
||
# --- Main Webhook Handler --- | ||
|
||
|
||
def github_webhook_handler(request: Request): | ||
""" | ||
HTTP Cloud Function that handles GitHub workflow_job events for autoscaling. | ||
""" | ||
logging.getLogger().setLevel(logging.INFO) # Set logging level | ||
|
||
|
||
# 1. Validate Webhook Signature (IMPORTANT FOR PRODUCTION) | ||
# You need to implement this with your GitHub Webhook Secret. | ||
# This is commented out in your original code, but critical for security. | ||
# Example (you need to retrieve webhook_secret from Secret Manager too): | ||
# webhook_secret = get_secret_from_secret_manager("GITHUB_WEBHOOK_SECRET") | ||
# if not validate_signature(request, webhook_secret): | ||
# return ("Invalid signature", 403) | ||
Comment on lines
+177
to
+183
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
|
||
# 2. Parse Event | ||
event_type = request.headers.get('X-GitHub-Event') | ||
if event_type != 'workflow_job': | ||
logging.info(f"Received event type '{event_type}', ignoring.") | ||
return ("OK", 200) | ||
|
||
|
||
try: | ||
payload = request.get_json() | ||
except Exception as e: | ||
logging.error(f"Error parsing JSON payload: {e}") | ||
return ("Bad Request", 400) | ||
|
||
|
||
action = payload.get('action') | ||
job = payload.get('workflow_job') | ||
|
||
|
||
if not job: | ||
logging.warning("No 'workflow_job' found in payload.") | ||
return ("OK", 200) | ||
|
||
|
||
job_id = job.get('id') | ||
job_name = job.get('name') | ||
job_status = job.get('status') # 'queued', 'in_progress', 'completed' | ||
job_conclusion = job.get('conclusion') # 'success', 'failure', 'cancelled', 'skipped' | ||
|
||
|
||
logging.info(f"Received workflow_job event: Job ID {job_id}, Name '{job_name}', Status '{job_status}', Action '{action}'") | ||
|
||
|
||
# 3. Handle Scaling Logic | ||
|
||
|
||
current_instance_count = get_current_worker_pool_instance_count() | ||
|
||
|
||
if current_instance_count == -1: | ||
logging.error("Could not retrieve current instance count. Aborting scaling operation.") | ||
return ("Internal Server Error", 500) | ||
|
||
|
||
# Scale Up: If a job is queued and we have available capacity | ||
if action == 'queued' and job_status == 'queued': | ||
if current_instance_count < MAX_RUNNERS: | ||
new_instance_count = current_instance_count + 1 | ||
logging.info(f"Job '{job_name}' is queued. Scaling up from {current_instance_count} to {new_instance_count} runners.") | ||
create_runner_vm(new_instance_count) | ||
else: | ||
logging.info(f"Job '{job_name}' is queued, but max runners ({MAX_RUNNERS}) reached. Current runners: {current_instance_count}.") | ||
|
||
|
||
# Scale Down: If a job is completed, find the corresponding runner and consider terminating it | ||
elif action == 'completed' and job_status == 'completed': | ||
# You might want more sophisticated logic here to determine which runner to shut down, | ||
# especially if you have multiple runners and want to only shut down idle ones. | ||
# For simplicity, this example scales down by one, ensuring it doesn't go below zero. | ||
if current_instance_count > 0: | ||
new_instance_count = current_instance_count - 1 | ||
logging.info(f"Job '{job_name}' completed. Scaling down from {current_instance_count} to {new_instance_count} runners.") | ||
delete_runner_vm(new_instance_count) | ||
else: | ||
logging.info(f"Job '{job_name}' completed, but no runners are currently active to scale down.") | ||
else: | ||
logging.info(f"Workflow job event for '{job_name}' with action '{action}' and status '{job_status}' did not trigger a scaling action.") | ||
|
||
|
||
return ("OK", 200) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
Flask | ||
requests | ||
google-cloud-secret-manager | ||
google-auth | ||
google-auth-oauthlib | ||
google-api-python-client | ||
PyGithub | ||
Comment on lines
+1
to
+7
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems overkill to run as root just to set a file as executable, I don't think you need to. Can you ask around how to solve that without root?