diff --git a/run/github-runner/Dockerfile b/run/github-runner/Dockerfile new file mode 100644 index 0000000000..bebd122ebe --- /dev/null +++ b/run/github-runner/Dockerfile @@ -0,0 +1,13 @@ +# syntax=docker/dockerfile:1 +# Best practices: https://docs.docker.com/build/building/best-practices/ + +FROM ghcr.io/actions/actions-runner:2.322.0 + +# Add scripts with right permissions. +USER root +ADD start.sh start.sh +RUN chmod +x start.sh + +# Add start entrypoint with right permissions. +USER runner +ENTRYPOINT ["./start.sh"] diff --git a/run/github-runner/README.md b/run/github-runner/README.md new file mode 100644 index 0000000000..4aea99f412 --- /dev/null +++ b/run/github-runner/README.md @@ -0,0 +1,113 @@ +# GH Runner Worker Pools Sample + +The following example walks through how to host self hosted GitHub Runner on worker pools which will execute the workflows defined in your GitHub repository. + +## About self-hosted Github runners +Runners are the machines that execute jobs in a GitHub Actions workflow. For example, a runner can clone your repository locally, install testing software, and then run commands that evaluate your code. + +A self-hosted runner is a system that you deploy and manage to execute jobs from GitHub Actions on GitHub. +Self-hosted runners: +- Give you more control of hardware, operating system, and software tools than GitHub-hosted runners provide. +- Are free to use with GitHub Actions, but you are responsible for the cost of maintaining your runner machines. +- Let you create custom hardware configurations that meet your needs with processing power or memory to run larger jobs, install software available on your local network. +- Receive automatic updates for the self-hosted runner application only, though you may disable automatic updates of the runner. +- Can use cloud services or local machines that you already pay for. +- Don't need to have a clean instance for every job execution. +- Can be physical, virtual, in a container, on-premises, or in a cloud. + +## Benefits of using Cloud Run Worker Pools for hosting Github runners +Cloud Run Worker Pools offer an easy way to use Cloud Run API to host runners instead of managing your VM or GKE cluster. +With fast startup and shutdown, you can configure autoscaling using Worker Pools API to execute Github actions workflow on-demand with effective compute resource utilization in response to webhook events. +With a combination of competitive pricing and scale to zero, Worker Pools offer a cost effective solution to run workflow jobs. + +### Getting started +In this example, you use the following billable components of Google Cloud: +- [Artifact Registry](https://cloud.google.com/artifact-registry) +- [Cloud Build](https://cloud.google.com/cloud-build) +- [Cloud Run](https://cloud.google.com/run) +- [Secret Manager](https://cloud.google.com/security/products/secret-manager) + +### Ensure you have the following IAM roles granted to your account: +- [Cloud Run Admin](https://cloud.google.com/iam/docs/roles-permissions/run#run.admin) (roles/run.admin) +- [Project IAM Admin](https://cloud.google.com/iam/docs/roles-permissions/resourcemanager#resourcemanager.projectIamAdmin) (roles/resourcemanager.projectIamAdmin) +- [Service Usage Consumer](https://cloud.google.com/iam/docs/roles-permissions/serviceusage#serviceusage.serviceUsageConsumer) (roles/serviceusage.serviceUsageConsumer) +- [Secret Manager Secret Accessor](https://cloud.google.com/iam/docs/understanding-roles#secretmanager.secretAccessor) (roles/secretmanager.secretAccessor) +- [Artifact Registry Admin](https://cloud.google.com/iam/docs/roles-permissions/artifactregistry#artifactregistry.admin) (roles/artifactregistry.admin) +- [Cloud Build Editor](https://cloud.google.com/iam/docs/roles-permissions/cloudbuild#cloudbuild.builds.editor) (roles/cloudbuild.builds.editor) + +### Deploy the Runner as Cloud Run Worker Pool deployment + +Clone: + +```sh +git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git/ +``` + +Create the secret: + +> [!IMPORTANT] +> Change the values of `GITHUB_SECRET_VALUE`. +> See [How to get a Github register token](#how-to-get-a-github-register-token) + +```sh +gcloud secrets create GH_TOKEN --replication-policy="automatic" +echo -n "GITHUB_SECRET_VALUE" | gcloud secrets versions add GH_TOKEN --data-file=- +``` + +Permissions: + +> [!NOTE] +> Need to set the `secretAccessor` to the right service account. + +```sh +gcloud secrets add-iam-policy-binding GH_TOKEN \ +--member="serviceAccount:XXXX@developer.gserviceaccount.com" \ +--role="roles/secretmanager.secretAccessor" +``` + +Deploy: + +> [!IMPORTANT] +> Change the values of `GITHUB_USER_OR_ORGANIZATION` and `REPOSITORY_NAME`. + +```sh +gcloud beta run worker-pools deploy cloud-run-github-runner \ +--source=. \ +--scaling=1 \ +--set-env-vars GH_OWNER=GITHUB_USER_OR_ORGANIZATION,GH_REPOSITORY=REPOSITORY_NAME \ +--set-secrets GH_TOKEN=GH_TOKEN:latest +``` + +> [!NOTE] +> In this case `cloud-run-github-runner` is the name of the Cloud Run Worker pool. + +### How to get a Github register token + +Go to "Add new self-hosted runner" in settings section of your repository. + +![example of hosted runner form](docs/assets/add-new-self-hosted-runner.png) + +Copy the *register token*. + +## Github Runner Autoscaler + +Once you deploy the worker pool with an active github runner, it's time to configure the autoscaler to provision worker instances based on the job status in the actions queue. + +You can automatically increase or decrease the number of self-hosted runners in your environment in response to the webhook events you receive with a particular label. For example, you can create automation that adds a new self-hosted runner each time you receive a [workflow_job](https://docs.github.com/en/webhooks/webhook-events-and-payloads#workflow_job) webhook event with the [queued](https://docs.github.com/en/webhooks-and-events/webhooks/webhook-events-and-payloads#workflow_job) activity, which notifies you that a new job is ready for processing. The webhook payload includes label data, so you can identify the type of runner the job is requesting. Once the job has finished, you can then create automation that removes the runner in response to the workflow_job [completed](https://docs.github.com/en/webhooks-and-events/webhooks/webhook-events-and-payloads#workflow_job) activity. + +### Deploy the function to receive webhook requests + +```sh + +cd github-runner-autoscaler + +gcloud run deploy github-runner-autoscaler --function github_webhook_handler --region us-central1 --source . --set-env-vars GITHUB_ORG_OR_REPO='OWNER/REPO-NAME',RUNNER_SCOPE='repo',MAX_RUNNERS=5,GCP_PROJECT='PROJECT',CLOUD_RUN_WORKER_POOL_NAME='CLOUD_RUN_WORKER_POOL_NAME' +``` +> [!NOTE] +> In this case `CLOUD_RUN_WORKER_POOL_NAME` is the name of the Cloud Run Worker pool you wish to autoscale. + +## Configure the webhook + +Under your repository, go to Settings -> Webhooks -> Manage webhook to configure the functions endpoint as the payload URL. Select Push events to trigger the webhook + +![example of configure webhook form](docs/assets/configure-webhook.png) diff --git a/run/github-runner/docs/assets/add-new-self-hosted-runner.png b/run/github-runner/docs/assets/add-new-self-hosted-runner.png new file mode 100644 index 0000000000..df889f05da Binary files /dev/null and b/run/github-runner/docs/assets/add-new-self-hosted-runner.png differ diff --git a/run/github-runner/docs/assets/configure-webhook.png b/run/github-runner/docs/assets/configure-webhook.png new file mode 100644 index 0000000000..70646d863c Binary files /dev/null and b/run/github-runner/docs/assets/configure-webhook.png differ diff --git a/run/github-runner/github-runner-autoscaler/main.py b/run/github-runner/github-runner-autoscaler/main.py new file mode 100644 index 0000000000..c5b5e6c9e7 --- /dev/null +++ b/run/github-runner/github-runner-autoscaler/main.py @@ -0,0 +1,254 @@ +# Copyright 2025 Google, LLC. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import hmac +import hashlib +import requests +import json +import logging +from google.oauth2 import service_account +from google.auth.transport.requests import Request as GoogleRequest +import google.auth +from flask import Request +from google.cloud import secretmanager +from github import Github + + +# --- Configuration --- +PROJECT_ID = os.environ.get('GCP_PROJECT') +LOCATION = 'us-central1' # Or your desired region +CLOUD_RUN_WORKER_POOL_NAME = os.environ.get('CLOUD_RUN_WORKER_POOL_NAME') # Your worker pool name + + +# GitHub specific config +GITHUB_ORG_OR_REPO = os.environ.get('GITHUB_ORG_OR_REPO', 'YOUR_ORG/YOUR_REPO') # e.g., 'my-org' or 'my-org/my-repo' +RUNNER_SCOPE = os.environ.get('RUNNER_SCOPE', 'repo') # 'org' or 'repo' + + +# Autoscaling parameters +MAX_RUNNERS = int(os.environ.get('MAX_RUNNERS', 5)) # Max number of concurrent runners +IDLE_TIMEOUT_MINUTES = int(os.environ.get('IDLE_TIMEOUT_MINUTES', 15)) # How long to wait before scaling down idle runners + + +# Initialize GitHub client +github_client = None +github_entity = None +try: + # Get GH_TOKEN from Secret Manager + client = secretmanager.SecretManagerServiceClient() + secret_name = f"projects/{PROJECT_ID}/secrets/GH_TOKEN/versions/latest" + response = client.access_secret_version(request={"name": secret_name}) + gh_token = response.payload.data.decode("UTF-8") + github_client = Github(gh_token) + + + if RUNNER_SCOPE == 'org': + github_entity = github_client.get_organization(GITHUB_ORG_OR_REPO) + else: + owner, repo_name = GITHUB_ORG_OR_REPO.split('/') + github_entity = github_client.get_user(owner).get_repo(repo_name) +except Exception as e: + logging.error(f"Failed to initialize GitHub client or access GH_TOKEN: {e}") + + +def get_authenticated_request(): + """Returns a requests.Session object authenticated for Google Cloud APIs.""" + credentials, project = google.auth.default() + scoped_credentials = credentials.with_scopes(['https://www.googleapis.com/auth/cloud-platform']) + auth_req = GoogleRequest() + scoped_credentials.refresh(auth_req) + return auth_req, scoped_credentials.token + + +def get_current_worker_pool_instance_count(): + """ + Retrieves the current manualInstanceCount of the Cloud Run worker pool. + Returns the instance count as an integer, or -1 if retrieval fails. + """ + auth_req, access_token = get_authenticated_request() + if not access_token: + logging.error("Failed to retrieve Google Cloud access token to get current instance count.") + return -1 + + + url = f"https://run.googleapis.com/v2/projects/{PROJECT_ID}/locations/{LOCATION}/workerPools/{CLOUD_RUN_WORKER_POOL_NAME}" + + + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {access_token}" + } + + + try: + response = auth_req.session.get(url, headers=headers) + response.raise_for_status() + worker_pool_data = response.json() + current_instance_count = worker_pool_data.get('scaling', {}).get('manualInstanceCount', 0) + logging.info(f"Current worker pool instance count: {current_instance_count}") + return current_instance_count + except requests.exceptions.RequestException as e: + logging.error(f"Error getting Cloud Run worker pool details: {e}") + if response is not None: + logging.error(f"Response Status Code: {response.status_code}") + logging.error(f"Response Text: {response.text}") + return -1 + + +def update_runner_vm_instance_count(instance_count: int): + """ + Updates a Cloud Run worker pool with the specified instance count. + """ + auth_req, access_token = get_authenticated_request() + if not access_token: + print("Failed to retrieve Google Cloud access token. Exiting.") + return + + + url = (f"https://run.googleapis.com/v2/projects/{PROJECT_ID}/locations/{LOCATION}/workerPools/" + f"{CLOUD_RUN_WORKER_POOL_NAME}?updateMask=scaling.manualInstanceCount") + headers = { + "Content-Type": "application/json", + "Authorization": f"Bearer {access_token}" + } + payload = { + "scaling": { + "scalingMode": "MANUAL", + "manualInstanceCount": instance_count + } + } + + + + + try: + response = auth_req.session.patch(url, headers=headers, json=payload) + response.raise_for_status() + + + print(f"Successfully updated Cloud Run worker pool. Status Code: {response.status_code}") + print("Response JSON:") + print(json.dumps(response.json(), indent=2)) + + + except requests.exceptions.RequestException as e: + print(f"Error updating Cloud Run worker pool: {e}") + if response is not None: + print(f"Response Status Code: {response.status_code}") + print(f"Response Text: {response.text}") + + +def create_runner_vm(count: int): + """Updates a Cloud Run worker pool to scale up to the specified count.""" + logging.info(f"Attempting to scale up Cloud Run worker pool to {count} instances.") + update_runner_vm_instance_count(count) + + +def delete_runner_vm(count: int): + """Updates a Cloud Run worker pool to scale down to the specified count.""" + logging.info(f"Attempting to scale down Cloud Run worker pool to {count} instances.") + update_runner_vm_instance_count(count) + + + + +# --- Main Webhook Handler --- + + +def github_webhook_handler(request: Request): + """ + HTTP Cloud Function that handles GitHub workflow_job events for autoscaling. + """ + logging.getLogger().setLevel(logging.INFO) # Set logging level + + + # 1. Validate Webhook Signature (IMPORTANT FOR PRODUCTION) + # You need to implement this with your GitHub Webhook Secret. + # This is commented out in your original code, but critical for security. + # Example (you need to retrieve webhook_secret from Secret Manager too): + # webhook_secret = get_secret_from_secret_manager("GITHUB_WEBHOOK_SECRET") + # if not validate_signature(request, webhook_secret): + # return ("Invalid signature", 403) + + + # 2. Parse Event + event_type = request.headers.get('X-GitHub-Event') + if event_type != 'workflow_job': + logging.info(f"Received event type '{event_type}', ignoring.") + return ("OK", 200) + + + try: + payload = request.get_json() + except Exception as e: + logging.error(f"Error parsing JSON payload: {e}") + return ("Bad Request", 400) + + + action = payload.get('action') + job = payload.get('workflow_job') + + + if not job: + logging.warning("No 'workflow_job' found in payload.") + return ("OK", 200) + + + job_id = job.get('id') + job_name = job.get('name') + job_status = job.get('status') # 'queued', 'in_progress', 'completed' + job_conclusion = job.get('conclusion') # 'success', 'failure', 'cancelled', 'skipped' + + + logging.info(f"Received workflow_job event: Job ID {job_id}, Name '{job_name}', Status '{job_status}', Action '{action}'") + + + # 3. Handle Scaling Logic + + + current_instance_count = get_current_worker_pool_instance_count() + + + if current_instance_count == -1: + logging.error("Could not retrieve current instance count. Aborting scaling operation.") + return ("Internal Server Error", 500) + + + # Scale Up: If a job is queued and we have available capacity + if action == 'queued' and job_status == 'queued': + if current_instance_count < MAX_RUNNERS: + new_instance_count = current_instance_count + 1 + logging.info(f"Job '{job_name}' is queued. Scaling up from {current_instance_count} to {new_instance_count} runners.") + create_runner_vm(new_instance_count) + else: + logging.info(f"Job '{job_name}' is queued, but max runners ({MAX_RUNNERS}) reached. Current runners: {current_instance_count}.") + + + # Scale Down: If a job is completed, find the corresponding runner and consider terminating it + elif action == 'completed' and job_status == 'completed': + # You might want more sophisticated logic here to determine which runner to shut down, + # especially if you have multiple runners and want to only shut down idle ones. + # For simplicity, this example scales down by one, ensuring it doesn't go below zero. + if current_instance_count > 0: + new_instance_count = current_instance_count - 1 + logging.info(f"Job '{job_name}' completed. Scaling down from {current_instance_count} to {new_instance_count} runners.") + delete_runner_vm(new_instance_count) + else: + logging.info(f"Job '{job_name}' completed, but no runners are currently active to scale down.") + else: + logging.info(f"Workflow job event for '{job_name}' with action '{action}' and status '{job_status}' did not trigger a scaling action.") + + + return ("OK", 200) diff --git a/run/github-runner/github-runner-autoscaler/requirements.txt b/run/github-runner/github-runner-autoscaler/requirements.txt new file mode 100644 index 0000000000..9507db40ac --- /dev/null +++ b/run/github-runner/github-runner-autoscaler/requirements.txt @@ -0,0 +1,7 @@ +Flask +requests +google-cloud-secret-manager +google-auth +google-auth-oauthlib +google-api-python-client +PyGithub diff --git a/run/github-runner/start.sh b/run/github-runner/start.sh new file mode 100644 index 0000000000..e6e87f2c2f --- /dev/null +++ b/run/github-runner/start.sh @@ -0,0 +1,39 @@ +#!/bin/bash + + +set -e + + +# Environment external variables. +GH_OWNER=$GH_OWNER +GH_REPOSITORY=$GH_REPOSITORY +GH_TOKEN=$GH_TOKEN + + +# Prepare internal variables. +GH_REPOSITORY_URL=https://github.com/${GH_OWNER}/${GH_REPOSITORY} +RUNNER_PREFIX="cloud-run-worker" +RUNNER_SUFFIX=$(cat /dev/urandom | tr -dc 'a-z0-9' | fold -w 5 | head -n 1) +RUNNER_NAME="${RUNNER_PREFIX}-${RUNNER_SUFFIX}" + + +# Configure the current runner instance with URL, token and name. +mkdir /home/docker/actions-runner && cd /home/docker/actions-runner +echo ${GH_REPOSITORY_URL} +./config.sh --unattended --url ${GH_REPOSITORY_URL} --token ${GH_TOKEN} --name ${RUNNER_NAME} + + +# Function to cleanup and remove runner from Github. +cleanup() { + echo "Removing runner..." +./config.sh remove --unattended --token ${GH_TOKEN} +} + + +# Trap signals. +trap 'cleanup; exit 130' INT +trap 'cleanup; exit 143' TERM + + +# Run the runner. +./run.sh & wait $!