Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
59abe15
Set up weekly/automatic sync from Intel gprofiler-performance-studio …
ashokbytebytego Jul 2, 2025
636b22d
Updated base branch name
ashokbytebytego Jul 2, 2025
34d46ce
Merge remote-tracking branch 'intel/master' into intel-sync
ashokbytebytego Jul 2, 2025
0886d5a
Merge branch 'intel:master' into master
ashokbytebytego Jul 7, 2025
5713c4d
Merge pull request #5 from pinterest/intel-sync
ashokbytebytego Jul 7, 2025
51bd967
gProfiler command control
prashantbytesyntax Jul 7, 2025
981ae45
Add documentation for creating and configuring the development enviro…
artursarlo Jul 22, 2025
6b37f78
Remove vscode folder from git
mouralucas Jul 11, 2025
bef76c7
Adjust command id in profile command create or update
mouralucas Jul 11, 2025
8890e59
Adjust the status update for profiling command and profiling request …
mouralucas Jul 14, 2025
9df4a72
Improve pydantic validations
mouralucas Jul 14, 2025
03a6ff3
Refactor profiling request pydantic models to its correct file
mouralucas Jul 14, 2025
b6064ef
Improved validation and documentation
mouralucas Jul 15, 2025
dcde0c0
code skeleton for gProfiler command control
prashantbytesyntax Jul 7, 2025
a64256f
Removed unused code
mouralucas Jul 15, 2025
9ba117c
Modify python requirements for gprofiler-dev[postgres] to resolve pip…
artursarlo Jul 16, 2025
68001bf
Resolve merge conflicts introduced from latest merge
artursarlo Jul 16, 2025
42a54b8
Fix updates to ProfilingRequests table
artursarlo Jul 16, 2025
fd7a69a
Add test framework skeleton
artursarlo Jul 18, 2025
8089137
Add backend unit tests for POST /api/metrics/heartbeat endpoint
artursarlo Jul 18, 2025
07203cc
Add backend unit tests for POST /api/metrics/command_completion endpoint
artursarlo Jul 18, 2025
3c096f5
Add integration tests for backend and DB: create profile requests
artursarlo Jul 18, 2025
6cf0b37
Add integration tests for backend and DB: create profile requests for…
artursarlo Jul 18, 2025
d79975b
Add integration tests for backend and DB: create profile requests for…
artursarlo Jul 18, 2025
55d416a
Add docker configuration for the test container
artursarlo Jul 18, 2025
ac7b9a7
Add configuration and instructions for test environment: how to run t…
artursarlo Jul 21, 2025
fa88de1
Enhance configuration files for test environment: better profile opti…
artursarlo Jul 22, 2025
7236976
Add better instructions for running tests
artursarlo Jul 22, 2025
b436ce3
update pydantic version
mouralucas Jul 18, 2025
002da24
Add relation between hastname and PID
mouralucas Jul 22, 2025
83013e0
Add validation to not add status failed or completed to commands that…
mouralucas Jul 22, 2025
7433afc
Removed unused code
mouralucas Jul 22, 2025
932c915
Adjust mark_profiling_request_assigned method
mouralucas Jul 22, 2025
9b7ac43
Removed duplicate sql files
mouralucas Jul 22, 2025
3b830f2
Removed duplicate sql files
mouralucas Jul 22, 2025
788696b
Fix rabase problems
mouralucas Jul 23, 2025
54ed839
Merge remote-tracking branch 'origin/create_profile_request_n_test_fr…
mouralucas Jul 23, 2025
4ce542e
Add missing stop_level param and improved combined_config
mouralucas Jul 23, 2025
0c6830a
Resolve conflicts from previous merges on db_manager
artursarlo Jul 23, 2025
4561806
Removed request_type default from class ProfilingRequest
mouralucas Jul 23, 2025
42dd203
Merge remote-tracking branch 'origin/create_profile_request_n_test_fr…
mouralucas Jul 23, 2025
57ee272
Modify tests to conform with most recent payload of backend API
artursarlo Jul 23, 2025
58fd66a
Refactored host/pid relation
mouralucas Jul 23, 2025
bfe939a
Merge remote-tracking branch 'origin/create_profile_request_n_test_fr…
mouralucas Jul 23, 2025
5af2616
Modify tests to conform with most recent payload of backend API
artursarlo Jul 23, 2025
26a7067
Add stop_level check in profile request model
mouralucas Jul 24, 2025
c7f5d11
Merge remote-tracking branch 'origin/create_profile_request_n_test_fr…
mouralucas Jul 24, 2025
af77b63
Fix integration tests for new create profile request payload
artursarlo Jul 24, 2025
ee9e6c4
Fix POST /api/metrics/profile_request for command restarts
artursarlo Jul 24, 2025
ce0ef0c
Update integration tests for creating profiling requests
artursarlo Jul 24, 2025
5b86e19
Add new integration tests for creating profiling requests to cover pa…
artursarlo Jul 24, 2025
dd6c565
Add new integration tests for creating profiling requests to cover re…
artursarlo Jul 24, 2025
d62856c
Handle request_ids list in get_pending_profiling_command
mouralucas Jul 25, 2025
89f4eb5
Merge remote-tracking branch 'origin/create_profile_request_n_test_fr…
mouralucas Jul 25, 2025
c27978a
Fixed update status for request when command is completed
mouralucas Jul 25, 2025
cfd02d7
Rollback pydantic to V1
mouralucas Jul 25, 2025
f1390bf
Add continous profiling to command control logic
artursarlo Jul 29, 2025
081135c
Fix command completion logic to correctly handle command 'restart' an…
artursarlo Jul 30, 2025
e9d9ff7
Remove 'stopped' status from the command control logic
artursarlo Aug 1, 2025
121c6b2
Add better descritption to dev environment setup
artursarlo Aug 1, 2025
cf090f8
Merge pull request #10 from pinterest/add_dev_env_tutorial
artursarlo Aug 4, 2025
a224b67
Merge pull request #12 from pinterest/add_continuous_profiling
artursarlo Aug 4, 2025
f825845
Merge pull request #11 from pinterest/create_profile_request_n_test_f…
artursarlo Aug 4, 2025
955c67e
UI support to start/stop profiling
prashantbytesyntax Aug 6, 2025
ccb26d1
Add status list to get_profiling_host_status() in backend
mouralucas Aug 7, 2025
0865334
Fix failed update after a action is done
mouralucas Aug 8, 2025
85cbd8f
Added header to the page and sidebar icon
mouralucas Aug 8, 2025
53b5f56
Keep status names as is in database for the host_status endpoint
mouralucas Aug 11, 2025
2856b5b
Merge pull request #13 from pinterest/command-control-ui
mouralucas Aug 11, 2025
469033d
Lint fixes
mouralucas Aug 14, 2025
da6652a
Merge branch 'master' into command-control-ui_lint_fixes
mouralucas Aug 14, 2025
fefb7e2
Added continuous true to profiling request payload in the UI
mouralucas Aug 18, 2025
febc8ad
Added continuous param to request profiling payload in the ui
mouralucas Aug 18, 2025
81b46ac
Merge pull request #14 from pinterest/add_new_ui_param
mouralucas Aug 18, 2025
085b009
Added heartbeat_timestamp to host_status reponse
mouralucas Aug 27, 2025
a67d7e7
Merge branch 'master' into command-control-ui_lint_fixes
mouralucas Aug 27, 2025
55607bb
Merge pull request #16 from pinterest/CE3587
mouralucas Aug 27, 2025
3ddbce1
Merge branch 'command-control-ui_lint_fixes' into CE3587
mouralucas Aug 27, 2025
77833c7
Lint fixes
mouralucas Aug 27, 2025
a2d00cc
Merge pull request #17 from pinterest/lint_fixes
mouralucas Aug 27, 2025
ecbaf6c
temp: save untracked optimization files
prashantbytesyntax Aug 20, 2025
cd0bd1b
resolve time issue
prashantbytesyntax Aug 28, 2025
ce39750
Modify backend /metrics/heartbeat endpoint to always send last command
artursarlo Aug 29, 2025
3c742cd
Merge pull request #19 from pinterest/modify_heartbeat_always_send_la…
artursarlo Aug 29, 2025
2887898
Fix heartbeat logic to correclty handle request_ids array
artursarlo Sep 2, 2025
1a5c847
Merge pull request #20 from pinterest/fix_heartbeat_always_send_last_…
artursarlo Sep 2, 2025
425eac1
adding env vars
prashantbytesyntax Aug 28, 2025
fb0372a
Merge pull request #18 from pinterest/hourly_snapshot
prashantbytesyntax Sep 3, 2025
0a39c1e
Added Dynamic Filtering Logic with service name
ashokbytebytego Sep 16, 2025
68cd5d8
Fixed review comments
ashokbytebytego Sep 17, 2025
988f0b7
Fixed formating issue
ashokbytebytego Sep 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions .github/workflows/sync-upstream.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
name: Sync Intel gprofiler-performance-studio

on:
schedule:
- cron: '0 5 * * 1' # Mondays at 5am UTC
workflow_dispatch:

jobs:
sync:
runs-on: ubuntu-latest
steps:
- name: Checkout pinterest/gprofiler-performance-studio
uses: actions/checkout@v4
with:
persist-credentials: false
fetch-depth: 0

- name: Add Intel Remote
run: git remote add intel https://github.com/intel/gprofiler-performance-studio.git

- name: Fetch Intel
run: git fetch intel

- name: Create or Update sync branch
run: |
git checkout -B intel-sync
git merge --no-ff intel/master || true

- name: Push sync branch
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: git push -f origin intel-sync

- name: Create or Update PR
uses: peter-evans/create-pull-request@v6
with:
token: ${{ secrets.GITHUB_TOKEN }}
branch: intel-sync
title: 'Sync from upstream Intel gprofiler-performance-studio'
body: |
Automated weekly sync from Intel's gprofiler-performance-studio repository.
Please review, resolve conflicts (if any), and approve to merge into main.
base: master
Comment on lines +10 to +43

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ venv*
*.egg-info/

.idea/

.vscode/

# File-based project format
*.iws
Expand Down
136 changes: 136 additions & 0 deletions DEPLOYMENT_CHECKLIST.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Deployment Checklist for Heartbeat Profiling System

This checklist covers what needs to be done to deploy the heartbeat profiling system.

## ✅ Completed Implementation

### Backend (Performance Studio)
- [x] API endpoints implemented (`/profile_request`, `/heartbeat`, `/command_completion`)
- [x] Database methods implemented in DBManager
- [x] Pydantic models defined for all request/response types
- [x] Error handling and logging implemented
- [x] Router registration in FastAPI app completed

### Database Schema
- [x] SQL migration files created
- [x] Database functions defined
- [x] Indexes and triggers created
- [x] Enum types defined

### Agent (gprofiler)
- [x] Heartbeat loop implementation
- [x] Command processing logic
- [x] Idempotency handling
- [x] Status reporting

## 🔧 Deployment Steps Required

### 1. Database Migration
Run the following SQL files on your PostgreSQL database:
```bash
# Apply the schema migrations
psql -d your_database -f scripts/setup/postgres/add_profiling_tables.sql
psql -d your_database -f scripts/setup/postgres/add_heartbeat_system_tables.sql
```

### 2. Backend Deployment
The backend code is ready to deploy. No additional configuration needed.

### 3. Agent Deployment
Deploy the updated gprofiler agent code with heartbeat functionality.

## 🧪 Testing Steps

### 1. Test Backend APIs
Use the test script to validate the backend:
```bash
python test_heartbeat_system.py
```

### 2. Test Agent Integration
Run the agent simulator:
```bash
python run_heartbeat_agent.py
```

### 3. End-to-End Testing
1. Start the backend service
2. Run database migrations
3. Submit a profiling request via `/api/metrics/profile_request`
4. Start an agent (or simulator)
5. Verify agent receives command via heartbeat
6. Verify command completion is reported

## 📋 Configuration

### Environment Variables
No new environment variables are required. The system uses existing database configurations.

### Database Connection
Ensure PostgreSQL connection is properly configured in your existing DBManager setup.

## 🔍 Monitoring

### Logs to Monitor
- Profiling request submissions
- Heartbeat processing
- Command distribution to agents
- Command completion status
- Database operation errors

### Key Metrics
- Number of pending vs completed profiling requests
- Agent heartbeat frequency and health
- Command execution success rates
- Database query performance

## 🚨 Potential Issues

### 1. Database Schema Conflicts
If you have existing `ProfilingRequests` table, you may need to modify the migration scripts to update the existing table instead of creating new ones.

### 2. Agent Compatibility
Ensure all agents are updated to support the new heartbeat protocol before submitting profiling requests.

### 3. Database Permissions
Ensure the application database user has permissions to:
- Create/update/delete records in the new tables
- Execute the new functions
- Use array operations and UUID types

## 📖 API Documentation

### Endpoints Available:
- `POST /api/metrics/profile_request` - Submit profiling requests
- `POST /api/metrics/heartbeat` - Agent heartbeat and command polling
- `POST /api/metrics/command_completion` - Report command execution status

### API Documentation:
FastAPI automatic documentation available at `/api/v1/docs` when the service is running.

## 🔄 Rollback Plan

If needed, rollback steps:

1. **Database Rollback**: Remove the new tables and functions
2. **Code Rollback**: Deploy previous version without heartbeat functionality
3. **Agent Rollback**: Deploy agents without heartbeat support

## ✅ Success Criteria

The deployment is successful when:
- [ ] Database migrations complete without errors
- [ ] Backend service starts and health check passes
- [ ] API endpoints respond correctly to test requests
- [ ] Agents can successfully poll for commands via heartbeat
- [ ] End-to-end profiling workflow works (request → command → execution → completion)
- [ ] All logs show expected behavior without errors

## 📞 Support

For issues during deployment:
1. Check application logs for errors
2. Verify database connectivity and permissions
3. Ensure all migration scripts have been executed
4. Validate agent and backend can communicate
5. Check API endpoint accessibility and responses
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,7 @@ as ClickHouse needs to rewrite TTL information for all involved partitions.
This operation is synchronous.

## Local running and development
For more detailed instructions (already partially covered on at the [usage](#usage) section) on how to setup a development environment, please refer to [this guide](deploy/README.md)

To develop the project, it may be useful to run each component locally, see relevant README in each service
- [webapp (backend and frontend)](src/gprofiler/README.md)
Expand Down
149 changes: 149 additions & 0 deletions deploy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
# Info
This guide aims to help users setup a local development environment for **Gprofiler Performance Studio**.

The end goal being:
* Have all services running locally as containers;
* Have the ability to orchestrate dev containers using **Docker Compose**;
* Have the ability to correctly expose and forward service ports to local development tools (in case the test environment needs to be run on remote host machine);

Some steps present here are already part of the [general guides of the project](../README.md#usage).

## 1. Pre-requisites
Before using the Continuous Profiler, ensure the following:
- You have an AWS account and configure your credentials, as the project utilizes AWS SQS and S3.
- You'll also need to create an SQS queue and an S3 bucket.
- You have Docker and docker-compose installed on your machine.


### 1.1 Security
By default, the system is required to set a basic auth username and password;
you can generate it by running the following command:
```shell
# assuming that you located in the deploy directory
htpasswd -B -C 12 -c .htpasswd <your username>
# the prompt will ask you to set a password
```
This file is required to run the stack

Also, a TLS certificate is required to run the stack,
see [Securing Connections with SSL/TLS](#securing-connections-with-ssltls) for more details.

### 1.2 Securing Connections with SSL/TLS
When accessing the Continuous Profiler UI through the web,
it is important to set up HTTPS to ensure the communication between Continuous Profiler and the end user is encrypted.
As well as communication between webapp and ch-rest-service expected to be encrypted.

Besides the security aspect, this is also required
for the browser to allow the use of some UI features that are blocked by browsers for non-HTTPS connections.


The TLS is enabled by default, but it requires you to provide a certificates:

Main nginx certificates location
- `deploy/tls/cert.pem` - TLS certificate
- `deploy/tls/key.pem` - TLS key

CH REST service certificates location:
- `deploy/tls/ch_rest_cert.pem` - TLS certificate
- `deploy/tls/ch_rest_key.pem` - TLS key

_See [Self-signed certificate](#self-signed-certificate) for more details._

#### 1.2.1 Self-signed certificate
If you don't have a certificate, you can generate a self-signed certificate using the following command:
```shell
cd deploy
mkdir -p tls
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout tls/key.pem -out tls/cert.pem
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout tls/ch_rest_key.pem -out tls/ch_rest_cert.pem
```
Pay attention, self-signed certificates are not trusted by browsers and will require you to add an exception.

:bangbang: IMPORTANT: If you are using a self-signed certificate,
you need the agent to trust it,
or to disable the TLS verification by adding `--no-verify` flag to the agent configuration.

For example,
that will run a docker installation agent with self-signed certificate
(that will communicate from docker network to host network):
```shell
docker run --name granulate-gprofiler --restart=always -d --pid=host --userns=host --privileged intel/gprofiler:latest -cu --token="<token from api or ui>" --service-name="my-super-service" --server-host "https://host.docker.internal" --glogger-server "https://host.docker.internal" --no-verify
```

### 1.3 Port exposure for local development tools
Make sure to uncomment the "ports" definition for each service you want to expose on [docker-compose.yml](./docker-compose.yml).

Also, make sure to correctly configure the port on your host machine that you want to bind to the port on the container. Ex:
```Dockerfile
#...
ports:
- "<your-host-machine-port>:<container-port>"
#...
```

### 1.4 [Optional] Port forwarding via SSH for remote dev environment
**This step should only be completed in case your dev environment is running on a remote machine, but your dev tools (i.e postman, db client, browser) are running on you local machine.**

Also, this technique is particularly useful for cases where you don't want to (or can't) open ports on your remote machine for debugging.

#### 1.4.1 [Optional] Configure SSH client to forward specific ports for specific host
1- Open the configuration file for your SSH client:
```sh
vim ~/.ssh/config
```
2- Modify or include port forwarding options tho the remote host
```configfile
Host <remote-host-alias-for-ssh-client>
User <remote-host-user>
ForwardAgent yes
HostName <remote-host-address>
# This port forwarding config intends to forward the 443 port used by the load balancer service to the local host
LocalForward 443 127.0.0.1:443
# This port forwarding config intends to forward the 5432 port used by the postgres service to the local host
LocalForward 5432 127.0.0.1:5432
```
3- Reconnect to the remote host with the new config
```sh
ssh <remote-host-alias-for-ssh-client>
```

## 2. Running the stack
To run the entire stack built from source, use the docker-compose project located in the `deploy` directory.

The `deploy` directory contains:
- `docker-compose.yml` - The Docker compose file.
- `.env` - The environment file where you set your AWS credentials, SQS/S3 names, and AWS region.
- `https_nginx.conf` - Nginx configuration file used as an entrypoint load balancer.
- `diagnostics.sh`- A script for testing connectivity between services and printing useful information.
- `tls` - A directory for storing TLS certificates (see [Securing Connections with SSL/TLS](#securing-connections-with-ssltls)).
- `.htpasswd` - A file for storing basic auth credentials (see above).

To launch the stack, run the following commands in the `deploy` directory:
```shell
cd deploy
docker-compose --profile with-clickhouse up -d --build
```

Check that all services are running:
```shell
docker-compose ps
```

You should see the following containers with the prefix 'gprofiler-ps*':
* gprofiler-ps-agents-logs-backend
* gprofiler-ps-ch-indexer
* gprofiler-ps-ch-rest-service
* gprofiler-ps-clickhouse
* gprofiler-ps-nginx-load-balancer
* gprofiler-ps-periodic-tasks
* gprofiler-ps-postgres
* gprofiler-ps-webapp

Now You can access the UI by navigating to https://localhost:4433 in your browser
(4433 is the default port, configurable in the docker-compose.yml file).

## 3. Destroying the stack
```shell
docker-compose --profile with-clickhouse down -v
```
The `-v` option deletes also the volumes that mens that all data will be truncated
4 changes: 2 additions & 2 deletions deploy/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ services:
- POSTGRES_PASSWORD=$POSTGRES_PASSWORD
- POSTGRES_DB=$POSTGRES_DB
# for debug
# ports:
# - "54321:5432"
ports:
- "5432:5432"
volumes:
- db_postgres:/var/lib/postgresql/data
- ../scripts/setup/postgres/gprofiler_recreate.sql:/docker-entrypoint-initdb.d/create_scheme.sql
Expand Down
Loading