Introducing the [perfmon] Extension for Cloudberry Database Monitoring #1087

fanfuxiaoran · 2025-05-06T09:35:37Z

fanfuxiaoran
May 6, 2025
Collaborator

Proposers

Xiaoran Wang

Proposal Status

Under Discussion

Abstract

We're excited to introduce the new perfmon extension, a comprehensive monitoring solution for Cloudberry Database clusters. This discussion covers its capabilities, architecture and potential use cases.
we welcome your feedback and questions!

Motivation

Why perfmon?
Traditional monitoring tools often lack:

Real-time query plan visibility
Segment-level resource attribution
Historical query analysis

perfmon addresses these gaps with deep database-aware instrumentation.

Key Features

Cluster-Wide Metrics
Hardware metrics (CPU, memory, disk, network) from all nodes.
Live Query Insights
- Query's general information, such as query text, user and status
- Resource attribution (CPU/memory/spills by query/segment)
- Progress tracking per plan node (like EXPLAIN ANALYZE for live queries)
Historical Analysis
- Query history with execution statistics
- Performance baseline comparison

Implementation

Technical Deep Dive

Component	Type	Key Responsibility	Config Required
gpmmon	Shared Lib	Metrics aggregation + storage	shared_preload_libraries
gpmon	Shared Lib	Query execution hooks + pg_query_state	shared_preload_libraries
gpsmon	Binary	Host-level metric collection	Deployed per segment

The system persists all monitoring data in the gpperfmon database, which supports standard SQL querying for data retrieval and analysis

The pg_query_state function is adapted from PostgresPro's pg_query_state https://github.com/postgrespro/pg_query_state to operate on Cloudberry Database (an MPP database system). It provides real-time query execution state monitoring capabilities.

while query is running

create table foo(a int , b int);
create table bar(a int, b int) distributed by (b);
insert into foo select generate_series(0,10000000);
insert into foo select generate_series(0,10000000);
insert into bar select generate_series(0,10000000);
select * from foo left join b on foo.a=bar.a;

you can check the running query state by pg_query_state

select pg_query_state(10226);
                                                     pg_query_state

------------------------------------------------------------------------------------------------------------------
-------
 (10226,0,"select * from foo left join bar on foo.a=bar.a;","Query Text: select * from foo left join bar on foo.a=
bar.a;+
 Gather Motion 3:1  (slice1; segments: 3) (node status: Executing) (actual rows=11091060, loops=1)
      +
   ->  Hash Left Join (node status: Executing) (actual rows=3752699, loops=1)
      +
         Hash Cond: (foo.a = bar.a)
      +
         ->  Seq Scan on foo (node status: Finished) (actual rows=6669678, loops=1)
      +
         ->  Hash (node status: Executing) (actual rows=3334839, loops=1)
      +
               ->  Redistribute Motion 3:3  (slice2; segments: 3) (node status: Finished) (actual rows=3334839, lo
ops=1)+
                     Hash Key: bar.a
      +
                     ->  Seq Scan on bar (node status: Finished) (actual rows=10000001, loops=1)",)
(1 row)

Rollout/Adoption Plan

No response

Are you willing to submit a PR?

Yes I am willing to submit a PR!

edespino · 2025-05-06T22:10:10Z

edespino
May 6, 2025
Collaborator

@fanfuxiaoran - Thanks for the detailed proposal—this is exciting functionality. That said, the architecture and component naming (gpmon, gpmmon, gpsmon) seem very reminiscent of the original Greenplum gpperfmon system from GPDB 5 and 6.

Could you clarify a few points?

🔄 Is this a revival or a full rewrite?

Is perfmon intended to revive, rebrand, or reimplement the legacy gpperfmon monitoring stack from earlier Greenplum releases?
How much of the original design and codebase is reused or adapted?
Is this based on a fork of the old implementation, or built from scratch?

⚙️ Build & Runtime Dependencies

What are the current build and runtime dependencies?
The original gpsmon used libsigar, which is an old and likely abandoned project. Is it still required?
If libsigar is still in use, are there plans to replace it or mitigate its maintenance risks?

🛠️ `configure` Integration

Previously, gpperfmon was enabled using a configure option like:
```
--enable-gpperfmon      build with gpperfmon
```
Will perfmon provide a similar configure option?

📦 Installation & Deployment

How is the extension intended to be installed and initialized?
Will there be tooling or guidance to create the gpperfmon database, or is manual setup expected?
A step-by-step outline of how users go from source to a fully monitored cluster would be very helpful.

🔤 Naming Consistency

One suggestion: rather than retaining the gpperfmon naming convention, consider renaming the database and components together using a cb prefix (e.g., cbperfmon, cbmon, cbmmon, etc.).

Using a Cloudberry-specific naming convention would:

Better reflect the project's identity and direction
Help distinguish new functionality from legacy Greenplum components
Reduce confusion around tool origin and long-term maintenance expectations

Even if the implementation shares technical roots with Greenplum, a consistent and forward-looking naming strategy would reinforce the project's independence and clarity for new adopters.

🧭 User Interface & Vision

I believe this also raises a key question: what kind of frontend or user-accessible tooling will be provided?

Previously, Greenplum included a commercial, non-open source product known as Greenplum Command Center (GPCC)—developed and maintained by Broadcom—which exposed real-time and historical metrics from the gpperfmon database in a web-based interface. GPCC allowed users to:

Monitor live queries and system usage
Analyze historical performance trends
Cancel runaway queries
Apply workload management policies

If perfmon is collecting similar telemetry, will users be expected to query it directly via SQL, or will there be:

Prebuilt dashboards (e.g., Grafana, Metabase)?
Planned integration with existing admin tools?
Custom visualizations or a future Cloudberry-native UI?

This feels like it should be part of a larger vision for observability in Cloudberry. Is that the intent?

📚 Background for Contributors New to `gpperfmon`

Not everyone in the Apache Cloudberry community may be familiar with Greenplum's gpperfmon, so including some context would help the proposal reach a broader audience.

Historically, gpperfmon provided:

System-level metrics (system_now, system_history): CPU, memory, disk, and network usage across segments
Query-level stats (queries_now, queries_history): execution time, rows processed, query text, spill usage, error codes
Segment insights (segment_history): resource usage by host and segment
Agent infrastructure: gpsmon ran per-segment, with collected metrics stored in the gpperfmon database
Integration with GPCC: GPCC consumed these metrics via SQL and exposed them graphically to end users

Installation required:

Creating the gpperfmon database via gpperfmon_install
Enabling metrics in postgresql.conf (gp_enable_gpperfmon=on, gpperfmon_port=8888)
Creating a gpmon user and .pgpass entry for internal connectivity
Configuring gpperfmon.conf to tune thresholds like min_query_time

In summary, gpperfmon offered a structured, extensible monitoring backend that was tied into a broader tooling ecosystem. If Cloudberry's perfmon is modeled after this, outlining that full vision—including backend schema, operational flow, and potential visualization plans—will help the community evaluate, adopt, and contribute effectively.

1 reply

fanfuxiaoran May 7, 2025
Collaborator Author

Hi @edespino , thanks for your suggestions and questions!

@fanfuxiaoran - Thanks for the detailed proposal—this is exciting functionality. That said, the architecture and component naming (gpmon, gpmmon, gpsmon) seem very reminiscent of the original Greenplum gpperfmon system from GPDB 5 and 6.

Could you clarify a few points?

🔄 Is this a revival or a full rewrite?

Is perfmon intended to revive, rebrand, or reimplement the legacy gpperfmon monitoring stack from earlier Greenplum releases?

How much of the original design and codebase is reused or adapted?

Is this based on a fork of the old implementation, or built from scratch?

The perfmon extension is derived from GPDB6's gpperfmon, with several significant modifications implemented.

perfmon is built as a shared library instead of a binary in GPDB6, it will be started as a backgroud worker by postgres if
the guc perfmon.enable is on. It is used as a extension, after create extension perfmon, all the tables used to store the data
will be created.
The original gpperfmon does not store query plan information in its queries_history table, whereas our enhanced perfmon extension captures this critical data. Each execution plan node includes key metrics such as rows processed and actual time cost, which are essential for users to analyze and troubleshoot slow queries effectively.
The way collecting running query statistics is totally different between perfmon and gpperfmon . In gpperfmon, when query is running, the postgres process will send the statistics to gpsmon . This will lead to a lot of traffic. Considering the performance, we just discard this part. Instead, we introduce the pg_query_state. If user wants to track the performance of an active query, can call select pg_query_state(pid).

⚙️ Build & Runtime Dependencies

What are the current build and runtime dependencies?

The original gpsmon used libsigar, which is an old and likely abandoned project. Is it still required?

If libsigar is still in use, are there plans to replace it or mitigate its maintenance risks?

Yes, the libsigar is still in use. We don't have any plans to replace it currently.

🛠️ configure Integration
Previously, gpperfmon was enabled using a configure option like:
--enable-gpperfmon      build with gpperfmon
Will perfmon provide a similar configure option?

Yes, --enable-perfmon is used to build perfmon

📦 Installation & Deployment

How is the extension intended to be installed and initialized?

Will there be tooling or guidance to create the gpperfmon database, or is manual setup expected?

A step-by-step outline of how users go from source to a fully monitored cluster would be very helpful.

Similar to Greenplum, we provide a Python script called gperfmon_install to assist users in setting up the gpperfmon database and performing additional preparatory tasks.

gpperfmon_install --port 5432 --enable --password 123456
gpstop -ari

Then perfmon will be enabled and gpmmon backgroud worker will be started to monitor the database.

🔤 Naming Consistency

One suggestion: rather than retaining the gpperfmon naming convention, consider renaming the database and components together using a cb prefix (e.g., cbperfmon, cbmon, cbmmon, etc.).

Using a Cloudberry-specific naming convention would:

Better reflect the project's identity and direction

Help distinguish new functionality from legacy Greenplum components

Reduce confusion around tool origin and long-term maintenance expectations

Even if the implementation shares technical roots with Greenplum, a consistent and forward-looking naming strategy would reinforce the project's independence and clarity for new adopters.

Currently, we don't have any plans to rename them. This needs a lot of work and may lead to a lot of conflicts. Such as if we rename gpperfmon database to cbperfmon database , other components will not work.

🧭 User Interface & Vision

I believe this also raises a key question: what kind of frontend or user-accessible tooling will be provided?

Previously, Greenplum included a commercial, non-open source product known as Greenplum Command Center (GPCC)—developed and maintained by Broadcom—which exposed real-time and historical metrics from the gpperfmon database in a web-based interface. GPCC allowed users to:

Monitor live queries and system usage

Analyze historical performance trends

Cancel runaway queries

Apply workload management policies

If perfmon is collecting similar telemetry, will users be expected to query it directly via SQL, or will there be:

Prebuilt dashboards (e.g., Grafana, Metabase)?

Planned integration with existing admin tools?

Custom visualizations or a future Cloudberry-native UI?

This feels like it should be part of a larger vision for observability in Cloudberry. Is that the intent?

Perfmon stores the data in the gpperfmon database using several tables and functions.

system usage tables: system_, diskspace_, network_interface_*
queries_now table for monitor live queries.
queries_history table for historical queries.

and one function pg_query_state
Users can directly use SQL to fetch them. Other features of GPCC is outside perfmon's scope.

📚 Background for Contributors New to gpperfmon

Not everyone in the Apache Cloudberry community may be familiar with Greenplum's gpperfmon, so including some context would help the proposal reach a broader audience.

Historically, gpperfmon provided:

System-level metrics (system_now, system_history): CPU, memory, disk, and network usage across segments

Query-level stats (queries_now, queries_history): execution time, rows processed, query text, spill usage, error codes

Segment insights (segment_history): resource usage by host and segment

Agent infrastructure: gpsmon ran per-segment, with collected metrics stored in the gpperfmon database

Integration with GPCC: GPCC consumed these metrics via SQL and exposed them graphically to end users

Installation required:

Creating the gpperfmon database via gpperfmon_install

Enabling metrics in postgresql.conf (gp_enable_gpperfmon=on, gpperfmon_port=8888)

Creating a gpmon user and .pgpass entry for internal connectivity

Configuring gpperfmon.conf to tune thresholds like min_query_time

In summary, gpperfmon offered a structured, extensible monitoring backend that was tied into a broader tooling ecosystem. If Cloudberry's perfmon is modeled after this, outlining that full vision—including backend schema, operational flow, and potential visualization plans—will help the community evaluate, adopt, and contribute effectively.

edespino · 2025-05-06T23:24:09Z

edespino
May 6, 2025
Collaborator

I’ve now reviewed the related PR (#1085), and a few important things came up that I want to bring back to this proposal thread:
🧬 Code Provenance

It’s now evident that the perfmon extension is derived from the Greenplum gpperfmon codebase. That’s valuable context, but it needs to be made explicit in this proposal for transparency and clarity—especially for those unfamiliar with Greenplum internals.

We should clarify:

What code is reused or adapted from Greenplum?

What was changed or removed?

How licensing is handled (particularly given Cloudberry’s Apache 2.0 license)

🧭 Proposal → PR Flow

Also, I want to respectfully point out that the PR was submitted before the proposal received any discussion or feedback. This limits the community’s ability to weigh in on design, naming, and direction—especially for inherited or complex features.

Can we encourage a clearer process going forward?

Propose the feature

Solicit feedback and refine

Submit the PR once there’s shared understanding

This would help ensure stronger community alignment and make it easier for new contributors to follow along.

Let me know if you’d like help updating the proposal language to reflect this background. I believe this feature has real potential — but it deserves a more transparent and community-driven rollout.

0 replies

avamingli · 2025-05-07T08:16:53Z

avamingli
May 7, 2025
Collaborator

+1 for this, very useful for customers, nice work.

0 replies

edespino · 2025-05-07T09:14:05Z

edespino
May 7, 2025
Collaborator

@fanfuxiaoran

Thanks for confirming that libsigar is still in use. However, I’d strongly encourage us to reconsider that position. Continuing to depend on libsigar without any plan to replace or modernize it introduces significant risks to the Cloudberry project:

❌ Why `libsigar` is problematic:

Unmaintained: No upstream development or security patches in years.
Broken on modern platforms: Build issues are common on RHEL 9, Debian 12, Alpine, macOS, etc.
Difficult to patch: Contributors will be forced to debug outdated C code and platform-specific breakage.
Portability blocker: This limits adoption on modern developer workstations, CI systems, and containers.
ASF sustainability concern: Bringing in obsolete dependencies violates the spirit of maintainability and long-term openness expected in ASF projects.

✅ What we need:

If perfmon is to become a core part of Cloudberry’s monitoring infrastructure, we should at minimum:

Track this as a known technical debt item, with a defined migration path or plan to deprecate libsigar
Abstract libsigar usage behind an interface, so it can be replaced in a modular way
Evaluate modern replacements:
- /proc-based stat readers (Linux-only, but simple)
- Integration with system libraries like libstatgrab, psutil, or systemd APIs
- Shell out to vmstat, iostat, or /sys/fs/cgroup metrics as a fallback

Without a plan, we risk adopting legacy constraints that will slow down contributors and users alike.

1 reply

fanfuxiaoran May 7, 2025
Collaborator Author

@fanfuxiaoran

Thanks for confirming that libsigar is still in use. However, I’d strongly encourage us to reconsider that position. Continuing to depend on libsigar without any plan to replace or modernize it introduces significant risks to the Cloudberry project:

❌ Why libsigar is problematic:

Unmaintained: No upstream development or security patches in years.

Broken on modern platforms: Build issues are common on RHEL 9, Debian 12, Alpine, macOS, etc.

Difficult to patch: Contributors will be forced to debug outdated C code and platform-specific breakage.

Portability blocker: This limits adoption on modern developer workstations, CI systems, and containers.

ASF sustainability concern: Bringing in obsolete dependencies violates the spirit of maintainability and long-term openness expected in ASF projects.

✅ What we need:

If perfmon is to become a core part of Cloudberry’s monitoring infrastructure, we should at minimum:

Track this as a known technical debt item, with a defined migration path or plan to deprecate libsigar

Abstract libsigar usage behind an interface, so it can be replaced in a modular way

Evaluate modern replacements:

/proc-based stat readers (Linux-only, but simple)

Integration with system libraries like libstatgrab, psutil, or systemd APIs

Shell out to vmstat, iostat, or /sys/fs/cgroup metrics as a fallback

Without a plan, we risk adopting legacy constraints that will slow down contributors and users alike.

Agree. I will try libstatgrab to see if it can replace libsigar in gpsmon

edespino · 2025-05-07T09:15:32Z

edespino
May 7, 2025
Collaborator

@fanfuxiaoran

🛠️ Build Environment Support for `perfmon` on Rocky Linux 9?

For this extension to move forward, we’ll need to ensure it can be built and tested reliably in the default Cloudberry build environment.

Question: Has any work been done to support building perfmon and its libsigar dependency on Rocky Linux 9, which is the current baseline used in GitHub Actions?

Some specific considerations:

libsigar hasn’t been maintained in years and has known issues compiling on modern glibc and kernel headers.
Rocky Linux 9 uses glibc 2.34+, which broke certain assumptions in sigar_os_linux.c and related memory layout assumptions.
What’s the expected path for packaging this? Will it be built as part of the Cloudberry core build or treated as an optional/isolated extension?

If build support hasn’t been tested or documented yet, I’d suggest:

Trying to integrate perfmon into the current CI build matrix (even if experimental)
Adding a tracking issue for adapting or replacing libsigar
Clarifying whether this is considered a required or optional component for Cloudberry installations

Happy to help test builds if needed — but this will need to be addressed before broader adoption is feasible.

0 replies

edespino · 2025-05-07T09:21:58Z

edespino
May 7, 2025
Collaborator

📚 User Documentation Plans?

This is a feature that will directly impact end users — it introduces an extension, SQL tables, background workers, and runtime hooks that people will need to understand to use effectively.

Could you clarify what the plans are for end-user documentation?

Some important areas that would benefit from formal documentation:

How to enable and install perfmon (CREATE EXTENSION, configuration settings, etc.)
How to initialize the gpperfmon database (via gpperfmon_install)
What each system table stores (e.g., queries_now, queries_history, system_*)
How to use pg_query_state(pid) in practice — expected output, limitations, and examples
What performance overhead, if any, is introduced
Any permissions model or user roles needed to query these tables or use the extension

Ideally this would be integrated into the Cloudberry user docs (docs/ directory), or at least as a README in the perfmon source directory for now.

Without documentation, this feature risks being misunderstood or underutilized — and it's a missed opportunity for user engagement.

Happy to help review or contribute to the initial docs if that would help.

0 replies

edespino · 2025-05-07T21:10:39Z

edespino
May 7, 2025
Collaborator

Building `perfmon` with SIGAR on Rocky Linux 9 – Dependency Discovery, Gaps, and Recommendations**

While working to build the perfmon components (gpsmon, gpmmon) in Apache Cloudberry (Incubating) on Rocky Linux 9, I encountered a series of missing dependencies and compatibility issues. This was all triggered by running the top-level configure script with the --enable-perfmon flag:

./configure --enable-perfmon

At that point, everything appeared to proceed normally — but there were no checks in configure for required dependencies like APR-util or SIGAR. The failures only surfaced later during compilation.

🧰 Environment

gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5)

🔍 Dependency Discovery & Remediation

1. APR Header Missing in `gpmmon.c`

gpmmon.c:23:10: fatal error: apr_queue.h: No such file or directory

Resolution:

sudo dnf install apr-util-devel

2. SIGAR Header Missing in `gpsmon.c`

gpsmon.c:20:10: fatal error: sigar.h: No such file or directory

Action:
Cloned SIGAR manually:

git clone https://github.com/hyperic/sigar.git
cd sigar

3. SIGAR’s `autogen.sh` Fails Without `libtool`

libtoolize 1.5.x wasn't found, exiting

Resolution:

sudo dnf install libtool

After that, ./autogen.sh, ./configure, and make ran — until deeper compilation issues arose.

❌ Compilation Errors in SIGAR

Macro conflict: `major` / `minor`

error: called object ‘major’ is not a function or function pointer
note: declared here: unsigned long major, minor;

Caused by local variables shadowing system macros from <sys/sysmacros.h>.

Deprecated API warnings:

warning: ‘readdir_r’ is deprecated

Not currently fatal, but these will likely require cleanup in future glibc versions.

I did not attempt to patch SIGAR — leaving these issues for the developer to evaluate.

🛠️ `configure` Recommendation

The configure script should be updated to:

Check for presence of apr-1 and apr-util headers/libraries
Check for SIGAR headers and libsigar availability
Provide helpful error messages or skip perfmon gracefully if missing

This would improve the experience when building with --enable-perfmon.

⚖️ Considerations for SIGAR Moving Forward

Given that SIGAR is effectively abandoned (last meaningful updates were ~2014), and its build system and source have not kept pace with modern Linux development, we should consider alternatives. Here are a few paths the project could take:

✅ Option 1: Vendor & Patch SIGAR Internally

Copy the minimal SIGAR source tree (include/ + src/os/linux/) into the perfmon/ tree.
Rename conflicting variables (major, minor), and gradually clean up deprecated code.
Maintain this minimal fork within the project under Apache 2.0 license terms.
Pros: Fastest path to stability. Controlled build. No external dependency.

🧹 Option 2: Replace SIGAR with `/proc`-based Native Code

Reimplement the small set of metrics used in perfmon using direct /proc parsing (as many modern tools do).
Drop SIGAR entirely.
Pros: Modern, no legacy baggage, fully under project control.
Cons: Requires modest development effort up front.

🧰 Option 3: Build a Thin Compatibility Shim

Abstract the limited SIGAR API used in gpsmon/gpmmon behind a clean wrapper.
Implement that wrapper internally, preserving the external API for now.
Could later swap in /proc or platform-native libraries underneath.

✅ Summary

The build experience with --enable-perfmon currently lacks dependency checks and leads to trial-and-error.
I resolved the issues up to SIGAR compilation, which still requires patching.
SIGAR is functional but clearly outdated and deserves reevaluation.
I've outlined several options to consider for modernizing or replacing SIGAR in a maintainable way.

2 replies

fanfuxiaoran May 8, 2025
Collaborator Author

@edespino
For the options to replace SIGAR, I prefer libstatgrab as it is a cross-platform library, but it provides a more limited set of collected parameters compared to SIGAR. I'm evaluating whether it can meet all of perfmon's requirements.

edespino May 8, 2025
Collaborator

Hi @fanfuxiaoran ,

Thanks for pointing out libstatgrab as a potential replacement for libsigar. One important licensing detail to be aware of:

While the library component of libstatgrab is licensed under the GNU LGPL, the tools and example programs are licensed under the GNU GPL.

Even though LGPL is less restrictive than GPL, it is still considered problematic for Apache Software Foundation (ASF) projects, particularly when there’s any risk of static linking or tight integration. ASF policy recommends avoiding both GPL and LGPL dependencies in core project code due to license incompatibility concerns.

Just flagging this early in case it influences your evaluation.

-=e

edespino · 2025-05-08T08:16:32Z

edespino
May 8, 2025
Collaborator

📌 Note: Replacing libsigar in gpsmon

Hi @fanfuxiaoran,

Following up on the effort to replace libsigar within gpsmon: while SIGAR is Apache-licensed and technically ASF-compatible, it's effectively unmaintained and fragile on modern Linux systems. We’ve looked into possible alternatives — specifically C-based, Linux-only libraries with a comparable feature set and compatible licensing.

🔍 Summary of Findings

Despite a broad survey, there does not appear to be a single, actively maintained, C-based library that:

Matches libsigar’s full system and per-process metrics coverage,
Works cleanly across modern Linux distros,
And is ASF-compatible (e.g., not LGPL/GPL).

✅ Recommendation

Given these constraints, the cleanest and most sustainable path forward is to incrementally replace SIGAR usage with direct /proc, /sys, and statvfs() parsing, organized as:

Modular internal helpers: cpu.c, mem.c, proc.c, net.c, fs.c
Simple, reusable get_*_stats() functions
Preserved structure and logic in gpsmon with minimal disruption

This approach ensures:

No licensing risk
Tight control over compatibility and behavior
A path toward long-term maintainability

This would allow you to phase out SIGAR while maintaining equivalent functionality and aligning with ASF licensing policy.

This is simply a proposed direction to consider as you evaluate how best to support the SIGAR functionality in some form going forward.

-=e

3 replies

fanfuxiaoran May 8, 2025
Collaborator Author

@edespino
Thanks for your detailed response and summary.

Sigar is implemented by reading /proc on linux. I think we don't bother to implement another 'sigar' by ourselves.
I prefer to clone the sigar and maintain it rather than putting it into the perfmon tree as we'll hardly ever need to touch it.

edespino May 8, 2025
Collaborator

Hi @fanfuxiaoran,

Thanks — I think your plan to clone and maintain a Linux-only SIGAR fork outside the perfmon tree makes a lot of sense. It keeps things modular and avoids entangling legacy build logic unnecessarily.

One thing we’ll need to be mindful of as we move forward is ensuring the fork aligns with Apache Software Foundation (ASF) release and licensing requirements, since it’ll be included as part of the Cloudberry codebase or build. That means we’ll eventually need to:

Clean up unused platforms and bindings (non-Linux, JNI, etc.)
Simplify the build so it works out of the box (no autoconf.sh or autogen steps)
Add basic versioning so we can track changes over time
Think about how we want to include it (e.g., as a submodule or vendored directly)
And make sure attribution and license files are in place according to ASF policy

None of this needs to block the clone effort itself, but we’ll want to tackle these as part of preparing the fork for inclusion in Cloudberry source releases.

Happy to coordinate or help when we get to that point.

-=e

fanfuxiaoran May 12, 2025
Collaborator Author

@edespino I forked the libsigar https://github.com/cloudberry-contrib/sigar.

edespino · 2025-05-08T10:19:42Z

edespino
May 8, 2025
Collaborator

Given the amount of work involved in preparing the SIGAR fork and ensuring it’s ready for inclusion under ASF policy, I suggest we exclude perfmon from the upcoming Apache Cloudberry (Incubating) 2.0 release.

This will give us space to:

Clean and scope the SIGAR fork (Linux-only, minimal build, etc.)
Handle licensing and attribution correctly under ASF guidelines
Decide on integration strategy (e.g., submodule vs vendored)
Set up basic versioning and testing

Also, it might make sense to close the current perfmon PR for now while we align on the overall approach. Once the SIGAR fork is ready and integrated in a maintainable way, we can revisit the PR with a clean path forward.

-=e

3 replies

edespino May 8, 2025
Collaborator

@tuhaihe & @my-ship-it - Given the amount of work needed to get SIGAR into shape to be used with perfmon, can you share your thoughts on excluding perfmon for the upcoming Apache Cloudberry (Incubating) 2.0 release

tuhaihe May 8, 2025
Collaborator

Given the significant effort required for the perfmon PR, I believe it's reasonable to exclude it from the upcoming 2.0.0 release. We can include this functionality in a later release once it's fully ready.

my-ship-it May 8, 2025
Collaborator

@tuhaihe & @my-ship-it - Given the amount of work needed to get SIGAR into shape to be used with perfmon, can you share your thoughts on excluding perfmon for the upcoming Apache Cloudberry (Incubating) 2.0 release

Thank you for in-depth communication for product maintainability. Yes, we don't need to be rush. If we need sometime to prepare SIGAR fork, it's reasonable to exclude it from 2.0.0 release, and include it in a later release.

leborchuk · 2025-05-14T09:50:52Z

leborchuk
May 14, 2025

Hi @fanfuxiaoran, very thanks for your work!

I have a thought on how to properly review and merge such complex functionality. Since we have agreed not to merge it into 2.0.0, what about creating a separate branch? In this branch, we can merge the initial set of changes, check them, discuss them, create additional PRs, and then merge the final PR into the main branch.

I was reviewing #1085, but realized that if we started discussing the implementation details, the review process would be delayed. Plus, we might postpone some ideas and come back to them later. However, they might get lost in the general discussion flow. It would be better to write them down and return to them at a later date.

0 replies

leborchuk · 2025-05-14T10:07:11Z

leborchuk
May 14, 2025

We have a similar solution for Open-GPDB: https://github.com/open-gpdb/yagp_hooks_collector/tree/YAGP-0.0.2-WIP. Right now, we have an open-sourced component that is quite similar to GPMON. It would be great if we could reuse solutions.

Is it possible to formalize the segment -> GPSMON and GPSMON -> GPMMON interface? If so, I believe we could merge the components and replace them somehow.

We use protobuf for exchange data https://github.com/open-gpdb/yagp_hooks_collector/blob/YAGP-0.0.2-WIP/protos/yagpcc_set_service.proto

0 replies

leborchuk · 2025-05-14T10:52:33Z

leborchuk
May 14, 2025

Hi, @fanfuxiaoran

I have a number of ideas about what can be done to improve diagnosis. May I ask you a few quick questions below? You can answer them in detail or just yes/no. It will allow me to understand what you consider important and what isn't.

If I rewrite your code, can you get rid of SIGAR?

We do not use it in https://github.com/powa-team/pg_stat_kcache and in https://github.com/open-gpdb/yagp_hooks_collector/blob/YAGP-0.0.2-WIP/src/ProcStats.cpp

Are you going to gather PG internal stat - here the list of stat for PG14 https://github.com/apache/cloudberry/blob/main/contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql#L29 ?
How are you going to gather network stat? We gather motion stat https://github.com/open-gpdb/yagp_hooks_collector/blob/bb3afb1605aec70a736a0fd21358537a5a50aef8/src/ProtoUtils.cpp#L150 and interconnect stat (but only UDPIfc metrics) https://github.com/open-gpdb/yagp_hooks_collector/blob/bb3afb1605aec70a736a0fd21358537a5a50aef8/src/ProtoUtils.cpp#L186
Are you going to take into account nested queries and the level of nesting? Here how it is done in pg_stat_statement https://github.com/apache/cloudberry/blob/main/contrib/pg_stat_statements/pg_stat_statements.c#L998 and the modern tests https://github.com/postgres/postgres/blob/master/contrib/pg_stat_statements/sql/level_tracking.sql
What are you going to do with queries without execution plan (those usually executed in utility mode - here the examples https://github.com/postgres/postgres/blob/master/contrib/pg_stat_statements/sql/utility.sql )
Are you going to track resources consumed by session? (In order to show not only top queries but also top sessions)
How are you going to store historical data? pg_stat_statements aggregate data, but for long-running queries it should not be done (our users demand the whole data).
pg_stat_statements calculate query identifier. Since we store execution plan too, are you going to calculate plan identifier? Here how I've done it for PG https://github.com/postgredients/pg_stat_query_plans
What type of execution plan are you going to store in historical data? We could gather explain without stat in ExecutorStart hook and explain analyze stat in ExecutorEnd hook.
How are you going to measure performance penalty?
How should the system behave if gpsmon/gpmmon is unavailable? How to limit processes of memory usage?
Is it acceptable if some of the processes are written (or replaced) in another programming language - go instead of C?

0 replies

Introducing the [perfmon] Extension for Cloudberry Database Monitoring​ #1087

Uh oh!

fanfuxiaoran May 6, 2025 Collaborator

Proposers

Proposal Status

Abstract

Motivation

Implementation

Rollout/Adoption Plan

Are you willing to submit a PR?

Replies: 12 comments · 10 replies

Uh oh!

edespino May 6, 2025 Collaborator

🔄 Is this a revival or a full rewrite?

⚙️ Build & Runtime Dependencies

🛠️ configure Integration

📦 Installation & Deployment

🔤 Naming Consistency

🧭 User Interface & Vision

📚 Background for Contributors New to gpperfmon

Uh oh!

fanfuxiaoran May 7, 2025 Collaborator Author

🔄 Is this a revival or a full rewrite?

⚙️ Build & Runtime Dependencies

🛠️ configure Integration

📦 Installation & Deployment

🔤 Naming Consistency

🧭 User Interface & Vision

📚 Background for Contributors New to gpperfmon

Uh oh!

edespino May 6, 2025 Collaborator

Uh oh!

avamingli May 7, 2025 Collaborator

Uh oh!

edespino May 7, 2025 Collaborator

❌ Why libsigar is problematic:

✅ What we need:

Uh oh!

fanfuxiaoran May 7, 2025 Collaborator Author

❌ Why libsigar is problematic:

✅ What we need:

Uh oh!

edespino May 7, 2025 Collaborator

🛠️ Build Environment Support for perfmon on Rocky Linux 9?

Uh oh!

edespino May 7, 2025 Collaborator

📚 User Documentation Plans?

Uh oh!

edespino May 7, 2025 Collaborator

Building perfmon with SIGAR on Rocky Linux 9 – Dependency Discovery, Gaps, and Recommendations**

🧰 Environment

🔍 Dependency Discovery & Remediation

1. APR Header Missing in gpmmon.c

2. SIGAR Header Missing in gpsmon.c

3. SIGAR’s autogen.sh Fails Without libtool

❌ Compilation Errors in SIGAR

Macro conflict: major / minor

Deprecated API warnings:

🛠️ configure Recommendation

⚖️ Considerations for SIGAR Moving Forward

✅ Option 1: Vendor & Patch SIGAR Internally

🧹 Option 2: Replace SIGAR with /proc-based Native Code

🧰 Option 3: Build a Thin Compatibility Shim

✅ Summary

Uh oh!

fanfuxiaoran May 8, 2025 Collaborator Author

Uh oh!

edespino May 8, 2025 Collaborator

Uh oh!

edespino May 8, 2025 Collaborator

📌 Note: Replacing libsigar in gpsmon

🔍 Summary of Findings

Uh oh!

fanfuxiaoran May 8, 2025 Collaborator Author

Uh oh!

edespino May 8, 2025 Collaborator

Uh oh!

fanfuxiaoran May 12, 2025 Collaborator Author

Uh oh!

edespino May 8, 2025 Collaborator

Introducing the [perfmon] Extension for Cloudberry Database Monitoring #1087

fanfuxiaoran
May 6, 2025
Collaborator

Replies: 12 comments 10 replies

edespino
May 6, 2025
Collaborator

🛠️ `configure` Integration

📚 Background for Contributors New to `gpperfmon`

fanfuxiaoran May 7, 2025
Collaborator Author

🛠️ `configure` Integration

📚 Background for Contributors New to `gpperfmon`

edespino
May 6, 2025
Collaborator

avamingli
May 7, 2025
Collaborator

edespino
May 7, 2025
Collaborator

❌ Why `libsigar` is problematic:

fanfuxiaoran May 7, 2025
Collaborator Author

❌ Why `libsigar` is problematic:

edespino
May 7, 2025
Collaborator

🛠️ Build Environment Support for `perfmon` on Rocky Linux 9?

edespino
May 7, 2025
Collaborator

edespino
May 7, 2025
Collaborator

Building `perfmon` with SIGAR on Rocky Linux 9 – Dependency Discovery, Gaps, and Recommendations**

1. APR Header Missing in `gpmmon.c`

2. SIGAR Header Missing in `gpsmon.c`

3. SIGAR’s `autogen.sh` Fails Without `libtool`

Macro conflict: `major` / `minor`

🛠️ `configure` Recommendation

🧹 Option 2: Replace SIGAR with `/proc`-based Native Code

fanfuxiaoran May 8, 2025
Collaborator Author

edespino May 8, 2025
Collaborator

edespino
May 8, 2025
Collaborator

fanfuxiaoran May 8, 2025
Collaborator Author

edespino May 8, 2025
Collaborator

fanfuxiaoran May 12, 2025
Collaborator Author

edespino
May 8, 2025
Collaborator

edespino May 8, 2025
Collaborator

tuhaihe May 8, 2025
Collaborator

my-ship-it May 8, 2025
Collaborator

leborchuk
May 14, 2025

leborchuk
May 14, 2025

leborchuk
May 14, 2025