PostgreSQL repmgr: pg_rewind fails with pg_control error, pg_basebackup overwrites data

### Name and Version

bitnami/postgresql-repmgr 16

---

### What steps will reproduce the bug?

1. Deploy a 3-node PostgreSQL cluster using the Docker Compose config below
2. Wait for the cluster to be fully initialized and synchronized
3. Stop one standby node:

   ```bash
   docker exec -ti <standby_container> pg_ctl stop -D /bitnami/postgresql/data -m fast
   ```
4. Make some changes on the primary node to create WAL divergence
5. Attempt to run pg\_rewind manually:

   ```bash
   docker exec -ti <standby_container> pg_rewind \
     --target-pgdata=/bitnami/postgresql/data \
     --source-server="host=postgres-0 port=5432 user=repmgr dbname=repmgr" \
     --progress
   ```

---

### Are you using any custom parameters or values?

**Docker Compose Configuration:**

```yaml
version: '3.8'

x-version-common: &service-common
  image: bitnami/postgresql-repmgr:${POSTGRES_RELEASE:-latest}
  volumes:
    - ${POSTGRES_VOLUME_PATH:?err}/postgres:${POSTGRESQL_DATA_DIR:-/bitnami/postgresql}:Z

x-common-env: &common-env
  BITNAMI_DEBUG: "true"
  POSTGRESQL_FSYNC: "on"
  POSTGRESQL_PASSWORD: ${POSTGRES_PASSWORD:-odoo}
  POSTGRESQL_POSTGRES_PASSWORD: ${ADMIN_POSTGRES_PASSWORD:-postgres}
  POSTGRESQL_USERNAME: ${POSTGRES_USERNAME:-odoo}
  POSTGRESQL_WAL_LEVEL: replica
  POSTGRESQL_SYNCHRONOUS_COMMIT_MODE: "on"
  POSTGRESQL_NUM_SYNCHRONOUS_REPLICAS: 1
  POSTGRESQL_SYNCHRONOUS_REPLICAS_MODE: "FIRST"
  POSTGRESQL_CLUSTER_APP_NAME: "*"
  POSTGRESQL_CONF_DIR: /bitnami/postgresql/data
  REPMGR_DEGRADED_MONITORING_TIMEOUT: 300
  REPMGR_FAILOVER: automatic
  REPMGR_MASTER_RESPONSE_TIMEOUT: 30
  REPMGR_MONITORING_HISTORY: "yes"
  REPMGR_PARTNER_NODES: postgres-0,postgres-1,postgres-2
  REPMGR_PASSWORD: ${REPMGR_PASSWORD:-repmgr}
  REPMGR_PRIMARY_HOST: postgres-0
  REPMGR_PRIMARY_VISIBILITY_CONSENSUS: "true"
  REPMGR_RECONNECT_ATTEMPTS: 10
  REPMGR_RECONNECT_INTERVAL: 10
  REPMGR_USE_PGREWIND: "yes"
  REPMGR_USE_REPLICATION_SLOTS: 1

services:
  postgres-0:
    <<: *service-common
    environment:
      <<: *common-env
      REPMGR_NODE_NAME: postgres-0
      REPMGR_NODE_NETWORK_NAME: postgres-0
      REPMGR_NODE_PRIORITY: 100
    deploy:
      placement:
        constraints: [node.labels.postgres-0 == true]

  postgres-1:
    <<: *service-common
    environment:
      <<: *common-env
      REPMGR_NODE_NAME: postgres-1
      REPMGR_NODE_NETWORK_NAME: postgres-1
      REPMGR_NODE_PRIORITY: 90
    deploy:
      placement:
        constraints: [node.labels.postgres-1 == true]

  postgres-2:
    <<: *service-common
    environment:
      <<: *common-env
      REPMGR_NODE_NAME: postgres-2
      REPMGR_NODE_NETWORK_NAME: postgres-2
      REPMGR_NODE_PRIORITY: 80
    deploy:
      placement:
        constraints: [node.labels.postgres-2 == true]
```

---

### What is the expected behavior?

1. `pg_rewind` should synchronize a diverged standby node successfully without wiping existing data
2. When resync is needed, repmgr tries `pg_rewind` first
3. `pg_basebackup` is only used as a fallback if rewind fails or on explicit user command

---

### What do you see instead?

**Current Error:**

```
pg_rewind: error: could not open file "/bitnami/postgresql/data/global/pg_control" for reading: No such file or directory
```

**Previous error (fixed by setting `POSTGRESQL_CONF_DIR`):**

```
postgres: could not access the server configuration file "/bitnami/postgresql/data/postgresql.conf": No such file or directory
```

**Fallback behavior:**

After `pg_rewind` fails, repmgr automatically runs `pg_basebackup` which **deletes and overwrites** the entire data directory, risking data loss or unnecessary reinitialization.

---

### Additional information

* Environment: Docker Swarm mode, 3 nodes
* Volume mount: `${POSTGRES_VOLUME_PATH}/postgres:${POSTGRESQL_DATA_DIR:-/bitnami/postgresql}:Z`
* PostgreSQL config:

  * `wal_log_hints = on` (required for pg\_rewind)
  * `data_checksums = off`
* Replication works normally otherwise
* Manual `pg_rewind` run returns "no rewind required" when nodes are synchronized

---

### Questions

1. How to properly structure/configure data directories so `pg_rewind` can locate the `pg_control` file?
2. Can automatic fallback to `pg_basebackup` be disabled to avoid data loss after failed rewind?
3. Is the use of `POSTGRESQL_CONF_DIR=/bitnami/postgresql/data` correct or should it point elsewhere?

---

### Root Cause Analysis

The core issue stems from a **configuration file location mismatch** between Bitnami's PostgreSQL structure and `pg_rewind`'s expectations.

**The Problem:**
`pg_rewind` is executed with these parameters:
```bash
pg_rewind -D "$POSTGRESQL_DATA_DIR" --source-server "host=${REPMGR_CURRENT_PRIMARY_HOST} port=${REPMGR_CURRENT_PRIMARY_PORT} user=${REPMGR_USERNAME} dbname=${REPMGR_DATABASE}"
```

Where `POSTGRESQL_DATA_DIR` = `/bitnami/postgresql/data`

**Why it fails:**
1. `pg_rewind` internally launches PostgreSQL in **single-user mode** to perform crash recovery
2. PostgreSQL in single-user mode **automatically searches** for `postgresql.conf` in the directory specified by the `-D` parameter
3. In Bitnami's structure:
   - Data directory: `/bitnami/postgresql/data` 
   - Config files: `/opt/bitnami/postgresql/conf/postgresql.conf`
4. When PostgreSQL (launched by pg_rewind) looks for `/bitnami/postgresql/data/postgresql.conf`, it **doesn't exist**
5. This causes the single-user PostgreSQL process to fail, preventing `pg_rewind` from completing

**The sequence:**
```
pg_rewind -D /bitnami/postgresql/data
  └── Launches PostgreSQL in single-user mode
      └── PostgreSQL looks for /bitnami/postgresql/data/postgresql.conf
          └── File not found → Process fails
              └── pg_rewind fails
                  └── repmgr falls back to pg_basebackup (data loss risk)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PostgreSQL repmgr: pg_rewind fails with pg_control error, pg_basebackup overwrites data #83755

Name and Version

What steps will reproduce the bug?

Are you using any custom parameters or values?

What is the expected behavior?

What do you see instead?

Additional information

Questions

Root Cause Analysis

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PostgreSQL repmgr: pg_rewind fails with pg_control error, pg_basebackup overwrites data #83755

Description

Name and Version

What steps will reproduce the bug?

Are you using any custom parameters or values?

What is the expected behavior?

What do you see instead?

Additional information

Questions

Root Cause Analysis

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions