Skip to content

inplace upgrade fails - member is not streaming from primary #2923

Open
@pbagona

Description

@pbagona

Please, answer some short questions which should help us to understand your problem / question better?

  • Which image of the operator are you using? ghcr.io/zalando/postgres-operator:v1.14.0

After updating to postgres-operator v1.14.0 I am trying to upgrade PostgreSQL version from 15 to 17, however it's being unsuccessful.
I try it with freshly created empty database to see if reason is something in DB itself but it's still failing.

time="2025-06-11T12:37:26Z" level=info msg="postgresql version increased (15 -> 17), depending on config manual 
...
...
...
time="2025-06-11T12:42:21Z" level=info msg="user id was identified as: 0, using su to reach the postgres user" cluster-name=psql-test/psql-test pkg=cluster worker=1
time="2025-06-11T12:42:23Z" level=error msg="major version upgrade failed: 2025-06-11 12:42:22,737 inplace_upgrade ERROR: Member psql-test-0 is not streaming from the primary\n" cluster-name=psql-test/psql-test pkg=cluster worker=1

Error is clear, I get this even if I run inplace_upgrade.py manually

inplace_upgrade ERROR: Member psql-test-0 is not streaming from the primary

However, this is fresh database and it's synced, no lag or anything else broken.
State is streaming in both patronictl output and also in table pg_stat_replication, which I believe is what inplace_upgrade.py script is checking

root@psql-test-0:/home/postgres# patronictl topology
+ Cluster: psql-test (7514662593666207815) ----------+----+-----------+
| Member        | Host         | Role    | State     | TL | Lag in MB |
+---------------+--------------+---------+-----------+----+-----------+
| psql-test-1   | XY.XYZ.X.XYZ | Leader  | running   |  2 |           |
| + psql-test-0 | XY.XYZ.X.XY  | Replica | streaming |  2 |         0 |
| + psql-test-2 | XY.XYZ.X.XYZ | Replica | streaming |  2 |         0 |
+---------------+--------------+---------+-----------+----+-----------+

postgres=# select * from pg_catalog.pg_stat_replication;
 pid | usesysid | usename | application_name | client_addr | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |          reply_time           
-----+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+------------+---------------+------------+-------------------------------
 524 |    16720 | standby | psql-test-2      | XY.XY.X.X   |                 |       34472 | 2025-06-11 12:39:22.901287+00 |              | streaming | 0/A064A20 | 0/A064A20 | 0/A064A20 | 0/A064A20  |           |           |            |             0 | async      | 2025-06-11 12:57:48.020315+00
 650 |    16720 | standby | psql-test-0      | XY.XY.X.X   |                 |       59692 | 2025-06-11 12:40:07.178861+00 |              | streaming | 0/A064A20 | 0/A064A20 | 0/A064A20 | 0/A064A20  |           |           |            |             0 | async      | 2025-06-11 12:57:48.045346+00
(2 rows)

Database manifest is simple, basic:

apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: psql-test
  namespace: psql-test
spec:
  databases:
    keycloak: postgres_keycloak_owner
  numberOfInstances: 3
  patroni: {}
  podAnnotations:
    prometheus.io/path: /data/metrics
    prometheus.io/port: "9187"
    prometheus.io/scrape: "true"
  postgresql:
    parameters:
      min_wal_size: 80MB
      max_wal_size: 1G
      checkpoint_timeout: 15min
      password_encryption: scram-sha-256
    version: "17"
  resources:
    limits:
      cpu: 1100m
      memory: 2298Mi
    requests:
      cpu: "1"
      memory: 2Gi
  users:
    postgres_keycloak_owner:
    - superuser
    - createdb
  volume:
    size: 15Gi
  env:
    - s3 related variables
    ...
    ...
    ...
    - name: USE_WALG_BACKUP
      value: 'true'
    - name: BACKUP_SCHEDULE
      value: "00 06 * * *"
    - name: BACKUP_NUM_TO_RETAIN
      value: "7"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions