Description
Please, answer some short questions which should help us to understand your problem / question better?
- Which image of the operator are you using? ghcr.io/zalando/postgres-operator:v1.14.0
After updating to postgres-operator v1.14.0 I am trying to upgrade PostgreSQL version from 15 to 17, however it's being unsuccessful.
I try it with freshly created empty database to see if reason is something in DB itself but it's still failing.
time="2025-06-11T12:37:26Z" level=info msg="postgresql version increased (15 -> 17), depending on config manual
...
...
...
time="2025-06-11T12:42:21Z" level=info msg="user id was identified as: 0, using su to reach the postgres user" cluster-name=psql-test/psql-test pkg=cluster worker=1
time="2025-06-11T12:42:23Z" level=error msg="major version upgrade failed: 2025-06-11 12:42:22,737 inplace_upgrade ERROR: Member psql-test-0 is not streaming from the primary\n" cluster-name=psql-test/psql-test pkg=cluster worker=1
Error is clear, I get this even if I run inplace_upgrade.py manually
inplace_upgrade ERROR: Member psql-test-0 is not streaming from the primary
However, this is fresh database and it's synced, no lag or anything else broken.
State is streaming in both patronictl output and also in table pg_stat_replication, which I believe is what inplace_upgrade.py script is checking
root@psql-test-0:/home/postgres# patronictl topology
+ Cluster: psql-test (7514662593666207815) ----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+---------------+--------------+---------+-----------+----+-----------+
| psql-test-1 | XY.XYZ.X.XYZ | Leader | running | 2 | |
| + psql-test-0 | XY.XYZ.X.XY | Replica | streaming | 2 | 0 |
| + psql-test-2 | XY.XYZ.X.XYZ | Replica | streaming | 2 | 0 |
+---------------+--------------+---------+-----------+----+-----------+
postgres=# select * from pg_catalog.pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time
-----+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+------------+---------------+------------+-------------------------------
524 | 16720 | standby | psql-test-2 | XY.XY.X.X | | 34472 | 2025-06-11 12:39:22.901287+00 | | streaming | 0/A064A20 | 0/A064A20 | 0/A064A20 | 0/A064A20 | | | | 0 | async | 2025-06-11 12:57:48.020315+00
650 | 16720 | standby | psql-test-0 | XY.XY.X.X | | 59692 | 2025-06-11 12:40:07.178861+00 | | streaming | 0/A064A20 | 0/A064A20 | 0/A064A20 | 0/A064A20 | | | | 0 | async | 2025-06-11 12:57:48.045346+00
(2 rows)
Database manifest is simple, basic:
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
name: psql-test
namespace: psql-test
spec:
databases:
keycloak: postgres_keycloak_owner
numberOfInstances: 3
patroni: {}
podAnnotations:
prometheus.io/path: /data/metrics
prometheus.io/port: "9187"
prometheus.io/scrape: "true"
postgresql:
parameters:
min_wal_size: 80MB
max_wal_size: 1G
checkpoint_timeout: 15min
password_encryption: scram-sha-256
version: "17"
resources:
limits:
cpu: 1100m
memory: 2298Mi
requests:
cpu: "1"
memory: 2Gi
users:
postgres_keycloak_owner:
- superuser
- createdb
volume:
size: 15Gi
env:
- s3 related variables
...
...
...
- name: USE_WALG_BACKUP
value: 'true'
- name: BACKUP_SCHEDULE
value: "00 06 * * *"
- name: BACKUP_NUM_TO_RETAIN
value: "7"