Description
We have observed already few cases, where suddenly (after days of running a GKE cluster), the size of database starts growing.
As an example, we have a cluster, that was running without any issues for ~2 weeks (size of database was ~16MB), and then its database started growing. The growth wasn't immediate - it took ~2days, before it reached 4GB limit and there were steps of growing.
For the reference we have backups (snapshots) (done via etcdctl snapshot) that reflect the growth speed (the name contains the time when it was made)
... // all snapshots are roughly 16MB
2017-05-24T04:57:24-07:00_snapshot.db 16,27 MB
2017-05-24T05:57:26-07:00_snapshot.db 29,06 MB
2017-05-24T06:57:30-07:00_snapshot.db 108,98 MB
2017-05-24T07:57:36-07:00_snapshot.db 177,57 MB
2017-05-24T08:57:51-07:00_snapshot.db 308,4 MB
2017-05-24T09:58:32-07:00_snapshot.db 534,54 MB
2017-05-24T11:00:16-07:00_snapshot.db 655,73 MB
2017-05-24T12:00:55-07:00_snapshot.db 764,22 MB
... // all snapshots of the same size
2017-05-25T15:15:10-07:00_snapshot.db 764,22 MB
2017-05-25T16:16:25-07:00_snapshot.db 818,14 MB
2017-05-25T17:26:35-07:00_snapshot.db 963,93 MB
... // all snapshots of the same size
2017-05-25T22:25:08-07:00_snapshot.db 963,93 MB
2017-05-25T23:27:03-07:00_snapshot.db 1,56 GB
2017-05-26T00:30:13-07:00_snapshot.db 1,56 GB
2017-05-26T01:05:24-07:00_snapshot.db 1,56 GB
2017-05-26T02:24:21-07:00_snapshot.db 2,18 GB
... // all snapshots of the same size
2017-05-26T08:43:07-07:00_snapshot.db 2,18 GB
2017-05-26T09:46:47-07:00_snapshot.db 2,19 GB
... // all snapshots of the same size
2017-05-26T16:11:31-07:00_snapshot.db 2,19 GB
2017-05-26T17:16:47-07:00_snapshot.db 2,65 GB
2017-05-26T18:22:37-07:00_snapshot.db 3,12 GB
2017-05-26T19:29:07-07:00_snapshot.db 3,86 GB
2017-05-26T20:33:24-07:00_snapshot.db 4,6 GB
<boom>
We've checked that we were doing compaction very regularly every 5m for the whole time - so it doesn't seem to be the same as: #7944
I'm attaching the interested lines from etcd logs in etcd-compaction.txt
[Note time in that logs are in UTC, and time in snapshot names is PST, so 7 hours difference]
To summarize, the compaction was always at most few thousands of transactions (so it's not that we did a lot during some 5m period), though there were some longer compactions, up to ~7s.
I started digging into individual snapshots and found some strange thing (I was using bolt)
- 16MB snapshot:
Aggregate statistics for 10 buckets
Page count statistics
Number of logical branch pages: 10
Number of physical branch overflow pages: 0
Number of logical leaf pages: 789
Number of physical leaf overflow pages: 518
Tree statistics
Number of keys/value pairs: 1667
Number of levels in B+tree: 3
Page size utilization
Bytes allocated for physical branch pages: 40960
Bytes actually used for branch data: 26494 (64%)
Bytes allocated for physical leaf pages: 5353472
Bytes actually used for leaf data: 3411680 (63%)
Bucket statistics
Total number of buckets: 10
Total number on inlined buckets: 9 (90%)
Bytes used for inlined buckets: 536 (0%)
- 534MB snapshot (5 hours later):
Aggregate statistics for 10 buckets
Page count statistics
Number of logical branch pages: 65
Number of physical branch overflow pages: 0
Number of logical leaf pages: 5559
Number of physical leaf overflow pages: 107743
Tree statistics
Number of keys/value pairs: 13073
Number of levels in B+tree: 3
Page size utilization
Bytes allocated for physical branch pages: 266240
Bytes actually used for branch data: 186912 (70%)
Bytes allocated for physical leaf pages: 464084992
Bytes actually used for leaf data: 451590110 (97%)
Bucket statistics
Total number of buckets: 10
Total number on inlined buckets: 9 (90%)
Bytes used for inlined buckets: 536 (0%)
- 1.56GB snapshot (another ~36 hours later):
Aggregate statistics for 10 buckets
Page count statistics
Number of logical branch pages: 70
Number of physical branch overflow pages: 0
Number of logical leaf pages: 4525
Number of physical leaf overflow pages: 115179
Tree statistics
Number of keys/value pairs: 10978
Number of levels in B+tree: 3
Page size utilization
Bytes allocated for physical branch pages: 286720
Bytes actually used for branch data: 152723 (53%)
Bytes allocated for physical leaf pages: 490307584
Bytes actually used for leaf data: 478196884 (97%)
Bucket statistics
Total number of buckets: 10
Total number on inlined buckets: 9 (90%)
Bytes used for inlined buckets: 536 (0%)
- 3.86GB snapshot (another ~18 hours later)
Aggregate statistics for 10 buckets
Page count statistics
Number of logical branch pages: 90
Number of physical branch overflow pages: 0
Number of logical leaf pages: 6219
Number of physical leaf overflow pages: 6791
Tree statistics
Number of keys/value pairs: 15478
Number of levels in B+tree: 3
Page size utilization
Bytes allocated for physical branch pages: 368640
Bytes actually used for branch data: 209621 (56%)
Bytes allocated for physical leaf pages: 53288960
Bytes actually used for leaf data: 36704465 (68%)
Bucket statistics
Total number of buckets: 10
Total number on inlined buckets: 9 (90%)
Bytes used for inlined buckets: 536 (0%)
- 4.6GB snapshot (1hour later, right before exceeding space):
Aggregate statistics for 10 buckets
Page count statistics
Number of logical branch pages: 89
Number of physical branch overflow pages: 0
Number of logical leaf pages: 6074
Number of physical leaf overflow pages: 6713
Tree statistics
Number of keys/value pairs: 15173
Number of levels in B+tree: 3
Page size utilization
Bytes allocated for physical branch pages: 364544
Bytes actually used for branch data: 204788 (56%)
Bytes allocated for physical leaf pages: 52375552
Bytes actually used for leaf data: 36092789 (68%)
Bucket statistics
Total number of buckets: 10
Total number on inlined buckets: 9 (90%)
Bytes used for inlined buckets: 564 (0%)
What is extremely interesting to me is that both:
- Number of physical leaf overflow pages
- Bytes allocated for physical leaf pages
dropped by order of magnitude in this 3.86GB snapshot, but the total size of database didn't drop
Unfortunately I can't provide any of those snapshots due to privacy reasons, but maybe you can see anything that we can investigate (or share results of some commands) that can help with debugging?