Skip to content

Etcd size sometimes starts growing and grows until "mvcc: database space exceeded" #8009

Closed
@wojtek-t

Description

@wojtek-t

We have observed already few cases, where suddenly (after days of running a GKE cluster), the size of database starts growing.

As an example, we have a cluster, that was running without any issues for ~2 weeks (size of database was ~16MB), and then its database started growing. The growth wasn't immediate - it took ~2days, before it reached 4GB limit and there were steps of growing.
For the reference we have backups (snapshots) (done via etcdctl snapshot) that reflect the growth speed (the name contains the time when it was made)

  ... // all snapshots are roughly 16MB
  2017-05-24T04:57:24-07:00_snapshot.db 16,27 MB
  2017-05-24T05:57:26-07:00_snapshot.db 29,06 MB
  2017-05-24T06:57:30-07:00_snapshot.db 108,98 MB
  2017-05-24T07:57:36-07:00_snapshot.db 177,57 MB
  2017-05-24T08:57:51-07:00_snapshot.db 308,4 MB        
  2017-05-24T09:58:32-07:00_snapshot.db 534,54 MB
  2017-05-24T11:00:16-07:00_snapshot.db 655,73 MB
  2017-05-24T12:00:55-07:00_snapshot.db 764,22 MB
  ... // all snapshots of the same size
  2017-05-25T15:15:10-07:00_snapshot.db 764,22 MB
  2017-05-25T16:16:25-07:00_snapshot.db 818,14 MB
  2017-05-25T17:26:35-07:00_snapshot.db 963,93 MB
  ... // all snapshots of the same size
  2017-05-25T22:25:08-07:00_snapshot.db 963,93 MB
  2017-05-25T23:27:03-07:00_snapshot.db 1,56 GB
  2017-05-26T00:30:13-07:00_snapshot.db 1,56 GB
  2017-05-26T01:05:24-07:00_snapshot.db 1,56 GB
  2017-05-26T02:24:21-07:00_snapshot.db 2,18 GB
  ... // all snapshots of the same size
  2017-05-26T08:43:07-07:00_snapshot.db 2,18 GB
  2017-05-26T09:46:47-07:00_snapshot.db 2,19 GB
  ... // all snapshots of the same size
  2017-05-26T16:11:31-07:00_snapshot.db 2,19 GB
  2017-05-26T17:16:47-07:00_snapshot.db 2,65 GB
  2017-05-26T18:22:37-07:00_snapshot.db 3,12 GB
  2017-05-26T19:29:07-07:00_snapshot.db 3,86 GB
  2017-05-26T20:33:24-07:00_snapshot.db 4,6 GB
  <boom>

We've checked that we were doing compaction very regularly every 5m for the whole time - so it doesn't seem to be the same as: #7944
I'm attaching the interested lines from etcd logs in etcd-compaction.txt

[Note time in that logs are in UTC, and time in snapshot names is PST, so 7 hours difference]

To summarize, the compaction was always at most few thousands of transactions (so it's not that we did a lot during some 5m period), though there were some longer compactions, up to ~7s.

I started digging into individual snapshots and found some strange thing (I was using bolt)

  1. 16MB snapshot:
Aggregate statistics for 10 buckets

Page count statistics
        Number of logical branch pages: 10
        Number of physical branch overflow pages: 0
        Number of logical leaf pages: 789
        Number of physical leaf overflow pages: 518
Tree statistics
        Number of keys/value pairs: 1667
        Number of levels in B+tree: 3
Page size utilization
        Bytes allocated for physical branch pages: 40960
        Bytes actually used for branch data: 26494 (64%)
        Bytes allocated for physical leaf pages: 5353472
        Bytes actually used for leaf data: 3411680 (63%)
Bucket statistics
        Total number of buckets: 10
        Total number on inlined buckets: 9 (90%)
        Bytes used for inlined buckets: 536 (0%)
  1. 534MB snapshot (5 hours later):
Aggregate statistics for 10 buckets

Page count statistics
        Number of logical branch pages: 65
        Number of physical branch overflow pages: 0
        Number of logical leaf pages: 5559
        Number of physical leaf overflow pages: 107743
Tree statistics
        Number of keys/value pairs: 13073
        Number of levels in B+tree: 3
Page size utilization
        Bytes allocated for physical branch pages: 266240
        Bytes actually used for branch data: 186912 (70%)
        Bytes allocated for physical leaf pages: 464084992
        Bytes actually used for leaf data: 451590110 (97%)
Bucket statistics
        Total number of buckets: 10
        Total number on inlined buckets: 9 (90%)
        Bytes used for inlined buckets: 536 (0%)
  1. 1.56GB snapshot (another ~36 hours later):
Aggregate statistics for 10 buckets

Page count statistics
        Number of logical branch pages: 70
        Number of physical branch overflow pages: 0
        Number of logical leaf pages: 4525
        Number of physical leaf overflow pages: 115179
Tree statistics
        Number of keys/value pairs: 10978
        Number of levels in B+tree: 3
Page size utilization
        Bytes allocated for physical branch pages: 286720
        Bytes actually used for branch data: 152723 (53%)
        Bytes allocated for physical leaf pages: 490307584
        Bytes actually used for leaf data: 478196884 (97%)
Bucket statistics
        Total number of buckets: 10
        Total number on inlined buckets: 9 (90%)
        Bytes used for inlined buckets: 536 (0%)
  1. 3.86GB snapshot (another ~18 hours later)
Aggregate statistics for 10 buckets

Page count statistics
        Number of logical branch pages: 90
        Number of physical branch overflow pages: 0
        Number of logical leaf pages: 6219
        Number of physical leaf overflow pages: 6791
Tree statistics
        Number of keys/value pairs: 15478
        Number of levels in B+tree: 3
Page size utilization
        Bytes allocated for physical branch pages: 368640
        Bytes actually used for branch data: 209621 (56%)
        Bytes allocated for physical leaf pages: 53288960
        Bytes actually used for leaf data: 36704465 (68%)
Bucket statistics
        Total number of buckets: 10
        Total number on inlined buckets: 9 (90%)
        Bytes used for inlined buckets: 536 (0%)
  1. 4.6GB snapshot (1hour later, right before exceeding space):
Aggregate statistics for 10 buckets

Page count statistics
        Number of logical branch pages: 89
        Number of physical branch overflow pages: 0
        Number of logical leaf pages: 6074
        Number of physical leaf overflow pages: 6713
Tree statistics
        Number of keys/value pairs: 15173
        Number of levels in B+tree: 3
Page size utilization
        Bytes allocated for physical branch pages: 364544
        Bytes actually used for branch data: 204788 (56%)
        Bytes allocated for physical leaf pages: 52375552
        Bytes actually used for leaf data: 36092789 (68%)
Bucket statistics
        Total number of buckets: 10
        Total number on inlined buckets: 9 (90%)
        Bytes used for inlined buckets: 564 (0%)

What is extremely interesting to me is that both:

  • Number of physical leaf overflow pages
  • Bytes allocated for physical leaf pages
    dropped by order of magnitude in this 3.86GB snapshot, but the total size of database didn't drop

Unfortunately I can't provide any of those snapshots due to privacy reasons, but maybe you can see anything that we can investigate (or share results of some commands) that can help with debugging?

@xiang90 @hongchaodeng @mml @lavalamp

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions