Skip to content

3.0 performance degraded due to locking #20286

@richsalz

Description

@richsalz

Using some internal QUIC tests, we see this timing in 1.1:

$ USE_GQUIC_VERSIONS=1 perf record -F 999 -g ./quic_lib_tests  --gtest_repeat=100  --gtest_filter=*ZeroRttDisabled*
$ perf report

+   31.73%    31.55%  quic_lib_tests  quic_lib_tests       [.] bn_sqr8x_internal
+    9.28%     9.28%  quic_lib_tests  quic_lib_tests       [.] mul4x_internal
+    4.91%     4.91%  quic_lib_tests  quic_lib_tests       [.] sha256_block_data_order_avx

In 3.0 we see this:


$ USE_GQUIC_VERSIONS=1 perf record -F 999 -g ./quic_lib_tests  --gtest_repeat=100  --gtest_filter=*ZeroRttDisabled*
$ perf report

+   11.02%    10.99%  quic_lib_tests  quic_lib_tests       [.] bn_sqr8x_internal
+    8.38%     8.08%  quic_lib_tests  libpthread-2.31.so   [.] __pthread_rwlock_rdlock
+    7.65%     7.51%  quic_lib_tests  libpthread-2.31.so   [.] __pthread_rwlock_unlock
+    4.98%     4.78%  quic_lib_tests  quic_lib_tests       [.] getrn
+    4.14%     4.11%  quic_lib_tests  quic_lib_tests       [.] mul4x_internal
+    3.37%     2.57%  quic_lib_tests  quic_lib_tests       [.] ossl_tolower
     3.30%     3.30%  quic_lib_tests  quic_lib_tests       [.] ossl_lh_strcasehash
+    2.72%     2.13%  quic_lib_tests  quic_lib_tests       [.] OPENSSL_strcasecmp
+    2.29%     2.05%  quic_lib_tests  quic_lib_tests       [.] ossl_lib_ctx_get_data
+    1.93%     1.93%  quic_lib_tests  quic_lib_tests       [.] sha256_block_data_order_avx

This seems to be part of OSSL_DECODER_CTX_new_for_pkey 16% of the time is spent in locking, on a single threaded binary. And 10% is in a string hashtable lookup.

If anyone on the project is going to look at this, I will try to get a small reproducer. But our the time for our QUIC tests is doubling.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions