The Wayback Machine - https://web.archive.org/web/20241207231338/https://github.com/openssl/openssl/issues/18814
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Severe efficiency degradation of credential loading in comparison to 1.1.1 #18814

Open
DDvO opened this issue Jul 17, 2022 · 16 comments
Open

Severe efficiency degradation of credential loading in comparison to 1.1.1 #18814

DDvO opened this issue Jul 17, 2022 · 16 comments
Labels
triaged: performance The issue/pr reports/fixes a performance concern

Comments

@DDvO
Copy link
Contributor

DDvO commented Jul 17, 2022

This is a follow-up on #14837, which got closed due to a partial solution regarding the heavy use of heap memory allocation.

I've just been using the tool provided by @richsalz in #16540, compiled like this:
cd test/
gcc -g -I../include timing.c ../libcrypto.a -ldl -lpthread -o timing

Update: PR for tool is now in #18821, called timing_load_creds.

When run on current 3.x master, I typically get figures like these:

./timing_load_creds -wc certs/timing-cert.pem
user      0 sec 57846 microsec
sys       0 sec 69877 microsec

and

./timing_load_creds -wp certs/timing-key.pem 
user      0 sec 95863 microsec
sys       0 sec 115708 microsec

Whereas with the latest 1.1.1 branch, I typically get for instance:

./timing_load_creds -wc certs/timing-cert.pem
user      0 sec 17389 microsec
sys       0 sec 17389 microsec

and

./timing_load_creds -wp certs/timing-key.pem
user      0 sec 14719 microsec
sys       0 sec 10223 microsec

BTW, on my Debian system times reported vary a lot between consecutive runs - I've copied some 'average' outcomes.

So with 1.1.1, cert and key loading appear reasonably efficient,
while since 3.0 loading a simple cert (including a public key) degraded by a factor around 3.5,
and loading a private key degraded by a factor of more than 6.

@t8m wrote in #14837 (comment):

The obvious reason being the change from legacy to provider based parsing of the public keys in certificates.

Apparently the inefficiencies are due to generalizing the loading mechanism using the provider mechanism and maybe the STORE API,
which appear pretty involved when having a look at the code and stepping through them while debugging.

Is this considered bearable?
Hopefully some further bottleneck(s) can be identified and removed.

@DDvO DDvO added issue: bug report The issue was opened to report a bug issue: question The issue was opened to ask a question labels Jul 17, 2022
@beldmit
Copy link
Member

beldmit commented Jul 17, 2022

See also #16791 and #17627

@paulidale
Copy link
Contributor

It's mostly due to the generalisation of the loading -- checking many formats takes time.

@richsalz
Copy link
Contributor

Who needed a general mechanism, anyway? And who decided the generalization was a good thing to put into the common path? Perhaps an internal API that allows functions to specify a search path would help reduce the pain of this generalization.

@DDvO
Copy link
Contributor Author

DDvO commented Jul 18, 2022

Before 3.0 it was already possible to load keys of all known types, even without specifying which key type(s) to expect, in reasonable time.
Even if the extra flexibility of handling further (provider-defined) key types is the main reason,
with credential loading now taking 3-6 times longer, something must have gone pretty wrong here.

At least when loading cert/key material having any well-known key type, the generalization to potentially handle further key types should produce just negligible overhead. And even for new key types there should be rather efficient ways of determining which specific (sub-)parser to use.

@DDvO DDvO changed the title Credentail loading: Severe efficiency degredation since 3.0 Credentail loading: Severe efficiency degradation since 3.0 Jul 18, 2022
@DDvO DDvO changed the title Credentail loading: Severe efficiency degradation since 3.0 Severe efficiency degradation of credential loading since 3.0 Jul 18, 2022
@t8m t8m added triaged: feature The issue/pr requests/adds a feature triaged: question The issue contains a question and removed issue: bug report The issue was opened to report a bug issue: question The issue was opened to ask a question labels Jul 27, 2022
@t8m t8m changed the title Severe efficiency degradation of credential loading since 3.0 Severe efficiency degradation of credential loading in comparison to 1.1.1 Jul 27, 2022
@mohd-akram
Copy link
Contributor

I came across this issue as well while trying to figure out why curl was taking so much time in the handshake. It seems PR #18341 might improve this but it doesn't seem to be released. Will it be released as part of the 3.0 series?

@t8m
Copy link
Member

t8m commented Oct 13, 2022

Will it be released as part of the 3.0 series?

Not likely. However we are planning to have 3.1 release relatively soon that will be undergoing FIPS 140-3 validation and it might include performance fixes including this one.

@l1n
Copy link

l1n commented Jun 16, 2023

Is this going to be resolved by the time that 1.1.1 support is dropped?

@l1n
Copy link

l1n commented Sep 6, 2023

Checking on this again since we're getting awful close to end of life per the email this morning

@ThomasChr
Copy link

ThomasChr commented Oct 5, 2023

I'm using Python and this problem has severe performance consequences for multi-threaded Python Code which heavily uses SSL-Requests. We're talking about double the time and up to 15 times the CPU power. See details here: python/cpython#95031 (comment)

@paulidale paulidale added the triaged: performance The issue/pr reports/fixes a performance concern label Nov 7, 2023
@cyberw
Copy link

cyberw commented Feb 11, 2024

Is there any workaround for this issue?

On windows, setup takes about 4-500ms longer than in previous versions (python 3.12 vs 3.10), and as we are doing lots of these it is a major issue..

@t8m
Copy link
Member

t8m commented Feb 12, 2024

What is the OpenSSL version you're testing with? It should be much improved with 3.2 version.

@cyberw
Copy link

cyberw commented Feb 12, 2024

I'm on an "official" Python 3.12.2 build (installed using pyenv) that comes with OpenSSL 3.0.13. I could probably recompile myself, but I can't make all my users to do it, so I'm mostly looking for any kind of workaround without needing a new version.

I guess I could just not support 3.12/3.0.x on windows, but that would be sad :)

@ThomasChr
Copy link

For python, as mentioned here python/cpython#95031 (comment) , you could do:

  1. Do not verify certificates: verfiy = False
  2. Use Python 3.11
  3. Use a context manager, ie, cache the verified certificate

@fireattack
Copy link

fireattack commented Jun 27, 2024

So while I was also hit by this in Python 3.12, I would like to test the issue within OpenSSL alone. Also to check if there is any improvement in 3.2.x or 3.3.x.

TL;DR: I didn't see much improvement on the speed of SSL_CTX_load_verify_locations in 3.3.1. It is still extremely slow especially on Windows. Performance wise, 3.0.14 ~= 3.3.1 and is 15x slower than 1.1.1.


Details:

I use the following test code in C:

#include <openssl/ssl.h>
#include <openssl/err.h>
#include <stdio.h>

int main(int argc, char **argv) {
    if (argc != 2) {
        fprintf(stderr, "Usage: %s <CAfile>\n", argv[0]);
        return 1;
    }
    const char *cafile = argv[1];
    SSL_CTX *ctx;
    OPENSSL_init_ssl(0, NULL);
    ctx = SSL_CTX_new(TLS_method());
    if (!ctx) {
        fprintf(stderr, "Error creating SSL context\n");
        ERR_print_errors_fp(stderr);
        return 1;
    }
    if (!SSL_CTX_load_verify_locations(ctx, cafile, NULL)) {
        fprintf(stderr, "Error loading CA file\n");
        ERR_print_errors_fp(stderr);
        SSL_CTX_free(ctx);
        return 1;
    }
    printf("CA file loaded successfully\n");
    SSL_CTX_free(ctx);
    return 0;
}

I compiled it with 1.1.1w, 3.0.14 LTS, and 3.3.1 x64 respectively, all downloaded from here.

set PATH=C:\Users\ikena\mingw64\bin\;%PATH%
gcc -o openssl-1.1.1w\bin\loadca loadca.c "-Iopenssl-1.1.1w\include" "-Lopenssl-1.1.1w\lib" -lssl -lcrypto
gcc -o openssl-3.0.14\bin\loadca loadca.c "-Iopenssl-3.0.14\include" "-Lopenssl-3.0.14\lib" -lssl -lcrypto
gcc -o openssl-3.3.1\bin\loadca loadca.c "-Iopenssl-3.3.1\include" "-Lopenssl-3.3.1\lib" -lssl -lcrypto

I then test their speed against 288 KB cacert.pem from python/certifi (can download here: https://github.com/certifi/python-certifi/blob/master/certifi/cacert.pem) 50 times and calculate the average speed:

import subprocess
import time

def measure_command(cmd, iterations):
    start_time = time.time()
    start_time_cpu = time.process_time()
    for _ in range(iterations):
        subprocess.run(cmd, shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    end_time = time.time()
    end_time_cpu = time.process_time()
    ave_time = (end_time - start_time) / iterations
    ave_time_cpu = (end_time_cpu - start_time_cpu) / iterations
    print(f"Command: {cmd} Average time over {iterations} iterations: {ave_time:.6f}s, {ave_time_cpu:.6f}s CPU time")

ITER = 50
measure_command('openssl-1.1.1w\\bin\\loadca.exe cacert.pem', ITER)
measure_command('openssl-3.0.14\\bin\\loadca.exe cacert.pem', ITER)
measure_command('openssl-3.3.1\\bin\\loadca.exe cacert.pem ', ITER)

The result:

Command: openssl-1.1.1w\bin\loadca.exe cacert.pem Average time over 50 iterations: 0.036579s, 0.001563s CPU time
Command: openssl-3.0.14\bin\loadca.exe cacert.pem Average time over 50 iterations: 0.488777s, 0.000313s CPU time
Command: openssl-3.3.1\bin\loadca.exe cacert.pem  Average time over 50 iterations: 0.452512s, 0.000625s CPU time

@gsmethells
Copy link

gsmethells commented Oct 23, 2024

This is still a major problem years later, especially now that 1.1.1 is all but dead. The performance degradation is significant. When will resources be applied to find a solution to this issue?

This comment still stands to reason and is well liked:

#18814 (comment)

@t8m t8m removed triaged: feature The issue/pr requests/adds a feature triaged: question The issue contains a question labels Oct 24, 2024
@t8m
Copy link
Member

t8m commented Nov 14, 2024

Please also note that on Windows (i.e., comment #18814 (comment)) the biggest part of the performance degradation comes from the use of unbuffered IO when reading PEM files which this PR resolves: #25716

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged: performance The issue/pr reports/fixes a performance concern
Projects
None yet
Development

No branches or pull requests