-
-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Severe efficiency degradation of credential loading in comparison to 1.1.1 #18814
Comments
It's mostly due to the generalisation of the loading -- checking many formats takes time. |
Who needed a general mechanism, anyway? And who decided the generalization was a good thing to put into the common path? Perhaps an internal API that allows functions to specify a search path would help reduce the pain of this generalization. |
Before 3.0 it was already possible to load keys of all known types, even without specifying which key type(s) to expect, in reasonable time. At least when loading cert/key material having any well-known key type, the generalization to potentially handle further key types should produce just negligible overhead. And even for new key types there should be rather efficient ways of determining which specific (sub-)parser to use. |
I came across this issue as well while trying to figure out why curl was taking so much time in the handshake. It seems PR #18341 might improve this but it doesn't seem to be released. Will it be released as part of the 3.0 series? |
Not likely. However we are planning to have 3.1 release relatively soon that will be undergoing FIPS 140-3 validation and it might include performance fixes including this one. |
Is this going to be resolved by the time that 1.1.1 support is dropped? |
Checking on this again since we're getting awful close to end of life per the email this morning |
I'm using Python and this problem has severe performance consequences for multi-threaded Python Code which heavily uses SSL-Requests. We're talking about double the time and up to 15 times the CPU power. See details here: python/cpython#95031 (comment) |
Is there any workaround for this issue? On windows, setup takes about 4-500ms longer than in previous versions (python 3.12 vs 3.10), and as we are doing lots of these it is a major issue.. |
What is the OpenSSL version you're testing with? It should be much improved with 3.2 version. |
I'm on an "official" Python 3.12.2 build (installed using pyenv) that comes with OpenSSL 3.0.13. I could probably recompile myself, but I can't make all my users to do it, so I'm mostly looking for any kind of workaround without needing a new version. I guess I could just not support 3.12/3.0.x on windows, but that would be sad :) |
For python, as mentioned here python/cpython#95031 (comment) , you could do:
|
So while I was also hit by this in Python 3.12, I would like to test the issue within OpenSSL alone. Also to check if there is any improvement in 3.2.x or 3.3.x. TL;DR: I didn't see much improvement on the speed of Details: I use the following test code in C: #include <openssl/ssl.h>
#include <openssl/err.h>
#include <stdio.h>
int main(int argc, char **argv) {
if (argc != 2) {
fprintf(stderr, "Usage: %s <CAfile>\n", argv[0]);
return 1;
}
const char *cafile = argv[1];
SSL_CTX *ctx;
OPENSSL_init_ssl(0, NULL);
ctx = SSL_CTX_new(TLS_method());
if (!ctx) {
fprintf(stderr, "Error creating SSL context\n");
ERR_print_errors_fp(stderr);
return 1;
}
if (!SSL_CTX_load_verify_locations(ctx, cafile, NULL)) {
fprintf(stderr, "Error loading CA file\n");
ERR_print_errors_fp(stderr);
SSL_CTX_free(ctx);
return 1;
}
printf("CA file loaded successfully\n");
SSL_CTX_free(ctx);
return 0;
} I compiled it with 1.1.1w, 3.0.14 LTS, and 3.3.1 x64 respectively, all downloaded from here. set PATH=C:\Users\ikena\mingw64\bin\;%PATH%
gcc -o openssl-1.1.1w\bin\loadca loadca.c "-Iopenssl-1.1.1w\include" "-Lopenssl-1.1.1w\lib" -lssl -lcrypto
gcc -o openssl-3.0.14\bin\loadca loadca.c "-Iopenssl-3.0.14\include" "-Lopenssl-3.0.14\lib" -lssl -lcrypto
gcc -o openssl-3.3.1\bin\loadca loadca.c "-Iopenssl-3.3.1\include" "-Lopenssl-3.3.1\lib" -lssl -lcrypto I then test their speed against 288 KB cacert.pem from python/certifi (can download here: https://github.com/certifi/python-certifi/blob/master/certifi/cacert.pem) 50 times and calculate the average speed: import subprocess
import time
def measure_command(cmd, iterations):
start_time = time.time()
start_time_cpu = time.process_time()
for _ in range(iterations):
subprocess.run(cmd, shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
end_time = time.time()
end_time_cpu = time.process_time()
ave_time = (end_time - start_time) / iterations
ave_time_cpu = (end_time_cpu - start_time_cpu) / iterations
print(f"Command: {cmd} Average time over {iterations} iterations: {ave_time:.6f}s, {ave_time_cpu:.6f}s CPU time")
ITER = 50
measure_command('openssl-1.1.1w\\bin\\loadca.exe cacert.pem', ITER)
measure_command('openssl-3.0.14\\bin\\loadca.exe cacert.pem', ITER)
measure_command('openssl-3.3.1\\bin\\loadca.exe cacert.pem ', ITER) The result:
|
This is still a major problem years later, especially now that 1.1.1 is all but dead. The performance degradation is significant. When will resources be applied to find a solution to this issue? This comment still stands to reason and is well liked: |
Please also note that on Windows (i.e., comment #18814 (comment)) the biggest part of the performance degradation comes from the use of unbuffered IO when reading PEM files which this PR resolves: #25716 |
This is a follow-up on #14837, which got closed due to a partial solution regarding the heavy use of heap memory allocation.
I've just been using the tool provided by @richsalz in #16540,
compiled like this:cd test/
gcc -g -I../include timing.c ../libcrypto.a -ldl -lpthread -o timing
Update: PR for tool is now in #18821, called
timing_load_creds
.When run on current 3.x master, I typically get figures like these:
and
Whereas with the latest 1.1.1 branch, I typically get for instance:
and
BTW, on my Debian system times reported vary a lot between consecutive runs - I've copied some 'average' outcomes.
So with 1.1.1, cert and key loading appear reasonably efficient,
while since 3.0 loading a simple cert (including a public key) degraded by a factor around 3.5,
and loading a private key degraded by a factor of more than 6.
@t8m wrote in #14837 (comment):
Apparently the inefficiencies are due to generalizing the loading mechanism using the provider mechanism and maybe the STORE API,
which appear pretty involved when having a look at the code and stepping through them while debugging.
Is this considered bearable?
Hopefully some further bottleneck(s) can be identified and removed.
The text was updated successfully, but these errors were encountered: