-
-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3.0 Performance problems #17627
Comments
|
Performance measurements #16791 |
Performance regression for CMAC #14787 |
Adding an OTC decision label here. Question to the OTC: which of these do we consider important enough to address sooner rather than later? |
PEM/DER decoding of PKCS8 RSA private keys are 80 times slower in 3.0 #15199 |
OTC: looking for suggestions as to which, if any, should be in 3.1. |
|
Question for OTC: do any of the releases need to be changed? The question marks remain to be populated. |
Another serious performance regression: #17844 |
Added that to the table. |
Dropped it again as it wasn't an openssl problem in the end. |
#17950 is encountering very high CPU load using d2i_X509. Likely related to cache conflict accessing ex_data in lib ctx. |
The OTC has discussed this, there is no longer a need for the hold. |
In view of the performance concerns, perhaps #15600 should be re-opened. |
Consider a configuration option that disallowed no-store. |
#20171 is a performance issue. Is anyone on the team doing periodic sweeps through new issues and referencing them here? |
#20286 is a performance issue that should be linked here. |
The number of cores on the machine you're testing with likely also matters. Thread contention issues with highly parallel workloads are often only visible on machines that can support that level of parallelism. |
There are 20 cores on my test machine (as opposed to 32 on @zh-jq's machine). I would have thought that 20 would be sufficient to demonstrate an effect....but the only way to know for sure would be for me to set up a proper isolated environment, which is quite a lot more work. |
There is a dead of loop bug in G3 openssl wrapper, so I updated the graph today, but the difference is just the final RPS comes from 5xxx -> 6xxx.
I also changed the g3tiles worker number to 20 and re-run the test, the result is: The graph is then a bit different and the final RPS is much better (from 6xxx to about 9000) but still far from ideal. |
The build logic is here https://github.com/alexcrichton/openssl-src-rs/blob/main/src/lib.rs. |
Thanks to @quarckster, I now have a proper test environment for this consisting of 3 VMs - all with 16 cores (less than what @zh-jq used - but this is what we had easily available). I've set things up as I understand that @zh-jq has set things up, ie. one VM for angie, one VM for g3tiles and one VM for g3bench. The g3bench installation is linked against OpenSSL 1.1.1 in all tests. I vary the version of OpenSSL that g3tiles uses. For each concurrency level I repeated the test 5 times and took the average. I restarted the g3tiles server between each test run to ensure we have a consistent starting point. Here is my graph: This shows some interesting properties:
None of these look anything like @zh-jq's graph - and they do not show the drop off in performance for 3.x that is shown there. So, I also cannot explain this. |
I re-run the test against g3tiles with 16 & 12 workers, the graph: @mattcaswell The 12 workers curve is the same as yours. So the way to reproduce is to add more cores to the g3tiles host. The architecture of aws c6gn.8xlarge is arm64, but I don't think this will make too much difference. |
On the contrary, afaik architecture matters a lot, ARM (including Cortex A76) is especially bad for RSA based TLS handshaking: See: Some technical explanation: Benchmarking: |
I run the test on an Intel 8260x2 host today, you can see the result here: The graph indeed is different, but the 3.2 performance is still far more than ideal. |
How many cores does the g3tiles VM have in this setup? |
It runs directly on the physical host with all of the 96 cores (96 workers, each bound to a core). |
Are you doing something specific here to configure the number of workers? In my tests I haven't done anything to configure the number of workers - am I supposed to be? |
The config worker: {} will default to use all available cores as worker numbers, so nothing need to be done in your test. On physical machines we use the following config: worker:
sched_affinity: true to default to use all available cores and also bind each worker thread to a specific CPU. |
I also add tls1.2 test result here https://bytedance.larkoffice.com/docx/NEqXdDyq8op8mwx0XTWcwgionnf. |
Is the lack of available high core count ARM devices impeding the ability the troubleshoot at OpenSSL end? I'd guess that 3x dedicated ARM servers like Hetzner RX170 could be used for this (unless of course you reproduce with the exact AWS setup that @zh-jq-b uses). Does the openssl project have the ability to rent those? Otherwise would some company step up to rent those for the openssl project? It is my understanding that the business impact of the openssl 3.x performance regressions is quite bad, so somebody ought to come up with the < 1000 bucks required to rent those, if that is what's required. |
I don't currently have access to this - but I am hoping to have such access soon.
Yes. It's just taken me a while to organise it (partly due to some other competing priorities). |
Having read over this, We have a collection of issues here, some of which have been addressed(or at least those issues have been closed). Others remain open. I'd like to propose that, given that I don't see a clear direction forward on this particular ticket, that we close this particular issue, and prioritize the remaining individual issues for addressing. Are there any objections? |
I would like it to remain open until the OMC or OTC decides, by vote, that the concerns are met. Yes this is a sprawling topic. The fact that team members, or at least one, do not see a way forward to address the overall concern raised by the user community does not justify closing it. If the vote happens, so be it. Thank you for asking first. |
@richsalz I'm happy to oblige on that request, but, at least in its current state, its a request that can't be satisfied, at least not without additional input. This issue has no OTC/OMC hold label on it, nor is there a distinct question for either group to address. As such, we would be in a catch-22 situation. We could of course add that hold if there was a specific question to answer, but as you note, this is a sprawling issue with no clear definition of purpose, so I am concerned that simply asking "Are the concerns met" is insufficient in terms of driving this to resolution. A better approach may be to ennumerate all the issues codified here (we already have at least a partial list here and address them individually. I've added the remaining issues on that list to our refinement board for addressing already. If there are more, and we can ennumerate them here, I'm happy to add those as well. |
If you close it, make sure to lock the issue and have the final comment be a pointer to the list. Please note that I didn't say "satisfied," I said "concerns are met." Addressed would have been a better word. |
Marking as inactive, to be closed at the end of the 3.4 dev cycle, barring further input Link to broken out issue list #17627 (comment) |
Performance problems encountered with 3.0 should be mentioned here.
The text was updated successfully, but these errors were encountered: