-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Valgrind Memory Leak Checking #8954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
* Free the output buffer on webp encode error
* If setimage errors out, the tiff client state was not freed.
* Return after setting the error for advanced features without libraqm. Not returning here leads to an alloc that's never freed.
@wiredfool We can trigger a workflow manually or run daily independent of PRs. |
The question then is how to we prevent this from being run as a scheduled action and subsequently ignored? The tension is that this is by far the most valuable when a PR is in flight, because then it's obvious when you've gone from current (xfail) to fail, but the runtime is likely 2-3x worse than our longest other test run, which is already too long. (Even running single tests with valgrind take ~ 1 min, because all the pytest setup infra runs under valgrind as well) |
In that case, long PR it is! We typically have PRs queued for days, or longer, before merging. |
On My Machine: That's vs 87 minutes for ordinary valgrind on this pr in the GHA testing. Unfortunately, we're single threaded for valgrind tests. |
Ok, We'll see how long this takes: https://github.com/python-pillow/Pillow/actions/runs/15054197221/job/42316085246?pr=8954 I'm only running this on the pull request, not the push. I'm expecting 5 hours. This is injecting the test script in the Pillow repo, instead of building a new copy of the valgrind image. At some point we can move that over, but I'd like to be tuning the supressions without having to build a new valgrind image, so it's probably better to leave it here for now. |
* ensure that the env variable is set in the makefile
* Some failing tests are on main but not last released version
Ok, so one more leak found, and a couple of timeouts that weren't xfailed in valgrind. Total test time was 3 hours, which was a little better than I was expecting. Guess core speed has gotten a little better than this machine. So what this whole exercise is showing is that we're reasonably good at dealing with the memory lifetime for the happy case, but there are a lot of cases where c-level error handling is returning without freeing the inflight items that would otherwise be passed out or handled properly. I suspect that there are more similar leaks out there -- the ones that we're hitting are the ones that are exercised in our test suite. There are a couple of places here where I've added changes that aren't picked up by the code coverage but are essentially the same pattern as the leak. This indicates that we're going to have other error level leaks, and we'll probably find them with additional testing. |
pytest-timeout doesn't raise a timeout error.
dfec187
to
20b49a3
Compare
for more information, see https://pre-commit.ci
Co-authored-by: Andrew Murray <3112309+radarhere@users.noreply.github.com>
for more information, see https://pre-commit.ci
## Run this as the test script in the docker valgrind image. | ||
## Note -- can be included directly into the docker image, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Run this as the test script in the docker valgrind image. | |
## Note -- can be included directly into the docker image, | |
## Run this as the test script in the Docker valgrind image. | |
## Note -- can be included directly into the Docker image, |
Changes proposed in this pull request:
To Do:
Note -- this is built on the Arrow memory leak check PR #8953.