Valgrind Memory Leak Checking #8954

wiredfool · 2025-05-13T22:01:01Z

Changes proposed in this pull request:

Fix for Memory leak when calling PIL.Image.Image.__arrow_c_array__() #8950 . Arrow Export memory leak.
Add support in the Makefile to use valgrind to check for definite leaks.
Add some suppressions of python object allocations for leak checks. Python is a bit noisy for leak checking.
Fix a leak in the Arrow schema where child schemas weren't released.
Fix a leak in webp encode on error
Fix a leak in UnsharpMask on error
Fix a leak in TiffEncode on error
Fix a leak in Font getmask on error
Fix a leak in JpegEncode on error

To Do:

Consider how to run this periodically in CI. This is significantly slower than plain valgrind -- locally I had a 4 hour run at one point. That's too slow for a merges or prs, but we need to see the results.
Consider setting the leak detection to something other than definite. -- future work
Review the python suppressions. It's definitely possible that some of them are leaks -- they'd be pointing to issues with our PyObject handling at the C level, either erroring from a function without decrefing or similar. -- future work

~~Note -- this is built on the Arrow memory leak check PR #8953.~~

* Free the output buffer on webp encode error

* If setimage errors out, the tiff client state was not freed.

* Return after setting the error for advanced features without libraqm. Not returning here leads to an alloc that's never freed.

aclark4life · 2025-05-13T23:23:52Z

@wiredfool We can trigger a workflow manually or run daily independent of PRs.

Makefile

src/libImaging/Arrow.c

wiredfool · 2025-05-14T08:11:17Z

We can trigger a workflow manually or run daily independent of PRs.

The question then is how to we prevent this from being run as a scheduled action and subsequently ignored? The tension is that this is by far the most valuable when a PR is in flight, because then it's obvious when you've gone from current (xfail) to fail, but the runtime is likely 2-3x worse than our longest other test run, which is already too long. (Even running single tests with valgrind take ~ 1 min, because all the pytest setup infra runs under valgrind as well)

aclark4life · 2025-05-14T11:26:38Z

In that case, long PR it is! We typically have PRs queued for days, or longer, before merging.

wiredfool · 2025-05-14T12:15:26Z

On My Machine:
4711 passed, 345 skipped, 11 xfailed, 8 warnings in 15074.00s (4:11:14)

That's vs 87 minutes for ordinary valgrind on this pr in the GHA testing.

Unfortunately, we're single threaded for valgrind tests.

wiredfool · 2025-05-15T20:23:13Z

Ok, We'll see how long this takes: https://github.com/python-pillow/Pillow/actions/runs/15054197221/job/42316085246?pr=8954

I'm only running this on the pull request, not the push. I'm expecting 5 hours.

This is injecting the test script in the Pillow repo, instead of building a new copy of the valgrind image. At some point we can move that over, but I'd like to be tuning the supressions without having to build a new valgrind image, so it's probably better to leave it here for now.

* ensure that the env variable is set in the makefile

* Some failing tests are on main but not last released version

wiredfool · 2025-05-16T11:00:47Z

Ok, so one more leak found, and a couple of timeouts that weren't xfailed in valgrind. Total test time was 3 hours, which was a little better than I was expecting. Guess core speed has gotten a little better than this machine.

So what this whole exercise is showing is that we're reasonably good at dealing with the memory lifetime for the happy case, but there are a lot of cases where c-level error handling is returning without freeing the inflight items that would otherwise be passed out or handled properly. I suspect that there are more similar leaks out there -- the ones that we're hitting are the ones that are exercised in our test suite. There are a couple of places here where I've added changes that aren't picked up by the code coverage but are essentially the same pattern as the leak. This indicates that we're going to have other error level leaks, and we'll probably find them with additional testing.

pytest-timeout doesn't raise a timeout error.

for more information, see https://pre-commit.ci

depends/docker-test-valgrind-memory.sh

Co-authored-by: Andrew Murray <3112309+radarhere@users.noreply.github.com>

for more information, see https://pre-commit.ci

depends/docker-test-valgrind-memory.sh

Co-authored-by: Andrew Murray <3112309+radarhere@users.noreply.github.com>

Makefile

Co-authored-by: Andrew Murray <3112309+radarhere@users.noreply.github.com>

radarhere · 2025-06-02T06:33:02Z

I think this is causing TIFF errors.

Since this was merged, there have been intermittent TIFF errors in GitHub Actions, from Test and Wheels.

'cannot identify image file' - https://github.com/python-pillow/Pillow/actions/runs/15371601710/job/43251458128
'attempt to seek outside sequence' - https://github.com/python-pillow/Pillow/actions/runs/15370070907/job/43247953625
'unrecognized data stream contents when reading image file' - https://github.com/python-pillow/Pillow/actions/runs/15370070907/job/43247953635

These all passed on subsequent attempts.

To check that it was this PR that changed things, I've run main before this merge 5 times in my fork with no errors in Test and Wheels.

wiredfool · 2025-06-02T08:39:34Z

That would likely be this: e2e40c5 where the TiffClose was moved to a cleanup method so that it's run on all exits.

radarhere · 2025-06-08T04:22:29Z

I've created #9002 as a possible solution.

radarhere · 2025-06-25T00:59:34Z

#9002 has now been merged.

wiredfool added 10 commits May 12, 2025 00:27

Fix memory leak in arrow export using array structure

74ab5ac

valgrind memory leak check

4984c45

fix memory leak in arrow schema

fdfba98

Suppress all python level leaks for now

84b88a9

Fix leak in webp_encode

eaab435

* Free the output buffer on webp encode error

Fix leak of destination image in ImagingUnsharpMask when an error occurs

a9bcd7d

Fix memory leak in TiffEncode

e2e40c5

* If setimage errors out, the tiff client state was not freed.

Fix memory leak

f792e0b

* Return after setting the error for advanced features without libraqm. Not returning here leads to an alloc that's never freed.

Fix memory leak when JpegEncode returns an error.

789631c

Wrap Makefile

7aa6a61

wiredfool added Bug Any unexpected behavior, until confirmed feature. Memory labels May 13, 2025

radarhere reviewed May 14, 2025

View reviewed changes

Makefile Show resolved Hide resolved

radarhere reviewed May 14, 2025

View reviewed changes

src/libImaging/Arrow.c Outdated Show resolved Hide resolved

wiredfool added 5 commits May 15, 2025 21:10

Adding pytest-valgrind install

fb126af

Guess so.

d5449d5

Add github workflow/test-script

218f055

executable

a6b8b3a

correct target

2d506f6

wiredfool added 3 commits May 16, 2025 12:08

Xfail timouts in Valgrind tests

f1957b4

* ensure that the env variable is set in the makefile

Merge remote-tracking branch 'upstream/main' into valgrind-leakcheck

6391f2c

* Some failing tests are on main but not last released version

Fix memory leak in text_layout_raqm on 0 length string

ff50e30

Remove timeout as the specific reason,

20b49a3

pytest-timeout doesn't raise a timeout error.

wiredfool force-pushed the valgrind-leakcheck branch from dfec187 to 20b49a3 Compare May 17, 2025 08:47

[pre-commit.ci] auto fixes from pre-commit.com hooks

c35082b

for more information, see https://pre-commit.ci

radarhere reviewed May 22, 2025

View reviewed changes

depends/docker-test-valgrind-memory.sh Outdated Show resolved Hide resolved

wiredfool and others added 3 commits May 23, 2025 10:57

Update depends/docker-test-valgrind-memory.sh

2603a24

Co-authored-by: Andrew Murray <3112309+radarhere@users.noreply.github.com>

add timeouts to two more tests

60a1a20

[pre-commit.ci] auto fixes from pre-commit.com hooks

c63db77

for more information, see https://pre-commit.ci

radarhere reviewed May 24, 2025

View reviewed changes

depends/docker-test-valgrind-memory.sh Outdated Show resolved Hide resolved

This was referenced May 24, 2025

Fix memory leak in arrow export using array structure #8953

Closed

Run slow tests on valgrind, but without timeout wiredfool/Pillow#11

Closed

radarhere and others added 2 commits May 24, 2025 15:41

Merge branch 'main' into valgrind-leakcheck

6096f33

Merge branch 'main' into valgrind-leakcheck

5b854b2

radarhere force-pushed the valgrind-leakcheck branch from b73e748 to 5b854b2 Compare May 27, 2025 22:28

Update depends/docker-test-valgrind-memory.sh

98cf15e

Co-authored-by: Andrew Murray <3112309+radarhere@users.noreply.github.com>

radarhere reviewed May 30, 2025

View reviewed changes

Makefile Outdated Show resolved Hide resolved

Update Makefile

399b6c1

Co-authored-by: Andrew Murray <3112309+radarhere@users.noreply.github.com>

wiredfool merged commit 256f6ea into python-pillow:main May 30, 2025
56 of 57 checks passed

wiredfool mentioned this pull request Jun 2, 2025

Revert "Fix memory leak in TiffEncode" #8992

Closed

radarhere mentioned this pull request Jun 7, 2025

Fix libtiff cleanup #9002

Merged

This was referenced Jun 12, 2025

Update dependency cibuildwheel to v3 #9010

Merged

Memory leak when calling PIL.Image.Image.__arrow_c_array__() #8950

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Valgrind Memory Leak Checking #8954

Valgrind Memory Leak Checking #8954

Uh oh!

wiredfool commented May 13, 2025 •

edited

Loading

Uh oh!

aclark4life commented May 13, 2025

Uh oh!

Uh oh!

Uh oh!

wiredfool commented May 14, 2025

Uh oh!

aclark4life commented May 14, 2025

Uh oh!

wiredfool commented May 14, 2025

Uh oh!

wiredfool commented May 15, 2025

Uh oh!

wiredfool commented May 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

radarhere commented Jun 2, 2025

Uh oh!

wiredfool commented Jun 2, 2025 •

edited

Loading

Uh oh!

radarhere commented Jun 8, 2025

Uh oh!

radarhere commented Jun 25, 2025

Uh oh!

Uh oh!

Uh oh!

Valgrind Memory Leak Checking #8954

Valgrind Memory Leak Checking #8954

Uh oh!

Conversation

wiredfool commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aclark4life commented May 13, 2025

Uh oh!

Uh oh!

Uh oh!

wiredfool commented May 14, 2025

Uh oh!

aclark4life commented May 14, 2025

Uh oh!

wiredfool commented May 14, 2025

Uh oh!

wiredfool commented May 15, 2025

Uh oh!

wiredfool commented May 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

radarhere commented Jun 2, 2025

Uh oh!

wiredfool commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

radarhere commented Jun 8, 2025

Uh oh!

radarhere commented Jun 25, 2025

Uh oh!

Uh oh!

wiredfool commented May 13, 2025 •

edited

Loading

wiredfool commented Jun 2, 2025 •

edited

Loading