Improve chunking in VTKHDF writer #4870

pazner · 2025-05-23T16:00:05Z

The data arrays grow only in the first dimension, and their other dimensions are fixed. Therefore, the chunk size is those dimensions should be equal to the dataset size. This can greatly reduce the size of saved datasets. (With compression enabled, it doesn't make a big difference; without compression, the difference is significant).

PR	Author	Editor	Reviewers	Assignment	Approval	Merge
#4870	@pazner	@tzanio	@helloworld922 + @camierjs	5/25/25	⌛due 6/8/25	⌛due 6/15/25

PR Checklist

Code builds.
Code passes make style.
Update CHANGELOG:
- Is this a new feature users need to be aware of? New or updated example or miniapp?
- Does it make sense to create a new section in the CHANGELOG to group with other related features?
Update INSTALL:
- Had a new optional library been added? If so, what range of versions of this library are required? (Make sure the external library is compatible with our BSD license, e.g. it is not licensed under GPL!)
- Have the version ranges for any required or optional libraries changed?
- Does make or cmake have a new target?
- Did the requirements or the installation process change? (rare)
Update continuous integration server configurations if necessary (e.g. with new version requirements for each of MFEM's dependencies)
- .github
- .appveyor.yml
Update .gitignore:
- Check if make distclean; git status shows any files that were generated from the source by the project (not an IDE) but we don't want to track in the repository.
- Add new patterns (just for the new files above) and re-run the above test.
New examples:
- All sample runs at the top of the example source file work.
- Update examples/makefile:
  - Add the example code to the appropriate SEQ_EXAMPLES and PAR_EXAMPLES variables.
  - Add any files generated by it to the clean target.
  - Add the example binary and any files generated by it to the top-level .gitignore file.
- Update examples/CMakeLists.txt:
  - Add the example code to the ALL_EXE_SRCS variable.
  - Make sure THIS_TEST_OPTIONS is set correctly for the new example.
- List the new example in doc/CodeDocumentation.dox.
- If new examples directory (e.g.examples/pumi), list it in doc/CodeDocumentation.conf.in
- Companion pull request for documentation in mfem/web repo:
  - Update or add example-specific documentation, see e.g. the src/examples.md.
  - Add the description, labels and screenshots in src/examples.md and src/img.
  - In examples.md, list the example under the appropriate categories, add new categories if necessary.
  - Add a short description of the example in the "Extensive Examples" section of features.md.
New miniapps:
- All sample runs at the top of the miniapp source file work.
- Update top-level makefile and makefile in corresponding miniapp directory.
- Add the miniapp binary and any files generated by it to the top-level .gitignore file.
- Update CMake build system:
  - Update the CMakeLists.txt file in the miniapps directory, if the new miniapp is in a new directory.
  - Add/update the CMakeLists.txt file in the new miniapp directory.
  - Consider adding a new test for the new miniapp.
- List the new miniapp in doc/CodeDocumentation.dox
- If new miniapps directory (e.g.miniapps/nurbs), add it to MINIAPP_SUBDIRS in the makefile.
- If new miniapps directory (e.g.miniapps/nurbs), list it in doc/CodeDocumentation.conf.in
- Companion pull request for documentation in mfem/web repo:
  - Update or add miniapp-specific documentation, see e.g. the src/meshing.md and src/electromagnetics.md files.
  - Add the description, labels and screenshots in src/examples.md and src/img.
  - The miniapps go at the end of the page, and are usually listed only under a specific "Application (PDE)" category.
  - Add a short description of the miniapp in the "Extensive Examples" section of features.md.
New capability:
- All new public, protected, and private classes, methods, data members, and functions have full Doxygen-style documentation in source comments. Documentation should include descriptions of member data, function arguments and return values, template parameters, and prerequisites for calling new functions.
- Pointer arguments and return values must specify whether ownership is being transferred or lent with the call.
- Any new functions should include descriptions of their intended use e.g. for internal use only, user-facing, etc., along with references to example code whenever possible/appropriate.
- Consider adding new sample runs in existing examples to highlight the new capability.
- Consider saving cool simulation pictures with the new capability in the Confluence gallery (LLNL only) or submitting them, via pull request, to the gallery section of the mfem/web repo.
- If this is a major new feature, consider mentioning it in the short summary inside README (rare).
- List major new classes in doc/CodeDocumentation.dox (rare).
Update this checklist, if the new pull request affects it.
Run make unittest to make sure all unit tests pass.
Run the tests in tests/scripts.
(LLNL only) After merging:
- Update internal tests to include the new features.

The data arrays grow only in the first dimension, and their other dimensions are fixed. Therefore, the chunk size is those dimensions should be equal to the dataset size. This can greatly reduce the size of saved datasets.

tzanio · 2025-05-25T19:44:28Z

This PR is now under review (see the table in the PR description). To help with the review process, please do not force push to the branch.

pazner · 2025-05-26T22:01:55Z

Also: should compression be enabled by default? VTKHDF compression does not require MFEM to be linked with zlib (since the compression is built into the HDF5 library).

Co-authored-by: John Camier <camierjs@gmail.com>

pazner · 2025-05-29T16:02:48Z

@camierjs @helloworld922, any opinions on the default compression? Currently, if MFEM is compiled with zlib, compression will be enabled by default. This makes sense for the standard ParaViewDataCollection (since compression requires zlib), but maybe less so for VTKHDF.

Should we enable compression by default? The trade-off is runtime vs. storage space.

helloworld922 · 2025-05-30T02:52:03Z

I ran a test using ex1p, modifying par_ref_levels=4 to get larger outputs (17,057,025 DOFs, first orde).

mpirun -np 16 ./ex1p -m ../data/beam-hex.mesh

No compression: ParMesh writeout took 1.97s, ParGridFunction writeout took 0.98s. Output file was 4.64 GB.
Compression level 6: ParMesh writeout took 1.99s, ParGridFunction writeout took 1.53s. Output file was 0.49 GB, for a compression ratio of 9.47.

The increase in writeout time seems to be relatively minor, and probably should be expected to be negligible for a full simulation anyways. I think based on this it's fine to leave compression on by default. What compression ratios/writeout times have you observed on other problems?

pazner · 2025-05-30T14:21:29Z

It's problem-dependent, but I have seen in the past compression ratios of about 3, and relatively insignificant differences in the runtime, so I think it's worth it to enable by default. This is done in 2421b48.

pazner requested review from helloworld922 and camierjs May 23, 2025 16:00

pazner self-assigned this May 23, 2025

pazner added minor visualization ready-for-review labels May 23, 2025

pazner added 2 commits May 23, 2025 09:12

Improve chunking in VTKHDF writer

0366ad2

The data arrays grow only in the first dimension, and their other dimensions are fixed. Therefore, the chunk size is those dimensions should be equal to the dataset size. This can greatly reduce the size of saved datasets.

Fix shadow warning in VTKHDF

70cb8fc

pazner force-pushed the vtkhdf-chunk-fix branch from fb23468 to 70cb8fc Compare May 23, 2025 16:13

tzanio added in-review and removed ready-for-review labels May 25, 2025

tzanio assigned camierjs, helloworld922 and tzanio May 25, 2025

tzanio added this to Pull Requests May 25, 2025

tzanio added this to the mfem-4.9 milestone May 25, 2025

github-project-automation bot moved this to Review Now in Pull Requests May 25, 2025

camierjs and others added 3 commits May 26, 2025 16:11

Merge branch 'master' into vtkhdf-chunk-fix

e8ed1a4

Merge branch 'master' into vtkhdf-chunk-fix

a7aa6c5

Address some implicit conversion warnings

b0c3078

Co-authored-by: John Camier <camierjs@gmail.com>

pazner force-pushed the vtkhdf-chunk-fix branch from 4d746e1 to b0c3078 Compare May 29, 2025 03:42

camierjs approved these changes May 29, 2025

View reviewed changes

helloworld922 approved these changes May 29, 2025

View reviewed changes

pazner added 2 commits May 30, 2025 07:19

In VTKHDF, avoid resizing on initial dataset creation

2cd2d11

Enable compression by default in ParaViewHDFDataCollection

2421b48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve chunking in VTKHDF writer #4870

Improve chunking in VTKHDF writer #4870

Uh oh!

pazner commented May 23, 2025 •

edited by tzanio

Loading

Uh oh!

tzanio commented May 25, 2025

Uh oh!

pazner commented May 26, 2025

Uh oh!

pazner commented May 29, 2025

Uh oh!

helloworld922 commented May 30, 2025

Uh oh!

pazner commented May 30, 2025

Uh oh!

Uh oh!

Improve chunking in VTKHDF writer #4870

Are you sure you want to change the base?

Improve chunking in VTKHDF writer #4870

Uh oh!

Conversation

pazner commented May 23, 2025 • edited by tzanio Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tzanio commented May 25, 2025

Uh oh!

pazner commented May 26, 2025

Uh oh!

pazner commented May 29, 2025

Uh oh!

helloworld922 commented May 30, 2025

Uh oh!

pazner commented May 30, 2025

Uh oh!

Uh oh!

pazner commented May 23, 2025 •

edited by tzanio

Loading