Skip to content

Improve chunking in VTKHDF writer #4870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Improve chunking in VTKHDF writer #4870

wants to merge 7 commits into from

Conversation

pazner
Copy link
Member

@pazner pazner commented May 23, 2025

The data arrays grow only in the first dimension, and their other dimensions are fixed. Therefore, the chunk size is those dimensions should be equal to the dataset size. This can greatly reduce the size of saved datasets. (With compression enabled, it doesn't make a big difference; without compression, the difference is significant).

PR Author Editor Reviewers Assignment Approval Merge
#4870 @pazner @tzanio @helloworld922 + @camierjs 5/25/25 ⌛due 6/8/25 ⌛due 6/15/25
PR Checklist
  • Code builds.
  • Code passes make style.
  • Update CHANGELOG:
    • Is this a new feature users need to be aware of? New or updated example or miniapp?
    • Does it make sense to create a new section in the CHANGELOG to group with other related features?
  • Update INSTALL:
    • Had a new optional library been added? If so, what range of versions of this library are required? (Make sure the external library is compatible with our BSD license, e.g. it is not licensed under GPL!)
    • Have the version ranges for any required or optional libraries changed?
    • Does make or cmake have a new target?
    • Did the requirements or the installation process change? (rare)
  • Update continuous integration server configurations if necessary (e.g. with new version requirements for each of MFEM's dependencies)
    • .github
    • .appveyor.yml
  • Update .gitignore:
    • Check if make distclean; git status shows any files that were generated from the source by the project (not an IDE) but we don't want to track in the repository.
    • Add new patterns (just for the new files above) and re-run the above test.
  • New examples:
    • All sample runs at the top of the example source file work.
    • Update examples/makefile:
      • Add the example code to the appropriate SEQ_EXAMPLES and PAR_EXAMPLES variables.
      • Add any files generated by it to the clean target.
      • Add the example binary and any files generated by it to the top-level .gitignore file.
    • Update examples/CMakeLists.txt:
      • Add the example code to the ALL_EXE_SRCS variable.
      • Make sure THIS_TEST_OPTIONS is set correctly for the new example.
    • List the new example in doc/CodeDocumentation.dox.
    • If new examples directory (e.g.examples/pumi), list it in doc/CodeDocumentation.conf.in
    • Companion pull request for documentation in mfem/web repo:
      • Update or add example-specific documentation, see e.g. the src/examples.md.
      • Add the description, labels and screenshots in src/examples.md and src/img.
      • In examples.md, list the example under the appropriate categories, add new categories if necessary.
      • Add a short description of the example in the "Extensive Examples" section of features.md.
  • New miniapps:
    • All sample runs at the top of the miniapp source file work.
    • Update top-level makefile and makefile in corresponding miniapp directory.
    • Add the miniapp binary and any files generated by it to the top-level .gitignore file.
    • Update CMake build system:
      • Update the CMakeLists.txt file in the miniapps directory, if the new miniapp is in a new directory.
      • Add/update the CMakeLists.txt file in the new miniapp directory.
      • Consider adding a new test for the new miniapp.
    • List the new miniapp in doc/CodeDocumentation.dox
    • If new miniapps directory (e.g.miniapps/nurbs), add it to MINIAPP_SUBDIRS in the makefile.
    • If new miniapps directory (e.g.miniapps/nurbs), list it in doc/CodeDocumentation.conf.in
    • Companion pull request for documentation in mfem/web repo:
      • Update or add miniapp-specific documentation, see e.g. the src/meshing.md and src/electromagnetics.md files.
      • Add the description, labels and screenshots in src/examples.md and src/img.
      • The miniapps go at the end of the page, and are usually listed only under a specific "Application (PDE)" category.
      • Add a short description of the miniapp in the "Extensive Examples" section of features.md.
  • New capability:
    • All new public, protected, and private classes, methods, data members, and functions have full Doxygen-style documentation in source comments. Documentation should include descriptions of member data, function arguments and return values, template parameters, and prerequisites for calling new functions.
    • Pointer arguments and return values must specify whether ownership is being transferred or lent with the call.
    • Any new functions should include descriptions of their intended use e.g. for internal use only, user-facing, etc., along with references to example code whenever possible/appropriate.
    • Consider adding new sample runs in existing examples to highlight the new capability.
    • Consider saving cool simulation pictures with the new capability in the Confluence gallery (LLNL only) or submitting them, via pull request, to the gallery section of the mfem/web repo.
    • If this is a major new feature, consider mentioning it in the short summary inside README (rare).
    • List major new classes in doc/CodeDocumentation.dox (rare).
  • Update this checklist, if the new pull request affects it.
  • Run make unittest to make sure all unit tests pass.
  • Run the tests in tests/scripts.
  • (LLNL only) After merging:
    • Update internal tests to include the new features.

pazner added 2 commits May 23, 2025 09:12
The data arrays grow only in the first dimension, and their other
dimensions are fixed. Therefore, the chunk size is those dimensions
should be equal to the dataset size. This can greatly reduce the
size of saved datasets.
@pazner pazner force-pushed the vtkhdf-chunk-fix branch from fb23468 to 70cb8fc Compare May 23, 2025 16:13
@tzanio
Copy link
Member

tzanio commented May 25, 2025

This PR is now under review (see the table in the PR description). To help with the review process, please do not force push to the branch.

@pazner
Copy link
Member Author

pazner commented May 26, 2025

Also: should compression be enabled by default? VTKHDF compression does not require MFEM to be linked with zlib (since the compression is built into the HDF5 library).

@pazner pazner force-pushed the vtkhdf-chunk-fix branch from 4d746e1 to b0c3078 Compare May 29, 2025 03:42
@pazner
Copy link
Member Author

pazner commented May 29, 2025

@camierjs @helloworld922, any opinions on the default compression? Currently, if MFEM is compiled with zlib, compression will be enabled by default. This makes sense for the standard ParaViewDataCollection (since compression requires zlib), but maybe less so for VTKHDF.

Should we enable compression by default? The trade-off is runtime vs. storage space.

@helloworld922
Copy link
Contributor

I ran a test using ex1p, modifying par_ref_levels=4 to get larger outputs (17,057,025 DOFs, first orde).

mpirun -np 16 ./ex1p -m ../data/beam-hex.mesh

No compression: ParMesh writeout took 1.97s, ParGridFunction writeout took 0.98s. Output file was 4.64 GB.
Compression level 6: ParMesh writeout took 1.99s, ParGridFunction writeout took 1.53s. Output file was 0.49 GB, for a compression ratio of 9.47.

The increase in writeout time seems to be relatively minor, and probably should be expected to be negligible for a full simulation anyways. I think based on this it's fine to leave compression on by default. What compression ratios/writeout times have you observed on other problems?

@pazner
Copy link
Member Author

pazner commented May 30, 2025

It's problem-dependent, but I have seen in the past compression ratios of about 3, and relatively insignificant differences in the runtime, so I think it's worth it to enable by default. This is done in 2421b48.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Review Now
Development

Successfully merging this pull request may close these issues.

4 participants