merging very large datasets #827

a2nalea · 2024-11-08T12:43:50Z

a2nalea
Nov 8, 2024

Hello,
I have the following issue. I'm trying to merge together 2 (up to 4) very large xarray.DataArray. Each is big enough that I'm obliged to use the chunking option of xarray (which enables dask arrays) in order to work on them else I'm encountering a MemoryError.

This is how I load each dataarray:

first_array = xarray.open_zarr("some_file.zarr", chunks={"band": 10, "x": 500, "y": 500})

On smaller arrays I would do:

from rioxarray import merge

arrays = [first_array, second_array]
merged_array = merge.merge_arrays(arrays, nodata=0)

However... on my big datasets, that doesn't work. I encounter pretty quickly a MemoryError.

For now, the only solution I found is:

convert the zarr to geotifs files with first_array.to_raster("some_file.tif")
use the rasterio.merge to directly merge the files on disk:

from pathlib import Path
from rasterio import merge

rasters_to_merge = [rasterio.open(filename) for filename in filenames]
destination = Path("merged_raster.tif")
merge.merge(rasters_to_merge, dst_path=destination)

My question (finally), is there a native way in xarray to perform the same operations? Thanks

Answered by snowman2

Nov 8, 2024

is there a native way in xarray to perform the same operations?

Not in a memory efficient way presently. The solution of writing to disk is a great way for you to merge larger datasets in a memory efficient way.

View full answer

snowman2 · 2024-11-08T14:36:15Z

snowman2
Nov 8, 2024
Maintainer

is there a native way in xarray to perform the same operations?

Not in a memory efficient way presently. The solution of writing to disk is a great way for you to merge larger datasets in a memory efficient way.

2 replies

a2nalea Nov 10, 2024
Author

May I ask if there are any plans toward implementing a memory efficient way using dask (for instance) in the future? Or what are the pain points preventing this?

snowman2 Nov 11, 2024
Maintainer

No plans presently. Just not a priority.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merging very large datasets #827

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

merging very large datasets #827

Uh oh!

a2nalea Nov 8, 2024

Replies: 1 comment · 2 replies

Uh oh!

snowman2 Nov 8, 2024 Maintainer

Uh oh!

a2nalea Nov 10, 2024 Author

Uh oh!

snowman2 Nov 11, 2024 Maintainer

a2nalea
Nov 8, 2024

Replies: 1 comment 2 replies

snowman2
Nov 8, 2024
Maintainer

a2nalea Nov 10, 2024
Author

snowman2 Nov 11, 2024
Maintainer