-
-
Notifications
You must be signed in to change notification settings - Fork 82
Description
Description:
There is a significant performance difference between 32-bit and 64-bit builds when using libspng. After detailed analysis and discussion in the zlib-ng repository, it was observed that the 64-bit build performs considerably worse compared to the 32-bit build.
Observed Behavior:
In our tests, encoding a PNG with 32-bit results in ~131 ms, while the same image takes ~400 ms with the 64-bit build.
Both tests were conducted on the same machine with the same configuration, using libspng and zlib-ng.
Detailed Analysis:
Filter Decision as the Cause:
Based on profiling, it appears the 64-bit build takes significantly longer because of differences in the filter decision logic.
The 32-bit build uses a more optimized path (e.g., SIMD vectorized loops), while the 64-bit build appears to rely on scalar operations.
Forcing the filter choice to SPNG_FILTER_CHOICE_NONE resolves the performance issue and brings the 64-bit performance in line with the 32-bit results. However, this is a manual workaround.
Filter Logic Differences:
libspng dynamically selects filters during the encoding process.
It seems the heuristic for choosing filters differs between 32-bit and 64-bit builds, potentially due to underlying differences in how zlib-ng operates in these environments.
zlib-ng Findings:
The analysis in the zlib-ng repository revealed that the 64-bit build might have suboptimal behavior in encode_scanline.
Scalar operations and loops dominate the profiling data in the 64-bit build, while the 32-bit build uses SIMD vectorized loops effectively.
Steps to Reproduce:
Use the provided C++ example to encode a raw image into a PNG with libspng.
Compare the encoding times between 32-bit and 64-bit builds.
Optionally, set the filter choice manually to SPNG_FILTER_CHOICE_NONE to observe how it impacts the 64-bit performance.
resultCode = spng_set_option(ctx, SPNG_FILTER_CHOICE, SPNG_FILTER_CHOICE_NONE);
Expected Behavior:
Both 32-bit and 64-bit builds should perform similarly, with comparable encoding times and efficient use of filters.
Links to Related Issues:
zlib-ng Performance Analysis
Request:
Could you investigate the filter decision logic in libspng? Specifically:
Why the 64-bit build seems to perform worse in selecting filters.
Whether this is related to differences in how zlib-ng interacts with libspng in 32-bit vs. 64-bit environments.
How the default filter heuristic could be improved for 64-bit builds to align with 32-bit behavior.
Full Example:
#include <iostream>
#include <fstream>
#include <vector>
#include <stdexcept>
#include <cmath>
#include <string>
#include <cstring>
#include <chrono>
extern "C" {
#include "/home/adam/spng-install/include/spng.h"
}
int WritePNGCallback(spng_ctx *ctx, void *user, void *src, size_t length)
{
std::ofstream* out = reinterpret_cast<std::ofstream*>(user);
if(!out->write(reinterpret_cast<const char*>(src), length))
{
return SPNG_IO_ERROR;
}
return SPNG_OK;
}
void EncodeRawImageToPNG(const std::string& RawFileName,
const std::string& PngFileName,
uint32_t Width,
uint32_t Height,
int DPI)
{
spng_ctx* ctx = nullptr;
spng_ihdr ihdr;
std::ifstream rawFile(RawFileName, std::ios::binary);
if(!rawFile.is_open()) throw std::runtime_error("Failed to open raw file.");
rawFile.seekg(0, std::ios::end);
std::streampos fileSize = rawFile.tellg();
rawFile.seekg(0, std::ios::beg);
std::vector<unsigned char> rawBuffer(fileSize);
if(!rawFile.read(reinterpret_cast<char*>(rawBuffer.data()), fileSize))
throw std::runtime_error("Failed to read raw file into memory.");
rawFile.close();
std::ofstream pngFile(PngFileName, std::ios::binary);
if(!pngFile.is_open()) throw std::runtime_error("Failed to create/open PNG output file.");
ctx = spng_ctx_new(SPNG_CTX_ENCODER);
if(ctx == nullptr) throw std::runtime_error("Failed to create spng context.");
try
{
std::memset(&ihdr, 0, sizeof(ihdr));
ihdr.width = Width;
ihdr.height = Height;
ihdr.bit_depth = 8;
ihdr.color_type = SPNG_COLOR_TYPE_TRUECOLOR_ALPHA;
ihdr.compression_method = 0;
ihdr.filter_method = 0;
ihdr.interlace_method = SPNG_INTERLACE_NONE;
int resultCode = spng_set_ihdr(ctx, &ihdr);
if(resultCode != SPNG_OK)
throw std::runtime_error(std::string("Failed to set IHDR: ") + spng_strerror(resultCode));
resultCode = spng_set_option(ctx, SPNG_IMG_COMPRESSION_LEVEL, 1);
//Remove this comment line after testing.
//resultCode = spng_set_option(ctx, SPNG_FILTER_CHOICE, SPNG_FILTER_CHOICE_NONE);
if(resultCode != SPNG_OK)
throw std::runtime_error("Failed to set compression level to 1.");
resultCode = spng_set_png_stream(ctx, WritePNGCallback, &pngFile);
if(resultCode != SPNG_OK)
throw std::runtime_error("Failed to set PNG stream callback.");
double ppm = static_cast<double>(DPI) * 39.37;
int ippm = static_cast<int>(std::round(ppm));
spng_phys phys;
std::memset(&phys, 0, sizeof(phys));
phys.ppu_x = ippm;
phys.ppu_y = ippm;
phys.unit_specifier = 1;
resultCode = spng_set_phys(ctx, &phys);
if(resultCode != SPNG_OK)
throw std::runtime_error("Failed to set pHYs chunk.");
size_t imageSize = static_cast<size_t>(Width) * static_cast<size_t>(Height) * 4;
if(rawBuffer.size() < imageSize)
throw std::runtime_error("RAW buffer is smaller than the expected image size.");
resultCode = spng_encode_image(ctx, rawBuffer.data(), imageSize, SPNG_FMT_RAW, SPNG_ENCODE_FINALIZE);
if(resultCode != SPNG_OK)
throw std::runtime_error(std::string("Failed to encode image: ") + spng_strerror(resultCode));
}
catch(...)
{
spng_ctx_free(ctx);
pngFile.close();
throw;
}
spng_ctx_free(ctx);
pngFile.close();
}
int main(int argc, char *argv[])
{
size_t w = 2480;
size_t h = 3508;
const std::string raw_fname(argv[1]);
const std::string out_fname(argv[2]);
/* Just assuming a squarish image for now */
auto t0 = std::chrono::steady_clock::now();
EncodeRawImageToPNG(raw_fname, out_fname, w, h, 300);
auto t1 = std::chrono::steady_clock::now();
auto diff = t1 - t0;
double total = std::chrono::duration<double>(diff).count();
printf("img encode too %lf ms\n", total * 1e3);
return 0;
}
Raw RGBA image
raw-rgba.zip