-
Notifications
You must be signed in to change notification settings - Fork 1.7k
OpenGL renderer
The OpenGL renderer implementation is based around the idea of batching as many drawing operations as possible into a single draw call for the graphics card. This is necessary to minimise the overhead of CPU -> GPU communication and state changes on the graphics card. To that effort, all of the draw calls in a single frame are not executed immediately, but added to a queue with their parameters. The parameters include details like bounds, texture, and tertiary colour. Once an operation is executed that depends on previously drawn contents in one way or another, like the post-processing pass or presentation to the screen, the draw operations that are still in the queue are flushed. Flushing here means that they are actually sent to the GPU to be processed.
Draw commands are queued in so-called command buffers, named after the constructs in the OpenGL GL_NV_command_list
extension and Vulkan API. There are three possible draw calls and their parameters are defined in structs that can be found in the src/drawing/engines/opengl/DrawCommands.h
file. The parameters are derived directly from the OpenGL implementation before it was optimized.
To be able to execute many draw commands that access textures in one draw call, they all need to be accessible from the fragment shader simultaneously. Most OpenGL implementations only allow up to 32 texture samplers in a single shader, which is way too limited. There are two possible approaches to expose many textures to the shader at once: array textures and texture atlases. The OpenGL renderer in OpenRCT2 uses a hybrid approach: an array texture of atlases.
Pure array textures were quickly dismissed, because those too are limited to only 2048 elements on most implementations. Given that some frames contain over 10k+ sprites with many unique textures, this is simply not scalable enough. It was proposed to use multiple samplers in combination with multiple arrays, but unfortunately GLSL does not allow selecting a sampler using a dynamic index. Iterating over all of them would be bad for performance.
Instead, the implementation uses a collection of texture atlases. Texture atlases are created for a specific power-of-two texture size like 32x32, 64x64 and so on. If a new texture comes in and there is no suitable texture atlas with any free spots left, then a new atlas is created and added to the array texture of atlases. All of the atlases have the same dimensions (required for texture arrays), but the images within do not. Each texture image is assigned a single square region in the appropriate atlas. The square regions are always equal to the power-of-two size that the atlas was created for. The idea of having different atlases for different power-of-two size textures is to ensure tight packing to waste as little VRAM as possible.
Here are two examples of what atlases look like in the array texture (from RenderDoc):
Once a texture is in an atlas like that, the draw command only needs 2 parameters to access the texture: the index of the atlas and the top-left to bottom-right coordinates within the atlas. The bottom-right coordinates are based on the actual size of the image and not the entire power-of-two block. As many atlases are created as needed when new textures come in, and the system can scale up to 2048 atlases simultaneously on most implementations.
Rectangles and lines are added to the command buffers, but are currently still processed as individual draw calls. This is because nearly all rectangles currently depend on previously rendered contents for correct transparency, and lines are so rarely used that it's not worth optimising them.
Sprites make up over 99% of the draw calls per frame, so this is where effort was focused on when optimising the OpenGL renderer. Once the sprite command buffer is flushed, the parameters of each individual draw call are turned into a GPU friendly format and written to a vertex buffer. These parameters include the bounds, tertiary colour, texture atlas index, atlas coordinates, and so on. It may also include an atlas index and coordinates for a mask texture (raw sprites and masked sprites are combined). This data represents instance data for instanced rendering with glDrawArraysInstanced
. The old shaders used uniforms to set the data for each sprite, and they were modified to instead load the data from the instance buffer. Instance data is provided as vertex attributes that are loaded per-instance with glVertexAttribDivisor
.
When viewing most of the intro parks at a resolution of 2560x1440, up to 30,000 sprites are rendered per frame. The optimised OpenGL renderer can draw all of these with a single call to glDrawArraysInstanced
. It was impossible to achieve a stable frame rate by drawing each of these individually, except maybe with an API like Vulkan where draw calls have much lower overhead.
Are there more optimization opportunities left? Definitely. The sprites are currently drawn by instancing over just 6 vertices (a quad), which is not something that drivers and GPUs are very fond of. It may be possible to achieve even higher frame rates by generating all of the vertices and not using instancing, but initial attempts to do this actually yielded worse performance. It is probably wise to explore more obvious opportunities first, like changing the OpenGL renderer to do dirt-only drawing. Initial tests have shown that this can lead to very significantly improved frame rates.
- Home
- FAQ & Common Issues
- Roadmap
- Installation
- Building
- Features
- Development
- Benchmarking & stress testing OpenRCT2
- Coding Style
- Commit Messages
- Overall program structure
- Data Structures
- CSS1.DAT
- Custom Music and Ride Music Objects
- Game Actions
- G1 Elements Layout
- game.cfg structure
- Maps
- Music Cleanup
- Objects
- Official extended scenery set
- Peep AI
- Peep Sprite Type
- RCT1 ride and vehicle types and their RCT2 equivalents
- RCT12_MAX_SOMETHING versus MAX_SOMETHING
- Ride rating calculation
- SV6 Ride Structure
- Settings in config.ini
- Sizes and angles in the game world
- Sprite List csg1.dat
- Sprite List g1.dat
- Strings used in RCT1
- Strings used in the game
- TD6 format
- Terminology
- Track Data
- Track Designs
- Track drawers, RTDs and vehicle types
- Track types
- Vehicle Sprite Layout
- Widget colours
- Debugging OpenRCT2 on macOS
- OpenGL renderer
- Rebase and Sync fork with OpenRCT2
- Release Checklist
- Replay System
- Using minidumps from crash reports
- Using Track Block Get Previous
- History
- Testing