kvark's picture

KRI features: bounding boxes and hierarchical Z-culling

Hello everyone!

Today I'd like to introduce a freshly implemented complex of renders that operate with bounding boxes.

Bounding box generation on GPU
Works for each mesh as follows:

  1. Whole mesh is sent to GPU as an array of vertex positions
  2. A geometry shader sends each position as a color to the first target fragment.
  3. The same shader sends an inverted (multiplied by -1) duplicate of the position (as a color) into the second target fragment.
  4. A target FBO has a 1D color RGBA32F render buffer, containing 2 pixels for each objects in the scene. It's bound with a viewport selecting only the 2 pixels corresponding to the current object.
  5. The blending function is set to Minimum, and the coefficients are 1,1.
  6. After all objects BBoxes are updated, the render buffer data is read into a buffer object.

As a result I have a buffer object containing minimum position and negative maximum position of a vertex (in local space) for each mesh. If we need them on CPU side, we can read them at any time.
Note that models can be morphed and skinned as you like and still will produce the correct bounding boxes! You just need to update them whenever vertex positions change.

Bounding box drawing
We definitely would want to see the result on the screen. Here is how it's done in KRI:

  1. We send the corresponding objects spatial data into another buffer object. In case of KRI it's 2 vectors (position+scale and rotation).
  2. We issue a draw call on the global mesh containing bounding boxes and spatial information of the objects.
  3. Geometry shader reconstructs all 8 points of a bounding box and transforms them into camera projection space.
  4. The same shader generates 12 lines based on the bounding box vertices.
  5. Rendering states: DepthTest:ON, DepthWrite:OFF, LineOffset:-1,-1

As a result, we have all bounding boxes drawn to the screen in a single draw call! It's extremely cheap, not to mention that the BBox information has never left GPU side, so no extra bus transfer costs present.

Constructing Z-buffer mip chain
Before culling anything, we need to set all mipmap levels for our Z buffer. In KRI it is done by writing a shader and calling a single helper function (kri.gen.Texture.createMipmap). The shader chooses a maximum depth within given 4 fragments and writes it into the next mip level as a depth value.

Culling the scene
Here is the most interesting guest of our party. Within a single draw call we create a buffer containing boolean values of visibility, one for every object in a scene:

  1. We render the same mesh we constructed for BBox drawing, binding the Z-buffer as a texture (containing our mip map)
  2. For each BBox the shader computes its camera projection space box (also 3D).
  3. Then it determines the minimal mip level containing 4 nearby texels covering the entire box.
  4. By comparing the maximum of these depth values with a close plane level of your screen space box we set the visibility output value as boolean.
  5. Transform feedback carefully gathers all visibility values and disables the rasterizer for this stage.
  6. Now we need to read them back into system memory in order to discard draw calls in the future.

That's it! Thanks to Transform Feedback and our intelligent BBox handing on GPU we can effectively cull the whole scene in a single draw call!

Hope it helps and inspires you!

KRI Bounding box drawing
KRI Hierarchical Z culling


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
DreaminD's picture

That's interesting... But where do you get Z-information to fill depth mipmap in the first place? Do you render all occluders beforehand?

kvark's picture

EarlyZ pass was always a standard way of starting a frame rendering in KRI. So Z mipmaps are created from it.

There are a couple of differences with a conventional hier-Z scheme:

  1. My Z mip map is constructed from a full Z buffer, while generally you can start with 256x256 size level and below.
  2. My Z mip map includes all non-transparent objects, while other implementations suggest rendering only the simplest and biggest occluders
  3. I'm not using bounding spheres, nor do I need to have my bounding volumes on CPU side.