Performance and Profiling

This page describes how to profile & improve the plugin performance.

General Profiling tips

The plugin is tightly integrated with Unreal's profiling tools.

You can use the following commands to show the various voxel stats categories:

  • stat Voxel

  • stat VoxelMemory

  • stat VoxelCounters

  • stat VoxelProcMeshMemory

All the plugin functions that might affect performance are profiled. Additionally, you can add #define VOXEL_SLOW_STATS 1 to VoxelUserGlobals.h.

This will however have a significant impact on performance.

Editor performance

A simple way to speed up the plugin during development is by lowering your LOD Range value on the VoxelWorldEditorControls actor in your scene, for instance to 1000. Note that this value is not persisted and will need to be re-set upon reloading the level.

Rendering performance

Rendering performance is going to be limited by either the CPU or the GPU.

Do stat VoxelCounters in the console. You will see stats like these:

The two lines of interest here are:

  • Num Voxel Triangles Drawn: if this is too high, the GPU will spend a long time rendering the scene.

Additionally, a lot of memory will be used to store the vertices. To decrease it, reduce your voxel invokers LOD range or increase your voxel size.

  • Num Voxel Draw Calls: this is the number of draw calls needed to draw the voxel world. Having too many draw calls will make you CPU bound.

By default, there is one draw call per material per chunk. If you have many different materials displayed in a Single Index or Double Index material config, your draw call count can quickly explode.

To reduce draw calls, you can enable Mesh Merging, simplify your generator to produce less different materials on high LOD (using the LOD node) or switch to a 5 way blend (see Materials).

You can get a better idea of your draw calls using voxel.renderer.ShowMeshSections 1: one color = one draw call.

Mesh Merging

The plugin has a builtin option to merge voxel meshes together if they have the same material.

This can be of a great help to reduce draw calls. The downside is a very slightly slower update time, as the mesh buffers need to be merged together. Culling might also be less efficient.

  • Expand the Voxel - Rendering section and tick Merge Chunks:

  • You can configure the Cluster Size. Higher cluster size means less draw calls, but also less culling and bigger meshes, so you might want to play with the value a bit to find the best for you.

Note that due to an artificial limitation in the way clusters are built, only chunks with the same LOD can be merged.

  • Do Not Merge Collisions And Navmesh will create separate (not rendered) meshes just for collisions and navmesh.

This is recommended as cooking collisions for merged meshes gets really expensive.

Merging can lead to huge improvements, consider this 1024x1024 flat world with a Max LOD of 0:

By default, this requires 1024 draw calls to be draw. Enabling merge chunk with a Cluster Size of 64:

Only 256 draw calls now! And if we cramp cluster size to 512, we only get 4 draw calls!

Tips

If you get a lot of draw calls with your generator, it might be because your surface is below and above Z = 0, which causes the plugin to create a chunk for Z < 0 and one for Z > 0. Try moving your surface up a little bit, to make the surface fit into a single chunk layer (eg by using Z - 16 instead of Z).

If we move the surface a bit, you can see it intersects a lot less chunks, halving the draw calls!

Generation Performance

Generation performance is mainly limited by the world generator. You can confirm that by using voxel.mesher.PrintStats.

This will print generation statistics to the output log.

Stats are recorded per level (UWorld). You can clear the current level stats using voxel.mesher.ClearStats.

For example, with a Flat World, you might get this kind of stats:

We can see that even with a flat world, nearly 50% of the generation time is spent querying the world generator!

To improve generation times, you thus need to improve your world generator speed.

To do so, you can either:

  • Use a simpler world generator with less layers/fractals

  • Improve your generator range analysis

Range analysis

Range analysis is based on a simple principle: only a minor portion of the entire voxel world actually needs to have a mesh; a lot of the chunks end up empty.

Range analysis is an heuristic to try to predict which chunks are guaranteed to be empty.

It works by querying the world generator with the chunk bounds, asking the generator what is the range of densities this chunk can have.

A good range analysis can improve a generator performance by several order of magnitude.

Be careful not to underestimate the range, else you will get holes in your world!

Voxel Graphs

If you have a voxel graph, range analysis is done automatically for you: you can see it by pressing the Range Analysis Debug button:

It will show the range of densities the currently previewed bounds can have:

Here, X and Y are between -256 and 256. The IQ Noise node Value output is between 0.72 and 0.79. This is computed by sampling the noise in a large amount of points.

Then, using some basic range math, we end up with the value between -400 and 362: the chunk needs to be generated as the range contains 0.

However, if we say that Z is 512 now:

Here the final value is between 112 and 874: the chunk can safely be skipped.

C++ World Generators

In C++, you need to do the range analysis yourself in GetValueRangeImpl. See World Generators.

Debugging

You can debug the results of range analysis using voxel.renderer.ShowChunksEmptyStates 1 and voxel.renderer.ClearChunksEmptyStates.

Green chunks are entirely skipped and are thus very cheap to compute. Red chunks need to have all their values queried, and are very expensive to compute.

For instance this is the result for our voxel graph above:

Sometimes you might want to turn off all range analysis (eg if you are getting holes, to see if it comes from that).

You can do so using voxel.mesher.DoNotSkipEmptyChunks 1.

Multithreading settings

The plugin is heavily multithreaded, with nearly everything happening async. It uses its own thread pool to handle task priorities.

You can configure the number of threads to allocate to your voxel world using Number Of Threads:

If you have multiple voxel worlds, they should share the same pool. See Sharing Pools Between World.

Priorities Updates

By default, task priorities are updated quite frequently as they are based on the distance from the voxel invokers: they need to be updated in case the voxel invokers moved.

How often can be configured using the Priority Duration property:

However, if you have a large number of task (> 100000) recomputing task priorities is going to get really expensive.

You can check this by watching how long FVoxelQueuedThreadPool::ReturnToPoolOrGetNextJob takes:

If it's significant, try turning Constant Priorities on and see if it generates faster:

Custom Priority Order

You can configure task priorities using the Priority Categories and Priority Offsets properties:

Tasks are first going to be sorted on their priority category. For instance here Render Octree tasks will always be computed first.

Then, some tasks have dynamic priorities: this include mesh merging tasks, meshing tasks, collisions cooking tasks and foliage build tasks.

The tasks will be sorted by how far they are from the voxel invokers.

If you want two tasks in a same category and at the same distance from an invoker to be computed in a specific order, use the Priority Offsets.

For instance here foliage build, collision cooking and the meshing of chunks that are visible and have collision all have the same priority category.

However, they have the following offsets:

Thus, for tasks that are at the same distance from invokers, collision cooking tasks will computed first, then meshing and finally foliage.

Internally, the following 64 bit priority will be used:

(PriorityCategory << 32) | (TaskPriority + PriorityOffset)

TaskPriority usually being the distance to the invokers.

Async Edits

Most voxel edit nodes have two versions: a single threaded one that will run on the game thread, and a latent/async one.

For instance:

If your game thread is slowed down by doing the edits directly, try using the async functions!

Thread Safety Note

The plugin has a spatial locking mechanism: you can lock specific bounds of the world either Read-Only or Read-Write.

This works by locking the data octrees overlapping the bounds.

While this adds a significant cost to locking, it's vital in order to be able to edit part of the world while another part is updating.

In C++, use FVoxelReadScopeLock and FVoxelWriteScopeLock.

Sharing Pools Between World

Voxel Thread Pools are designed to be shared between voxel worlds. A voxel world follows the following order to find its pool:

1) the voxel world is created

2) it looks for a pool

3) if it's setup to create its own pool, it creates its own pool for the active UWorld (unless one already exists)

4) else it looks for a pool for the active UWorld

5) else it looks for the global pool

6) else it creates a new pool with default settings

Runtime Performance Information

The plugin supports querying the memory usage of different parts of the plugin at runtime, even in Shipping Builds.

This can be useful to give your users feedback on how much memory their voxel terrain is using.

You can use the following functions:

The following memory categories are available:

Voxel Memory Usage and Save File Size

One concern with voxels is that they can take a lot of memory. This is improved a lot by the plugin only storing edited data in memory.

In addition to that, several additional options are available.

Check for Single Values

This optimization will check if values in a data chunk are all identical. If it's the case, only one will be stored.

To debug, use voxel.data.ShowValuesState 1.

Round Voxels

Voxel densities only need to be stored up to 2 voxels away from the surface (2 for good gradient normals).

Beyond that, densities can safely be rounded to -1 or 1, greatly increasing compression efficiency.

You can do this rounding using the Round Voxels function:

Compress Into Heightmap

If your world generator is a heightmap asset, you can write back vertical edits to it, only storing the truly volumetric edits.

Note that this is a lossy operation as you will get some banding.

Use the Compress Into Heightmap function:

Last updated