Devlog #2: Light Culling for Procedurally Generated Scenes in Unity
In this post, I discuss how I implemented a custom real-time light culling system in Unity for my game, Doors & Corners. Spoiler: I wrote a custom light culling system.
In this post, I discuss how I implemented a custom real-time light culling system in Unity for my game, Doors & Corners. This was critical to my game because I made the decision that, while I wanted to use a very simple low-poly art style, I also wanted to use a relatively advanced lighting setup. Specifically, I wanted every light to be a shadowcaster. This approach creates dynamic scenes with lots of shadows, but it can also quickly cripple a game’s performance.
In Unity, one way to reduce the performance impact of shadowcasters is to use the lightmapper to bake shadow data for static objects in a scene. The lights in the scene can then be configured to use mixed-mode lighting, which allows them to use the baked data for rendering static objects and limits real-time rendering of shadows to dynamic objects only.
Unfortunately, this wasn't an option for Doors & Corners because it generates level data at runtime. Since Unity's built-in lightmapper processes scenes at edit time, this ruled out using it to bake the lighting data. As a result, every light had to be configured to real-time only mode. I could also have chosen to implement a custom lightmapper to bake the lighting data at runtime as part of the level generation, but this wasn’t a technical challenge that I wanted to take on.
A typical level in Doors & Corners uses a couple hundred lights to illuminate the rooms. Additionally, every projectile fired has a light attached to it, so a firefight between characters can add one to two dozen additional lights to a scene. With each of these configured as real-time shadowcasters, the average frame time was reduced to 33ms (30 FPS). My development box is far from top end in 2021 (i7-6700K, 32GB RAM, GTX 1080 8GB), but it is also well above my target min-spec machine, so dramatic performance improvements would be required.
The easiest solution to this problem would be to use Unity’s built-in occlusion culling feature, which prevents objects that aren't visible to the camera from being rendered. Unfortunately, just like with the lightmapper, this works by baking data for scenes at edit time, which means it also can't be used in games that have real-time procedurally generated scenes. To achieve the required performance, I was going to need to implement a custom culling system to stop lights that weren't visible from being rendered.
Initial Implementation: Distance-based culling
My initial approach to the culling system was to use the simplest possible implementation. The system just looped over every light in the scene and if the distance to the camera was greater than the culling distance, then the light was turned off. Otherwise, the light was turned on. The culling distance was set to the maximum possible size of a room on the level.
This approach delivered reasonably good results. The average frame time improved to around 16ms (60 FPS). It also didn’t take very long to implement – it doesn’t get much easier than looping over all the lights in a scene and checking their distance to the camera.
However, this approach had a couple of downsides. First, the results weren’t always correct. There were some cases where you could see into a room where some of the lights were beyond the culling distance. This would result in these lights suddenly popping on as the camera moved closer to them. This didn’t happen very often, most rooms were much smaller than the culling distance so the lights in adjacent rooms would still be on, but if there was a really long room or if several open doors all lined up then you would see it. Not ideal.
The second, more important problem was that the performance still wasn’t where it needed to be. If it was taking an average of 16ms to process a frame on my machine, then (a) this didn’t leave any room in the frame budget for other features and (b) my machine wasn’t meant to represent the minimum specs for the game, which meant performance on a min-spec machine would fall well below 60 FPS.
The root cause of these problems was that "distance to the camera" wasn’t a good heuristic for whether a light should be rendered in a scene. It allowed lights to be rendered that were near the camera but couldn’t be seen because they were behind a closed door in another room. At the same time, it wouldn't render a light that was located in a room visible through an open door but beyond the culling distance.
I needed a better approach for determining which lights should be rendered and which shouldn't.
Improved Implementation: Room-based Culling
There are many different techniques for occlusion culling. Some are rather simple, while others are incredibly complex. Some only handle static objects, while others support dynamic ones. Some only work well with confined scenes, while others work well with any type of scene structure. I was looking for a method that would be relatively quick to implement but would still yield solid results.
I knew that portal rendering was a technique that would work well for Doors & Corners. The lights were mostly static and the levels were composed of a series of generally small rooms connected by doors. Additionally, it was a relatively simple technique to implement.
However, as I began to think more about the specific details of Doors & Corners, I realized that it had some constraints which allowed me to implement an even simpler and more performant solution. These constraints were:
- Each scene was composed of a set of rooms.
- Every object in a scene had to exist in a room.
- Rooms were connected to other rooms by doors.
- An output of the level generation process was a graph of rooms connected by doors.
- It was possible to calculate which room an object was in using only its worldspace coordinates.
- Doors were always closed, unless opened by a character.
- An open door would close automatically after a (relatively short) amount of time.
This meant that the algorithm could be as simple as:
- At the start of each frame, build a hashset of currently visible rooms. This could be done by starting with the room the camera was currently in and performing a breadth first search of the room graph, adding all rooms connected via an open door to the hashset of visible rooms.
- For each cullable object, determine whether it should be rendered based on whether the room it was in was present in the hashset of visible rooms.
Traversing the graph was fast because of the nature of the data. Most doors were closed most of the time, so the search terminated very quickly. The culling test was fast because it was just checking whether a value was present in a hashset. Furthermore, for objects that were static, it only needed to calculate which room they were in once, when the level was generated. Dynamic objects were supported by this solution because their room could quickly be determined using only their worldspace position, and this calculation was cheap enough to do every frame.
Results
Using the new room-graph based culling system, the frame time had improved to 10ms (100 FPS). This represented a 69.7% reduction in frame time over the implementation without culling. This improvement was sufficient for me to be comfortable with the performance implications for lower-spec machines.
The culling solution itself is straightforward to implement, handles both static and dynamic objects, and is always correct (i.e., it will never cull an object that should be visible).
Conclusion
My main takeaway from this effort was that it underscored the importance of analyzing the specific data and constraints of a game when building custom solutions for features in it, as these may create opportunities to use approaches which may be easier to implement or be more performant than standard, well-known techniques. While my approach to culling is not particularly sophisticated and is not a generally applicable solution, it leverages the unique constraints of Doors & Corners to provide a solution that strikes the right balance between ease of implementation and performance.