Worms: A Space Oddity
A classic hit computer game originally created by Team 17 was brought to the mobile world by THQ. The beauty of Worms comes not just from the gameplay but from the fun graphics and animations as well as the fully deformable terrain. This means that any portion of the ground the worms are standing (crawling?) on can be destroyed by the various weapons in the game (bazooka, shotgun, holy hand grenade or, my favourite, the sheep).
All worms games have implemented parallax scrolling and deformable terrain, and the mobile version was no different. THQ Wireless did a great job bringing the fun of Worms to mobile. However, while the implementation they used for the game engine worked on the high-end J2ME phones it was designed for, the BlackBerry presented some challenges. Magmic was (for a time) the exclusive partner with THQ to bring their games to BlackBerry and we wanted to replicate the great feel of the mobile game on BlackBerry. In order to accomplish this, the development team working on the game spent a lot of time (and I mean a LOT of time) working through various optimizations in the game including implementing a fairly complex buffering solution that leverages many of the strategies my previous articles this week have described.
The slides and the animations on them are really important to understanding the various sections of this Case Study. If you do not download them and follow along, you will probably not have an easy time understanding how some of these optimizations affected the game.
Problem: Very Slow Frame Rate
The initial implementation of the game (using the same game engine design as the J2ME version) worked on BlackBerry but only in the most basic sense of the word.
The frame rate was basically un-usable. There are three main game situations and all three need to have sufficient frame rates to make the game fun. Throughout this article we’ll track each of these to show the improvements that were made.
|Stationary Camera||3-5 fps||Situation occurs when the player is not scrolling the screen|
|Moving Camera||1-2 fps||Looking around the board or switching focus between worms|
|Explosions||0 fps||BOOM goes the dynamite!|
The best of the three frame rates (fps) above is when basically nothing is happening. In an idle state, the game is still only able to max out at 5 fps. When explosions were happening, it was taking so much work to draw the screen that less than 1 frame per second could be shown. Usually explosions were causing the terrain to deform as well.
Why was the frame rate so bad?
There were four main reasons why the performance was so bad:
- Large Level Maps. The levels on the order of 1024×512 pixels. The BlackBerries of the day used QVGA (320×240) resolution meaning the levels were almost 7 times the size of the screen.
- Repainting Every Tick. The game was designed to run with a core game loop running at a fixed rate processing the game engine updates and input. At the end of each loop (tick) the screen was being repainted — whether there were any updates needed or not.
- Graphics.drawRGB(). The large images and map data were being drawn as straight up ARGB pixel data (basically an array of colour values) vs. using one of the built-in image format containers (i.e. Bitmap).
- Excessive Map Updates. Every time the terrain needed to be modified, the game engine would loop through the entire set of data for the terrain to check for and make modifications.
What can we do to improve this performance? Can we use buffering to make a difference?
Stage 1: Basic Optimizations & Full Screen Buffer
Buffering paints is not the only strategy for improving performance so the first step was to sweep through the game and see what non-buffering optimizations we could do.
- Look-up Tables for Trigonometric Functions: There is a lot of trigonometry used in Worms to calculate angles and explosion deformation impacts. By using pre-calculated lookup tables for approximate values for the basic trig functions (sin, cos, tan) instead of calculating the exact values we were able to dramatically speed the time each trig call took (array lookup vs. math calculation). The trade-off was that the resulting values were slightly less accurate but in practise, this did not impact the behaviour in the game.
- Localize Painting to Viewport: By re-arranging some of the painting code and adding in some bounds checking, we eliminated a bunch of work that was done on objects not visible in the viewport (on-screen). By grouping related objects locally (as in their relative locations to each other) you can optimize the bounds checking as well.
- Remove some Detail: There were some minor animations and graphics that were not integral to the gameplay and added simply as finishing touches on platforms that could support them. By removing them, we reduced the required number of re-paints as well as the work to paint the screen. At the expense of a small decrease in production value, we made significant improvements in the performance.
Of course, the easiest and most basic buffering type to implement and evaluate the effect is the full screen image. We created one buffer and rendered all content to it before rendering it on-screen. Drawing to an off-screen buffer is measurably faster than drawing directly to screen so the performance improvements here were twofold: first we buffered static content and second we were drawing off-screen for the majority of the paint calls.
The results of these changes were already impressive:
|Stationary Camera||17 fps||Major improvement here to the point where the frame rate was more than acceptable.|
|Moving Camera||8-12 fps||Significant improvements here but the frame rate variance created choppiness while scrolling.|
|Explosions||0-3 fps||At least you got to see a few frames of animation, but still really bad.|
But the explosions were still slow and the camera moving was choppy; we were sure we could do better…
Stage 2: More Optimizations & Segmented Buffer
We knew there was more we could do to manipulate the data for the level as well as take advantage of load times to pre-render as much of the level as possible. The biggest change now was to implement a segmented buffer without having to re-write all the code in the game to deal with it.
A segmented buffer for the entire level will need to cover 1024 x 512 pixels. Since images much larger than the BlackBerry screen are not handled well by the BlackBerry OS (or weren’t in 2008) we broke the buffer up into a grid of 6 buffers: 3 wide by 2 high (about 171 x 256 each). To ease integration with the rendering code that expected to be drawing directly to the screen, we abstracted the buffers into SegmentedImage and SegmentedGraphics classes. These classes mirrored the required APIs of the Bitmap and Graphics classes (respectively) that are in the RIM API but offset the calls appropriately to make sure the rendering happened on the correct buffer segments.
We left the background image off this segmented buffer so that we could easily maintain the parallax scrolling effect by drawing the background image (offset appropriately for the viewport) and then the segmented buffer to the screen. The transparent areas on the segmented buffer allowed the background image to show through.
Note: Due to the way parallax scrolling works, objects in the background do not move as much, relative to the objects in the foreground, when the viewport scrolls. The greater the delta between the movement of objects in the foreground and background, the further away the background appears. Due to this behaviour, the background image was not nearly 1024×512 in size and so did not need to be separately segmented or buffered, it could be used directly.
The biggest trick in creating a usable segmented buffer for the deformable terrain is in dealing with transparency. When you create a Bitmap on BlackBerry, by default, the image has a white opaque background. In order to change the alpha level on the pixels in the Bitmap, we need to programmatically set the ARGB data on the newly created Bitmap for each pixel (making sure the AA values in the AARRGGBB formatted data are transparent). Unfortunately, even when you do this, you can’t then just get the Bitmap’s Graphics object and start making drawBitmap calls on it because the default behaviour of the drawBitmap method is to use the alpha values set in the destination Bitmap not the source Bitmap leaving all your newly modified pixels still transparent.
To get around this, instead of drawBitmap, we needed to call drawRGB and set the pixel data directly. Normally, this is a major performance hit (drawBitmap is many times faster than drawRGB) but since we did not need to modify large volumes of the pixel data frequently, the performance hit’s impact was mainly limited to the initial buffer rendering.
The reason we did not need to modify a lot of the pixel data regularly, was that after the terrain was fully rendered on the buffer, the only time it changes is when there’s an explosion (or some other action by the player to destroy some of the terrain). Originally, every explosion and terrain deformation triggered a sweep through all of the terrain data to update pixels as needed. We decided to keep track of the maximum area that an explosion would impact (calculated dynamically at the moment of impact for whatever was exploding) and limit changes to that area of the buffer. This allowed us to leverage the large segmented buffer, with its transparency performance limitations, and use the drawRGB method for updating the buffer. The negative performance of drawRGB vs drawBitmap was mitigated by the limitation of the region we needed to update to the minimal area possible. We were also able to limit the inspection of the pixel data for the terrain (to determine what actions to take based on the existence or not of terrain in a pixel) to the minimal area needed for an explosion vs. the entire map.
|Stationary Camera||12 fps||We ended up doing more work with a stationary camera so lost some of the performance here.|
|Moving Camera||12 fps||The moving camera normalized out to a constant frame rate which eliminated the choppy frames.|
|Explosions||8-10 fps||A major improvement during explosions to the point where we were satisfied with achieving about 10 fps (our minimum goal for the explosions).|
The game was looking very good but could we possibly squeeze some extra performance our of our code?
Stage 3: Double Buffering
The performance was looking good but painting a subsection of a 1024 x 512 segmented buffer still takes some time. If the viewport overlaps 4-6 of the segments, then we have the overhead of figuring out which segments to paint and then individually painting them each to screen. In addition, the background image is larger than the screen size so there’s some overhead in painting that as well. If we’re not moving around, why not just cache the current screen with all the layers merged?
An extra layer of buffering was added to implement double buffering. Normally, double buffering would be painting everything to an off-screen buffer and then painting that buffer to screen, but since we already have off-screen buffering, I guess this is triple buffering =).
It’s faster to render to an off-screen buffer than it is directly to the screen so that’s just what we did. We created a full screen image and did all rendering to it. We kept track of its dirty state and re-painted as needed just like the very first example in the first article on buffering.
|Stationary Camera||14-16 fps||A 16-33% improvement just by adding the extra buffer!|
|Moving Camera||12-14 fps||Even the moving camera benefitted from the extra buffer by up to 16%!|
|Explosions||8-10 fps||Since the explosions have so much other work going on, the benefits of the extra buffer were not apparent.|
It took a lot of work and a lot of experimentation to find the right combination of optimizations to do in this game but the results are very impressive.
|Stationary Camera||3-5 fps||14-16 fps|
|Moving Camera||1-2 fps||12-14 fps|
|Explosions||0 fps||8-10 fps|
It wasn’t just buffering that helped make this game faster but the majority of the performance improvement was through optimizing the painting and adding the buffering.
This was the most complex buffering we did on a game during my tenure leading the BlackBerry development efforts at Magmic. As you can see, squeezing top performance out of graphics rendering on BlackBerry often requires a combination of strategies and to sometimes think outside the box. The benefit of a lot of available memory on BlackBerry really helps open the toolbox for developers looking to buffer graphics operations. Not all devices can support the full complement of buffering strategies (or at least not simultaneously!) but BlackBerry definitely can. This is a great example of why you need to develop your code and even game architecture differently depending on the platform you’re targeting. Simple porting doesn’t always work, often there are significant code changes required.
I hope you have gotten some benefit from this discussion on buffering. I know we at Magmic worked hard to develop many solutions for BlackBerry game development and buffering was just one (and one that I could have spent twice as long discussing!) of the great re-usable technology developments that came out of the many years of work we did.
Newer BlackBerries have mitigated some of these issues but the platform still have more memory that processor power relative to many of its counterparts — and you can use that to your advantage when developing applications on BlackBerry even if they’re not as complex as this example.
Note: when I originally started writing these articles I did not expect to think of so much useful information that was not covered in the DEVCON presentation. Please feel free to contact me with questions and I’ll answer them or clarify anything that’s not clear.
(Thanks to Simon Dale for working on the original presentation with me as well as reviewing my work here to make sure I didn’t miss anything!)