Tuesday, 6 September 2011

Topia: Optimising for OpenGLES2.0

Note added 26/2/2014: This blog is more than two years old but i'm pretty sure that most of it is still true. I notice that it gets a lot more views than any other blog I've written, I can only assume this is because people get here from googling some of the GLES keywords. I hope it has been useful to a few people.

Back when I started on iOS development I knew nothing about Objective C, or OpenGL. I'd been writing 3D stuff for over 20 years, originally hand made software renderers before moving to D3D on Windows and a little experience of a few consoles over the years.

I paid my $99 as soon as OS2 was released, downloaded the SDK, upgraded my original iPhone and started playing around with it in my non-existent spare time (I was working for EA at the time), I didn't get very far before Ben Carter (also at EA and had worked with me at Lost Toys and Weirdwood) ported Weirdwood's old cross Platform Library to the iPhone. When I left EA a few months later I used this as the basis for Ground Effect. It was, as I think Ben would be the first to admit, a pretty quick and dirty port as he'd done it over a weekend!

It actually turned out to be a pretty good basis for Ground Effect. I barely had to touch Objective C and got away with cutting and pasting a few hundred lines of it from sample code to get Accelerometer, sound and Openfeint working. I also learned a little OpenGLES1.1 as I added support for multitexture, compressed textures and Stereo 3D but the rest of it was written using Weirdwood's old, familiar cross platform library. I even had a PC version which I was able to build the level editor inside.

At the time I had a nagging suspicion that it wasn't very optimal because all of the 3D primitives were using standard glDrawArrays, each being copied to the graphics hardware each time it was used. I checked in the Apple support forums where posts seemed to suggest that GLES1.1's vertex buffer objects wouldn't actually buy me a speedup so released it as it was and I think it was actually fairly optimal for a game targeted at first gen hardware. The same library was also used for Andrew Cakebread's Tilestorm games and everything released with it got straight through Apple approval so we must have done something right. Ground Effect ran very well on 3gs, hitting the 60FPS frame limiter most of the time so I didn't seem to have problems with the newer hardware.

But then I got my hands on the first iPad... I quickly got it running in 1024x768 but was slightly depressed to find it ran at only 13 FPS with all those extra pixels. It turned out that the new SGX devices did a pretty good job at 'emulating' GL1.1 but certain features ground it too a halt. Ground Effect made pretty extensive use of fog. Turning this off made the frame rate shoot back up to 40FPS but I didn't like the way it looked. This is why Ground Effect hasn't yet been updated for Retina or iPad.

I decided to take the plunge and add support for OpenGLES2.0 so I could do the fog myself in the shader.
This proved a much bigger task than I'd thought but didn't fit too well along side a full time job and when I got back to iOS dev full time, I was focused on other projects beside Ground Effect.

OpenGLES2.0 is easy...

Initial experiments seemed to show that the slow pixel shaders on iPad and Retina were the limiting factor to performance. In fact, I found myself telling people it was the only limiting factor and arrogantly announced to anyone that would listen that iOS devices were piss simple and the only thing that mattered was optimising pixel shaders.

I was wrong.

Well, I was was kind of right in certain circumstances... My initial test apps which used my old library with shaders hacked in and didn't do much beyond submit geometry, in this case it seemed my arrogant assertion was sort of right, as long as the CPU had nothing better to do than shunt data around for the GPU.

Armed with this 'knowedge' I set about writing the graphics engine that became Topia. As it became a game it was very much focused on iPad, initial demos running at around 25 FPS but then I got an iPad2... Shaders got cleverer, there were a few thousand creatures wandering around interacting with each other and a few thousand static trees, all reacting to changes in the landscape. As the CPU was now busy doing all of this simulation stuff the iPad1 speeds got worryingly slow (maybe 10 FPS) but it still ran at a semi playable speed on iPhone4 (20 FPS or so) as long as I disabled Retina and ran at 480x320...

Meanwhile the iPad2 version ran at a solid 30 FPS and would happily make 60FPS with the frame limiter turned off.

I started getting scared. It was time to really focus on gameplay but it was looking like I had an iPad2 (and presumably iPhone5) only game.

I really had to spend a little time actually working out what the iPad1 and Retina display could actually handle so dropped everything to do a bit of optimisation.


First thing was the Textures. My hacked together system had support for pvr compressed textures,  32 bit RGBA and nothing else. The hardware also supports 16 bit RGB, a couple of flavours of 16 bit RGBA, one 8 bit monochrome or alpha and two channel 'IA'. Rather than write my own support for all of these I  did what many sensible developers have done and used the texture system from The PowerVR SDK. This saved huge amounts of memory but provided a barely measurable performance increase.

The next step was to include support for partial texture updates. This was a rather pleasant surprise as it turns out to be very efficient, very much not like the old D3D support back in DX7 which sort of pretended to work while actually updating the entire texture. It didn't buy a huge speedup but did smooth things out as the old version had done a full texture update once per second. The frame rate didn't really change noticeably.

It was time to take the plunge and look into using those OpenGL Vertex Buffer Objects. These are designed to allow the graphics data to live on the GPU rather than need to be copied from the CPU's memory every frame. Unfortunately a hell of a lot of Topia's graphic data is generated every frame but they still theoretically help on 3gs or better hardware.

I wrote a whole new primitive system and moved every draw call in the game to use it. The static geometry was now much more efficient and the dynamic stuff was double buffered and running with the GL_DYNAMIC_DRAW hint. It was much faster but unfortunately not fast enough. iPad1 was up to at least 16 FPS, a big improvement but not slick enough for what we wanted. Switching to tripple buffered was a slight improvement but not enough.

Next I decided to attack the shaders and started loading them into PowerVR's PVRShaman shader tool. The very first thing I noticed was that the 'mix' instruction in the water fragment shader was a huge hit. I was using 'mix' to blend between the calculated colour and the global fog in all my fragment shaders. Basically doing per-pixel fog which was pretty pointless with the high polygon density. I moved light and fog calculations out to the vertex shader, reducing the pixel shader's fog and lighting to just a multiply and an add from a couple of 'varyings'.

On timing the new shaders I was safely over 20 FPS but by now I was shooting for 30. Just looking at an area of water was the worst case as the translucent water on top of the landscape shader was just too much pixel shader work. I was starting to worry that the water was going to have to become solid and Josh wouldn't be able to see his beloved sea creatures.

The water's fragment shader also had a couple of clamps in it that are vital for the water edge effect and couldn't be moved out to the vertex shader. I decided to try an old trick of using a small clamped 1D texture to achieve the same result. This worked well and got us to around 25FPS. A significant speedup but pretty useless as it wasn't 30. One cool aspect of this is that the texture lookup can do rather more than the clamps did. I was able to generate a wave that gave a much nicer water edge effect. Also, iOS shaders don't actually support 1D textures so I was able to use the other dimension to animate this wave, all while running faster than the older, simpler effect.

So, after all this work, things were better but still not close enough to 30 FPS, it sort of hit the magic 30 when looking at a bland bit of landscape zoomed in with no water visible but that isn't enough to claim 30 and it felt jerky when scrolling around.

I then broke out the XCode instruments to see what the OpenGL analysis thing thought of my code, it suggested many things, told me the CPU was still waiting for the GPU at times, asked me why the hell I was using VBOs when I could be using VAOs and whined on about redundant state changes.

I hacked away at most of these issues, the move to VAOs wasn't going to do much for Topia as it uses so few draw calls but I did it anyway as it'll prove useful in other apps. I kept on checking the frame rate but I seemed to be stuck at 25.

After all of this I was starting to conclude that 30 FPS just wasn't going to happen, I was about to change everything and target 20 FPS instead but then I rememberd something. That GL_DYNAMIC_DRAW I was using for all the dynamic geometry... Didn't I read about a third option? yeah, there in the documentation was GL_STREAM_DRAW. I hadn't used it because the notes were slightly confusing, intending to maybe try it later. I thought 'what the hell' and changed one word from DYNAMIC to STREAM and hit run...

The result was a solid 30 FPS on both iPad1 and iPhone4 in Retina.

I slept much better that night.

30 FPS in Retina!


  1. Very interesting blog article. Thanks for the info on the VAOs, I wasn't familiar with them and had a go at implementing them today.
    Do you update your dynamic vertex buffers by initially calling glBufferData to allocate them and update them with glBufferSubData, do you call glBufferData each update or do you use glMapBufferOES and memcopy the data over? I'm currently using the glSubBufferData update method as it feels the best but I haven't done any proper benchmarking as my dataset is currently quite small.

  2. I set them up with glBufferData but with a null pointer for the data as the call is more about allocating the buffer up front. I might start feeding them zeroed memory though as Instruments constantly moans because it knows there is unititialised data, even though it's the part of the buffer I'm not drawing. I'm using glSubBufferData rather than glMapBufferOES. I guess I should try the latter as it must exist for a reason...