I just released version 0.9.13 of my Stage3D engine. Meanwhile is ND2D in a very good and stable state. All features that I planned to integrate, are implemented and working. It’s very close to v 1.0. So it’s about time to have a little detailed »best practice and how to« post. This post is meant for the traditional flash developer who has never touched a GPU (The processor on your graphics card) accelerated environment. There are significant differences in this GPU powered world and you have to think and prepare your assets in a different way, than you used to. Let’s start:
What is ND2D?
ND2D is a GPU accelerated 2D game engine, that makes use of the new Stage3D features introduced in Flash Player 11 (Also known as Molehill). It has nothing to do with the traditional flash display list and runs on a different “layer”, behind all flash content. If you want to get a little low level knowledge, read Thibault’s article here. Using the GPU, the flash player is able to render full screen HD content at 60hz… Finally a dream comes true. Of course Stage3D is mainly focused on 3D, but we can make good use of the hardware for a 2D engine as well and speed things up a lot.
A GPU Environment
First of all, let’s try to understand a little, how 2D rendering on a GPU works. Actually, the GPU can only deal with 3D data. To render 2D, we just don’t use the third dimension. So you could call ND2D a “planes-in-3D-space-engine” if you like.
Unfortunately, the GPU can only deal with triangles (A triangle is also called a polygon in the 3D world). To render a sprite, we need construct a quad out of two triangles like this:
Next we have to specify, which part of our bitmap is mapped to which corner of our quad. This is called UV mapping. As you see in the picture above, the top left corner has a UV coordinate of (0, 0), which is the top left pixel of our bitmap. The lower right corner UV(1,1) is of course the lower rightmost pixel of our image. The GPU interpolates between these coordinates and know’s which pixel to choose for a UV(0.5, 0.5) coordinate (If our image is 128×128 px, it chooses the pixel 64,64, this is called sampling). One important thing is, that the GPU can only handle textures sizes, that are a power of 2 (32×32, 64×32, 128×128, 256×64, etc.). In the above example, a lot of space and therefor texture memory is wasted, because ND2D has to blow up the 68×68 sized PNG of the little bacteria and create a 128×128 texture. So keep the power of two (2^n) in mind, when exporting your images. Later we’ll get to know the TextureAtlas and it’s tools, which will take take of the unused space problem automatically.
So we need to pass all this information to the GPU: A quad/triangle definition, UV coordinates, the bitmap (on the GPU it’s called a texture). All of this is done internally in ND2D. You only have to deal with these low level details, if you want to create own objects or write your own materials and shaders.
The display hierarchy and it’s limitations
To mimic the displaylist, ND2D has a similar hierarchy compared to the flash displaylist. It feels very similar, albeit there are significant differences we’ll get to know now. Everything in ND2D is a Node2D which can have a number of childs, just like in your normal flash display list. The drawing is done from back to front of course. The draw loop starts with the topmost parent and continues with the childs. This is no different to flash’s displaylist.
One thing that’s very important to know, basically the most important thing when you’re dealing with a GPU environment is »how« things are sent to the GPU and being drawn. Keep this in mind, this is the bottleneck and the reason for low speed in your game: We have to try to sent as less data to the GPU and call as less methods as possible! Unfortunately an engine like ND2D or any other engine can’t automate this process. Let me give you an example:
You’re building a game where you have hundreds or even thousands of fluffy little bunnies on the screen. If you now would create 1000 Sprite2D’s, ND2D has to send 2000 triangles and 1000 textures to the GPU and the GPU would have to draw them one by one, which would be just very slow. This might be slower that a traditional blitting approach. But don’t give up so fast: There is batching. The GPU has methods, that allow ND2D to sent the data for 1000 sprites as one single data package instead of 1000 little one’s. The downside is, that the texture of all these 1000 sprites has to be the same. That’s the limitation: Batching is only possible, if the texture of the batched nodes is the same! Good for us, if we want to display 1000 bunnies that all look the same, but what if we have lot’s of different looking bunnies we want to display? We can’t get back to rendering them all one by one, this would be slow…
TextureAtlases / SpriteSheets
Behold! There’s always a solution and this is called a TextureAtlas. When the limitation is, that all sprites have to have the same texture, then why not just put all graphics we have in one bigger texture:
By changing the UV coordinates for each sprite, we can specify which part of the texture should be drawn for our sprite. There are a few good tools, that help you to generate a TextureAtlas (A bitmap that has a size of 2^n). You don’t have to do this by hand. ND2D currently supports these tools:
- TexturePacker (cocos2d + cocos2d-0.99.4 format)
- Zwoptex App (zwoptex-default format)
This is the main difference to traditional flash. Instead of getting your assets one by one from a library, you “bake” them all in a big PNG. And that’s the way you should go. If, for some reason, you need a dynamic approach and generate this atlas on the fly, you can check out the “nd2d-dynatlas” extension built by wjammal (thanks mate!).
Using a batch
ND2D provides two different kind of batches: The Sprite2DCloud and the Sprite2DBatch (I’ll explain the differences later). You just create a batch, pass it the TextureAtlas and the Texture2D and start to add children’s:
var atlasTex:Texture2D = Texture2D.textureFromBitmapData(new textureAtlasBitmap().bitmapData);
var atlas:TextureAtlas = new TextureAtlas(atlasTex.bitmapWidth, atlasTex.bitmapHeight, new XML(new textureAtlasXML()), TextureAtlas.XML_FORMAT_ZWOPTEX, 5, false);
batch = new Sprite2DBatch(atlasTex);
s = new Sprite2D();
As you can see, you have to add an empty Sprite2D to the batch. After adding the child to the batch, the batch passes a copy of the TextureAtlas to the sprite. Then you’re able to set individual frames or animations on that sprite:
To stop any confusion: A TextureAtlas sometimes is called a SpriteSheet and vice versa. In ND2D, a TextureAtlas means a bitmap containing packed images like in the screenshot above, plus an XML definition that defines the UV coordinates for each sprite. The simpler version is a SpriteSheet, which just contains images of equal sizes and doesn’t need an XML. You can create SpriteSheets with tools like SWFSheet by Keith Peiters.
In an ideal world, you would place all your graphics in one big TextureAtlas and work with just one batch. In reality it’s not always possible. The size of a texture is limited (2048 x 2048) and you sometimes can’t squeeze all your graphics and animations into it. You might need a second batch with a second texture. You can’t nest batches and since we live in a hierarchical world, you have to keep in mind, that one batch and all of it’s children will be drawn before the other! So one batch could deal with all background and level assets, while the upper batch renders the characters and other foreground graphics.
I said, I’ll explain the difference between a Sprite2DCloud and a Sprite2DBatch and here we go. I won’t get into technical details here, but there a basically two different methods for batching data. For those who are interested: ND2D – speeding up the engine.
The Sprite2DCloud does more computation on the CPU and delivers a complete package to the GPU, while the Sprite2DBatch receives “chunks” of data and processes it on the GPU:
Sprite2DCloud: Higher CPU load, lower GPU usage
Sprite2DBatch: Lower CPU load, higher GPU usage
On a desktop machine with a decent CPU, the cloud will be faster. On machines with a slower CPU or on mobile systems, the batch could be faster. So, I’m afraid it’s up to you to choose which batching method you’d like to use. One more important thing I have to say about the differences: Due to technical limitations (and speed optimizations) the cloud can just render it’s own children and won’t render the children’s children, while the batch will render the full display list tree. No limitations there. I’d always vote for the batch, even though it’s a bit slower on a desktop machine, but still powerful enough for our fluffy bunny horde.
There are other objects in ND2D that are fully calculated on the GPU. For example the ParticleSystem2D. Get into detail here.
I mentioned the word »mobile« quite a few times and you might ask, when Stage3D for mobile will be available. I can’t say when it’s public, but as you know, Adobe is working hard on it. All I can say, is that ND2D is already ready for mobile. MultiTouchEvent’s are integrated and a new compressed texture format (ATF) also, which will be released with Stage3D for mobile as well (hopefully).
I hope this post was somehow useful to you and helps you to get started in this new accelerated world. If you have any questions, don’t hesitate to ask them. ND2D has also a forum where a lot of questions have been answered.