Header image

I just released version 0.9.13 of my Stage3D engine. Meanwhile is ND2D in a very good and stable state. All features that I planned to integrate, are implemented and working. It’s very close to v 1.0. So it’s about time to have a little detailed »best practice and how to« post. This post is meant for the traditional flash developer who has never touched a GPU (The processor on your graphics card) accelerated environment. There are significant differences in this GPU powered world and you have to think and prepare your assets in a different way, than you used to. Let’s start:

What is ND2D?

ND2D is a GPU accelerated 2D game engine, that makes use of the new Stage3D features introduced in Flash Player 11 (Also known as Molehill). It has nothing to do with the traditional flash display list and runs on a different “layer”, behind all flash content. If you want to get a little low level knowledge, read Thibault’s article here. Using the GPU, the flash player is able to render full screen HD content at 60hz… Finally a dream comes true. Of course Stage3D is mainly focused on 3D, but we can make good use of the hardware for a 2D engine as well and speed things up a lot.

A GPU Environment

First of all, let’s try to understand a little, how 2D rendering on a GPU works. Actually, the GPU can only deal with 3D data. To render 2D, we just don’t use the third dimension. So you could call ND2D a “planes-in-3D-space-engine” if you like.

Unfortunately, the GPU can only deal with triangles (A triangle is also called a polygon in the 3D world). To render a sprite, we need construct a quad out of two triangles like this:

Next we have to specify, which part of our bitmap is mapped to which corner of our quad. This is called UV mapping. As you see in the picture above, the top left corner has a UV coordinate of (0, 0), which is the top left pixel of our bitmap. The lower right corner UV(1,1) is of course the lower rightmost pixel of our image. The GPU interpolates between these coordinates and know’s which pixel to choose for a UV(0.5, 0.5) coordinate (If our image is 128×128 px, it chooses the pixel 64,64, this is called sampling). One important thing is, that the GPU can only handle textures sizes, that are a power of 2 (32×32, 64×32, 128×128, 256×64, etc.). In the above example, a lot of space and therefor texture memory is wasted, because ND2D has to blow up the 68×68 sized PNG of the little bacteria and create a 128×128 texture. So keep the power of two (2^n) in mind, when exporting your images. Later we’ll get to know the TextureAtlas and it’s tools, which will take take of the unused space problem automatically.

So we need to pass all this information to the GPU: A quad/triangle definition, UV coordinates, the bitmap (on the GPU it’s called a texture). All of this is done internally in ND2D. You only have to deal with these low level details, if you want to create own objects or write your own materials and shaders.

The display hierarchy and it’s limitations

To mimic the displaylist, ND2D has a similar hierarchy compared to the flash displaylist. It feels very similar, albeit there are significant differences we’ll get to know now. Everything in ND2D is a Node2D which can have a number of childs, just like in your normal flash display list. The drawing is done from back to front of course. The draw loop starts with the topmost parent and continues with the childs. This is no different to flash’s displaylist.

One thing that’s very important to know, basically the most important thing when you’re dealing with a GPU environment is »how« things are sent to the GPU and being drawn. Keep this in mind, this is the bottleneck and the reason for low speed in your game: We have to try to sent as less data to the GPU and call as less methods as possible! Unfortunately an engine like ND2D or any other engine can’t automate this process. Let me give you an example:

You’re building a game where you have hundreds or even thousands of fluffy little bunnies on the screen. If you now would create 1000 Sprite2D’s, ND2D has to send 2000 triangles and 1000 textures to the GPU and the GPU would have to draw them one by one, which would be just very slow. This might be slower that a traditional blitting approach. But don’t give up so fast: There is batching. The GPU has methods, that allow ND2D to sent the data for 1000 sprites as one single data package instead of 1000 little one’s. The downside is, that the texture of all these 1000 sprites has to be the same. That’s the limitation: Batching is only possible, if the texture of the batched nodes is the same! Good for us, if we want to display 1000 bunnies that all look the same, but what if we have lot’s of different looking bunnies we want to display? We can’t get back to rendering them all one by one, this would be slow…

TextureAtlases / SpriteSheets

Behold! There’s always a solution and this is called a TextureAtlas. When the limitation is, that all sprites have to have the same texture, then why not just put all graphics we have in one bigger texture:

By changing the UV coordinates for each sprite, we can specify which part of the texture should be drawn for our sprite. There are a few good tools, that help you to generate a TextureAtlas (A bitmap that has a size of 2^n). You don’t have to do this by hand. ND2D currently supports these tools:

- TexturePacker (cocos2d + cocos2d-0.99.4 format)
- Zwoptex App (zwoptex-default format)

This is the main difference to traditional flash. Instead of getting your assets one by one from a library, you “bake” them all in a big PNG. And that’s the way you should go. If, for some reason, you need a dynamic approach and generate this atlas on the fly, you can check out the “nd2d-dynatlas” extension built by wjammal (thanks mate!).

Using a batch

ND2D provides two different kind of batches: The Sprite2DCloud and the Sprite2DBatch (I’ll explain the differences later). You just create a batch, pass it the TextureAtlas and the Texture2D and start to add children’s:

var atlasTex:Texture2D = Texture2D.textureFromBitmapData(new textureAtlasBitmap().bitmapData);
var atlas:TextureAtlas = new TextureAtlas(atlasTex.bitmapWidth, atlasTex.bitmapHeight, new XML(new textureAtlasXML()), TextureAtlas.XML_FORMAT_ZWOPTEX, 5, false);
 
batch = new Sprite2DBatch(atlasTex);
batch.setSpriteSheet(atlas);
 
s = new Sprite2D();
batch.addChild(s);

As you can see, you have to add an empty Sprite2D to the batch. After adding the child to the batch, the batch passes a copy of the TextureAtlas to the sprite. Then you’re able to set individual frames or animations on that sprite:

s.spriteSheet.playAnimation("walkingBunny");

To stop any confusion: A TextureAtlas sometimes is called a SpriteSheet and vice versa. In ND2D, a TextureAtlas means a bitmap containing packed images like in the screenshot above, plus an XML definition that defines the UV coordinates for each sprite. The simpler version is a SpriteSheet, which just contains images of equal sizes and doesn’t need an XML. You can create SpriteSheets with tools like SWFSheet by Keith Peiters.

Performance

In an ideal world, you would place all your graphics in one big TextureAtlas and work with just one batch. In reality it’s not always possible. The size of a texture is limited (On desktop 4096 x 4096 / 2048 x 2048 on mobile) and you sometimes can’t squeeze all your graphics and animations into it. You might need a second batch with a second texture. You can’t nest batches and since we live in a hierarchical world, you have to keep in mind, that one batch and all of it’s children will be drawn before the other! So one batch could deal with all background and level assets, while the upper batch renders the characters and other foreground graphics.

I said, I’ll explain the difference between a Sprite2DCloud and a Sprite2DBatch and here we go. I won’t get into technical details here, but there a basically two different methods for batching data. For those who are interested: ND2D – speeding up the engine.

The Sprite2DCloud does more computation on the CPU and delivers a complete package to the GPU, while the Sprite2DBatch receives “chunks” of data and processes it on the GPU:

Sprite2DCloud: Higher CPU load, lower GPU usage
Sprite2DBatch: Lower CPU load, higher GPU usage

On a desktop machine with a decent CPU, the cloud will be faster. On machines with a slower CPU or on mobile systems, the batch could be faster. So, I’m afraid it’s up to you to choose which batching method you’d like to use. One more important thing I have to say about the differences: Due to technical limitations (and speed optimizations) the cloud can just render it’s own children and won’t render the children’s children, while the batch will render the full display list tree. No limitations there. I’d always vote for the batch, even though it’s a bit slower on a desktop machine, but still powerful enough for our fluffy bunny horde.

There are other objects in ND2D that are fully calculated on the GPU. For example the ParticleSystem2D. Get into detail here.

Outlook

I mentioned the word »mobile« quite a few times and you might ask, when Stage3D for mobile will be available. I can’t say when it’s public, but as you know, Adobe is working hard on it. All I can say, is that ND2D is already ready for mobile. MultiTouchEvent’s are integrated and a new compressed texture format (ATF) also, which will be released with Stage3D for mobile as well (hopefully).

I hope this post was somehow useful to you and helps you to get started in this new accelerated world. If you have any questions, don’t hesitate to ask them. ND2D has also a forum where a lot of questions have been answered.

Resources

In my current client project, we’re developing an AIR application targeted for iOS (Android will follow) and we wanted to make use of some iOS SDK features, so I had to write my first NativeExtension. Developing the Objective C part is pretty straight forward (If you know C++ and Objective C) and so is the Actionscript part. There are some good examples and tutorials on the Adobe site about all kind of extensions.

The hard part was to get this thing to work. So I just wanted to share my settings here. This might become useful, if you’re starting to develop your first ANE. I had strange crashes when I packaged the app with my ANE and I couldn’t figure out what was wrong. The app just crashed everytime I launched it on the device. The crashlog wasn’t very helpful. After quite a search, I found out, that I didn’t set an apparently important compiler flag for the LLVM compiler in my XCode project. So, be sure to set:

Enable Linking With Shared Libraries: No

And if you want to get rid of the warnings:

Warnings: Missing Function Prototypes: No

The second part was packaging the ANE correctly. The working command for my case is:

adt -package -target ane MyExtension.ane extension.xml -swc MyExtension.swc -platform iPhone-ARM library.swf libMyNativeExtensionIOS.a

The annoying thing about packaging the ANE is, that after you have built your swc, you have to extract the library.swf out of it (By renaming it to .zip and extracting the swf). So you need both, the swc AND the swf. I didn’t write an ANT task to do automate the process until now and I don’t know the reason for this strange step, since the ADT compiler has everything it needs within the swc. Only Adobe knows ;)

Obviously you can not test on the device everytime, because the deployment process to iOS is more or less manual and just takes too long at the moment. I found out, that I could link the ANE as a regular library (SWC) in my Flash Builder project and launch the app on my desktop machine. When the native extension tries to create the context on the desktop machine, it fails and returns null, because it was just built for the iOS platform:

context = ExtensionContext.createExtensionContext(EXTENSION_ID, null);

So I could implement a fallback for the extention when running on the desktop that mocked the behaviour in AS3. To package the application for iOS, I wrote a small ANT task. This way we can easily test on the device and have a fallback, when testing 0n the desktop without writing desktop extensions as well.

So, maybe someone will find this useful…

ND2D – Stage3D Masks

September 2nd, 2011 | Posted by lars in Actionscript | Molehill / Stage3D | ND2D | Source - (6 Comments)

Another feature I really wanted to implement in ND2D were masks. Just like the setMask() method in flash. In Stage3D (OpenGL), there is no such thing as a mask. You can display textured triangles, that’s it, but you know that nearly everything is possible with a pixel shader. So let’s start:

The idea of masking in a fragment shader is to grab the pixel color of your texture, then grab the pixel color of your mask, multiply the two colors and display the result. But how do we find the correct pixel in the mask? Our task is to find the right UV coordinates for the mask texture.

If you look at the above image, the mask is rotated and overlaps the sprite we want to mask. How do we find the correct pixel (UV coordinate) of the mask, that overlaps this orange pixel in the sprite? Somehow we have to map the position of the pixel in the sprite to the pixel in the mask and we can do that by transforming it between the different coordinate systems. In a vertex shader we calculate the final pixel positon from local space to world space. The idea is to map this pixel in world space back to the local coordinate system of the mask. This way it’s pretty easy to find the correct UV coordinates. Let’s do a simple actionscript test:

// this is the top right corner of our sprite quad.
var v:Vector3D = new Vector3D(128, -128, 0, 1);
 
// this is the sprites matrix, translated a bit
var clipSpaceMatrix:Matrix3D = new Matrix3D();
clipSpaceMatrix.appendTranslation(100, 0, 0);
// this is the masks matrix, it's in the same position as the sprite
var maskClipSpaceMatrix:Matrix3D = new Matrix3D();
maskClipSpaceMatrix.appendTranslation(100, 0, 0);
// this is the masks size
var maskBitmap:Rectangle = new Rectangle(0, 0, 256, 256);
 
// invert the matrix, because we want to map back from world space to local mask space
maskClipSpaceMatrix.invert();
 
// transform our vertex from local sprite space to world space
v = clipSpaceMatrix.transformVector(v);
[trace] moved to clipspace: Vector3D(228, -128, 0)
 
// transform world space vertex back to local mask space
// the result is the same vector of course, because the positions of mask and sprite are equal
v = maskClipSpaceMatrix.transformVector(v);
[trace] moved to local mask space: Vector3D(128, -128, 0)
 
// calculate the uv coordinates from the local pixel position
v = new Vector3D((v.x + (maskBitmap.width * 0.5)) / maskBitmap.width,
                 (v.y + (maskBitmap.height * 0.5)) / maskBitmap.height,
                  0.0, 1.0);
 
// the result is what we expect, the top right uv coordinate:
[trace] local mask uv: Vector3D(1, 0, 0)

Porting this idea to a shader is pretty straight forward. Let’s code a PB3D Material Shader:

void evaluateVertex()
{
     interpolatedUV = float4(uvCoord.x + uvOffset.x, uvCoord.y + uvOffset.y, 0.0, 0.0);
 
     float4 worldSpacePos = float4(vertexPos.x, vertexPos.y, 0.0, 1.0) * objectToClipSpaceTransform;
     // maskObjectToClipSpaceTransform is the invertex clipspace matrix of the mask
     float4 localMaskSpacePos = worldSpacePos * maskObjectToClipSpaceTransform;
 
     // halfMaskSize.xy is maskBitmap.width/height * 0.5 passed as a parameter
     // invertedMaskSize.xy = 1.0 / maskBitmap.width/height passed as a parameter, because divisions are not properly working in the current pb3d release
     interpolatedMaskUV = float4((localMaskSpacePos.x + halfMaskSize.x) * invertedMaskSize.x,
                                 (localMaskSpacePos.y + halfMaskSize.y) * invertedMaskSize.y,
                                  0.0, 0.0);
}
 
void evaluateFragment()
{
    float4 texel = sample(textureImage, float2(interpolatedUV.x, interpolatedUV.y), PB3D_2D | PB3D_MIPNEAREST | PB3D_CLAMP);
    float4 texel2 = sample(textureMaskImage, float2(interpolatedMaskUV.x, interpolatedMaskUV.y), PB3D_2D | PB3D_MIPNEAREST | PB3D_CLAMP);
 
    result = float4(texel.r * color.r * texel2.r,
                    texel.g * color.g * texel2.g,
                    texel.b * color.b * texel2.b,
                    texel.a * color.a * texel2.a);
}

If you don’t want to use PixelBender3D and like to ‘torture’ yourself with AGAL, you can write the same shader this way:

/*
vertex shader:
 
vc0-vc3 = clipspace matrix of sprite
vc4-vc7 = inverted clipspace matrix of mask
vc8.xy = half mask width / height
vc8.zw = mask width / height
va0 = vertex
va1 = uv
*/
 
m44 vt0, va0, vc0           // vertex * clipspace
m44 vt1, vt0, vc4           // clipspace to local pos in mask
add vt1.xy, vt1.xy, vc8.xy  // add half masksize to local pos
div vt1.xy, vt1.xy, vc8.zw  // local pos / masksize
mov v0, va1                 // copy uv
mov v1, vt1                 // copy mask uv
mov op, vt0                 // output position
 
/*
fragment shader:
*/
 
mov ft0, v0                                // get interpolated uv coords
tex ft1, ft0, fs0 <2d,clamp,linear,nomip>  // sample texture
mov ft2, v1                                // get interpolated uv coords for mask
tex ft3, ft2, fs1 <2d,clamp,linear,nomip>  // sample mask
mul ft1, ft1, ft3                          // mult mask color with tex color
mov oc, ft1                                // output color

The result is visible here: ND2D – alpha masks (Move your mouse over the crates). I added one more feature: You can set the alpha of a mask, that means that you can specify how much the mask affects the sprite. In the demo above the alpha fades from 0.0 to 1.0. Since we’re using all four color components in our calculations (r,g,b,a), we can not only mask the alpha, but all color channels. I don’t know if this it’s a “nice thing to have” or if it will get annoying when you use sprites as masks in your game and need to provide an extra image for that. Just let me know :) Here is the example: ND2D – disco color masks.

F10 Astro Blackhole

September 30th, 2008 | Posted by lars in 3D | Actionscript | Experiments | Flash 10 | Particles | Source - (4 Comments)

The Flash 10 beta player is out for a while and I found a few minutes to try out the new native 3D effects. You can get quite nice and fast results out of the new API if you only want to display flat 2D planes in 3D-space:


(Space = fullscreen, Download source, Flash Player 10 needed)

Good news everyone! I packed everything together, cleaned up a bit and created a google.code project for my 3D engine. Since it became popular under the name ‘nulldesign’s 3d engine’ I call it ND3D from now on ;). I will post additional infos, more examples and future developments here in my blog and on the project page. So long…

I just found a nice way to automatically truncate text in datagrid columns and show tooltips for truncated elements, that I wanted to share:

package
{
	import mx.controls.dataGridClasses.DataGridItemRenderer;
 
	public class TruncateToolTipRenderer extends DataGridItemRenderer
	{
		private var textTruncated:Boolean = false;
		private var originalText:String;
 
		public function TruncateDataGridItemRenderer()
		{
			super();
		}
 
		override public function set text(value:String):void
		{
			super.text = value;
			originalText = value;
			textTruncated = truncateToFit();
		}
 
		override public function validateProperties():void
		{
			super.validateProperties();
			toolTip = textTruncated ? originalText : null;
		}
	}
}

Just set the itemRenderer property of your column to the new TruncateToolTipRenderer. Neat!

Update: … and it’s a lot easier if you just set the itemRenderer to mx.controls.Label, because a Label already has truncate and tooltip functionality. As so often in Flex, everything is built in. Lesson learned!

About some time ago I started to code my own 3D engine in flash. Derived from a small AS2 project, I challenged myself to built my own flash 3D engine. So I took out my good old Actionscript Animation Book and opened the 3D chapter. Very soon I could move a cube around. A few pages later I learned how to implement simple dynamic lighting. The next challenge was to get texture mapping to work. Since flash still can’t distort images, you need a workaround. After I found these great examples: Seb Lee-Delisle’s flash texture maps and Andre Michelle’s texture examples it was done. Somewhere inbetween I switched to AS3, which was quickly done. Meanwhile Papervision3D became very popular and I thought it didn’t make sense to continue evolving my engine. But since I came so far, I needed to find out how to implement a few effects like depth of field or additive rendering ;)

My engine shouldn’t and doesn’t compete with Papervision3D or Sandy3D, nor it has a very user-friendly API, no stunning effects or animation support, but I learned a lot while building it, understanding 3D to 2D rendering, optimizing the code for a few ms of extra speed (AS3 rocks!) or challenge problems with 3D rotations like gimbal lock and their solutions: quaternions. It’s just another 3D flash engine, at least I can say: I made it! ;)

It’s undocumented, there’s still a lot of work to do and it doesn’t have a cool name , but if you want to play around with it or just take a look how I set up this and that, feel free to download the sources (yes, the 3D ribbon example is included). And I’m always interested in what you think about it, so drop a comment or mail.

Collision Tests

December 19th, 2007 | Posted by lars in Actionscript | Experiments | Flash 9 - (4 Comments)

As you noticed, I’m building a small arcarde shooter in AS3 besides my daily work. It started as a small test and got me ;). In my previous post I used getObectsUnderPoint for a hittest between the shot and the enemyclips, this works fine if you don’t need a collision reaction and the clips travel at a moderade speed. Now I wanted to integrate a collision detection and reaction for the enemies and made use of some nice scripts I found out there:

A shape-based collision detection for bitmaps by Grant Skinner, since the third parameter in hitTestPoint(x:Number, y:Number, shapeFlag:Boolean) only works for vectorimages and I’m using bitmaps as graphics. A Proximity Manager. This small class comes really handy and the principle is so simple. The proximity manager splits the stage into grids and stores for every sprite in which grid they are. So you don’t need to test every sprite against all other sprites, instead you just say: “Give me all neighbours of that sprite”, which saves a lot of loops.

So I had the tools for a nice collision detection, for the collision reaction I converted a simple snooker algorithm to AS3: Snooker Balls. Watch the result:

Move your mouse over me!

Ok, AS3 can render thousands of particles per frame… but what if we want more than a few single moving pixels and hittests? So I started to write a small test, instancing bitmaps on random locations. And hey: These objects look like bombs, now I need something to destroy them … and there was the battleship ;). For the hittest I used getObjectsUnderPoint. It seems like this method is a bit slower than hitTest or hitTestObject, but you can actually save a lot of loops because you don’t need to test each bomb with each shot. Instead of:

// pseudo code
for each(bombs) {
    for each(shots) {
        if(bomb.hitTestObject(shot)) {
            // shot hit bomb...
        }
    }
}

you go:

// pseudo code
for each(bombs) {
    var objects:Array = shotClip.getObjectsUnderPoint(bomb.x, bomb.y);
    if(objects.length) {
          // shot hit bomb...
    }
}

This only works, if you have a seperate clip for the shots. Enough of the talk: Launch!


(Move using the mouse and rotate the gun with cursor keys)

AS3 Preloading continued

November 30th, 2007 | Posted by lars in Actionscript - (9 Comments)

I wasn’t really satisfied with the solution I used for my AS3 preloader (see post: Preloader in AS3 projects). Now Sven found THE solution. It’s in german, so I try to explain how it works:

The class “Test” acts as the factory class that’s responsible for the preloading. The Class “Demo” is the main application class. Using the compiler option -frame frameLabel Demo, the class Demo is compiled into frame 2.

Just like in Keith Peter’s Example and my try using an exclude xml, you can preload a pure AS3 application now, but in this case no flex framework classes are compiled into the resulting SWF and there is no need for an exclude xml. Finally!

AS3 Singletons, the other way

November 20th, 2007 | Posted by lars in Actionscript - (0 Comments)

Since you can’t declare a constructor private in AS3, there are a few methods out there to enforce a Singleton. A common way is to declare an internal SingletonEnforcer class outside the package: SingletonEnforcer, but the declaration of the Enforcer outside a package looks a bit “dirty” to me. Another way is to declare the singleton as a static variable, but this way you can’t decide when your singleton is instantiated.

So, why don’t just try nearly same with a local variable:

package {
    public class Singleton {
        private static var instance:Singleton;
        private static const checker:Object = {};
 
        public function Singleton(initObj:Object) {
            if(initObj != checker) {
                throw new Error("Private constructor!");
            }
        }
 
        public static function getInstance():Singleton {
            if(instance == null) {
                instance = new Singleton(checker);
            }
            return instance;
        }
    }
}

Update: I found another nice solution by GSkinner: SingletonDemo.

Preloader in AS3 projects (factoryClass)

November 20th, 2007 | Posted by lars in Actionscript - (3 Comments)

The last days I was playing around with the Flex Metadata Tag “Frame” to implement a preloader in an AS3 project. You might have read Keith Peters post concerning this issue. This method is working fine, except of the fact that I can’t get rid of the flex framework classes in my compiled SWF, which blows up the size of an nearly empty SWF to 120kb.

If you check out the comments in his post, you’ll find a few solutions, but none of them worked for me. So I generated a XML exclude File, that excludes the whole flex framework. You might want to give it a try: flex_framework_exclude.xml (compile with the option -load-externs+=flex_framework_exclude.xml). It’s still not satisfying, but it’s working…

Custom right click menus in AIR

October 31st, 2007 | Posted by lars in Actionscript | AIR - (0 Comments)

I was wondering, why DisplayObjects in AIR don’t support the MouseEvent.RIGHT_CLICK Event. The only way to implement a right-click-menu seemed to be a NativeMenu. But I needed only the event, not the ugly native-os-menu.

Here is the workaround: Just create a blank NativeMenu and listen for the display event.

var menu:NativeMenu = new NativeMenu();
menu.addEventListener(Event.DISPLAYING, onRightClick, false, 0, true);
someSprite.contextMenu = menu;

New site, new blog

October 21st, 2007 | Posted by lars in Actionscript - (2 Comments)

Good news everyone!
I removed my three year old flash site and managed to set up this blog instead. I think this is a much better place to showcase my experiments. It’s not finished yet, but I’m working hard on it to get everything done.

I will post news, talk about my flash experiments, actionscript in general and whatever pops into my mind here.
So, let’s get started!