Header image

ND2D – Blur

December 7th, 2011 | Posted by lars in Molehill / Stage3D | ND2D | Pixelshader - (6 Comments)

Good news everyone. I found a little time to implement a blur shader for ND2D and I’m trying to explain how to implement a shader like this:

First of all: How does a blur work? To blur an image, you sample neighbouring pixels of each pixel in the image and compute the average color. For example if you have a 3×3 image and the pixel in the middle is black, the rest white. You sample all neighbours of the middle pixel (r: 1.0 g: 1.0 b: 1.0) * 8 plus the pixel itself (r: 0,0 g: 0,0 b: 0,0) * 1 and compute the average (divide by 9), the resulting pixel will be (r: 0,88 g: 0,88 b: 0,88). Just do that for every pixel in the image and you’ll have a blur.

To implement this in a shader we have to consider a few things: First you want to save as many texture sampling calls as possible. For example if you want to blur your image 4 pixels horizontally and vertically, you would have to take 9 x 9 = 81 samples (4 to the left, up and down and the pixel that should be blurred itself). This is way too much and you could never squeeze this into a fragment shader with AGAL. But there is a trick: First blur your image horizontally, take the result and blur it vertically. This way, you have to take only 9 + 9 = 18 samples (see Article: Gaussian Blur Shader). Implementing it this way, means we have to do a horizontal blur, write the output to a texture and do a vertical blur with the already horizontal blurred texture. In other words, a two pass rendering. A nice sideeffect of this approach is, that we can not only blur in x AND y direction, but in x OR y individually.

So we’ve implemented our blur now and are happy that everything is blurry with a 4×4 blur, but how do we animate it now? We could generate the shader dynamically, so that we would have a different shader for different blur values, but space is limited in a fragment shader. A program can’t exceed a certain size. What if we want to have a blur of 50 x 50? We can’t write a shader that does this. The program would be just too big, since we don’t have loops in AGAL.

One part of the answer is good old: Carl Friedrich Gauß. He invented a formular a few hundred years ago, that let’s us weighten the sampled pixels (see Article: Guassian Blur and an Implementation). So our shader can remain static and sample always 9 pixels, but the gaussian function will tell us how the samples are treated. So instead of dividing all samples by 9, we have a factor for each sample. Now not only the blur is dynamic, it even looks a lot better with the gauss values than our simple “divide by 9″ approach. Neat! Now we can animate a blur from 0 to 4 pixels. That’s ok, but we wanted 50 or more, remember?

The last and final part to our full dynamic blur shader is: Just repeat what we’ve done already! If you want to have a blur of 10, just blur two times by 4 pixels, followed by a 2 pixel blur. Implementing this is also straight forward: setRenderToTexture(), renderBlur(), switchTextures(). All done in a loop.

Enough of the tech talk, here’s the result (move your mouse to blur the sprites in x and/or y):

You’ll notice the ugly edges in the middle image. This happens, if the blur is larger than the transparent space available in the texture. So the blur is “cut off”. I haven’t found a good solution for this, except of: Leave enough space in your textures if you want to blur it ;)

I never really introduced the TextureRenderer of ND2D and what possibilities you have, when using it. The TextureRenderer does what the name suggests: It renders a display object (Sprite2D, etc.) and all subsequent objects onto a Context3D texture. The cool thing is, that you are able to draw your entire scene to a (fullscreen) texture and add some post processing effects, by writing a new material / shader and displaying it via a standard Sprite2D.

Here’s the plain scene without post processing:

… and here with a small “dizzyness” post process shader:

I’ve added this test to the examples incluced in the ND2D sources. You can see the live running example here (test #18).

One of biggest challenges in modern computer graphics, still is the high cost of rendering thousands of different objects, no matter how simple they are. While developing ND2D, I’m experimenting and trying out different techniques to get a good performance.

To optimize the rendering you have to know it’s weaknesses. As a simple rule you can say: Every state change on the graphics context (Context3D) and especially the drawTriangles() call is using a lot of processing power. You’ll notice pretty fast, that if you try to render 2000 sprites (a sprite are just two textured triangles, so 4000 tri’s in total) and you’re doing a draw call for every single sprite, the overhead will be so high, that the output looks more like a slideshow than a smooth animation. The possible solution is simple: Just do as little state changes and draw calls as possible. The implementation is a bit more work…

So how do you save draw calls? The answer is geometry batching. Instead of drawing one sprite per draw call you just draw multiple sprites in a single call. To get it to work, you have to dig a bit deeper into pixel shader programming and the graphics hardware:

Single sprite per draw call:
A sprite consists of two triangles, a triangle of three vertices and each vertex has the following attributes: x,y,z, u,v, which will be the format for our vertex buffer. The shader input parameters (constants) will be the mvp matrix, a color (to tint a sprite and to enable transparency) and of course the texture image (image4). This way you’re able to draw one sprite per call, pretty easy and straight forward… but slow.

Improvement, batching calls:
You can only batch calls, if the sprites you want to draw have all the same texture (Setting a texture is also pretty expensive). The main idea is, that you pass multiple mvp matrices and multiple colors to the shader instead of just one. Within the shader, depending which sprite is drawn, a different mvp matrix is used. But how many values you can pass to the shader? Todays modern graphic hardware has at least 128 constant registers available in the GPU, so to be compatible with all the different graphics cards out there it’s limited to 128 in the Molehill API. In the following picture you can see the different inputs that are available for the vertex shader. We won’t bother with temp registers and input vectors now, because it’s just unlikely that we are running out of registers while drawing sprites. So just keep in mind, that the vertex shader has limited storage space. In our case we’re limited to 128 constants.

(Image taken from the DX8 SDK documentation)

A single register can hold a float4. So, let’s do some simple math. The matrix uses 4 registers (4 x float4) and the color just one: 128 / 5 = 25. We should be able to batch 25 draw calls in a single call. But how does the shader know which matrix to use? To provide this information in the shader, we simply add a batch identifier to the vertex buffer: x,y,z, u,v, batchID. The vertex shader could look like this then:

...
parameter float4x4 clipSpaceMatrix[25];
 
void evaluateVertex()
{
    vertexClipPosition = vertexPosition * clipSpaceMatrix[batchID];
}

Yay! We just batched our draw calls and the engine will run a lot faster for sprites with the same texture.

But there is more… Right now, we can only batch sprites that share the same texture. Wouldn’t it be great if we could batch just everything? There is an idea called texture atlas. Basically it’s pretty simple as well: Instead of using different textures, you just “bake” every texture used in your game into a single big texture like this: Pocket God Texture Atlas. All you have to do then, is to adjust the UV coordinates of your sprites to match the original texture in the big one. Generating a texture atlas at runtime and adjusting the UV coords is in fact a bit more work…

Have fun exploring the GPU ;)

In this demo, you can see a GPU accelerated particle system I built for my upcoming ND2D engine. The particles are entirely processed on the GPU using pixelshaders. Flash does basically nothing :) There is still plenty of room for improvements, but the performance is already really good! I’ll post more details about the engine the next days…


(Note: The demo is broken with the latest Flashplayer 11 Release due to API changes)

Btw.: Who’s attending to the FITC Amsterdam next week? See you there ;)

Good news everyone. Since a few hours a version of the Pixel Bender 3D compiler has reached the public beta phase. You can download it here.
Now you’re able to write vertex and fragment shaders not only with assembler-like opcodes, but with a lot more readable language.

Instead of:

mov ft0, v0
tex ft1, ft0, fs1 <2d,clamp,linear>
mov oc, ft1

You can write much more readable code like this:

void evaluateFragment()
{
    float4 color = sample(textureImage, float2(uv.x, uv.y));
    result = color;
}

To use a PD3D shader in Molehill you have to write three pixelshaders. The first one is the vertex shader that calculates the vertex position, the second one will be the material vertex shader which (optionally) prepares and calculates data for the fragment shader and finally the fragment shader which is responsible to set the pixel color. Here is an example of the most simple shader you can imagine:

Vertex Shader:

vertex kernel default
<namespace : "AIF Test";
vendor : "Adobe";
version : 1;>
{
    parameter float4x4 objectToClipSpaceTransform;
 
    input vertex float4 vertexPosition
    <id : "PB3D_POSITION";>
 
    output float4 vertexClipPosition;
 
    void evaluateVertex()
    {
        vertexClipPosition = vertexPosition * objectToClipSpaceTransform;
    }
}

Material Vertex Shader & Fragment Shader:

material kernel color
<namespace : "AIF Test";
vendor : "Adobe";
version : 1;>
{
    input vertex float4 vertexColor
    <id : "PB3D_COLOR";>
 
    interpolated float4 color;
 
    output float4 result;
 
    void evaluateVertex()
    {
        color = vertexColor;
    }
 
    void evaluateFragment()
    {
        result = color;
    }
}

To compile a shader program from these three shaders you can use the included PBASMCompiler:

var inputVertexProgram:PBASMProgram = new PBASMProgram(compiledVertexProgramBinary);
var inputMaterialVertexProgram:PBASMProgram = new PBASMProgram(compiledMaterialVertexProgramBinary);
var inputFragmentProgram:PBASMProgram = new PBASMProgram(compiledMaterialFragmentProgramBinary);
 
var programs:AGALProgramPair = PBASMCompiler.compile(inputVertexProgram, inputMaterialVertexProgram, inputFragmentProgram);
var agalVertexBinary:ByteArray = programs.vertexProgram.byteCode;
var agalFragmentBinary:ByteArray = programs.fragmentProgram.byteCode;
 
program = context3D.createProgram();
program.upload(agalVertexBinary, agalFragmentBinary);

For a description how to connect a pixelshader to data streams look into the PDF and the examples that are included in the PB3D release. The documentation is already pretty good ;)

If you have no idea what I’m talking about and you’re have heard the word pixelshader for the first time, take a look at Thibault’s intro to Molehill and Michael’s Simple 2D Molehill example.