MZR: Music synchronisation with a band-pass filter

Posted on Updated on

This is how this game MZR happened to be. I always wanted to have visuals synched with music. That was integral part of the game idea so much that after some initial failures to get that going I gave up on the whole game for about couple of months.

Band-pass filter graph. Image from If you end up using band-pass filters you’ll probably need to read that page too.

Part of my inspiration came from early encounter in the XNA scene with ColdBeamGames’ game Beat Hazard. ( An excellent example of how music synchronisation of visuals and game play can work really well. Definitely a direction I wanted to go towards although ColdBeamGames’ stuff in that area is just on another level.

Again this is a vaguely technical post. It’s going to be fairly simple stuff thought.

In order to synchronise visuals with music you want to be able to turn a waveform signal (audio signal) into a signal that drives your graphics. This can be done in couple of ways (that I know of):

  • processing the raw audio input and extracting the amplitude of different frequencies (drums would be a fairly low frequency – for example 100Hz, voice is in the middle ones, etc). Then using that result signal to drive graphics.
  • tagging – visualising the wave form and using a tool to place various events on the track, matching the beats and various other music facets. During game you can then synchronise the tag stream with the music stream and have the tags drive the visuals or game. I imagine that’s how most “guitar hero” games are done.

Both approaches have advantages and disadvantages. Automatic processing can handle all sorts of music and produce fidelity you can’t ever achieve with tagging. On other hand Automatic processing detects frequencies – it’s simple as that. If you want anything more complex that is matched by something like a song chorus or a specific music phrase – you want tagging. Also nothing stops you form using both approaches together.

In MZR I use automatic processing.  In this post I’ll describe how got there.

I tried FFT first

Fast Fourier Transform is (better explanation from Wikipedia) an algorithm that can compute the discrete Fourier transform. A Fourier transform is one that can take a signal from time domain (amplitude over time waveform) to frequency domain (amplitude of frequencies). In essence you provide an array of values which is the sound (at 44KHz you get 44000 values per second) and you get an array of  frequencies. In the frequencies array each item is the amplitude of that frequency.

You can find FFT implementations on the internet – there are ones in almost every programming language I can imagine. It’s a known numerical recipe [points for those who got this pun ;)].

So, what do you do with this array of frequencies?

  • find the dominant frequency – this is as simple as finding the array item with the largest value
  • lookup a frequency range that you are interested in
  • visualise it – in a traditional graphic equaliser the array items would each be a bar and the item values would be how high those bars are lit up

So why didn’t I use this? Where’s the gotcha with using FFT?

FFT is successfully used for this exact purpose. However I had couple of issues with this, some of them unrelated to the algorithm.

First and foremost I had made a mistake with my wave forms array calculations. Due to that  I was feeding corrupt data into the FFT and was getting results that were really wrong. As there are complex numbers involved and I don’t fully understand it I assumed I had implemented it wrongly or was using it incorrectly. The result after hours of experiments and visualisation was to abandon this solution and look for another one.

Another, not so valid reason is that FFT can be costly to calculate. I grew up in the ’90 when we were counting every unorthodox operation and if simpler solution was available that could do the job that was the preferred solution. Today’s computers even on mobile devices are pretty powerful and wouldn’t bat a eyelid at this algorithm.

Not long after I had a chat and sought advice from a friend of mine – Neil Baldwin. He is far better informed in all matters audio, audio programming, audio electronics, music, etc… besides being an all round fantastic fellow. Checkout his site, especially if you like chiptunes and NES stuff – you wont be disappointed.

Anyway, that’s when I learned about band pass filters.

Band-pass filter

You have probably heard of low-pass and high-pass filters, right? If not, a low-pass filter is one that once applied leaves only the low-frequencies in the signal. I’ll leave you guessing what frequencies a “high-pass” filter leaves.

A band-pass filter is similar to the low and high pass ones but can be applied for an arbitrary frequency band.  So you can say stuff like apply band-pass for a band that is centred at 440Hz  and covered 200Hz each side. See the graph image at the top of the article and the link under it – that has a good and more in-depth explanation of what a band-pass filter is.

I believe my implementation was based on this internet post  and code sample on the topic.

I thought the band-pass filter was a lot simpler to implement than a FFT and a lot easier to understand. I used it instead of my FFT implementation. No joy! That’s when I left this venture and took a break.

Screenshot 2014-08-23 15.29.27
This is how my visualization looks. Absolutely basic – took minutes to get going and visualize the processing. The middle bit is the waveform the bars at the bottom is the amplitude for the selected band at 60 fps.

Couple of months later…

… I suddenly realised that the problem wasn’t with the FFT or the band-pass implementation. I had made a mistake of how I calculated the wave data size and converted the stereo channels into a mono stream to process. Once that was fixed, everything fell into place.

It’s good to mention couple of things about debugging this stuff, even though I may not be the best person to advise on that given the above story. At least I knew I was doing it wrong and it was because I did this:

  • visualise your input stream and output stream – I wrote a quick app in c# that drew the wave data and then overlay the processed on top – see if they matched. That’s how you know the algorithm is working.
  • play the music on top visualising the playback time to see if the processed data matches the music playback in any usable form. That’s how you know you can use the output you are getting.

How did I end-up not spotting the problem then if I had such visualisation. My mistake was such that the calculated input signal data didn’t match the length of the music. I was visualising only the start of the music track (a few seconds) and things looked ok-ish… however once used in game things quickly went wrong and I couldn’t figure out why. Once I visualised the whole music track I could clearly see how the stream was ending well before the end of the music track. I guess the moral of the story is – if I have debugging tools I should use them in every possible way, not just in the one narrow minded approach I started with.

Actual game data

Once I got this working, I could see two ways to use these filters:

  • pre-process the data offline and bake it into a file, then load in a game and use it synchronised
  • use the live audio stream as sent to the audio output of the device and process that

I chose the first method. My decision was mainly driven by having to implement the audio output capture on multiple platforms. The iOS capture looked complicated enough. Plus, I wasn’t planning on changing music tracks or using the user’s music library (like Beat Hazard mentioned above).

I run 4 bands band-pass filters offline on all music tracks found in the game. I store the result at 60fps – my target frame rate. Each item is a vector4 with x,y,z and w being the output for each band – so I got 60 vector4 per second of music.

At game run-time I load the pre-processed data and track where the music playback has reached to – then I sample the vector4 value and attenuate it gradually in game.

That way in-game I always have a vector4 value representing 4 frequency bands that I can synchronise my visuals with.


Most of the time you don’t have to deal with this. I mean signal processing is fascinating but it’s hard to get right. Most game engines these days would have built-in functionality to give the FFT of an audio stream. That’s right. For example Unity provides something called AudioSource.GetSpectrumData. Check it out.

That’s it!

See you next time.



MZR leadergrid programming tricks

Posted on

The MZR leadergrid.

In MZR (now on App Store: there is an alternative approach to the concept of leaderboards. As the height of the player’s tower is the score we decided that it might be a good idea to have everyone else’s tower around. See your friends and their scores at a glance. As they are all represented as square columns on a grid I call this the MZR leadergrid.

User can browse around, zoom in or out and see the global picture – scores aggregated by country as well as their friends.

In this post I’ll talk about the leaderbaord implementation. In general it’s a collection of programming tricks and shortcuts and I wanted to describe some of those.

Most of the techniques are aimed at providing visual complexity while reducing the number of draw calls. High number of draw calls is possibly the main thing that would hurt your performance  – any opportunity to reduce that number without impact on quality should be considered.

The implementation of the MZR leadergrid score representation is split in two parts.

  • The local grid where every tower is an accurate representation of a real active player.
  • The global grid where a representation of the global leadergrid is rendered based on aggregated data from our servers.

Local grid

Local in this case refers to the locality of the other users’ towers to the centre player tower.  There is a grid of 9×9 towers around the main one.

This local grid, contains a few types of entries:

  • friend entries – these are the towers of friends logged in Game Center or Facebook. Every friend gets their avatar picture on top of their tower, as well as their score
  • active player entries – there are towers of players who have played the game recently. The idea is to get a selection players who are currently playing the game.
  • global grid entries – these are entries that could not be filled by friends or active players and are sampled in a similar way as the global grid one (described below)
Local leadergrid – 9×9 towers of friends and active players. Animation showing each separate render call: towers, frames, friend avatars and scores.

In a way this local bit is the “friends” section of a normal leaderboard but extended so it can always fill 9×9 matrix.

This local grid is the immediate vicinity of the player playing field and as such visible during gameplay. I also wanted to indicate on the nearby columns (towers) how the current player score compares to them.

The local grid is accurate. Every column (friends and active players) is an actual player with their score as reported by our servers. Every time a player submits a high-score to the servers, it is stored in our database and then reported to the player’s friends when they play.

Rendering of the local grid

All of the columns in the local grid are a single rendering mesh. That allows efficient draw call submission – 1 call is better than 9×9 calls. The writing and avatars on top are additional draw calls as they are translucently blended on top. That also allows me to dynamically alter/grow columns as players upload higher scores. Each column is always in the same place in the vertex array so I can calculate what part of the vertex array I’m changing when the column grows.

A similar approach is taken with player scores. A single draw call renders all players score. They use the same font and can change dynamically. That’s possible because I always leave space in the vertex array for 4 digits. By using degenerate (zero area) triangles I can display scores that with less than 4 digits while maintaining vertex count for the full 4 digits.

Finally the avatars. The avatars can change at any time as player logs in to Game Center or Facebook and gets lists of friends… and their avatar pictures are downloaded from the respective network. Furthermore, to optimise the draw call for those avatar images they need to be in the same texture – not separate textures. Again a single draw call is a lot better than 81 (9×9) ones.

To achieve that I pack all images into a dynamic texture atlas. I use a render-to-texture technique to pack each avatar picture into a separate place on the same texture. Then I just need to update the display avatar mesh with the right texture coordinates of where that avatar image finds itself in the final atlas texture. A bonus of this atlas rendering is that I can apply a custom shader effect to the avatar images achieving the specific MZR look they have.

Finally I have the hight indicators rendered on the closest friend columns indicating how high the current player run has reached. At the same time if a local grid column is overtaken, it would flash briefly by changing the colours of the appropriate vertices in the grid vertex array.

Global grid

The global grid is everything outside the 9×9 local grid. It repeats infinitely the space so the user can scroll around in any direction when they browse. By repeats infinitely, I mean that the camera is telported back once it reaches certain limit – wrapping back to the opposite side of the area – that gives impression of endless scrolling.

The global grid is not a 1-to-1 representation of all the players in the world. It is a statistical representation. This is done for couple of reasons:

  1. to reduce the data transfer between the app and our servers
  2. to show the player a bigger picture about the scores of the world.

    Scores are aggregated by country on the server. This table shows the top 10 countries (by max score) at the time of writing of this article. (score = average score, score_count = number of players submitted score form that country, score_max = the top score form that country)

To achieve that I do a certain aggregation on the server with respect to players scores. At regular intervals a routine runs that calculates stats for every country in the world. The stats calculated are number of players from that country, the average score for that country and the maximum score.

The game client downloads the country score stats from the server. The global leadergrid is then rendered using that information. You can look at this from a data compression point of view: it’s a loss compression method where the data aggregation is the compression and the visualisation is the decompression part.

In terms of visualisation each country looks like a small mountain of columns. The more people play in that country the larger space it occupies. The top score for that country is the highest point of the mountain. The average score of the country dictates how “sharp” the mountain is. If the average score is low (a lot lower than the max one) then we get a mountain that’s fairly pointy – naturally the people with high scores are not many. If the average score is high, then we get a meatier mountain as there are more people with relatively high scores.

Using average and max scores to control the curve of the country statistical representation.
Using average and max scores to control the curve of the country statistical representation.

I achieve that by using a power function to approximate the curve slope. I take the average score (av) and the max score (max) for a given country and calculate the ratio as av/(max*0.5). That way score average that is half the max score for that country would mean a power of 1. Say we have average score of 50 and max score of 100 – give the formula we get 50/(100*0.5)=1. We then use the function f(x) = 1-pow(x, av/(max*0.5)) to get the height at various point of x where x is the distance of the sample point form the centre of the country region.

Note: the mathematical ground behind this was if we imagine these mountains as cones (or a swept/revolved curve that makes a kind of a strange cone) then the volume of that cone would be the total added scores of all players playing. That can be found by calculating player scores x average score. We know the height of the cone (max score) and the radius of the cone (number of player playing). Having the radius and the height of the  cone the unknown would be the curve which swept would dictate the volume of the “cone”. The connection between the curve and the cone volume would be an integral of that curve over a circle.

Solving this  however goes beyond my mathematical skills and seemed extremely OTT when I had a sensible approximation already.

Global grid country field

Global country leadergird. Canada has the lead. Notice the relatively pointy score mountain due to relatively low average score of 213 (see the table above) for CA compared to the top score of 735.

Once the country scores have been downloaded from the server. They are inserted into a CountryField object that places them on a plane and uses a relaxation algorithm to make sure they are spaced evenly according to their respective radius. After that I can query the CountryField at any point in the world and it will return a height for that point based on what countries overlap and influence that point. The relaxation is seeded randomly so that every time a user gets the countries from the servers the global grid looks different.

When sampling at a certain point I consider all countries that the sampling point overlaps with. Then I apply a calculation described above and take the highest sample position. Layered on that is a deterministic noise function so that the score mountains don’t look uniform.

The rendering routine then uses this CountryField object to sample the heights for each of the global grid columns.

Adaptive subdivision 

Rendering all global grid towers in one go proved slow – too much geometry to update at the same time. I opted out for an adaptive subdivision technique where if columns are close to the camera then they are rendered if they are further away they get converted into a bigger column that’s the average of their height (it’s actually biased above the average as that proved better looking). This is done on 3×3 columns basis.  As camera moves around some of these combined columns split up dynamically and other coalesce. This combined with some non-linear interpolation gives the unique MZR look when user is browsing around.

This adaptive technique also means that geometry rendered is kept under control.It also allows the adaptive algorithm to work on only fraction of the geometry to interpolate and update the vertex array. The entire global grid is a single mesh and there for a single draw call.

Country names

On top, like clouds, are rendered the country initials (alpha-2 country code) and the country top score. This is again a single mesh so I can render that in a single draw call.



MZR has unique look when it comes to leaderboards. In some ways however it is a bit “style before function”. That’s ok. MZR has always been about trying something new, be willing to go where traditionally games have done stuff another way.

With that in mind here’s what I think could be better:

  • No easy way to compare. In a traditional leaderboard user gets a list of entries with them inserted in the middle. They immediately know who is ahead of them and behind them. This information is more difficult to read in MZR. User gets a bigger picture view but details remain a bit hazy – the details are still there but not so easy to compare.
  • No easy way to compare country scores. User has to browse around  Its’ been fun to watch different countries take the lead (as of writing of this post Canada has the lead) but it would have been better to have a clear indication of who is in front.
  • In retrospect just using alpha-2 country codes make for a more alienating experience. Using something like full names or flags would have work better.


That’s it. If you are interested in reading more about any of the areas I describe, do let me know and I can write that up in more detail in a separate post.

See you next time.

Blending and Transitioning Camera Behaviours

Posted on

This time I’m going to talk about an approach I’ve used successfully on multiple occasions in several companies and home projects. I use it to manage multiple behaviours, blend and transition between them.

I’ve used this system for animation blending and camera control in the past. However, if I can represent something as a state structure and want to blend/transition between different behaviours that operate on that state then this is my “go to” solution.

In MZR I use this approach for my camera system as well as a system that adds additional camera effects – shakes, wobbles, etc. They are both independent: the camera may have a “focus on point of interest” behaviour to control it while the effects can change independently of that giving me a wider variety of visual experience.

The main motivation behind a system like that is that we want to be mostly concerned with directing the system behaviour rather than be busy with the small detail of how that happens every time. I’m interested in when the camera transition happens, how long it lasts, what the new camera frame is, etc. I’m less concerned with what happens with the old camera setup and I’m definitely not keen on doing the same work every time a camera transition needs to happen, in terms of maintaining entities and writing code to manage their life time.

Ideally we want to write only the code that introduces changes in the system and have the system sort itself out afterwards.

First things first.

Couple of words about some code choices.

In order to get automation of allocation and deallocation of objects I make use of a smart pointer system. If you are working in a managed language environment (like C# for example) you don’t have to worry about that. However, for the purpose of this article if you see an object that is derived from BaseObject then it supports intrusive reference counting. And if you see SmartPtr<MyClass> then that adds value semantics to the pointer – incrementing, decrementing the ref count of the object to manage its life time.

In short it’s an automatic lifetime management system. If noone is pointing to an object it will get deleted.

The State

I use a POD type of structure to represent the state of an item in the system. In this example the we are doing a camera system so let’s represent the state as two points: camera position and camera target. You can use a position and orientation or any other combination of properties.

The state is important because this is the result of our system. It is also the data that we would blend. Any behaviours we have will aim to produce one of these states as result of their execution.

struct CameraState
     Vector3 position;
     Vector3 target;

CameraState BlendCameraState(const CameraState& lhs, const CameraState& rhs, float fraction)
    CameraState result;
    result.position = lhs.positions*(1.0f - fraction) + rhs.position*fraction; =*(1.0f - fraction) +*fraction;

    return result;

The state can be anything. In case of an animation system the state can be an array of skeletal bone transforms, movement vector extracted from the animation and so on.

The Base Controller

Next, the basic building block of my system – the CameraControllerBase. It contains a state that we can get access to in order to find out the current state of the system.

class CameraControllerBase: public BaseObject
    virtual SmartPtr<CameraControllerBase> Update(float fDeltaTime)
         return this;

    const CameraState& GetState() const { return m_state; } 

    CameraState m_state;

The most important part of this class is the Update method. The update is where a derived behaviour would do the work by overriding that method.

The Update returns the current controller the parent entity would have after this update step. At the top level, let’s say the entity that owns the system, we have a pointer to the current “top” controller.

SmartPtr<CameraControllerBase> m_topController;

In the update part of this top level entity we want to update the current current top controller and assign to it whatever it returns.

void Game::Update(float fDeltaTime)
    m_topController = m_topController->Update(fDeltaTime);

By doing this we make sure that whatever behaviour is currently at the top controller will be updated and can delegate it’s position of “top controller” to one of it’s child behaviours it aggregates.

This is the driving idea behind this approach. Controllers can “suicide” themselves and pass the responsibility of top controller to another controller they hold a pointer to.

In this article I use the terms controller and behaviour interchangeably. My base building block is the controller – but some controllers have more complex functionality that is beyond the simple control/blend functionality of the system. In other words they have some game or domain specific function that is used to generate or process a state. I call such controllers “behaviours” to indicate their higher function.

The Blend Controller

The blend controller is a class that takes two other controllers (of unknown type but derived from the base one) and blends between them over time. Then it replaces itself with the second controller – the one it interpolates to. As it replaces itself with the “to” controller the “from” and the “blend” controllers are automatically disposed of.

It’s a transition.

class CameraControllerBlend: public CameraControllerBase
    CameraControllerBlend(SmartPtr<CameraControllerBase> From, SmartPtr<CameraControllerBase> To, float BlendTime)
         m_blendTimeMax = BlendTime;
         m_blendTime = 0

         m_controllerFrom = From;
         m_controllerTo = To;

    SmartPtr<CameraControllerBase> Update(float fDeltaTime)
         // accumulate the time
         m_blendTime += fDeltaTime;

         //update the two controllers; assign the result of the update them so the take over logic works
         m_from = m_from->Update(fDeltaTime);
         m_fo = m_to->Update(fDeltaTime);         

         if (m_blendTime < m_blendTimeMax)
              float fraction = m_blendTime/m_blendTimeMax;

              //use fraction to blend between the states of m_From and m_To
              //store the resulting blended state in m_state of base class
              m_state = BlendCameraState(m_from->GetState(), m_to->GetState(), fraction);

              // return this one as the current top controller
              return this;
              //the blending has finished - return the m_To controller as one that will take over
              return m_to;

    float m_blendTime;
    float m_blendTimeMax;

    SmartPtr<CameraControllerBase> m_from;
    SmartPtr<CameraControllerBase> m_to;

To trigger this transition we we have to replace the top controller with a newly created blend one that blends between the old top controller and a new behaviour.

void Game::BlendToController(SmartPtr<CameraControllerBase> ToController, float BlendTime)</pre>
      m_topController = new CameraControllerBlend(m_topController, ToController, BlendTime);

Easy. With just one line we can introduce new controller/behaviour in the system and have it blend in gracefully and clean up after itself.

A diagram showing how new controllers are blended in.
A diagram showing how new controllers transition in to take over the top controller role.

A really nice property of this system is that if blend transitions come in close succession (before the previous blend has finished) everything works exactly as expected. By doing that we are essentially growing the three of objects with every branch being a blend controller pointing to either other blend controllers or behaviour. In the end once all blend times have expired we will be left once again with one top controller.

Sometimes we want to blend to a behaviour, stay at that behaviour for a while and then return to the previous one. I call that an “attack-sustain-release” blend controller. The controller blends to the “to” behaviour, stays there for “sustain” time and in the end returning the “from” controller and disposing of the “to” one.

Here’s how this “attack-sustain-release” (ASR) controller Update function might look.

SmartPtr<CameraControllerBase> CameraControllerBlendASR::Update(float fDeltaTime)
    // accumulate the time
    m_blendTime += fDeltaTime;

    //update the two controllers; assign the result of the update them so the take over logic works
    m_from = m_from->Update(fDeltaTime);
    m_to = m_to->Update(fDeltaTime);         

    if (m_blendTime < (m_attackTime + m_sustainTime + m_releaseTime)
         //calculate fraction as function of current BlendTime, attack, sustain and release times
         //fraction will stay in the range [0:1]
         float fraction = CalcAttackSustainReleaseFrac(m_blendTime, m_attackTime, m_sustainTime, m_releaseTime);

         //use fraction to blend between the states of m_From and m_To
         //store the resulting blended state in m_state of base class
         m_state = BlendCameraState(m_from->GetState(), m_to->GetState(), fraction);

         // return this one as the current top controller
         return this;
         //the blending has finished - return the m_From controller as one that will take over
         return m_from;

The fraction function returns a value between 0 and 1 depending which phase of the controller we are in. During “sustain” fraction will always be 1 for example.

Behaviour Controllers

We had a look at the blend controllers but what about the actual behaviours in the system? Well, that’s down to the specific system. That’s why our Update function is virtual, so that any derived classes can calculate the state in many ways not imagined by us at the point of writing the system.

For a camera system here are some that I’ve used in the past (Note: names are something I’ve just come up with):

  • CameraControllerSnapshot – take a snapshot of a current CameraState and keep it still – in many ways that’s the CameraControllerBase with ability to expose the m_state for writing.
  • CameraControllerFixedPointLookAtPlayer – one that keeps the camera in the same position but makes it look at the player and track them
  • CameraControllerFixedDirectionLookAtPlayer – camera looks at the player from certain directions and moves position to maintain that direction as that player moves. Often such camera would be constrained by a box or geometry.
  • CameraControllerRailsLookAtPlayer – this is sort of cinematic camera 3rd person action games would employ. It would constrain it’s position to a pre-defined spline (on-rails) and follow the player.
  • CameraControllerRailsFixedLookedAtPlayer – this is a variant of the on-rails camera where there are two splines. One that defines the camera position and another one that defines the camera look-at point. This is used so that at any point the artist (camera man) knows what will be in the frame. We would then take the player position, find the closest point on the target spline and calculate the position spline accordingly.

I’m sure you can come up with a lot more camera behaviours. This is just a taste. As long as your Update function uses some logic to fill the m_state CameraState you will have a working system.

Non-transition blending

Not all blends are transitional. They don’t have to be timed and always expire.

We could have a behaviour that has two child behaviours – very much like our blend controller. However instead of time controlling our fraction we can control it from another parameter in code or data setup. That way we can dynamically control the degree in which each of the child behaviours contribute to the final state.

You can take this notion a step further and introduce several child behaviours that are all associated with a value on a line – for example one sits at 0, one at 0.5 and finally one at 1.0. Then the blend behaviour would be given a parameter “depth” and it would evaluate which child behaviours contribute to the final output. This I’ve heard that called a “depth blend”.

There could be other blend examples where the parameters are not linear. Any parameter set can be used as long they can be evaluated to result into a weight for their corresponding child behaviour contribution.

I’ve mostly used these in animation blending. For example, the depth blend could be used in character animation where we want to blend between two animation loops: running and walking. Based on the desired speed of the character we can derive a parameter that is the fraction between walking and running speed and pass it in as depth blend factor. The result would be an animation that is half walk and half run driven by the parameter we just passed in.

Another animation example would be the multi-parameter blend. Let’s say we have several animations of a character that is pointing [a weapon?] at different directions. Each of these animations is associated with it’s corresponding aim direction. Given a desired aim direction the system evaluates a function that results in an array of weights – one for every animation. Using those weights we can then calculate a weighted blend of those animations to got to a state where we have a character pointing at the direction we need.

Note: there is a lot more going on in character animation systems and I’m simplifying here to illustrate this method. A good animation system would need to compensate for different character speeds, foot planting, make additional corrections using IK solutions and so on.

In practice…

… I have a generic template implementation that I specialise every time I have to write one of these systems.  The blend controllers are the same. The state, the state interpolation function and the behaviours are what differs between systems.

Sometimes you will need different interpolation methods than just a linear in transitions. You can add that to the blend controller and control it with a parameter.

Depending on the game that you are making you may even want to make the blending controllers be more context aware. Maybe you want your camera to always track the player no-mater-what. Maybe sometimes blending between two perfectly good behaviours you end up with a frame or two when the camera isn’t looking at the player. To fix that you could make a “clever” blend controller that blends between two controllers but keeps the player into view.

You can trigger camera blend transitions in code – I do that in MZR. However, quite often camera transitions and blends are result of a complex setup in a level. When player passes this trigger then transition to this camera and if they get in this area switch that one, etc. You can even have an editor that allows you to lay down those triggers and position the camera behaviours around the level… but that’s another story.


That’s it for now. I hope you enjoyed reading about this system. It has served me well and I like how it liberates me from the tedious book-keeping of the blending transition tasks and allows me to focus on the top line “what I want to happen” bit of development.

See you next time.

MZR: Gradient Based Shader Effect

Posted on

Today I’ll talk about a shader idea I’ve always wanted to use but never got to release in a game until now. It has its roots in the old retro palette scrolling technique – or at least was inspired by it.

Palette scrolling was the thing when images had 8 bit pixels with each pixels being an index into a palette table of 256 RGB entries. That way using the same image and just changing the palette, one could change the look of the image without actually altering any pixels. Artists would do wonderful animations with just changing palettes. One of the cheapest way to do that would be to just shift the palette one entry (scroll it) and then see the colours shift – I called that palette scrolling.

These days one can still do palette scrolling but on current GPU hardware that involves using two textures: one index texture and one 1D palette texture. Animation being achieved by dynamically changing the palette texture. While on desktop GPU hardware that’s entirely fine on current mobile device GPUs dependent texture fetches are not very performance friendly.

I wanted to use a similar concept of having a static texture that would change appearance when “something like a palette” would change.

I do that by exposing a range from a gradient texture using a step function. For a quick refresher on the topic, have a look at this excellent post on step and pulse functions:

By using a gradient texture and a step function, y = sat(ax + b), I can vary the parameters and a and b and reveal/animate different parts of the said texture. I also introduce two colours and interpolate between then based on the y value.

Here is the shader code:

uniform mediump vec2 GradientParams;
uniform lowp vec4 GradientColour0;
uniform lowp vec4 GradientColour1;


mediump vec4 col = texture2D(Texture, texVar);

// calculate a*x + b
mediump float y = col.x*GradientParams.x + GradientParams.y; 

// calculate sat(a*x + b) by clamping
y = clamp(y, 0.0, 1.0); //sat (a*x + b)

// interpolate the two colours based on the resulting y value
lowp vec4 rcol = GradientColour0*(1.0 - y) + GradientColour1*y;

// factor in the original texture alpha =;
col.a *= rcol.a;

//apply the variant colour
gl_FragColor = col*colorVar;

The a and b parameters go in the GradientParameters x and y components and two colours at each extreme is respectively GradientColour0 (for y=0) and GradientColour1 (for y=1).

Let’s take a simple gradient texture:

A horizontal gradient texture.

And then apply our shader to it. We are using the function y=sat(ax+b). We use a=1 and b=0 thus giving us a gradient of 0 to 1 in the range of the texture. Then we are going to assign a colour at y=0 to be white (255,255,255) and at y=1 we’ll assign it to be black (0,0,0). Here’s how that would look.

Reverse: Using the gradient but replacing colour 0 to be white and colour1 to be black.
Reverse: Using the gradient but replacing colour 0 to be white and colour1 to be black.


Next let’s try to use a part of the range. We’ll use the same function but use a = 3.3 and b=-0.9. That way y will be 0 until x reaches 0.3 and then grow linearly to 1 until x reaches 0.6. To illustrate that I’ve assigned colours to be red for y=0 and blue for y=1.

Note how gradient transition between the two colours happens int he range 0.3-0.6 that we have isolated using our two parameters.


Here’s one based on the same gradient texture that illustrates the way I use this effect. I assign a=2 and b=0 and that gives me a gradient between 0 and 0.5. I also assign the y=1 colour to be translucent – alpha=0.0. That way by varying the parameter a with some dynamic game value, I can get the bar to move with that value.

Gradient based on parameter going into translucency.
Gradient based on parameter going into translucency.

In MZR I link a lot of effects to the music EQ so that visuals appear to bounce with with music.

The above examples are using our simple horizontal gradient texture. Things get a bit more interesting as we start using more complicated textures. For example here’s the actual texture I use for my effect in MZR and the final result next to it. The green MZR logo in the texutre is to indicate where the start of the gradient is – it’s a grey gradient that fills up a maze.

Gradient texture for a maze that the shader works with.
Gradient texture for a maze that the shader works with.
The final result in MZR. The parameter a is controlled by the base frequency in the music animating the maze on the screen.
The final result in MZR. The parameter a is controlled by the base frequency in the music animating the maze on the screen. Note: this also has the MZR logo rendered on top as well as the FUNKY CIRCUIT sign.












And here’s the intro sequence to MZR where this effect is used:


The parameters are linked to the music EQ. The video shows the effect which is a single render call as well as the logo rendered on top and a FUNKY CIRCUIT sign underneath.


That’s it for now. See you next time.


The 1D Radial Height Field

Posted on

It is common in games to use a generic, complex and powerful system to solve a small simple problem. There are many reasons for that:

  • developer is familiar with the system/tool – familiarity – we work faster with the tools we know how to use!
  • it’s available and needs no additional work – reuse – saving effort and money!
  • seems like the right “realistic” way to solve the problem – stay grounded
  • will solve the problem in all possible scenarios that would happen – be robust

All good reasons and I’m sure there are more. To illustrate this let’s take game physics and collision detection – often such a system in games. They can:

  • check for “line-of-sight”
  • use custom colliders (invisible walls) to solve game play problems
  • use it to ensure camera doesn’t get in a place where it won’t provide useful view to the player

These sound so common and natural. They are now standard uses of these systems in games and many people often wouldn’t think twice if they had to solve one of the above problems. I’m sure you can find other countless examples.


… I want to talk about using a custom solution to solve a problem that would otherwise be solved by a generic system. In Mazer that would be the camera control and not letting it get in a place where random geometry would obscure the player’s view.

Using custom solution for camera control in Mazer. The line describes the area of safe camera movement.
Using custom solution for camera control in Mazer. The line describes the area of safe camera movement.

In Mazer the camera always looks at the top of the maze tower the player is constructing and is surrounded by other “leaderboard” towers. The camera position however is procedurally and programatically controlled to be “interesting” and “cool” with the music and gameplay. It can happen that the camera would intersect with surrounding towers obscuring the player’s view and rendering the game unplayable.

Simple brute-force solution to this problem would be to construct a camera frustum and check all surrounding towers for intersection. Easy enough to implement that solution has one major problem – it only provides us with the information that something bad has happened but not with the information needed to resolve the situation.

Next, more elaborate algorithms can be involved where we can have objects be physical entities (rigid-bodies) and just let them collide and slide against each other – never to intersect and get the player in trouble. That would solve the problem and perhaps look good too – one can never be sure before it’s implemented. However, it would be like using a massive hammer to hit a tiny nail.

The solution I use in Mazer uses a more simplistic yet robust approach. It is based on the knowledge that the camera is not completely free and there are various constraints that allow us to solve that problem in a different/targeted way.

Height Field (1D)

First a small detour into height fields. A height field is a scalar field of heights. Quite often you’d see that as a terrain representation where given 2D coordinates you can get the height of the terrain at that point.

Fig 1. Height field function using data control points.

A simpler form of this is a 1D height field. In essence that is just a function where given one coordinate “x” and evaluating the height field function h(x) you can get the height (or “y” coordinate).

This can be achieved with a polynomial of some fashion. For example f(x) = 2*x – x*x. That way we have a continuos solution and a known function that gives us a value at every (most) values of x.

However, if we have some data and we want our height function to match that data we can use an array of control points that specify the height at various values of “x” and an interpolation policy allowing the function to give a continuos result for any value of “x” (Figure 1). I find this to be a common scenario throughout development.

I’m using a linear interpolation here but a more complex methods of interpolation can provide a better (and higher order) continuity.

Height Field (1d) on a Circle 

Note: if anyone knows a better name for this method (there must be one), please let me know.

Now imagine we wrap the 1D height field over a circle. Now x is the angle in radians so any values outside the range of [0; 2*π] are equivalent to that range. So we need to define our h(x) just for that range.

If we have h(x) =0, our Circle Height Field will be just a dot. If we have h(x) = 1, our field will be a circle with radius 1.

Fig 2. Data points plotted on a circle.
Fig 2. Data points plotted on a circle. Note: in a proper implementation drawing the straight lines would be curved with the circle curvature as the 1D line is wrapped around the circle.

Figure 2 illustrates how our 1D data points height field above translates to this idea. It is the same field with the same data points but now plotted on a circle.

In code, I often need to plot the circle height field so I can visualise it and troubleshoot problems with my code. To do so we need to get a 2D point for every value of x. Here’s how we do this:

Vector2 GetCircleHeightFieldPoint(float x)
   //calculate the unit vector at X angle
   Vector2 unit = Vector2(fcos(x), fsin(x));

   //get the height at X angle
   float h = HeightFieldFunction(x);

   //combine them to get the point on our circle height field at that angle
   return unit*h;

By iterating a number of  values of “x” in the range of [0; 2*π] we can get a full representation of the radial height field. All images below (screens from Mazer) use this method to visualise the field.

Camera  Control

How is that relevant to camera control?

Mazer screen with a radial height field.
Mazer screen with a radial height field used for camera control.

I calculate the area the camera can go without getting the user in trouble. I start with and empty height field function h(x) and then I plot, in the height field, all the obstacles that can obscure the camera view. In my case each obstacle is a box. Once projected on a circle it becomes a curve segment that I have to draw (project) on my h(x). There is more information about this process below.

Once I complete that I have a h(x) that gives me the maximum distance (height) the camera can be in at every angle. If I keep the camera within that limit I know I won’t get any visual problems. Even more, if I find that the camera has strayed into an area that is “not good”, I can immediately take remedial action in the right direction or quickly evaluate many other potential points the camera can go very cheaply.

For now I’ll just show a simple method that limits the camera within that height-field – if it goes beyond it, it limits the camera to the maximum distance.

Vector2 LimitCameraPosition(Vector2 cameraPosition)
   //get the "height" the current camera position is at 
   float height = length(cameraPosition);

   //normalise to get the unit vector
   Vector2 unit = normalise(cameraPosition);

   //convert to angle representation
   float angle = UnitToAngle(unit);

   //look up the height function to get the max height at that angle
   float maxHeight = HeightFieldFunction(angle);

   //limit the distance to that max height
   if (height > maxHeight)
      return unit*maxHeight;

   return cameraPosition; 

In the code above there is a function to convert from a unit vector to an angle scalar (in radians). While that’s trivial using an atan2 function from the standard library, you have to remember to wrap that around and keep the angle in the range of  [0; 2*π].

float UnitToAngle(const Vector2& unit)
   //get angle
   float ang =  atan2(unit.y, unit.x);
   //keep it in the range 
   if (ang < 0.0f)
        ang += PI*2.0f;
   return ang;

Obstacle Drawing: Preparing a Useful Height Field

Mazer screen illustrating the "safe" camera field. Note how each box/column makes a dent into what is otherwise a perfect circle.
Mazer screen illustrating the “safe” camera field. Note how each box/column makes a dent into what is otherwise a perfect circle.

Now that we have the basics of our radial height field for camera movement we need a method to insert useful data in it. The method I use is to initalise the field to some default maximum value – at that point it’s just a circle with radius of “maximum value”.

Then I “draw” various obstacles that in-turn would “dent” the circle and make the function non uniform. In the Mazer case the obstacles are the “score” boxes.

For each box I find the bonding circle of that box in 2D and then from the centre and radius of that bound sphere I find two points determining the extents (in the height field circle) of the arc that it would take. Then I step through the line segment those two points define intersecting a ray (starting at the origin and pointing towards the step point on the segment) with the box obstacle itself.

Then I check if the distance to the intersection point is less than the distance stored in the closest control point in the field function and write it in. In a way this acts as a 1D circular depth-buffer.

Finally I do a low pass filter on the data (field) so that all edges are smooth and provide a more gradual response when I use it for camera control.


I look up the constructed field multiple times a frame. However that would be only for certain values of “x”.

Most performance hit is taken when I’m constructing the field and rasterising the obstacles in it. Right now I do that every frame and it has a resolution of 128 control points for the whole range of  [0; 2*π]. That gives me a good resolution for quality camera control and I haven’t seen any visible performance impact.

Other Applications

I’m sure this method has it’s applications in game AI. I know I’ve seen this visualised before in games (or debug information in games) and that’s how I got to know about it. I can think of AI agents using it to plot a 360 degree of threat so the AI can reason about the best course of action for that agent. Or maybe representing the audio “hearing” of the AI agent so it can reason about where the sounds are coming from and how to react.

This approach also shares similar ground with the “context maps” described in this post by @tenpn

That’s it for now. See you next time.

Mazer: Wavefront

Posted on Updated on

I first learned about this algorithm when I was at school. I needed something to fill a maze with certain values and I was going for a brute-force flood-fill recursion when my teacher suggested I should use a wavefront.

Note: what I call wavefront propagation is a simplified version of Dijkstra’s algorithm. (Thanks to @tenpn for pointing this out)

So how does wavefront propagation work?

Suppose you want to fill a maze (or any connected graph) starting at a certain point in the maze. With a wavefront you:

Wavefront Propagation
  1. Mark that node in the maze – mark it with 1 as this is the first step in the wavefront propagation – then you insert it in the wavefront queue.
  2. Pick up a node from the “wavefront” queue (FIFO). Examine it’s wavefront value and make that current wavefront value.
  3. Examine all neighbours of the node in the graph. For any neighbours that are unobstructed and have not been visited yet:  mark them with wavefront value + 1 and insert them in the wavefront queue.
  4. Repeat steps 2 – 4 until the wavefront queue is empty – i.e. no work left to do. All reachable nodes, that could be reached from the start, have been visited.


At this point you should have a maze that has been marked with wavefront values that represent the distance to travel to the start point. A distance field of sorts.

How is this useful?

1. Path-finding. Let’s say that we have two points in a maze – a start point and a destination. You can start a wavefront at the destination (yeah, it’s a bit backwards) and propagate it until the wavefront reaches the start point. After that “the path” is just following neighbours with decreasing wavefront values.

2. Distance map. Path finding is usually about finding the shortest path… using wavefront we can build a map of how far it is to travel to the destination at every node. That allows not only multiple agents to look-up and path find at zero cost (to the same destination) but also make decisions if they want to take a different path at any moment.

3. Multiple destinations. By slightly modifying the storage at each node we could have multiple destination wavefront values stored at each node – by storing an array of values instead of just single value. Then we can have hundreds or thousands of agents seemingly all efficiently path-finding towards their chosen destination while incurring 0 cost  run time because it has all been encoded in the maze.

But what about the costs?

Wavefront has certain resource overheads. It has a queue that would grow up to the length of the current wavefront. And it would be more conservative than a heuristic based path-finding algorithm for example A*. Wavefront aimlessly fills the whole maze – it doesn’t target a certain destination.

These resource concerns can be addressed in various ways and quite often depend on the actual application. The two most often applications I go for are:

  • pre-computation – using wavefront offline in a tool to compute the distance map and then save the data to use run time. This can also be in-game at level initialisation time. I used that in Carcophony 
  • time-distributed iteration – instead of propagating all wavefront completely on one go and fill the maze at once, I can do one node (or one wavefront value) at time… spreading the cost over multiple game frames. In Mazer I do exactly that – spread the cost over multiple frames.

How is this NOT useful?

Even though I like this algorithm, no doubt due to my early encounter with it, it can be very limited and grow complicated as we try to overcome it’s limitations. Arbitrary costs to traverse a node or node links for example… Carcophony had to deal with that as roads were of different length and not all neighbours are the same distance away.


Wavefront takes centre stage in my new game (codenamed Mazer for now). In that I try to make a game mechanic out of my nostalgia for a programming algorithm I learned very early on. I hope you grow to like it too.