How Koalatro works

There were some interesting responses to Koalatro over at the CSDb when it was released, with a few people expressing surprise that the code only takes 16K which has left me giggling like a schoolgirl because, apart from some zero page use which doesn’t count towards the total according to the competition rules, the entire thing runs from 15K, loading and executing between $0400 and $3fff. A Koala-format bitmap is 10,001 bytes (8,000 of bitmap, 2,000 of colour and a byte for the background colour register), aNdy’s music takes around 3K and another half a K is taken up by unrolled code to open the borders and split sprite colours. So that’s about 13.5K use and the rest of the code, some tables, sprite definitions and a little scroll text therefore have to fit in the remaining 1.5K… right?

Well not exactly and it’s probably best to look at the memory map in order to explain further; before the code starts, the layout looks like this:

$0400 - $07e7	First block of colour RAM for the bitmap
$07e8 - $09db	Second block of colour RAM for the bitmap (packed)
$09dc		Background colour register
$0a00 - $13ff	Code, data and scroll text
$1400 - $1fff	Music
$2000 - $3f3f	Bitmap data
$3f40 - $3fff	Colour tables and sprite positioning/set-up data

The first two “tricks” are how colour for the bitmap has been stored; the first block at $0400 is already where it needs to be (which is why only the crunched file from the Github repository can be dragged and dropped into an emulator) so there’s no memory lost elsewhere or code required to move it into place. The second block of colour is a bit more involved since it’s been packed down into 50% of the RAM; this relies on the fact that only the lower nybble is used (a value from $0 to $f) so two of those can be stored in one byte. The background colour byte from the Koalapainter format file is included for completeness and the code will actually deal with other background colours cleanly despite that not being required in the final release.

One thing that’s missing from the memory map above is where the sprite definitions for the scroller are being stored but there’s a small hole between the end of the packed colour data and where the code starts for a reason; one of the first things the code does is unpack the colour data to $d800 and, once that’s done, the space from $0800 to $09ff is then cleared and used for the sprites so there’s only eight definitions available but, since both scrollers use the same eight sprites, that’s not an issue.

And speaking of the scrollers, there’s that block of unrolled block of code I mentioned previously to split the sprite colours and yes, it’s literally doing that; rather than changing the background colour and having black sprites over the top it’s actually writing one of two values to every sprite colour register on each scanline whilst juggling $d016 to open the side borders. The same block of unrolled code is recycled for the two scrollers, starting one scanline further up the sprites for the upper area so the reversed version of the scrolling message gets the static colours. Finally there’s the eight hundred bytes of scroll text from $10de to $13fe which leaves one byte free before the music starts.

I think that’s everything of note covered, the source code is, I hope, reasonably well documented.

How our CD5 part works

So… erm yes, I said over at the Plus/4 World forums that I’d write a “how it works” for the Cosine contribution to Crackers’ Demo 5 and here it is girls and boys, only six months late! Generally speaking there are two actual effects in play, the forty by five byte luminance scroller running through the middle of the bitmapped logo and a sixteen by ten pixel DYCP which works in the regular 39 by 6 character workspace but splits it into two blocks which are four characters high at the top and bottom of the screen.

The DYCP isn’t doing anything majorly different to the other single character routines I’d released around the same time – the loops are all unrolled and there’s specially formatted character data for speed which was originally drawn with ProMotion (I’m actually using an older version) before being converted using my cheap and cheerless bitmap to raw data converter – except that it has two distinct versions of the clear and draw code; one starts from the left hand half of the first character and proceeds to draw across to the right whilst the other begins from the right hand half; the code then flips back and forth whenever the hardware scroll finishes a cycle. During what I’ll refer to as the “design phase” for want of a better term, I settled on wanting the four character high areas so the redraw has to happen during the logo and there was only enough time to render ten pixel high characters.

And since I mentioned the logo, it was originally drawn using C64-specific tool Project One (again, I’m using an older version… one of these days I’ll update all my tools) with the colours used representing luminances; that data was then converted for the Plus/4 with a small assembly language routine on the C64 (all it actually did was translate the C64 colour data and dump it into memory so I could save the results out with a virtual Action Replay cartridge) and all of the colour data was manually created as an included source file. Here’s the logo’s “before” picture when it was still on the C64:

Finally there’s the large luminance scroller; the TED keeps luminance and colour data separately for bitmap-based displays like the logo so one eight by eight pixel attribute cell has two bytes of information, one containing two nybbles of colour (values from $0 to $f for sixteen possible colours) and the other holding two nybbles of luminance data (this time values from $0 to $7 for eight possible brightnesses). The “trick” here is that the luminance has been limited to a maximum of $5 for every cell in the picture where the scroller can pass over it, so when that is added to the scroller’s buffer which is either $0 or $2 for each nybble there’s a noticeable hike in brightness. To keep things simple there’s a second copy of that luminance data used for reference.

And, apart from mentioning that the music was created by aNdy using Knaecketraecker, pretty much covers everything I think. The source code is available online for those brave enough to go prodding around it and I’ll have a go at answering questions if any arise.

How MD201602 works

The majority of this post will be on MD201602‘s DYCP scrollers since the logo movement relies on what is a technique which requires very little processing time; there are multiple copies of the bitmap in memory generated when the code starts, with each stored at a different character offset horizontally so the routine swinging the logos can take a value from the curve, strip the lower seven bits off for the hardware scroll register and use what’s left to select which version of the logo to use. No data is moved in realtime so this takes a couple of scanlines at most, have a look at lines 379 to 398 of the source code since that sets up the top logo, writing directly to the registers.

The DYCP scrollers themselves are… well, just traditional DYCPs really and I’ve tried to optimise things as much as possible without losing that “spirit” too, which is why the routine isn’t going to set a record! The character set is five pixels high and, similarly to the logos, converted on start up by a routine called font_xvert so the top pixel row of each character is in a contiguous table, then the second row and so on; this format allows any of the characters to be accessed quickly, with the renderer for the first scroll drawn looking like this:

ldy dycp_cosinus,x
ldx dycp_buffer_1+$00
beq dd1_char_01
lda dycp_xfont+$000,x
sta dycp_workspace+$008,y
lda dycp_xfont+$100,x
sta dycp_workspace+$009,y
lda dycp_xfont+$200,x
sta dycp_workspace+$00a,y
lda dycp_xfont+$300,x
sta dycp_workspace+$00b,y
lda dycp_xfont+$400,x
sta dycp_workspace+$00c,y

The Y register is, appropriately enough, being used to designate the height that the character will be written into dycp_workspace (a standard character set arranged in columns of six characters on screen) whilst X is selecting which character to draw – the first line of every character is stored at dycp_xfont, the second line kept one 256 byte page on from that original position and the third, fourth and fifth are each a page further into memory than the one before them. The BEQ in the source fragment above means that if the content of the scrolling message in dycp_buffer_1 at that point is zero, the character is empty and the rendering can be skipped entirely. (And with that in mind, it probably isn’t difficult to understand why the greetings scroller is spaced out!)

Finally, there’s the colour bar in the middle of the screen using the same kind of vertical splitting that MD201509 employs and, to make the scrollers look like they’re passing behind it, the screen RAM on that central line of the five being used by the DYCPs is constantly being manipulated; if the character in the second scroller is using the first half of the cosine curve then the value written to that column on the central line of the screen is taken from line_cache (which is a copy of what’s initially generated at that point on the screen) but if the position is 128 to 255 then the routine writes a zero there instead which is always a blank character. Filling the first eight bytes of dycp_workspace which starts at $5000 with $FF from the VICE monitor or using an Action Replay cartridge once the code is executing will make that blank space visible and shows how this works more clearly, the solid blocks move across the screen and it looks like this:

So… that’s the basics at least and the source code is available to prod around too, but if anyone has a question please get in touch through the “usual channels”. I might also be persuaded to post source code for a more generic DYCP routine if enough people want it?