How MD201510 works

The idea behind the UOCC was to find ways to use pseudo/illegal/undocumented/whatever opcodes in order to gain an advantage; these are commands that aren’t officially there but work because they accidentally perform the operation of two other commands at once. MD201510 relies on them for the main effect in the upper and lower borders and I’ll get to those in a moment, but the logo movement routine is also using a pseudo opcode, specifically LAX which simultaneously loads the A and X registers with the same value. In this particular case the loop starting at the label logo_copy_1a in the source code does the following…

lax logo_data_ln1,y
sta screen_ram+$000,y
lda logo_colour_dcd,x
sta $d800,y

…which basically reads from somewhere within a table called logo_data_ln1 (which is data for the first line of the Cosine logo, logo_data_ln2 is the second and so on – all of these reads are self modifying code so the logo can move) and stores the contents of A to the screen memory. It then uses that same value but in X as an offset to fetch a byte from the table logo_colour_dcd which gets pushed to the colour RAM. Each character in the set has a unique colour value in that table and changing one byte makes every instance of that character update when the screen is refreshed. Doing this in a loop using the regular commands would look something like…

lda logo_data_ln1,y
sta screen_ram+$000,y
lda logo_colour_dcd,x
sta $d800,y

…and for every character plotted the LAX shaves two cycles off by removing the TAX; that doesn’t sound much but it claws back over four hundred cycles when everything is added up and, to paraphrase a certain supermarket advertising campaign, every little helps. That said, this is jumping through hoops to make the pseudo opcode useful in this context and simply throwing some RAM at unrolling the entire loop will both solve the issue just as cleanly with regular commands and reduce the cycle count even further – that’s basically what Hammer Down is doing for it’s colour scrolling.

The main effect on the other hand is legitimately gaining an advantage but more cycle conscious too, producing a series of patterns across the upper and lower border areas by writing multiple values to the ghostbyte (a single byte that gets repeated throughout those spaces which is usually last byte of the current video bank) on each scanline. I’ve used this before in Spotified but that was a lower “resolution” because it’s using LDA #value / STA ghostbyte for each split, six cycles in total and one cycle is a character wide. MD201510‘s splits are only four cycles wide which has to be done by loading all three registers at the start of each scanline and then spending the visible area doing something like STA ghostbyte / STX ghostbyte / STY ghostbyte – there can be ten or eleven writes per line but, if the splits are to be the same width, the registers can’t be given new values mid line.

That’s where pseudo opcode SAX comes in because it takes the same four cycles as a regular STA or STX when writing to memory, but the value it pushes out is the contents of registers A and X with a logical AND applied. Set bits in the ghostbyte are always black (the colour is all provided by the background) and two mostly set bytes will AND together quite well in this situation. There was also meant to be some use of SBX in the routine which renders the moving “blob” texture but I’ve just noticed it was removed whilst trying to fix something else and not restored in the final source despite the comment still being there – whoops!

The competition was loads of fun distraction and I’m interested to see what everyone else comes up with to use some of the other pseudo opcodes.

2 thoughts on “How MD201510 works

  1. Thanks for the info.

    Pardon my ignorance because I don’t know about such things, but what the the limitations of the opcodes?

    I mean, are there certain circumstances where they don’t work – models of C64, PAL vs NTSC, etc? I’ve run the demo on older versions of VICE out of interest and things don’t look to hot, I’m guessing because older versions don’t emulate ‘correctly’ and cycle exact etc.

  2. It depends on the opcode, all the ones i’ve used are classed as “stable” so should work on any C64 or 128 regardless of video standard but there are others that require a bit of hoop jumping and a few where the results can’t be predicted since things like system temperature will have an effect on the outcome.

    A lot of the research into what these commands do has only happened relatively recently too so the emulators had to catch up and much of the use “in the wild” has been for protection schemes so the focus was probably there to begin with.

Leave a Reply

Your email address will not be published. Required fields are marked *