The idea behind the UOCC was to find ways to use pseudo/illegal/undocumented/whatever opcodes in order to gain an advantage; these are commands that aren’t officially there but work because they accidentally perform the operation of two other commands at once. MD201510 relies on them for the main effect in the upper and lower borders and I’ll get to those in a moment, but the logo movement routine is also using a pseudo opcode, specifically LAX which simultaneously loads the A and X registers with the same value. In this particular case the loop starting at the label logo_copy_1a in the source code does the following…
…which basically reads from somewhere within a table called logo_data_ln1 (which is data for the first line of the Cosine logo, logo_data_ln2 is the second and so on – all of these reads are self modifying code so the logo can move) and stores the contents of A to the screen memory. It then uses that same value but in X as an offset to fetch a byte from the table logo_colour_dcd which gets pushed to the colour RAM. Each character in the set has a unique colour value in that table and changing one byte makes every instance of that character update when the screen is refreshed. Doing this in a loop using the regular commands would look something like…
…and for every character plotted the LAX shaves two cycles off by removing the TAX; that doesn’t sound much but it claws back over four hundred cycles when everything is added up and, to paraphrase a certain supermarket advertising campaign, every little helps. That said, this is jumping through hoops to make the pseudo opcode useful in this context and simply throwing some RAM at unrolling the entire loop will both solve the issue just as cleanly with regular commands and reduce the cycle count even further – that’s basically what Hammer Down is doing for it’s colour scrolling.
The main effect on the other hand is legitimately gaining an advantage but more cycle conscious too, producing a series of patterns across the upper and lower border areas by writing multiple values to the ghostbyte (a single byte that gets repeated throughout those spaces which is usually last byte of the current video bank) on each scanline. I’ve used this before in Spotified but that was a lower “resolution” because it’s using LDA #value / STA ghostbyte for each split, six cycles in total and one cycle is a character wide. MD201510‘s splits are only four cycles wide which has to be done by loading all three registers at the start of each scanline and then spending the visible area doing something like STA ghostbyte / STX ghostbyte / STY ghostbyte – there can be ten or eleven writes per line but, if the splits are to be the same width, the registers can’t be given new values mid line.
That’s where pseudo opcode SAX comes in because it takes the same four cycles as a regular STA or STX when writing to memory, but the value it pushes out is the contents of registers A and X with a logical AND applied. Set bits in the ghostbyte are always black (the colour is all provided by the background) and two mostly set bytes will AND together quite well in this situation. There was also meant to be some use of SBX in the routine which renders the moving “blob” texture but I’ve just noticed it was removed whilst trying to fix something else and not restored in the final source despite the comment still being there – whoops!
The competition was loads of fun distraction and I’m interested to see what everyone else comes up with to use some of the other pseudo opcodes.