Duke Nukem 1’s sprite rendering

Last time, we looked at how the game renders its world – background and tiles. What really brings the game to life though are the sprites drawn on top. Almost everything interactive in the game is represented using sprites: Duke himself, enemies, pickups, doors, force fields, laser blasts, floating score numbers, explosions, etc.

In this post, we’ll have a closer look at how these things are rendered.

Like the map and background, sprites are made up of 16×16 pixel tiles. Unlike the former, sprite tiles have transparency masks – parts of the image can be skipped during drawing, making it possible to have (e.g.) rounded shapes and not just rectangles. Some objects in the game only use a single tile, but many are composited out of several ones. Unlike the backgrounds and map tiles, sprite data is held in system memory and needs to be copied to video memory each time a sprite is to be drawn.

Some examples for different sprite composition sizes (number of tiles making up a sprite)

Overall, the system works very similarly to Cosmo and Duke Nukem II, except that those two use 8×8 pixel tiles. But there is another big difference: The later games have a more structured approach for constructing sprites out of multiple tiles.

Both Cosmo and Duke 2 use a combination of files to store sprite data. A graphics file stores the tiles themselves, basically an unstructured soup of small images. A separate metadata file holds the starting address for each sprite within the graphics file, along with the width and height in tiles. Thanks to this metadata, the unstructured graphics can automatically be arranged into larger sprites1. A numerical ID identifies the combined sprites within the code, so the graphics can then be changed (including their dimensions) without needing to update the code – a data-driven approach.

Duke Nukem 1 is more primitive in this regard. There is no metadata stored in files, only the raw graphical data. It’s essentially just the soup of tiles, and the code is responsible for arranging it into something meaningful. Let’s have a closer look at the raw data.

Sprite data

The data for sprites is organized into a couple of separate groups: One for Duke, one for “objects”, one for “animations”, and one for “numbers” (all groups feature animation, so the naming is somewhat arbitrary). The images for each group are stored in multiple smaller files. During game startup, the content of all files is loaded into memory buffers. There’s one buffer per group, so the individual files making up each group are combined during loading. The data remains resident in memory until the game quits.

Sprite tiles in the “animations” group
Group# files# tilesMemory needed
Duke519230 kB
“Objects”315023.4 kB
“Animations”628845 kB
“Numbers”1446.9 kB
Total15674105.3 kB

Most likely, the separation into individual groups/buffers was done to avoid blocks of memory larger than 64 kB, which are difficult to handle in 16-bit real-mode DOS code. Why each of the groups was further split up into multiple files isn’t clear to me though, as each group is definitely smaller than 64 kB. Most likely this was a limitation of the tools used to create and package the graphics – the tiles used for the map (see previous post) are also split up into multiple files.

Overall, sprite graphics occupy more than 100 kB of memory. This may seem insignificant today, but considering that real-mode DOS applications only have a maximum of 640 kB available to them (usually less in practice)2, it’s not nothing. The load image (code and data + space for global variables) of the executable itself is already 156.5 kB.

Each buffer holds a sequence of masked 16×16 pixel tiles, packed tightly. Each of these tiles occupies 160 bytes of memory: 128 bytes of color data, and 32 bytes of mask data. This means that we can seek to the beginning of any tile within a buffer by multiplying its index by 160.

Tile indices and corresponding addresses/offsets

Since there’s no metadata describing which tiles belong to which sprite, these buffer offsets have to be hardcoded into the executable. We don’t know how the original developers organized this, perhaps there was a header file with a bunch of defines for the various offsets. When adding new graphics or reorganizing existing ones, these defines would then have to be updated.

High-level drawing code

Each actor/game object has dedicated drawing code as part of its update logic. The low-level foundation for sprite drawing is a function written in Assembly: BlitMaskedTile_16x16. It takes a pointer to the sprite tile data and the target position on screen, with the X coordinate given in bytes (1 byte of EGA memory address space represents 8 pixels) and Y given in pixels.

This means that sprite positions are limited to multiples of 8 pixels horizontally, but can be positioned freely on the vertical axis. This is due to the complexity inherent in the EGA’s planar video mode3. Another important consideration is that sprites mustn’t be drawn outside of the visible screen – the low-level drawing function doesn’t perform any clipping, so drawing outside the screen’s bounds would result in overwriting and corrupting unrelated memory.

With all this in mind, let’s have a look at an example: The Energizer Bunny enemy, aka “Rabbitoid”.

Some of the bunny’s animation frames

The sprite for this enemy consists of two tiles, stacked vertically. There are 8 animation frames in total: 3 for walking to the left, 3 for walking to the right, and 2 for “spinning in place”. The latter is also used when the bunny reverses direction. Each frame consists of two tiles. Curiously, not all of the tiles for this sprite are adjacent in memory – the last 6 frames are at a different place than the first 3. Pure speculation, but this could be an indication that the bunny enemy only used 3 animation frames at some point during development, and was later changed to have more frames after more graphics had already been added. Or perhaps the space occupied by the first 3 frames was originally used by another enemy which was later scrapped and replaced with the bunny?

The update function for the bunny actor calls a helper function to draw the sprite, which takes the index of the frame to draw and the base position, which refers to the bottom tile. The helper function then calls BlitMaskedTile_16x16 twice with the correct arguments.

Let’s have a look at the (decompiled) code first, and then go through how it works.

void pascal DrawRabbitoidSprite(int frame, int x, int y)
{
  word offset;

  if (frame <= 2)
  {
    offset = frame * 320 + 18240;
  }
  else
  {
    offset = (frame - 3) * 320 + 36480;
  }

  if (!IsOffScreen(x, y - 128))
  {
    BlitMaskedTile_16x16(
      animSpritesData + offset,
      WORLD_2_SCREEN_X(x),
      WORLD_2_SCREEN_Y(y));
  }

  if (!IsOffScreen(x, y))
  {
    BlitMaskedTile_16x16(
      animSpritesData + offset + 160,
      WORLD_2_SCREEN_X(x),
      WORLD_2_SCREEN_Y(y) + 16);
  }
}

The first step is to determine the starting address within the sprite data buffer (animSpritesData). The two tiles for each animation frame are always right next to each other, so to draw the second tile, we simply need to add 160 bytes to the address of the first one. This means that an offset of 320 bytes (two tiles) gets us to the next animation frame’s address. To get the full address, all we need to do then is to add the hardcoded starting offset of the first bunny sprite tile (18240) to the animation frame multiplied by 320.

Due to the discontinuity in how the bunny sprite tiles are laid out in memory (see above), we need a separate case for animation frames 3 and up. In the else-block, we first subtract 3 from the animation frame to get a 0-based index relative to the 2nd group of tiles, and then operate just as before, using starting offset 36480. Now we have the correct starting address for our sprite data, and we can proceed with drawing the two tiles.

Note how each call to BlitMaskedTile_16x16 is preceded by an on-screen check. This is to make sure that we don’t draw outside the bounds of the screen, as mentioned above (more on why it’s y - 128 for the first check in a moment).

Now all that’s left to do is to convert the actor’s position into screen coordinates, and then call the blit function. We use the data address as is for the first call, and then add 160 for the 2nd one to draw the 2nd tile for the current animation frame.

Sprites can be positioned in a way that makes them appear half-way off the screen. This is handled exactly the same way as for map tiles4: Sprite tiles are drawn over the edges of the HUD, and the HUD border is then redrawn on top. The IsOffScreen function takes this into account, and will consider a position to still be “on screen” if it falls within one of the HUD border regions.

The camera sprite on the left and the football on the right appear half-way on screen. The sprites are actually drawn fully, drawing over parts of the HUD, but this gets covered up afterwards by redrawing the HUD borders on top.

Simple actors often don’t use a helper function and just call BlitMaskedTile_16x16 directly. More complex actors work fundamentally the same way as the bunny, but will have larger and more complex helper functions.

We can also see some interesting examples here of optimizing memory and disk space usage by only storing animation frames for the parts of a sprite that change, like for Dr. Proton:

Dr. Proton’s sprite consists of 4 tiles. Only the bottom two are animated, the top two are shared between both animation frames.

Now let’s have a closer look at coordinate conversion.

Coordinate systems

There are three coordinate systems in the game:

  • The map tile grid, where each tile occupies 16×16 pixels and has an associated tile value in the current level’s map data
  • The world object coordinate system, used to place actors/objects on the map
  • Screen coordinates, which are like world coordinates but relative to the screen, not the world map

For both world and screen coordinates, X positions use an 8-pixel grid while the Y position has no restrictions. Converting an X coordinate between map and world coordinates is easy: Multiply by 2 to go from map to world, or divide by 2 to go the other way. To then go to screen coordinates, we just need to subtract the camera position from the world coordinate. The horizontal camera position is also in world units, since the camera scrolls in 8-pixel increments horizontally. The full conversion logic then becomes:

#define WORLD_2_SCREEN_X(a) ((a) - cameraPosX)

With Y coordinates, things get a bit weird. As we’ve established, the Y coordinate is more fine-grained – objects can be drawn at any pixel location. So it may seem like a logical choice to use world pixel coordinates for vertical object positions. Converting a map coordinate would then require a multiplication by 16, and world object positions could span any value between 0 and 14245. However, that’s not what the game’s developers did. Instead, when converting a map coordinate to world space, values are multiplied by 128. This essentially means that Y positions are stored in subpixel units – a delta of 1 represents one 8th of a pixel. 128 represents a full tile’s height (16 px), 64 is a half-tile, and 8 is a single pixel. This explains why the on-screen check in the previous section’s code uses y - 128 to check the upper half of the bunny sprite.

To convert to screen coordinates then, we need to first subtract the camera position (which again uses the same units as objects) and then divide by 8 (which is achieved via a right shift by 3), as follows:

#define WORLD_2_SCREEN_Y(a) (((a) - cameraPosY) >> 3)

But there’s one more step we need. When drawing the map, the game takes the HUD into account, and starts rendering at screen coordinates 16,16 – which is where the game viewport starts. We need to apply the same logic to sprite drawing – otherwise sprites would appear offset from their corresponding map coordinate, and a sprite at position 0,0 would draw over the top-left corner of the HUD. This is where things get even weirder. For Y coordinates, the offset is applied when converting from world to screen coordinates. That’s why the second call to BlitMaskedTile_16x16 in the function above adds 16 to the converted Y coordinate6. But there’s no such adjustment for X coordinates. For some reason, it’s handled differently, and in a fairly strange way:

When loading a level, the positions for all actors are converted from map coordinates into world coordinates. The Y coordinate is simply multiplied by 128, but the X coordinate is shifted to the right by one tile. That’s right: To ensure that sprite drawing takes the HUD offset into account, the X coordinate of all actors in the game is offset by 2 world units (1 tile, or 16 pixels)! As a result of this decision, X coordinates have to be adjusted back whenever a world position needs to be converted to the map coordinate system, for example to check for collision with walls and floors. Whereas for Y positions, a simple division is enough to go from world space to map coordinates.

Suffice to say, this system is quite confusing and unintuitive. It would seem much easier to handle the draw offset for X coordinates the same way as for Y coordinates, by applying the offset when drawing, not in the world coordinates. This is how Cosmo and Duke 2 handle it. Having two different units for X and Y is also strange. It does make some sense at least, given the difference in how objects can be positioned on each axis (8-pixel grid restriction for X). But consistently using pixels or subpixels for both axes would make the whole thing much more intuitive and easier to think about. Again, that’s what Cosmo and Duke 2 are doing (they limit the vertical axis to 8-pixel steps too, though).

Well, enough of this, let’s have a look at the low-level drawing function now.

Low-level drawing code

Although the low-level code is conceptually the same as in Cosmo and Duke 2, the actual implementation is quite different. The code is based on ProGraphx Toolbox, a graphics library supplying various EGA/VGA drawing routines written in Assembly. It also provides its own file formats, and came with an image editor. For Cosmo and Duke 2, the sprite drawing functions were completely rewritten alongside the introduction of a new (custom) file format7.

The job of BlitMaskedTile_16x16 is to copy 128 bytes of data from system memory to EGA video memory, while applying the transparency mask. The data is laid out as a sequence of 16 lines. Each line consists of two groups of 8 pixels. Each of those groups consists of 5 bytes of data. The first byte is the mask: It has a bit set for each pixel that should be visible8. Pixels for which the corresponding mask bit is not set will be skipped. The remaining 4 bytes contain the bits for each of the EGA’s color planes9, in order blue, green, red, and intensity. Combining these 4 bytes makes up the 4-bit EGA palette indices for 8 pixels (this happens in hardware).

Data layout of a 16×16 masked tile. Reading four binary numbers vertically from bottom to top gives the color index for each pixel.

To copy this data to video memory, we need to select each of the color planes one by one, and write the corresponding byte at the right location. For applying the mask, the routine makes use of the EGA’s bit mask feature. A hardware register can be set to the desired bit mask, and during subsequent writes to video memory, the EGA will then ignore any bit positions that have a 0-bit in the bit mask10.

The overall process then is as follows:

  • Read one byte from the source data, write it into the EGA’s bit mask register
  • Select plane 0
  • Read another byte, write it to video memory
  • Select next plane, read another byte, write it, etc., until all 4 planes have been written
  • Repeat all of the previous steps a 2nd time for the remaining 8 pixels in the current line
  • Repeat all previous steps 15 more times

In C code, this would look as follows:

void BlitMaskedTile_16x16_C(byte far* src, word x, word y)
{
  word line;
  word group;
  word plane;

  /*
     Pointer to EGA memory matching the given target coordinates.
     Draw page logic omitted for brevity.
  */
  byte far* dest = MK_FP(0xA000, x + y * 40);

  /*
     Select Map Mask register in EGA's Sequencer for subsequent
     writes to port 0x3C5
  */
  outportb(0x3C4, 2);

  for (line = 0; line < 16; line++)
  {
    for (group = 0; group < 2; group++)
    {
      /*
        Read bit mask and send to EGA (Bit Mask register in the
        Graphics Controller)
      */
      outport(0x3CE, 8 | (*src++ << 8));

      for (plane = 0; plane < 4; plane++)
      {
        /* Select current plane in EGA's Map Mask (Sequencer) */
        outportb(0x3C5, (1 << plane));

        /* Read data for current plane and send to EGA */
        *dest = *src++;
      }

      /* Advance to next 8 pixels */
      dest++;
    }

    /*
       Advance to next line on screen. 40 bytes is one full screen
       row. The loop above already advanced by 2, so add the
       remaining 38 here.
    */
    dest += 38;
  }

  /* Reset Map Mask for writing to all planes simultaneously */
  outportb(0x3C5, 0xF);

  /* Reset Bit Mask to allow writing all bit positions */
  outport(0x3CE, 0xFF08);
}

However, writing this in C at the time would produce code that’s a lot slower than handwritten Assembly. The Assembly code implements the algorithm using a single loop over 16 lines, all the inner steps are unrolled for performance. It also makes use of optimized instructions for reading the data. The core logic for handling one 8-pixel span of data is 23 instructions. See the Assembly code. For comparison, this is Cosmo’s equivalent function, and this is Duke 2’s.

Summary

This concludes our look at Duke Nukem 1’s sprite drawing. Although it has a lot in common with later incarnations in Cosmo’s Cosmic Adventure and Duke Nukem II, it’s also quite different. It’s based on the ProGraphx Toolbox whereas the later games have their own routines. The file formats are quite different, and the high-level code has much more work to do in order to define how larger sprites are created out of several smaller tiles. This aspect was moved into a generalized system driven by data files in the later games. But even though the first game’s system is more primitive, it also offered a bit more flexibility, by allowing free positioning at any pixel location on the Y axis. Then again, this capability also led to some strange and confusing design choices. Perhaps that’s why the later games restrict vertical movement to an 8-pixel grid as well?

For the next article in this series, I’m planning to look at the game’s collision detection, which has its own share of quirks and odd choices.

  1. Here’s an in-depth explanation of Cosmo’s system, which is nearly identical to Duke 2’s ↩︎
  2. There were ways to access more memory, namely EMS and XMS. But Duke Nukem 1 doesn’t use either of these ↩︎
  3. I describe this in detail in my article on Duke Nukem II’s parallax scrolling ↩︎
  4. As explained in the previous post in the series ↩︎
  5. Levels are 90 tiles high, so the bottom-most pixel location is 89*16 = 1424 ↩︎
  6. I first tried including the +16 offset in the WORLD_2_SCREEN_Y macro, which would make the client code more logical: The 1st tile’s draw call would then subtract 16 from the result of the macro, while the 2nd one would use the result unchanged. This would match the actual visual positioning of the two tiles better. Unfortunately, the original compiler used for this game, Borland Turbo C 2.0, isn’t smart enough to collapse a sequence of + 16 - 16 into nothing – it still generates an instruction (that does nothing). In order to make the decompiled code match the original Assembly, I therefore had to pull the addition of 16 out of the macro. ↩︎
  7. The latch-copy based map tile drawing routine described in the previous article was kept, and it doesn’t seem to be part of the ProGraphx Toolbox library either. It looks like it was added specifically for Duke 1 as an optimization. ↩︎
  8. Cosmo and Duke 2 do it the other way around, with 1-bits indicating pixels that should be skipped ↩︎
  9. More on planar EGA memory here and here ↩︎
  10. Interestingly, this approach isn’t used anymore in Cosmo and Duke 2. They do the bit masking on the CPU instead ↩︎

Leave a comment