Chip's Challenge (DOS / Spectrum / CPC) File Format

By John Elliott. Based mainly on reading the source of c4 by Brian Raiter, and peering at various versions of CC with a hex editor.

Introduction

Ports of Chip's Challenge prior to the Microsoft version tend to use the same format to describe levels. This format also appears to be used by the original Lynx ROM.

Unless otherwise stated, all multi-byte values are little-endian.

DOS

Level list

In CC for DOS, each level is stored in a separate .PAK file in the 'levels' directory. The names of the .PAK files are hardcoded in the file BF1.EXE, at offset 0x94B9. Each level filename is a fixed 13-character ASCIIZ string; the name element is packed to 8 characters with spaces. For example, 'DIGGER.PAK' is stored as

000095A3            64  69 67 67 65  72 20 20 2E  70 61 6B 00     digger  .pak.

Passwords

There is no master table of passwords. Instead, 256 passwords are generated by the game engine at startup using a pseudo-random number generator, although only the first 149 are used:

1BDHP33BQSN65UPUN97IOCS129HEAN161JDFE193EFMV225WCSO
2JXMJ34NQFI66ZIKZ98TKWD130XHIZ162OGFP194ICNP226HNAE
3ECBQ35VDTM67GGJA99XUVU131FIRD163YJPA195ZUNX227SGNH
4YMCJ36NXIS68RTDI100QJXR132ZYFA164KBIV196FHYK228TIOC
5TQKB37VQNK69NLLY101RPIR133TIGG165ZDTQ197FEOO229ZUKF
6WNLP38BIFA70GCCG102VDDU134XPPH166LQSN198TPLM230BPIK
7FXQO39ICXY71LAJM103PTAC135LYWO167WKWD199PLYR231LEBH
8NHAG40YWFH72EKFT104KWNL136LUZL168TQJM200DEJD232RDXH
9KCRE41GKWD73QCCR105YNEG137HPPX169RABT201EKFD233CFLY
10VUWS42LMFU74MKNH106NXYB138LUJT170URPI202IGWB234PHDA
11CNPE43UJDP75MJDV107ECRE139VLHH171ARLT203LXHT235KJXP
12WVHI44TXHL76NMRH108LIOC140SJUK172NMSB204DVMV236UNXY
13OCKS45OVPZ77FHIC109KZQR141MCJE173VLNA205XLUZ237RUSJ
14BTDY46HDQJ78GRMO110XBAO142UCRY174MNAN206DHPP238HQSS
15COZQ47LXPP79JINU111KRQJ143OKOR175JNTT207IVIR239VHIC
16SKKK48JYSF80EVUG112NJLA144GVXQ176FHYK208DQKB240BAOG
17AJMG49PPXI81SCWF113PTAS145YBLI177XTIW209XPPH241ICNM
18HMJL50QBDH82LLIO114JWNL146JHEN178EJTX210HAGO242PYCX
19MRHR51IGGJ83OVPJ115EGRW147COZA179QRLT211SJUN243SFER
20KGFP52PPHT84UVEO116HXMF148RGSK180CSOZ212USJE244ICXI
21UGRW53CGNX85LEBX117FPZT149DIGW181CZMG213WCSR245DASX
22WZIN54ZMGC86FLHH118OSCW150GNLP182KBYM214LDFU246BDHH
23HUVE55SJES87YJYS119PHTY151ACKS183MJLA215DMYB247UGZQ
24UNIZ56FCJE88WZYV120FLXP152BQZP184MIZX216ZXMJ248PXMF
25PQGV57UBXU89VCZO121BPYS153XUFS185KWTT217BIVY249GNXP
26YVYJ58YBLT90OLLM122SJUM154EBHR186UGRW218TLAK250OCKS
27IGGZ59BLDM91JPQG123YKZE155YWNL187BIFQ219GZIF251PHTY
28UJDD60ZYVI92DTMI124TASX156CFPD188URPI220RLYG252ZTHP
29QGOL61RMOW93REKF125MYRT157HHDA189DEZT221HYRA253EBHM
30BQZP62TIGW94EWCS126QRLD158REZT190GONT222PQGF254PPXL
31RYMS63GOHX95BIFQ127JMWZ159ZMWO191ASHA223KRQZ255CFLX
32PEFS64IJPQ96WVHY128FTLA160GBUS192LHXD224ZQRA256PHDA

After the 256 passwords have been generated, the program then generates three more characters and uses them to overwrite the passwords of levels 98, 107 and 114 (which would otherwise be duplicates):

LevelBeforeAfter
98GKWDTKWD
107KCREECRE
114KWNLJWNL

Compression

Each level (.pak) consists of one or more compressed blocks. A block begins with a 4-byte header:
ByteLength of compression dictionary, 0-255. If this is zero the level is not compressed.
ByteNonzero if another compressed block follows this one; zero if this is the last compressed block in the level. In the levels supplied with CC for DOS, this byte is always zero, and I don't know how well various implementations support nonzero values for it.
WordNumber of bytes following the dictionary.

The dictionary then follows. Each entry is three bytes:

ByteToken to define.
ByteToken definition: first byte.
ByteToken definition: second byte.

Each 'definition' byte will be treated either as a reference to another token (if it matches a token already in the dictionary) or a literal byte (otherwise). Note that tokens can be redefined. For example, level 106 (KABLAM) starts with:

	1E		; Dictionary has 30 entries
	00		; This is the last block
	70 00		; 112 bytes follow the dictionary
	02 2C 2C	; Define dictionary entry 02 as { 2C 2C }. Dictionary
			; entry 2C is not defined, so both '2C' are treated
			; as literal values.
	03 00 00	; Define dictionary entry 03 as { 00 00 }.
	04 02 02	; Define dictionary entry 04 as { 02 02 }. Dictionary
			; entry 02 is defined, so both are expanded; entry
			; 4 becomes { 2C 2C 2C 2C }
	02 03 03	; Redefine dictionary entry 02 as { 03 03 }. Dictionary
			; entry 03 is defined, so both are expanded; entry
			; 2 becomes { 00 00 00 00 }

Here are some C code fragments for reading the dictionary, and expanding an entry.

typedef struct dictent
{
	struct dictent *pLeft;
	struct dictent *pRight;
	unsigned char chLeft;
	unsigned char chRight;
} DICTENT;

DICTENT	dict[256] = { 0 };
DICTENT *lookup[256] = { 0 };

	/* Load dictionary */
	int dentry = 0;
	for (n = 0; n < entries; n++) 
	{
		unsigned char key   = read_byte();
		unsigned char left  = read_byte();
		unsigned char right = read_byte();

		dict[dentry].pLeft  = lookup[left];
		dict[dentry].pRight = lookup[right];
		dict[dentry].chLeft = left;
		dict[dentry].chRight = right;
		lookup[key] = &dict[dentry];
		++dentry;
	}

/* Expand an entry */
int expand(DICTENT *de, unsigned char *dest)
{
	int len = 0;
	
	if (de->pLeft) 
	{
		len = expand(de->pLeft, dest);
		dest += len;
	}
	else
	{
		dest[0] = de->chLeft;
		len = 1;
	}
	if (de->pRight) 
	{
		len += expand(de->pRight, dest);
	}
	else
	{
		dest[0] = de->chRight;
		len++; 
	}
	return len;
}

Following the dictionary is a sequence of bytes, which are expanded to the level data. Again, these are treated as dictionary tokens (if defined in the dictionary) or literal values (otherwise).

unsigned char *dest = buffer;
{
	unsigned char c = read_byte();

	if (lookup[c])
	{
		dest += expand(lookup[c], dest);
	}
	else	
	{
		*dest++ = c;
	}
}

Expanded data

Map

Once the level has been expanded, the first 1024 bytes will be the map (32 rows of 32 tiles). Tiles are:

00Empty 10Fire boots 20Yellow key 30Red button
01Wall 11Force boots 21Green key 31Chip (uncounted)
02Ice 12Water boots 22Blue button 32Blue block wall
03Dirt 13Fire 23Chip (counted) 33Hint button
04Blue block floor 14Water 24Socket
05Force floor north 15Thief 25Exit
06Force floor east 16Popup wall 26Invisible wall temporary
07Force floor south 17Toggle open 27Invisible wall permanent
08Force floor west 18Toggle closed 28Gravel
09Force floor random 19Green button 29Wall east
0AIce corner SE 1ARed door 2AWall south
0BIce corner SW 1BBlue door 2BWall southeast
0CIce corner NW 1CYellow door 2CBomb
0DIce corner NE 1DGreen door 2DBear trap
0ETeleport 1ERed key 2EBrown button
0FIce boots 1FBlue key 2FClone machine

Other tile types tend to crash the game engine.

Creature list

Following the map is a 384-byte creature list.

000h:	DB	creature0, creature1, ...	; Up to 128 creature IDs
080h:	DB	x0, x1, ...			; X-coordinates
100h:	DB	y0, y1, ...			; Y-coordinates

Creature types are:

04Chip north 05Chip east 06Chip south 07Chip west
08Bug north 09Bug east 0ABug south 0BBug west
0CCentipede north 0DCentipede east 0ECentipede south 0FCentipede west
0CFireball north 0DFireball east 0EFireball south 0FFireball west
10Glider north 11Glider east 12Glider south 13Glider west
14Ball north 15Ball east 16Ball south 17Ball west
18Block north 19Block east 1ABlock south 1BBlock west
1CTank north 1DTank east 1ETank south 1FTank west
20Walker north 21Walker east 22Walker south 23Walker west
24Blob north 25Blob east 26Blob south 27Blob west
28Teeth north 29Teeth east 2ATeeth south 2BTeeth west

Other creature types tend to crash the game engine.

Level time, title and hint

The level time is a little-endian word:

	DW	time

The title is a 0-terminated string, with this encoding:

x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF
0x Newline Space 0 1 2 3 4 5 6 7 8 9 A B C D
1x E F G H I J K L M N O P Q R S T
2x U V W X Y Z ! " ' ( ) , - . : ;
3x ?

(Note: The Spectrum version port only supports characters 00h-25h, plus 2Dh).

The hint, if present, uses the same encoding as the title except it is terminated with two 00 bytes in succession (a single 00 byte is treated as a newline but does not end the hint).

Spectrum

On the Spectrum version, levels are stored on cassette tape. World of Spectrum has three tape images:

In all versions, each file on the tape contains eight consecutive levels. The files are distinguished by their level set ID: ID zero holds levels 1-8, ID 1 holds levels 9-16, etc.

TAP format

The TAP format is vastly easier to understand: both as a container, and the way the levels are stored in it. A .TAP file consists of one or more blocks:

	DW	length	;Number of bytes in the following block
	DB	block	;Data block. The first byte is the type, usually
			;00h for a file header, 0FFh for file data.

Each level file is stored in the .TAP as two consecutive blocks, header followed by data:

	DW	13h		;Number of bytes in the following block
	DB	00h		;Block type 0 : header
	DB	03h		;Spectrum file type (CODE)
	DB	id		;Level set ID, 0-17.
	DB	"*CHIP'S* "	;Remainder of Spectrum filename
	DW	datalen		;Length of level data
	DW	8000h		;Notional load address (ignored)
	DW	0		;Unused field
	DB	check		;XOR of all bytes from block type to end.

	DW	blklen		;Number of bytes in the following block,
				;should be datalen + 2
	DB	0FFh		;Block type 0FFh : data
	DB	... level data ...
	DB	check		;XOR of all bytes from block type to end.

For example, this is how the beginning of the first level file looks. I have highlighted the two separate TAP blocks:

00011938                             13 00 00 03  00 2A 43 48          .....*CH
00011940   49 50 27 53  2A 20 97 08  00 80 00 00  5A 99 08 FF  IP'S* ......Z...
00011950   10 A8 12 A9  F4 A9 27 AB  7E AC 83 AD  7B AE B2 AF  ......'.~...{...
00011960   11 00 CB 00  02 00 00 04  02 02 05 04  04 07 05 05  ................

The level file begins with a 16-byte header, giving the addresses of the eight levels it contains (on the assumption that the file has loaded at 0A800h). In the above example, the first level pointer is 0A810h, so Lesson 1 can be found 16 bytes after the start of the file.

The actual levels themelves, once you reach them, are in the same compressed format as the DOS .PAK files.

TZX format

TZX, as a container format, is described at World of Spectrum. It's a nuisance to work with, because there's no consistent way of specifying the length of a chunk.

Of the two TZX releases, the Erbe rerelease contains the same data as the .TAP download. The original (turbloader) TZX stores the levels in an obfuscated form, which the turboloader decrypts on read. For example, the file containing the first levels has the header

E9 CB 5C F7 98 41 83 C3 25 FE CB 76 2B

After decryption by the loader, this becomes

00 89 08 FF 5F 3D 3D C0 97 08

Of these bytes: 00 is the level ID.
89 08 is the number (0889h) of bytes that will be read from the file.
97 08 is the number (0897h) of bytes that will end up in memory.

The other bytes are used in decoding the following data block.

Amstrad CPC

The version of the game downloadable from ntnu.no is in the form of a disk image (in the CPC data format). The filesystem is CP/M, with groups of eight levels held in CP/M files:

CHIPS.CHA: Levels 1-8
CHIPS.CHB: Levels 9-16
CHIPS.CHC: Levels 17-24
etc.

Each file begins with a 128-byte AMSDOS header. This is followed by the same data as in the Spectrum tape file: eight words giving the level addresses (assuming a load at 0A800h) and then the eight levels.