Wednesday, 7 March 2018

Exploring C compilers for the 6809

Ever since I fired up my first 6809-based computer, I have wondered about the possibility of running programs written in something other then assembly language on it. Not that I don't love writing code in assembly. But for interests sake, and because it could be a different kind of fun, I've been curious about running code written in higher level (relative to assembly) languages, especially programs written in C.

To make trying out external code - that is code not stored in the EEPROM - quicker to try out and run then when it is held on the Compact Flash, I have implemented an improvement over the existing method for transferring runnable files (essentially machine code subroutines) between my Linux box and the MAXI09 board. The old method involved transfers with XMODEM. Whilst this method largely worked, it was fiddly because XMODEM transfers need to be initiated on the sending side; the Linux box, which is not very convenient especially if the MAXI09 console is being used.

What I have implemented in its place is a simple file transfer service whereby the name of the file to be transferred is given to the receiver, the MAXI09 board.

This screenshot shows the "getrun" command being used to download and run various binaries:
This means I can rebuild code then switch to minicom and immediately try it out, without having to go through the hassle of popping the Compact Flash out of the MAXI09 board, copying the programs across, then putting the Compact Flash back in the MAXI09 board. A big improvement!

The protocol is mind-boggling simple. In this description the client is the MAXI09 board and the server is the Linux box:
  1. Client sends a command byte 0x01, the "get" operation
    1. The rest of the sequence is command-specific though in any case only "get" has been implemented
  2. In the case of 0x01 the filename, with a trailing null byte, is sent next
  3. The server sends a response code: 0x01 means "data follows", 0x02 means "file not found", 0x03 means bad command, and 0x04 means the file is larger then 64KB
  4. Assuming the file exists 0x01 is followed by the file size as a 16bit, in big endian order
  5. Finally the file data itself is sent
    1. There are no acknowledgements sent from the receiving end as the file is sent; it is essentially sent blind
This last characteristic caused problems, initially, with data loss. It seems that with task switching going on during a transfer the circular buffer between the ISR and the sysread system call will sometimes overflow. This could be cured by either:
  • Breaking the transfers up into blocks (say 64 bytes) and acknowledging each block transferred
  • Introducing hardware flow control
  • Disabling task switching - but leaving interrupts enabled - for the duration of the file transfer
Disabling task switching is the solution I have currently employed. The first method would also likley work, but would slow down the transfers.

Figuring out the cause of the data loss took some time. Initially I suspected the 64 byte FIFO in the UART port in the SC16C654 (PDF) quad UART was overflowing due to interrupt starvation. But after checking the Line Status Register this turned out to not be the case. and it's obvious really: the baud rates used are slow compared to the speed of the CPU and the size of the UART's FIFO.

Instead the problem was an overflow of another FIFO: the circular buffer filled by the ISR and emptied by the sysread call in the UART driver. While other tasks are running, nothing is emptying this buffer so eventually it will overflow (the write pointer moves past the read pointer) causing data loss. The current solution is to insert a forbid at the start of the file transfer and a permit at the end. During the transfer interrupts will be serviced - needed so that the UART's FIFO is still emptied - but only the shell task, where the file transfer code runs, is scheduled. Thus the circular buffer is always emptied fast enough and no data loss occurs. The downside is that no other tasks are running for the duration of the transfer.

The fileserver software itself is based on the old flasher program; the one I use to update the EEPROM on the MAXI09 board. To avoid code duplication, the common routines (serial IO) have been broken out into their own module, which is linked by both the flasher program and the fileserver program. Anyone interested in this code can view it here. The code is not particular pretty, but it works.

While working with external commands, I thought about tackling a little project I've been wanting to complete for a while: running my old Snake game from within MAXI09OS. Not only would the game be started from the Shell (and read either from the Compact Flash or sent over the UART), whilst the game is running other tasks would still be being scheduled. Further, it would be terrific if it were possible to switch between the running game and the virtual consoles running other tasks. Finally, exiting the game should free up all resources and return the user to the virtual console which was used to start the game.

In summary, all this has been achieved. The game code was "ported" from its existing, bare metal, environment to use the facilities from MAXI09OS. One of the things I wanted to avoid was having the game code busy-wait - the previous iteration of the game spends most of its time in a delay loop implemented by decrementing a register until it reaches zero. This has been replaced with the use of a MAXI09OS timer. The joystick driver has also been rewritten to be event-based. Instead of returning the current state of the stick, sysread now returns only changes in position, which are themselves determined by hooking the main system tick. Changes in position generate a signal, which user code can wait on. The main loop for the Snake game thus waits on both timer and joystick events. As before, the game speeds up as the Snake eats the food. This is accomplished by shortening the interval between timer events each time the food is eaten. The interface to the V9558 itself uses the sameAPIs for interacting with the V9958 as the console driver, which happens to be roughly the same mechanism used in the Snake game. For performance reasons it's not really possible to place the V9958 behind a proper driver in the same way as, say, the UART is.

Implementing console switching while the game is running was interesting. This was accomplished by first shuffling the console video memory space around a bit. Previously the console font data and the screen data was spread across the 64KB video RAM memory space. Now the console fonts and screen data is within the first 32KB; the second 32KB is free for other purposes, for example the Snake game's own font and screen data. Actual switching was achieved by extending the console switching routine to adjust all video registers, including the video mode registers. This means that if a graphical mode is active and the user switches to a text-based virtual console, then it will be replaced with text mode. Previously only the screen memory address register were updated to switch the display text to the new virtual console.

Because of possible conflicts between tasks wanting to use non text modes on the V9958, it is only possible to have one non text view active at a time. To claim ownership of the upper half of the video memory for use with alternative video modes, a task registers a video switch callback using the setgraphicsub subroutine. This callback is run, inside the console's ISR, to switch to an alternative video mode when a special "graphical console" function key is pressed, currently F10. In the case of the Snake game, this callback switches the video mode registers to the 32 column tile mode used by the game. When the user presses, say F1, the game switches back to 80 column text mode.

This all works rather well: it is possible to leave Snake running between games (or even during a game, if you are quick) and switch back to text mode to use the Shell, etc. All the while, the system's performance is not too adversely impacted.

I'm pleased enough with this little milestone for MAXI09OS that I've made a brief video. It shows some other things covered later in this blog post:

So, to C compilers.

I've pondered running programs written in something other then Assembly Language for a while now, and I've finally had some success. Whilst there are a number of C compilers that were produced for the 6809 over the years, there seems to be only two which are, more or less, currently being worked on:
  1. GCC, in the form of some unofficial patches
  2. CMOC, which is a compiler written specifically for the 6809
I've spent a fair amount of time looking at both compilers. They each have their pros and cons:

GCC pros:
  • Although they are close to 7 years old, the patches are against GCC 4.3.4, which though somewhat old, supports full C and C++ standards including C99 and C++98 with support for some of the features available in the later C11 and C++03 standard.
  • The generated code appears to perform better then CMOC generated code.
  • Not to dish CMOC, but it is a fully featured compiler.
  • C++ is available as well as C, though I have to used it. The STL is not available.
GCC cons:
  • It requires the creation of a linker script in order to produce output in a form which can run on a "bare" system like MAXI09OS. Not a big issue.
  • It might be my build, but the shipped libgcc contains integer division routines which crash the system. I had to use an alternative implementation and manually link the module containing these routines in, or else the code could not contain expressions with division/modulus.
  • Position Independent Code support is patchy. It mostly works, but I had problems getting the compiler to generate sensible code when globals (which were arrays) were involved. I have not spent much time looking at the generated assembly, but it does not look complete.
  • There is no C library. Apparently it is possible to use the newlib C library, but I have not had any success getting it to build.
  • The 6809 patches are not maintained anymore, it seems.
CMOC pros:
  • Very easy to use: no linker scripts, just point it at the code. The compiler mostly targets at the CoCo, but generating "bare" machine-code files is easy with the Motorola SREC output format, which can be easily converted to a raw binary format with objcopy, a part of binuils.
  • Comes with a fairly minimal C library.
  • Being actively worked on; the last release (as of this writing) was 26th February. The author happily answered my queries about the software.
  • When dumping out the generated assembly files, the code is beautifully annotated with remarks relating to the original C code.
  • The inline assembly (required for calls to MAXI09OS routines) is a lot easier to use then GCCs.
CMOC cons:
  • Supports only a subset of C. Some missing features: bitfields (have never liked these anyway), and "const".
  • It seems to produce slower code the GCC.
On the subject of the speed of the generated code, my performance measurement was extremely crude. I tested a prime number solver and observed it ran about half the speed of the GCC-build. I have yet to look into why this is the case, but one factor against CMOC is that it passes all function parameters on the stack instead of through registers. I'm unsure if this can account for all the speed difference though.

Because of not being entirely sure which compiler I will end up using for any larger projects, the build system I have implemented for building programs written in C supports both compilers. Getting a build from "the other" compiler is a simple matter of switching a Makefile variable.

A "c" directory in the MAXI09OS repository has been created which contains the relevant files for working on MAXI09OS code written in C. This is broken down into the following files and directories:

Makefile: This is a simple top level Makefile which runs make in the subdirectories. This include file contains shared Makefile commands used by the other subdirectories. The compiler to use (GCC or CMOC) is defined here.
libc/: My very, very trivial C library is here. I have only implemented a few functions: atoi, memcpy, memset, strlen. Also an snprintf implementation has been included, which is based on mini-printf.
libmaxi09os/: This directory contains the interface between programs written in C and the MAXI09OS assembly routines. Only a few MAXI09OS routines are currently exposed: putstrdefio, sysopen, sysread, syswrite and a couple of others. This linkage uses inline assembly: the C function wrapper extracts the parameters passed in (in CMOC's case via the stack) and then uses jsr to call the MAXI09OS routine via the jump table.
libgcc/: Contains the integer division routines required when GCC is the compiler being used.
examples/: Various test programs live here, including the prime number finder.

One of my current concerns lies in the overhead of calling MAXI09OS routines from C code. In a function like sysopen, the steps involved are roughly, assuming CMOC is used:
  1. The calling code pushes the 3 parameters onto the stack. These parameters are the device name pointer (a word), and the two parameter bytes (eg the UART port number and the Baud rate).
  2. The C wrapper then loads the correct registers by reading values off the stack. This is done with inline assembly, as is the next step.
  3. The inline assembly then calls through the MAXI09OS routine via an indirect jsr.
  4. The return value, if any, is stored into a variable held on the stack.
Since the inline assembly syntax is different between GCC and CMOC, libmaxi09os is pretty much implemented twice, once for each compiler.

Here is the code to the sysopen wrapper, assuming CMOC is being used:

DEVICE m_sysopen(char *name, uint8_t param2, uint8_t param1)
        DEVICE device;

        asm {
        pshs a,b
        ldx name
        lda param1
        ldb param2
        jsr [0xc008]
        stx device
        puls a,b

        return device;

I don't believe anything can really be done about this overhead, short of using inline assembly at the calling point. But if that was done, the whole program might as well be written in assembly.

All told, I'm pretty happy about being able to write user code in C. I'm not sure where I'll use it though, yet.

So far I've written (or converted) a bunch of programs and have ran them up on the board:
  • io.bin: A simple a. print a prompt, b. get a string, c. output the string demo.
  • gettime.bin: A rewrite of the SPI RTC time-display program.
  • prime.bin: Asks for a number, and then prints all the prime numbers between 2 and that number.
  • easter.bin: Prints the dates for Easter for the years between 1980 and 2020.
As usual, I have a bunch of ideas for what to work on next, including:
  • Tidy up keyboard support: need to add key debouncing, utilities for setting the key repeat parameters, and fix some other little issues.
  • Get my Citizen 120D+ 9 pin dot-matrix working. This needs a driver for the VIA parallel port, and some utilities to print files. I really want to do this!
  • Write a driver or other routines for the OPL2 sound IC. Not really sure of the best way to approach this though!
  • See about porting a more complex C program to MAXI09OS. A text adventure game might be a nice target.
Lots of ideas, and only so much time...

Sunday, 21 January 2018

Another MAXI09 is born and some OS progress

There's not been as much progress over the last few, actually six, months as I would have liked. The good news is that I now have a whole load more time available to work on my projects, including MAXI09-related things. So expect many more blog posts in the coming months!

There has been some progress on the OS which I will talk about later, but first up some news about the creation of another MAXI09 board.

After posting about the MAXI09 project on the retrobrewcomputing forum I initially had very little interest. Which is fair enough; MAXI09 is not for everyone. But a couple of months ago I received a very interesting response from someone who, like me, is a huge fan of the 6809. After sending a few emails back and forth he expressed an interest in building his own MAXI09 and, sure enough, he is now in possession of a running board. There's still a few things to get working, especially around the screen and keyboard, but his board is very nearly complete and he is able to interact with the system via its UART ports. Here's a picture of his MAXI09:

He goes by the handle ComputerDoc and you can read more about him here. He's very active in the retro computing scene and has built more then a few retro machines over the years.

Most of ComputerDoc's troubles with getting the board running were down to him using Windows for his development machine. I haven't ever tried using asxxxx under Windows, and while there's a build which works great, working with MAXI09 also involves being able to run the "flasher" program which, whilst it builds fine under CygWin, doesn't seem to run properly. So for now 'Doc is using a Linux box with his board. It would be great to get the system useable from a Windows environment, so at some point I will tackle that extra little project.

Work on MAXI09OS is continuing slowly.

I have been working on introducing a "clean" interface between user code (that is, utilities and such like) and the main system code. I did briefly consider using Software Interrupts, which is the most elegant and recommended solution, but the overhead of stacking and unstacking all registers on every syscall does not make this a high performance solution.

Instead I am using a simple vector table approach. Unlike in my previous "mini OS" made out of my old machine code monitor, the vector table is generated using a script. This script parses the symbol map which aslink produces and generates an array of symbol references. Thus each syscall is presented at a constant offset into the vector table (referred to as the subtable in the code) and a rebuild of the EEPROM which moves the actual routine address around will not break callers of the code - user code - which are calling through the vector instead of directly. This table resides at the very start of ROM, 0xc000 using the current memory map. Currently system variables, like the current task pointer, are also exported to user code but without the indirection of the vector table. This is a reasonable approach since the table of variables has changed very little during MAXI09OS's development. I might yet change this around and either hide some of these variables behind a syscall subroutine or vector them so they can be moved around if desired. Actually a third option exists: see if its possible to live without exporting them, since most of them are fairly core to the system and do not hold things useful to "end user" code.

Because some globals need to be global across the EEPROM image (ie. the MAXI09OS) but not exported to user code, a convention has been introduced: if the subroutine starts with an underscore it will not be added to the subtable at build time. This is used, for example, by startup code to locate the address of the idler task code, which needs to be a global because it resides in a different file to the startup code. The nice thing is that the generation of these tables is completely automatic; they are produced by small perl scripts which are invoked as part of the build process by make.

To facilitate the vector table being at the start of ROM I had to make an improvement to the 256 byte boot loader I've previously written about. Before the loader always jumped to 0xc000 after either reprogramming the EEPROM, or not; the normal boot case. This was a simple approach but it meant it was not possible to chain-boot ROM images which had a regular reset vector stored in ROM at 0xfffe and 0xffff. The loader will now chain the real ROM by jumping through this vector. It achieves this by always copying itself into RAM when it starts. This is necessary because the loader program also resides at the top of the address space (it has to since the 6809 needs to boot it at start-up). Previously this copy to RAM operation - and running in - was only done when rewriting the EEPROM. It's now performed on each start up since the loader always has to be able to access all of the real ROM including the top page where the reset vector is found. This approach for chaining up the OS feels nicer and is also more flexible, at the expense of a few milliseconds start up time.

The system routine table has been proven out by implementing some test "user programs", which are executed by the shell. At present the current directory of the shell task is searched, but obviously this is a temporary hack: in the future external commands will reside in a specifically named directory. Obviously this code path is only entered if the command name entered is not one of the built-ins (cd, type, list, etc).

The rough sequence of steps to run an external command is as follows:
  1. Try to open a file in the current directory with the name obtained from the shell input stream, eg "test.bin". If this fails, bail with "file not found".
  2. If the file opened is not actually a plain file (maybe it's a directory) then bail with "not a file".
  3. Read the file into RAM (readfile sub in drivers/fs/minix.asm):
    1. Obtain the open file's length.
    2. Allocate that many bytes.
    3. Read from the file handle, that many bytes, into the new memory block, using the device agnostic getchars in lib/io.asm.
  4. Jump to the start of the allocated memory block as a subroutine.
  5. On return, free the menory allocated by readfile.
On return from the subroutine ie. the user program, the shell could do something with the register state, like treat the contents of the a register as an error value, or something similar. Currently it just loops back to the top of the shell input loop. Also, on entry to the subroutine the registers are not setup in any particular way; it would be nice if the x register contained the calling tasks IO channel, and y contained the rest (sans command name) of the shell's command buffer. That way the command could process the rest of the command-line and extract arguments. However, none of the (three) external commands currently take arguments so that is not yet needed.

The following screenshot shows the shell being used to run the three external commands:
This screenshot also shows the operation of a new driver for the SPI "controller" within DISCo. It is still bit-banged, with DISCo acting as GPIO pins to the various SPI peripheral ICs. One small improvement I've implemented in DISCo's VHDL is that the SPI select pins are now configured via an address register, instead of individual pins being controlled by register bits written by 6809 code. This means the details of how a peripheral IC is selected is abstracted away in the VHDL SPI address decode logic. The address arrangement is as follows:

SPICLOCK        .equ 2
SPIEEPROM       .equ 3

All other values mean no peripheral IC is selected, with 0xff being the nominal address to use for this state. The peripheral IC address is set in the a register when the SPI device is opened via sysopen. Unlike other drivers with units (like the Quad UART) it is only possible to open a single instance of the SPI driver at a time. This is because the SPI data effectively travels on a shared bus. While SPI peripheral ICs might remember their state when deselected (allowing flipping between devices during multi-byte reads and writes, which could happen if two tasks open different SPI peripheral ICs) I'm not sure if this is the case or not and, for now at least, this restriction simplifies things. The SPI device works well, though regular readers will know that I fully intended to implement an "intelligent" SPI host controller within DISCo at some point.

I have also been working on the debug monitor.

Previously the monitor was a task that sat idle until wanted, at which point it disabled task switching and entered it's interactive mode until it was exited. This worked well enough, but it was tied to a particular IO device (ie. a UART port or a virtual console). This made it inflexible.

Now the monitor is entered, with task switching being disabled, by sending the break signal, if using a UART port. Or by hitting a magic key combination, using using a virtual console. This is much more useful as the monitor can be entered regardless of the IO device used.

The implementation of this mechanism is kind of interesting. The OS subroutine sysread has been extended to report error values. Obviously not all device will return these error states. Errors are indicated by setting Not Zero (ie. Zero means no error). The a register will contain one from a list of possible errors:

IO_ERR_OK       .equ 0                  ; io operation completed ok
IO_ERR_WAIT     .equ 1                  ; task should wait, no data yet
IO_ERR_EOF      .equ 2                  ; end of file reached
IO_ERR_BREAK    .equ 0xff               ; get a break signal

(Prior to this change, sysread returned Not Zero from the UART driver if no data was available indicating the caller should wait. This has now been extended with these new error states.)

Inside the UART driver, the break condition is detected by examining the Line Status Register. If a break is indicated, then the sysread action returns with the appropriate error. Similarly inside the console driver, IO_ERR_BREAK is set when the keyboard translation routine detects that the break combination has been pressed. Break is represented by a virtual ASCII sequence which is on the Help key when shifted. Thus pressing this key combination results in ASC_BREAK being entered into the buffer, which is then turned into the IO_ERR_BREAK condiion within the console driver's sysread subroutine.

This system is not yet flawless. Phantom characters appear to being entered into the buffer when the break sequence is sent of the UART from the terminal program (minicom). Thus the input buffer needs to be cleared with a dummy Return key press before entering a monitor command. It's not clear if this is a coding issue, or if the break sequence itself is generating an additional real character, which is swallowed into the buffer. It's not a big problem, just slightly annoying. Here's a screenshot of the monitor being entered, two commands being run, and then exited:
The "BREAK!!!" message is printed as a reply to the break signal, inside the generic IO error handler in lib/io.asm. This routine deals with generating wait calls if a IO_ERR_WAIT is indicated, as well as dealing with break. This error handler in turn is called by the getchar routine, which is the generic wrapper for sysread. getchar is in turn used by all IO routines in the IO library, including getstr, which is the main routine for getting an input string from the IO device. It's thus possible to bypass the break detection, and implement it directly (or ignore it), by calling sysread directly and dealing with errors. Most user code will not use sysread directly however.

I have also added a debugmemory command to help find problems with the memory allocator. This spits out the list of memory blocks in the system; the addresses of each block as well as the next pointer, length and free flag for each block. This was done for a very practical reason: at one point in the implementation of external commands, my memory allocator had a corruption problem.

As is the case with most bugs, the issue is obvious in hindsight.

As described above, the external command run routine reads the entire file (of raw machine code) into a newly allocated memory block, sized exactly as big as the file. The problem was the file did not include the buffers used for input data. For example, gettime.asm requires a buffer to hold the SPI bytes received from the DS1305 (PDF) Real Time Clock. But this data is not included in the resultant gettime.bin file, since the memory buffer used to hold the SPI data is only "reserved" using the .rmb assembly instruction, which only creates labels unless there is trailing data - in which case real byes are used up inside the compiled machine-code file. The solution, albeit temporary, is to add a literal 0 byte value to the end of the gettime.bin file, via the .byte assembly pseudo instruction. This causes the file to contain space for the SPI data buffer.

The proper solution to this problem is introduce a header to the .bin external command files. This header would contain things like what additional space is needed to hold these unset buffers. This is the purpose of the .bss segment found in proper executable and object files. This is data which ha no initial value (usually zero'd out), but none the less needs to exist within a running processes address space.

It is not immediately clear what I will be working on next. In any case, real life has gotten in the way; my first priority at the moment is moving house. Once that is done I can turn my attention back to MAXI09. For a change of pace, I am thinking of working on a joystick. I want to build a nice digital joystick out of an arcade stick and some buttons I bought from eBay years ago. As a side project I want to make this joystick work with modern Macs/PCs/Linux using a USB adapter. I also want to see if I can get the 9 pin dot-matrix printer working. I bought that more then 2 years ago and it has so far sat gathering dust.

After that I want to work on a game. It will run within the OS, so should be a nice demonstration of the multitasking capabilities of the system. It is a lofty goal: after proving the ideas by porting my previously written Snake game to MAXI09OS, I am thinking about tackling a much more sophisticated arcade game...

Saturday, 22 April 2017

Code reorganisation, debug monitor, IDE, MinixFS

I have restructured the source code tree for MAXI09OS. Source files are now grouped in directories, with a directory for drivers, another one for the core of the OS, etc. The assembly source files are no longer stuck in a single directory. The makefiles are, by necessity, a little more complex as a result. After toying with using make in a recursive (PDF) fashion, I've instead settled on using include files which enumerate the build files.

One other improvement is that the source for the 256 byte boot loader, described previously, has been folded into the main MAXI09OS repo, along with the flasher tool for transferring new ROM images to the MAXI09 board.

All in all, it's a neater setup even if it is a little over engineered. I learned a bit more about make writing it as well. There's a few further improvements I could make if I wanted to, like a better header dependancy system then simply having a full rebuild whenever one of the headers is changed, but since a full rebuild takes only seconds it's probably pointless.

Back in the OS code itself, I have been improving my monitor by folding it into the OS as a kind of debug task. Commands are now words instead of a single letter. So "dump" instead of "d". Much nicer. The "command table" is handled through a generic subroutine, so other environments, like the Shell, can use the same mechanism.

While in use, the monitor task is the only one which will be scheduled. It is entered by hitting return on whatever IO device it is running on, which is usually a UART port. Interrupts are still enabled, but a flag variable is incremented that causes the scheduler, when it runs under the main ticker interrupt, to always return causing the same task to run each time. In this state the monitor will see a mostly static system. This is the same mechanism (forbid()/permit()) used when a task needs to be the only task running in the system.

I've added a command that shows memory stats, and the tasks in the system, or just the information about a specific task as this screenshot shows:
The "Largest" value from the memory command warrants some elaboration. This is the size of the largest free memory block. Since the simple linked list approach to the memory management lacks a mechanism to coalesce free blocks, it is very prone to memory fragmentation. The Largest value is the size of the biggest free memory block, which might well be considerably less then the total free memory. Actually after coalescing adjacent free blocks, free memory could still be fragmented.

I've also been working on making the monitor useful for exercising device drivers directly, without the need to write a specific test program.

With the sysopen command, it is possible to set the A register (which is usually the unit number) as well as the B register, which is used to set the baud rate in the UART driver but is otherwise not specifically assigned to a particular purpose.

The main motivation for doing this was to make it easier to write a driver for the IDE interface.

The IDE driver is for sector level access to the attached disk; the filesystem layer, described later, sits on top of it.

The same driver mechanism, and subroutines (sysopen, sysread, etc) are used for the IDE driver, except that in the case of sysread additional registers are used since the read is a sector read and not a byte read.

The following registers are used, in both sysread and syswrite:
  • X: the device handle, as usual
  • Y: the memory address to write to (sysread) or read from (syswrite)
  • U: the sector number (LBA) to start from
  • A: the count of sectors to read or write
Currently no interrupts are used so the IO operations busy-wait until the device is ready to send (or receive) the data. There are obstacles in MAXI09OS to doing this which I'll write about later. In reality this would only really matter if MAXI09 was ever attached to a very slow, old, hard disk. Whilst a CompactFlash is used the interrupt would fire, most likely, only a few iterations into the polling loop. Such is the speed of the MPU in MAXI09 retaliative to what a CF would more usually be attached too. All that said, getting interrupts working with the IDE interface would be nice.

I'm also using syscontrol for a few things (more will come later):
  • IDECTRL_IDENTIFY - 0: perform an identify command on the drive and fill out the 512 byte sector of information referenced by the Y register
  • IDECTRL_READ_MBR - 1: read sector 0 into device handle memory and copy partition information into a system table
The partition table, which is a section of 4 lots of 16 bytes within the MBR sector, contains start and length sector information about each partition, as well as other non pertinent data. The syscontrol action reads this in and uses it as sector offsets when doing IO on a partition. Currently no info about the partition table is returned with the syscontrol call. I will probably change this at some point so the user could view the table etc.

Inside the sysopen call partitions map nicely to units, with unit 0 being used to access the entire disk. The partition table information is used to calculate an offset for each partition / unit. At present, the lengths of each partition is not enforced and accesses for the first partition could overlap into the second, etc. This would be trivial to honour I'd I wanted to write some extra code.

This screenshot shows the monitor being used to open the IDE device and issue the IDENTITY command. The first 128 bytes of the resultant sector are dumped out, showing the vendor and model:
Here's a look at the monitor being used to:
  1. Open the entire disk
  2. Read the MBR
  3. Close the entire disk
  4. Open partition 1
  5. Read in the Minix superblock (2 sectors) into memory location 0x4000
  6. Dump out the first 128 bytes of the superblock
(The way I worked out that this was a MinixFS superblock was to spot the 0x8f13 sequence at offset 0x10. This is the magic number sequence for a Minix filesystem with 30 character file names, byte swapped since the format on disk is little-endian.)

After implementing the low level IDE layer (and MBR reading) the next task was to write routines for dealing with the filesystem layer.

For anyone interested in the guts of this ancient filesystem I suggest you read this overview, as well as this look at some of the data structures. Needless to say, MinixFS version 1 and 2 are about the simplest form of UNIX filesystem, all the more interesting (for me and this project) because the important structure fields are 16 bits wide.

The functionality nicely splits in two modules:


This wraps a mounted Minix filesystem handle.

It contains code to parse the superblock structure (including lots of little to big endian swaps), and a routine to read a arbitrary 1KB FS block which calls to the underlying device ie. the IDE layer to do the reading at the calculated physical offset. This routine uses a block offset calculated when the filesystem is mounted (of course, the underlying IDE layer will apply its own partition-based offset) The filesystem-layer offset calculation uses fields from the superblock, which includes fields which indicate the number of blocks used for the inode and data zone bitmaps.

There's also a routine to read in an arbitrary inode. This uses a simple cache; the last disk block of inodes read in. If the requested inode has already been read in then there won't be any disk access. 

An interesting aspect of this module is that it is possible to have multiple mounted file systems in the system at the same time. However to keep things simple the init task is responsible for mounting a system-wide root filesystem.

Also since an open "low level" device handle is the thing which is mounted, in theory the code written supports the adding of other block devices sitting under the filesystem code, say a SCSI disk attached to a controller.

Those good things said, I have not attempted to implement a VFS-type layer in MAXI09OS. Only Minix FS volumes can be mounted and the subroutine is called "mountminix". I did toy with writing more abstractions that would allow, hypothetically, the writing of a FAT16 layer without changing any user level code but in the end I concluded it would add yet more complexity and time and not really teach me anything useful.


This wraps an open "file", and presents the same entry points for an open file as the other drivers in the system, with the addition of a new sysseek call to move the file pointer around the file. It also contains other routines, for things like stat-ing a file by its filename.

The basic "thing which is opened" is a file referenced by its inode number. sysopen is called with X pointing to "minix", Y pointed at a mounted filesystem superblock handle obtained with mountminix, and U with the inode number. As usual the device/file handle is returned in X.

These handles are used for all types of "files". Opening one principally consists of locating and reading in its inode. This inode includes the "data zones" (block numbers) for the entry's data. For directories this data consists of an array of 32 byte directory entries AKA dirents. The structure of a dirent is trivial:
  1. 2 byte inode number
  2. 30 byte (max) null-padded filename
Thus to open a file by it's filename the first thing to do is to open the directory the file is in. We have to start somewhere, and what we start with is the root directory inode, which in Minix is always inode 1. (Inode 0 is reserved.)

The dirent array is iterated through, possibly by reading in additional data zone (filesystem blocks) if the directory contains more entries then would fit in a single filesystem block.

If a match of filename is found, the inode number for the file becomes known and its inode structure can be read from disk. The content - data zones - of the file can then be read in using the data zone pointers.

The following screenshot shows the monitor to be used to:
  1. Open the IDE disk.
  2. Read the partition table.
  3. Close the IDE disk.
  4. Open partition 1.
  5. Mount the MinixFS.
  6. Open inode 1, the root directory.
  7. Read in the first few dirents and dump them out.
As well as simple linear reading through a file it is also possible to seek to an arbitrary byte position, and continue reading from there.

One of the massive tasks I've not yet even started is writing to the filesystem. This is a pretty big job and opens up all sorts of interesting scenarios that need dealing with. For instance, how to deal with two tasks, one which is writing to a file, whilst another has it open for reading? Things get very challenging, very fast.

The last thing I have been working on is the command-line Shell. Currently it is very, very basic. You can perform the following operations:
  1. Change directory to a named directory (cd)
  2. List the current directory (list)
  3. List in long form the current directory (list -l)
  4. Output files by name (type)
These are internal commands, effectively subroutines in the ROM. But external commands should be perfectly possible too. Since I can now read from a filesystem, if the user enters the filename for a command that filename will be opened, read into RAM, and jumped too as a subroutine of the Shell process.

The list functionality warrants a little discussion. In a simple listing, it is only necessary to iterate the dirents in a directory. The user cannot tell even wether the entries being listed are files or directories, since those attributes (the file "mode") are stored in the inode and not with the filenames. List in long form opens each file's inode and displays the type of the "file" (regular, directory, pipe, device node, etc), along with its files size, user and group etc. Thus list long, on MAXI09 as it is on a real UNIX/Linux, is more expensive then simply listing the names of the files.

I've yet to write a routine to output decimal numbers, so all the values for things like file size are still in hex.

The following screenshot shows the Shell being used to wander around the Minix filesystem, listing directories etc:
The Shell is coming along quite nicely. Of course, what's not shown here is that it is possible to have multiple Shells running at the same time, on the virtual consoles and on the UART ports. There's a few limitations not shown in the screenshot, like the fact that you can only change directory into a directory in the current directory, not to an arbitrary path.

So MAXI09OS is coming along quite nicely, though you still can't really "do" anything useful with it yet.

I think I'll take a break from working on the software side for a while now and switch to working on the hardware:
  • There's the SPI controller VHDL code in DISCo. I can then write a driver for it, and finally some utilities for setting and getting the date and time from the Real Time Clock.
  • The Citizen 120D+ printer I bought about a year ago has yet to even be powered on. I could have a go at writing a driver so I can print to it, much like how they can be outputted to the console. This might also prove that I need to implement command redirection.
  • I could have a go at finishing the building of an analogue joystick I started a year or so ago, and experimenting with that
  • The keyboard controller can generate a reset signal on a particular key combination, but I've not even experimented with getting that working yet
I've had MAXI09 up and running for more then a year now, and it continues to keep on giving with no end in sight...

Thursday, 19 January 2017

More progress.... Lots of different areas

Looking again at the VHDL for my interrupt router, I realised that adding a priority encoder was fairly simple:

Instead of being the masked and active interrupt lines, ROUTED is now the number (bit position plus one) of the active line. 0 is used as a special case: no interrupt lines, which are exposed by the mask, are active. This should never happen inside the ISR because when there are no active interrupts the MPU's interrupt lines should stay high, but it is catered for "just in case". The only downside with this method is that interrupt priorities are set in the VHDL. In other words the fact that a QUART interrupt overrides (and hides) a RTC interrupt can't be changed by MPU code.

This all means my main interrupt service routine is reduced to something approximating:

As each driver is opened, it will set up its interrupt handler routine pointer in the appropriate table, and set DISCo's interrupt mask bit to allow the incoming interrupt through to the MPU. For example, here is the relevant code in the UART driver:

This is much more efficient then before, when the main ISR had to manipulate mask bits, looping over a list of possible interrupt lines. The top level (IC level) interrupt processing is now nice and fast and most of the time spent in handling an interrupt is now used dealing with the actual interrupt-generating peripheral IC's registers and not "housekeeping".

I've also been working on the keyboard microcontroller code and have implemented, at last, key repeat and support for caps lock.

Key repeat was easier then I expected. Key repeat in any keyboard generally consists of two, separately configured delay values:
  • Typematic delay: the pause between a key being pressed and the first repeat press being generated.
  • Typematic rate: the pause between subsequent repeated key press events being generated.

Both values have reasonable default values set in the MCU code. I have also introduced a command code so the 6809 can change these values via the UART channel.

Because of wanting to keep all commands in a single byte, I have had to be a bit frugal. The command byte is arranged as follows:
  • 0b00000000 - red LED off
  • 0b00000001 - red LED on
  • 0b00000010 - green LED off
  • 0b00000011 - green LED on
  • 0b00000100 - blue LED off
  • 0b00000101 - blue LED on
  • 0b00000110 - re-initialize the controller state
  • 0b01xxxxxx - set typematic delay to xxxxxx
  • 0b10xxxxxx - set typematic rate to xxxxxx

The delay values are scaled such that 63 (decimal) is a suitable - slow enough - value. There is currently no acknowledging of the byte sent to the keyboard controller. I will look at doing that when I have figured out transmission interrupts in the main computer.

From the 6809 user-level code point of view, sending a command to the controller is accomplished via a syscontrol call on any open console. However the delay values impacts all consoles.  Eventually I will write a system utility to set these values when the system starts up, as well as interactively.

I had a couple of ideas for how to program in the operation of the caps lock key and its LED. Ideally I wanted the LED to be turned on and off by the 6809 sending a control byte to the MCU when the caps lock key was pressed. This would allow the caps lock LED to be used, in a crude way, to determine if the system was responsive since the LED would only toggle if the 6809 was running and able to pick up the keypresses, generating a command to turn the LED on or off in reply.

This would have required a byte to be sent in response to one received, in the console driver's sysread function. The problem with this is that task switching would have to be disabled while the byte was sent because another task might also be wanting to send a byte to the keyboard MCU, for the same or a different reason. This is a clear millisecond at 9600 baud and whilst I could increase the rate to sidestep this problem, I instead decided on a simpler approach.

The caps lock LED is handled entirely in the keyboard MCU code, which tracks wether caps lock is on or not. Communicating caps lock changes with the main 6809 MPU is done by sending a key down scancode byte (high bit clear) when caps lock turns on, and a key up sequence (high bit set) when caps lock is turned off. Cunningly, this mirrors what an ordinary shift key does. This means that the actual event of the caps lock key being released is not sent, but there seems to be no reason why the 6809 would ever need to know about this occurrence.

If anyone is interested, the code for my keyboard controller is available on github as normal. I'm pleased with how relatively simple the controller code has remained with these two bits of extra functionality.

One small "annoyance" that I noticed, after making the initial changes to scancode translation in the console driver: caps lock does not behave exactly like shift in "real" keyboards. Indeed, some early computer keyboards had two "lock" keys, a shift lock and a caps lock. If caps lock is on, numbers should still be available instead of punctuation. This has complicated the 6809 scancode translation a little. Instead of yet another translation table, a call to a toupper subroutine is made to convert a byte, if it's a lowercase letter, to uppercase. This occurs - only if caps lock is on - after the shift and control key checks have selected an alternative scancode translation table.

I've also been busy improving the core OS itself. Tasks now have a default IO device. IO routines - character based like sysread and the string based wrappers like getstr - have been wrapped *defio variants which pull out the default IO channel from a variable in RAM. For efficiency, this variable is updated from the copy held in the task structure each time a task is scheduled. It is also possible to set this IO device when a task is created.

With this work it is possible to use the same task code in multiple tasks. The "echo test" task, as I now call it, allocates the memory to hold the inputted string and uses the default IO channel to get and put the string. Multiple tasks all use the same code, without making a copy, and operate on two virtual consoles as well as a UART port. In other words this task code is reentrant. This method will eventually be used when the echo test task is replaced with something resembling a Shell and should be a significant memory saver.

One issue to address in most multitasking systems is how a task should exit. This is a surprisingly complex operation for any multitasking OS. I have borrowed some ideas from UNIX (massively simplified) and hold an exit code in the task's structure. The exiting child signals its parent which can, in turn, extract the task structure pointer (now fee memory) and exit code, returned in x and a respectfully.

I am still not sure if I'll use the above described mechanism for the running of all sub-tasks, since it is quite a bit of setup. After assuming for years AmigaDOS used this kind of mechanism in its CLI, I have since learned that instead external commands are simply read off disk and run as subroutines of the CLI task. The reason is obvious: efficiency. However, this simple technique makes pipelining commands more awkward, since the whole output has to be buffered for the next task (this probably explains why AmigaDOS lacked pipes). I will have to experiment when I get closer to having a useable Shell to determine which approach to use.

Finally, I've been working on improving the debug output which is presented, if enabled at assembly time, on UART port 2 - the TTL header. There are code, and text output, improvements.

Each debug message now has a "section type", for example messages related to the memory allocator are marked against DEBUG_MEMORY. At assembly time, it is possible to select t which debug messages should be generated for that particular build. So if I have a problem with the interrupt processing I can choose to only output those messages, etc. Debug messages were previously always included in the generated binary; now they are only included if debug mode is enabled.

The messages also look neater because each one is prefixed by a section label. To print out that task address with the task name, the following code is used:

debugxtaskname ^'Scheduling task: ',DEBUG_TASK

This required some new uses of macros in ASxxxx. The macro has to allocate space for the message to print in the generated code, but it can't go directly into the stream of code since otherwise the MPU would try to run the message as if it as code. Instead a new "area" was created, called DEBUGMSG. This is towards the end of ROM and contains all the debug messages. This particular message only shows if DEBUG_TASK is enabled for the the build.

Typical output from the debug serial line looks like the following:
All told, I'm pretty pleased with how my weird little OS is coming along. There's still lots of interesting things to work on:
  • I'm keen to try organising the code a little better. There are now about 30 files, and it would be nice to have a directory layout with drivers in their own directory etc.
  • More drivers: joystick, printer port - I have yet to try hooking my Citizen 120D+ printer up. The SPI bus needs an interface too.
  • Improvements to the console driver: I could look at some of the VT100 codes, and try to make the serial terminal more useable with them
  • Transmission interrupts on the UART. I still have yet to crack them.
  • Extend my, very preliminary, debug monitor. I have started "porting" my old monitor to the MAXI09OS so I can use it for debugging by dumping out things like task structures, the allocated memory blocks etc.
I  could also have a crack at extending the VHDL in MuDdy and DISCo:
  • The IDE port needs a high byte latch implemented in DISCo
  • DISCo also needs a proper SPI controller, instead of the basic SPI pin interface which is currently written
Or I could have a think about how I will implement a storage interface in the OS. I have a few ideas, but more thinking and planning is needed...

Saturday, 15 October 2016

Console driver and other OS progress

The last few month's free time has been spent knee deep in the Operating System for my 8 bit micro. The main area of attack has been on implementing a console (screen and keyboard) driver, but I've been working on a few other things as well.

Up until recently I have not tried linking the code that makes up the monitor, or MAXI09OS. Instead the assembler source code files are concatenated together via include statements. While this works, it means all labels and other symbols are global across the code. It's generally not nice structuring things this way, especially as the project grows to be quite large.

MAXI09OS is now assembled into object files and linked using aslink, a component of asxxxx. This means that only symbols that need to be global are marked as such, and other named symbols can be file-local. In all, it is a much tidier arrangement. There are still some further improvements I'd like to make to how MAXI09OS is built, especially around the idea of producing header (include) files which the "user" programs will need. I also want to structure the source code tree so that drivers have their own subdirectory etc.

I have made a fair amount of progress on the console driver. It's far from the VT102-emulated terminal I have in mind for the "finished" version, but it is useable.

The first thing I did was split out the low level hardware aspects of the UART driver into a new source file. The reason for doing this is because the console driver shares much of the UART code for interfacing with the keyboard controller. Once this was done the UART driver itself was modified to use the low level functions and re-tested. As a consequence of the keyboard controller using the slower 9600 baud rate, the low level UART code also contains a routine for setting the rate. This is also useable by the higher level UART driver's open routine through the b register.

The first thing to get working, within the console driver, was keyboard input. This operates in a similar manner to the UART driver's reading operation except that within the interrupt handler the scan codes, obtained from the keyboard MCU, are fed through a translation routine to turn them into ASCII. This routine also deals with shift key and control key handling. As before, no attempt is currently made to deal with key repeat. The resultant ASCII characters are fed into the user side circular buffer, to be picked up by the sysread subroutine when the task is woken up by a signal.

One additional complication revolves around "virtual" console support. Like with UART ports, each console is a "unit". But obviously the input side (keyboard) and output side (screen) are shared. Since they require very little MPU resources, and only consume the relatively bountiful video memory in the V9958 VDC, there are six virtual consoles. In the keyboard interrupt handler the function keys F1 to F6 are checked and if one is pushed the appropriate console is displayed by writing to the "pattern name" base register. Video memory is arranged as follows:
  • 0x0000 - patterns AKA fonts
  • 0x8000 - console unit 0
  • 0x9000 - console unit 1
This continues up to console unit 5. Note a few details: I am only using the lowest 64KB of video RAM out of a possible 192KB, and also that although the consoles only consume 24 rows of 80 columns (1960) bytes, it is not possible to pack the consoles closer together because of limitations in the pattern name base register. Anyway, this all means that console switching is instantaneous, something I'm quite pleased with.

Actually getting the console to display something was accomplished by "porting" the previously written V9958 monitor VDC code into the console's write function. The code has been tidied up quite a lot but still only supports the very basic control codes: carriage return, new line, backspace, tab, and formfeed - which, per the norm, clears the console. I have also implemented vertical scrolling.

One quite serious restriction exists. Since the VDC manages its video memory via address pointer registers it is necessary to prevent any other task from manipulating the VDC's registers. The first, brute force, approach to this was to disable interrupts while the VDC is being used to stop other tasks from being scheduled. This certainly worked but it prevents all interrupts from being serviced, even if that interrupt had nothing to do with a task which might be using the VDC.

The solution I'm currently using is to leave interrupts enabled, but disable task switching. This is accomplished by having the scheduler return into the current task if task switching is disabled.
This solution remains suboptimal; there might be tasks waiting to run which are not using the VDC; they are needlessly blocked from running. The better way is to use a mutex to pause tasks that want access to the VDC once one task has obtained access to it. Other tasks would not be affected. I might try implementing some mutex routines, though I am concerned about them being efficient enough.

Putting this console driver with the work described previously, I now have a test setup for my little OS which includes the following test tasks, each running on their own virtual console:
  1. Firstly the timer-based task as previously described. It opens the console and a timer, starting a repeating second based timer, until the space bar is pressed. At which point a 5 second non repeating timer starts. It also shows the hex values for what is read from the console driver, ie. the ASCII values for key down events.
  2. Two copies of the "enter a string, print the string" looping task.
  3. A serial terminal. This ties together a virtual console and a UART port. Both are opened and then a loop entered. In the loop both devices are waited on. The device that caused the task to wake is read, and the opposite task is written to. UART port 1 is used here.
  4. I also have a task which performs the "enter a string, print the string" operation on UART port 0.
In general, things are looking pretty good. The timer task updates its console while another tasks's console is displayed, and the UART port presents its task simultaneously.

When it comes to the serial terminal task, there are a few issues, beyond the fact that the serial terminal is only honouring the very simplest control codes.

The big issue is with scrolling. The V9958 is a great video controller but it has no special support hardware for scrolling the screen in text mode. Instead the video RAM data has to be read into MPU RAM and then written out again, but shifted up a line. For most uses of the console this is ok; the task just has some extra work to do to scroll the display. But for the serial terminal task this extra work needs to be done before the serial ports FIFO fills up.

This problem is, perhaps, unfixable at least without hardware assistance. The problem lies in the time it takes the MPU to do the line scroll vs the time taken for the UART to receive a line of text. At 9600 baud a full line of text will take roughly 1 / (9600 / 10) by 80 characters which is 83ms. I have cycle counted the MPU scroll code and it takes - ignoring setup code - 43ms. This means everything is fine if the lines are long, but if a (say) 8 char line is received there is only 8.3ms to scroll the screen, which isn't long enough. Even with the UART FIFO and MPU circular buffer, data loss will eventually occur as "short" text lines are received.

The "solution" I have come up with is to scroll the screen in half-screen chunks instead of every line. This reduces both the frequency of scrolling that's required, and it also reduces the work required to do the scrolling since more time is spent filling the video memory with zeros instead of writing out previously read in data. The result of this is not very pretty - using the console feels a bit strange - but it mostly cures the data loss problem. Only mostly though; on very short lines - the yes command is a nice way to show the problem - occasionally characters are still lost.

There is, perhaps, a better solution involving the usage of the UARTs hardware flow control lines. Hardware flow control can be used to pause data transmission when the receiver is not ready to receive it. In theory this could be used to pause the transfer while the MPU is scrolling the display. Unfortunately despite playing with this for a few hours I cannot get it to work.

So far I'm very happy with my OS's progress. The scrolling problem is annoying but I'm going to move on - serial terminal support was always going to be used as a way to test the driver model instead of being anything actually "practical". The serial terminal is a nice way to improve, and learn about, terminal emulation though, when I get around to that.

There's a few different things I could work on next:
  • The previously mentioned VT102 improvements.
  • Some ability to measure how busy the system is would be nice. Currently I use the idle LED to get a rough idea for how idle the system is, but it would be nice to know which tasks are using the MPU the most.
  • Some means for open device inheritance when a task creates another task. Currently each task is responsible for opening a UART port, or a console. It would be better if the parent task did it as this would mean the same task code could be used regardless of the IO device. It could then even be possible to have multiple instances of the same task's code running multiple times if a task either used only the stack or kept its variables in allocated memory, with the pointer to that memory stored in the stack (or perhaps the u register)
  • Another massive chunk of the Operating System revolves around proving disk access to tasks. I have thus far given this problem very little thought.
I feel what I have now is at a nice point to go back and fill in some smallish missing parts before I start thinking about the disk routines, which will be a large chunk of work. Some smaller areas to tackle include:
  • Tidying up the source tree into different subsystem directories.
  • Improve the keyboard behaviour. Key repeat is something I've put off doing for months. And then there's support for the caps lock key.
  • Reworking the debug console to make the output cleaner and easier to read.
  • UART transmit interrupts.
  • Improving interrupt processing speed. This also involves some work on DISCo's VHDL. I have a few ideas for how to simplify, and thus speed up, interrupt processing.
  • More device drivers, for example a joystick driver would be interesting to do. It would need polled and event (changes) modes.
I've also made a short video showing the test tasks, including the serial terminal:

So many ideas, but so little time...

Friday, 17 June 2016

Prototyping parts of an OS for MAXI09

This past few months has been spent thinking about, and prototyping, ideas for the Operating System I want to write for the computer. But before talking about that, a summary of the other little pieces of hardware progress. Anyone who has viewed the video will be familiar with most of this.

First up SPI. I have implemented a trivial SPI controller for DISCo. It is really nothing more then a MPU interface to the SPI pins: SCLK, MISO and MOSI, as well as the Chip Selects SS0 through SS3. Bit shifting is done in the MPU so it is far from a proper controller, but it is adequate to verify that the SPI peripherals have been wired up correctly. So far I have verified the function of the DS1305 (PDF) Real Time Clock, and the CAT25256 (PDF) 32KByte EEPROM. When I have less interesting things to do I will implement a proper SPI controller in VHDL, which will move the bit shifting from MPU code to logic in DISCo, which will speed up SPI data transfers immensely.

The behaviour of the OPL2 sound interface has also been more thoroughly tested then before. I did this by writing a 6809 routine to read in and play back DOSBox DRO format files, which had previously been written out by my laptop. I was surprised by the quality of the playback from the OPL2 on the MAXI09 board, it is indistinguishable from the DOSBox playback, but when recording it for YouTube the sound capture was poor, so I thought I would try again to do it justice. The following recording was made using a USB microphone:

So, to Operating Systems. Here is a brief summary of what I think should be achievable on my MAXI09 micro:

User's point of view
  • Multiple, at least 4, VDC+keyboard text consoles switchable via function keys
  • Serial console active concurrently
  • "DOS" like command shell: List directory, copy file, type file, delete file, create directory, run external command, etc
  • Date command; set and get time from the RTC
  • Possibly very simple batch files
  • If I really went crazy, I could think about writing an editor and assembler
  • The ability to act as a serial terminal for a Linux box, with basic VT102 terminal emulation
  • Fall back machine code monitor, entered via special key combination, based on already written one but with improvements such as command names instead of letters
Low level
  • Pre-emptive multitasking with at least 8 tasks
  • Signal based events and messages
  • No direct hardware access for tasks, with the exception of the VDC (necessary for performance reasons)
  • Byte-wise driver abstraction layer for at least: VDC text mode, keyboard, UARTs, parallel port, SPI, joysticks
  • Block abstraction layer for IDE
  • "Special" drivers for OPL2
  • Timer driver for repeat and non repeat delays, with an interface into the RTC via SPI abstraction
  • Polled or interrupt/signal IO
  • MinixFS support, read and write; likely whole files not proper byte-streams
  • Dynamic memory allocation; used by the system and user tasks
  • Runtime loadable modules for drivers and libraries
This feels very ambitious, but so was the MAXI09 hardware. I have no intention of writing an "8 bit UNIX"; others have already done this. Some elements, eg. the multiple console idea, are borrowed from Linux. But the bigger influence is my usage of AmigaOS with its light-weight multitasking.

I have a few bits of hardware in the MAXI09 to help with things which, on the surface, seem complex. The V9958 should make multiple consoles fairly trivial to do because the screen text for all the consoles can be held in video RAM at all times. Switching consoles should largely be a matter of writing to a couple of registers.

Lots of questions remain. One of the interesting ones is what should the interface to the Operating System calls look like? The "pure" approach is to use software interrupts. This is the preferred way because the service routine is isolated from the caller with a new stack frame, with hardware interrupts from within the service routine automatically disabled. It is also even more robust in the face of future change then a jump table because the number of the service routine to execute can be expressed as a simple register value. None the less they have a significant performance penalty compared to a simple jump table. For now I have not dealt with this problem; system calls are just like any other subroutine call.

Another big question mark hangs over my ideas for using MuDdy to implement an MMU. This will be useful for isolating tasks and should enable a single task to access the full 64KByte memory space. But for now I will ignore this idea.

I have begun by prototyping up some of the facilities which the OS (unimaginatively named MAXI09OS) will need: a dynamic memory system, linked lists, task switching, and the driver model. I've also written some test tasks to help me verify that everything is working.

Dynamic memory management is required principally because the number and size of the tasks, and other objects, in the system is a variable quantity. Tasks themselves will need to allocate memory as they need it and the memory required to hold the code for the task itself will need to be allocated, by the OS, from the free pool. Without a multitasking environment most 8 bit systems get by without a dynamic memory manager.

My prototype is simple. A memory block has the following form:
  1. Pointer to the next block, or 0 for the end of the list
  2. The size of the block (including this header)
  3. A byte for flags, which is currently used to indicate wether the block is free or not
  4. The data block bytes themselves
In this trivial design these blocks are arranged across the dynamic memory area, know as the heap, which is initialised as a single block marked as being free. Allocation of a portion of memory is achieved by scanning the list of memory blocks until a large enough free one is found. If it is, then the block is split in two: the first half is the claimed space, and the second half is the remainder of the original, free, block. The next pointers and the block sizes need to be adjust appropriately to keep the list of blocks intact.

As an illustration of how this works, here is what the heap looks like after a sequence of operations are performed. The size is of the heap is 100 bytes.

For ease of illustration all values are in decimal and the heap begins at location 0. First the freshly initialised heap:

Next 10 bytes is allocated:

Only 5 bytes are useable. The 10 byte block is freed, and a 50 byte block is allocated:

Note the fragmentation, and the fact that the 50 byte block does not use any space from the freed up 10 byte block. This can be cured by coalescing adjacent free blocks, something I will get round to doing soon.

Anyway despite, or perhaps because of, its small code size - only about 80 lines of 6809 assembly - my dynamic memory allocator seems to work quite well. To actually get this implemented I first wrote the routines, then constructed test routines around them using the existing monitor program.

Next, linked lists. The requirements for the structure of memory areas in the heap are limited; the list will only ever need to be scanned one way. Therefore singly linked lists are appropriate: there is no benefit in maintaining links backwards through the chain. But for other OS “objects” this is not good enough - backwards scanning will be needed, so doubly linked lists would be beneficial.

Being familiar with AmigaOS I have decided to implement the same optimisation as used in AmigaOS’s Exec linked list: a hidden node in the the list header. This avoids some special casing on list operations and generally makes for a more efficient operation then a traditional list which uses nulls in the header to indicate an empty list.

The following list operations are available:
  • Initialise the list (since it is not possible to simply fill the header with nulls) (initlist)
  • Append a node to the tail (addtail)
  • Prepend a node to the head (addhead)
  • Remove the node from the tail, returning it in a register (remtail)
  • Remove the node from the head, returning it in a register (redhead)
  • Remove a node from anywhere in the list (remove)
The lists are similar to “minimal lists” in AmigaOS terms. There is not, yet, a priority field and thus no means to insert a list at it’s priority position. Nor is it possible to insert a node after a given node. None the less, the operations available should be sufficient for my little OS. For anyone interested in this code, you can see it here.

I have also made a start on a device abstraction layer. In the case of the UART driver this includes interrupts for reception.

A note on the terminology I'm trying to stick to. A driver is a set of routines for dealing with a type of device. Device are open for a driver type, with perhaps additional parameters passed in for things like the port to open.

Abstraction layers appear even in very simple operating environments. They are useful because it allows user programs to be written without regard for how input and output gets into them. For now, at least, the abstraction will be of an IO device, and not of a file on disk. It’s not yet clear how I will implement file IO. Another advantage of a driver abstraction is they allow the same generic routines to be used regardless of the underlying hardware.

The starting point for this is a “open device handle”. This is an object handle which can be passed to a “write” system function to output a byte on a particular device, be it the screen or a particular UART channel. It's nature as a handle means that the external user task will not be concerned with what it contains.

Opening a device first requires a driver object. The structure representing a driver has the following properties:
  1. Pointer to the driver prepare routine
  2. Pointer to the device open subroutine
  3. A device name (up to 8 characters plus a null)
The driver prepare routine allows global initialisation of all possible devices of that type to occur. These routines are ran before the OS itself is fully running. This usually includes setting up of the device list, but it might also include setting up interrupt routing if that is better done prior to a device being opened.

Drivers are currently held in a table in ROM. A routine, sysopen, takes a device name (null terminated) in the x register, and any other applicable arguments in other registers. For instance, with the UART driver the channel number (0 for PORT A, etc) is held in the a register. Once sysopen has located the address of the particular driver's open routine, that routine is then run. This performs the necessary functions for the driver. For instance, in the UART case it ensures the channel is not already in use, allocates memory for the "open device" handle, and then initialises the hardware in the UART IC. Finally it configures interrupts in the interrupt routing register inside DISCo. There will be more about interrupts later.

Upon a successful call to the “open device” subroutine, a device handle is returned. This handle is driver dependant but contains at least:
  1. A list node
  2. A pointer back to the driver table entry with which this device is a type of
  3. A pointer to the device's close routine
  4. Same for the read routine
  5. And the write routine
  6. A pointer to the generic "control" routine
The reason the close, read, write and control routine pointers are held in the open device structure is speed; since write, for example, takes an argument - the device structure pointer - if the write routine pointer was in in the driver structure an additional indirection would be needed to locate it on each syswrite call. Nonetheless I may re-evaluate this decision.

The purpose of the control routine is to provide a mechanism to perform arbitrary operations on the the device which are not data reads or writes. For example, the timer driver uses this to start and stop a timer. The UART driver will use this, some day, for allowing the sending of a break signal, changing baud rates etc.

Open devices are held in a per driver-type list, the purpose of which is to facilitate interrupt processing.

In the case of the UART driver, a whole load of additional information is needed:
  1. RX circular buffer (32 bytes)
  2. TX circular buffer (32 bytes, currently unused)
  3. ISR and "user" counters for the two buffers
  4. A pointer to the task that opened the UART channel (this is used for task switching)
  5. The allocated signal mask (signals are explained later)
  6. The base address for the hardware IO port for this port
  7. The port number
There's also some global state which is not part of the open device structure. This includes things like wether a port is in use. In theory this is redundant because of the port being in the device record, and the device record being pushed into a list, but the list node was added relatively recently.

At the moment writes are not interrupt based, but reads are. In fact, on a relatively slow machine like the MAXI09, interrupt-driven writes for the serial port are not all that useful, since even with the action of the UART FIFO, the UART would need its write-based ISR servicing rather frequently, unless the baud rate was quite slow. Nonetheless I will implement interrupt-driven writes at some point. Interrupt based reads are, on the other hand, very useful, since the task handling the UART channel can be put to sleep whilst it waits for data, most likely from the remote end's keyboard. This allows the computer to run other tasks while it waits for UART activity, without the performance hit you'd see polling the port. This leads us on nicely to the next topic.

Multitasking. This is a massive area. Modern OSes on PC are huge beasties and, ignoring fluff like the User Interface, much of this complexity revolves around the multitasking environment presented to the user. But multitasking is possible on nearly anything that might be classed as a computer, from microcontrollers and up. I have decided to attempt to implement a fairly simple, but effective, form of multitasking; it is loosely based, once gain, on AmigaOS ideas, but obviously it is much simpler to work on the 8 bit 6809. Whilst it is preemptive - tasks and interrupt handlers can force other tasks to be scheduled - there is no priority value so all tasks, except the special case idle task, are treated the same. The first thing to do was figure out how task switching should actually work from a MPU standpoint.

On a simple machine like a 6809 without any way to implement memory protection, task switching can be achieved by setting up a periodic interrupt and having the Stack Pointer adjusted so that when the RTI (Return from Interrupt) is executed at the bottom of the interrupt handler, a different path of execution is taken then the one the system was on when the interrupt was entered. If all of the machines registers are also saved via the stack, then the system should behave just as if the task being switched into had never been paused while another task ran.

This setup requires some kind of record to hold basic information on each task, including the stack pointer at the time the task was last running, as well as space set aside for the task's stack. The memory allocation system, just described, is used to create space for these structures, which currently look like this:
  1. List node
  2. Saved Stack Pointer
  3. Initial Program Counter (this is only used to identify tasks when debugging, in lieu of a task name field)
Preceding this record is the task's stack, currently fixed at 100 bytes. Eventually tasks will get names, and possibly an ID number.

The location of the first byte in a task structure is used as a handle to a task. Creating a task consists of the following steps. Currently the only parameter to this subroutine is the initial Program Counter, ie. the start of the task's code.
  1. Allocate 100 + the size of the task record bytes
  2. Calculate what the SP will need to be at task start
  3. Write a value into the stack for the Condition Code register which has the E (entire register set) bit set
  4. Write the initial PC into the stack
  5. Write the initial PC and SP into the task record
The stack for a full register set, as generated by an active signal on the IRQ line, looks like this:

From the 6809 Programming Manual

Thus the SP for a task, prior to starting it with an RTI, points to the address of the first register stacked (Condition Code). Upon the RTI instruction the machine state, including the Program Counter will be pulled off and the task will be started. The MPU pulls off the entire machine state  because the E bit in the Condition Code register will be set.

The timer interrupt handler, which is where task scheduling usually occurs, is roughly described by the following steps:
  1. First the current task pointer is retrieved from the global used to hold it
  2. The Stack Pointer register is saved into the current task structure
  3. If the ready queue is empty then no tasks are in their runnable state, so schedule the idle task and proceed to step 5
  4. Otherwise, rotate the ready queue; the previous head ready task becomes the tail task
  5. In any case, update the current task pointer and pull off that tasks stack pointer
Now we can RTI to run that task, which might be the idle task, or the same task as when this ISR was entered, or a completely different task.

One obvious optimisation to this sequence is to skip the ready list (queue) if there is only a single task in the ready set - if that's the case we can just leave the ready list alone.

Tasks can be in one of two lists: the ready list or the waiting list. Currently the running task is also in the ready queue; this simplifies some of the logic. Tasks are normally in the ready queue unless they are blocked waiting for an event; a signal. Signals are generated either by another task, or by an interrupt handler. Signal semantics are loosely copied from AmigaOS signals:
  • One difference is that only 8 signals are available instead of 32
    • At a future time I will utililise one special bit for forced task termination
  • Tasks, and drivers, obtain a signal (actually a mask with a single bit set) via allocsignal
  • Waiting, via wait, takes a bit mask. If any signal bit becomes set the task wakes up. Also, if the signal is pending before wait is called then the bit is cleared and the tasks skips the wait. In any case the bits that woke up (or not) the task are returned
  • Signalling, via signal, takes a task structure pointer and a set of bits. There's also a variant of this routine, intsignal, for use by interrupt handlers
So far the only use of signals is so that tasks can sleep (block) while they wait for serial input. This is really nice as it means that tasks that are just sat there waiting for button presses don't chew up MPU time. A further usage of signals is so that tasks can be informed of events, such as tasks being told to exit - eg. by a keystroke like Ctrl-C. AmigaOS built its messaging system (pending messages attached to either an anonymous or named list) with signals being the main mechanism used to inform the recipient of a new message. This was used for everything from GUI events to a high level application automation mechanism - via Arexx. I'm not yet sure if I'll need a message system for MAXI09OS. It would be trivial to implement on top of the pieces I have; I'm just unsure if I will need such a system yet. In addition, a concern I have about this kind of system is that each message will require a memory allocation, which will invariably lead to problems with memory fragmentation.

Actual interrupt handling is an involved processes. I must say I am not especially happy with the sheer number of steps required to process, say, a UART interrupt. DISCo's interrupt routing makes things marginally more efficient but the very rough order of processing is still something like:
  1. Read the interrupt status register from the DISCo FPGA
  2. Loop through the claimed interrupt signals, in priority order
  3. Jump to the handler when a match is found
  4. In the device handler (UART in this case) loop through the active channel list
  5. From the device pointer, obtain the start address for the UART channel
  6. Get the UART port IRQ status (on the 6809 this can be done with: lda ISR16C654,x)
  7. Test it to check the event type
  8. For reads, jump to the UART read handler, passing the device handler and base address for the channel
  9. Get the line state to see if there are bytes in the FIFO (lda LSR16C654,x)
  10. Loop Testing for RX having data, writing it into the CPUs 32 byte circular data
  11. When we have filled the buffer, or there isn't any more exit that loop
  12. Schedule the task which owns the port by moving it to the top of the ready list
  13. Enter the scheduler, which will save the current tasks state (which probably isn't the task that owns the UART channel)
  14. Make the top of the ready list task the active one by setting the Stack Pointer up for it
  15. Finally we can RTI
In the case of the UART the FIFO helps a lot. Even with it I have still had to reduce the console baud rate to 19,200 for fear of loosing data. A viable alternative to lowering the data rate would be to introduce hardware flow control, but I'm not that keen on that. I'm sure further optimisations in interrupt processing are possible anyway.

I have implemented two additional drivers just to prove the model: one for the LED attached to DISCo, and one for timer events.

The LED driver is just pointless fun. Ordinary I don't want tasks to be banging on hardware directly. Eventually an MMU implementation should prevent it. To show how hardware would always be accessed through a driver layer I have written a driver for the simplest hardware on the board: an LED. Writing a 1 turns the LED on; you can probably guess what writing a 0 does. To show the utility of a driver even for this trivial hardware, only one task can open the LED at a time. And the task which owns the LED is currently the idle task, which strobes the LED as it runs so you can tell how busy (or idle) the system is.

The timer driver is more useful. The system itself uses a 40hz interrupt to schedule tasks, and this driver hangs off of it. Each instance of an open timer device allows for a single timer action which is configured via syscontrol. Timers can be stopped or started. When started, the interval and wether the timer is a one off or in repeat mode can be set via a simple command block passed via y.

The timer device interrupt handler is fairly straightforward. It has to scan the open timer devices, updating counts and dealing with repeating timers. If a timer has reached its interval value the task that owns the timer is signalled.

So far I have written two test tasks.

The first task is the more interesting of the two. I wrote it so I could test timers and waiting for multiple signals in one task.

The task starts by opening UART port A, and a timer. The timer is initially configured to repeat once a second. The main loop of the task waits on both device signal bits, indicating which of the two signals was received via the UART port. In the case of the UART, outstanding bytes are pulled down via sysread, until no more bytes are outstanding. To make things more interesting if a space character is read the timer is reconfigured, via syscontrol, to be a non repeating 5 seconds one.

This all works quite well, as the following screenshot tries to show:

The top of the screenshot shows the periodic (one second) timer expiring a couple of times. Then I typed h, e, l, l, o and finally space (20 hex), which turns the timer into a 5 second non repeating timer.

The second one is a version of one of the first test programs I wrote in 6809 assembly: a loop that prompts for a string, gets the string, and then outputs the string. This task makes use of two wrapper functions, getstr and putstr that wrap the device routines for byte-based IO. The getstr routine uses wait when sysread indicates there is no data available yet. It is not possible for these routines to read from multiple devices at the same time, or to do anything else at all whilst getting the string. The source code for these routines is available here.

Here's a not very interesting screenshot of this task running:

Of course this is all the more interesting because both tasks were running at the same time. I have even had two computers (the regular Linux box and the MacBook) attached at the same time, one for each task.

I must have a bug somewhere though as very occasionally I have seen system lock ups. To try to diagnose this, and earlier since corrected problems, I wrote a simple debug handler using the spare UART port C (the TTL level channel). So far no luck figuring out this particular problem, but I'm sure I will work it out soon.

All in all I am pleased with how my little OS is coming along. There's obviously a great more still to do, but I'm very happy that large pieces of the OS core are starting to take shape. The next thing I will start on, after fixing the lock up bug, is the VDC console driver...