Friday, 17 June 2016

Prototyping parts of an OS for MAXI09

This past few months has been spent thinking about, and prototyping, ideas for the Operating System I want to write for the computer. But before talking about that, a summary of the other little pieces of hardware progress. Anyone who has viewed the video will be familiar with most of this.

First up SPI. I have implemented a trivial SPI controller for DISCo. It is really nothing more then a MPU interface to the SPI pins: SCLK, MISO and MOSI, as well as the Chip Selects SS0 through SS3. Bit shifting is done in the MPU so it is far from a proper controller, but it is adequate to verify that the SPI peripherals have been wired up correctly. So far I have verified the function of the DS1305 (PDF) Real Time Clock, and the CAT25256 (PDF) 32KByte EEPROM. When I have less interesting things to do I will implement a proper SPI controller in VHDL, which will move the bit shifting from MPU code to logic in DISCo, which will speed up SPI data transfers immensely.

The behaviour of the OPL2 sound interface has also been more thoroughly tested then before. I did this by writing a 6809 routine to read in and play back DOSBox DRO format files, which had previously been written out by my laptop. I was surprised by the quality of the playback from the OPL2 on the MAXI09 board, it is indistinguishable from the DOSBox playback, but when recording it for YouTube the sound capture was poor, so I thought I would try again to do it justice. The following recording was made using a USB microphone:

So, to Operating Systems. Here is a brief summary of what I think should be achievable on my MAXI09 micro:

User's point of view
  • Multiple, at least 4, VDC+keyboard text consoles switchable via function keys
  • Serial console active concurrently
  • "DOS" like command shell: List directory, copy file, type file, delete file, create directory, run external command, etc
  • Date command; set and get time from the RTC
  • Possibly very simple batch files
  • If I really went crazy, I could think about writing an editor and assembler
  • The ability to act as a serial terminal for a Linux box, with basic VT102 terminal emulation
  • Fall back machine code monitor, entered via special key combination, based on already written one but with improvements such as command names instead of letters
Low level
  • Pre-emptive multitasking with at least 8 tasks
  • Signal based events and messages
  • No direct hardware access for tasks, with the exception of the VDC (necessary for performance reasons)
  • Byte-wise driver abstraction layer for at least: VDC text mode, keyboard, UARTs, parallel port, SPI, joysticks
  • Block abstraction layer for IDE
  • "Special" drivers for OPL2
  • Timer driver for repeat and non repeat delays, with an interface into the RTC via SPI abstraction
  • Polled or interrupt/signal IO
  • MinixFS support, read and write; likely whole files not proper byte-streams
  • Dynamic memory allocation; used by the system and user tasks
  • Runtime loadable modules for drivers and libraries
This feels very ambitious, but so was the MAXI09 hardware. I have no intention of writing an "8 bit UNIX"; others have already done this. Some elements, eg. the multiple console idea, are borrowed from Linux. But the bigger influence is my usage of AmigaOS with its light-weight multitasking.

I have a few bits of hardware in the MAXI09 to help with things which, on the surface, seem complex. The V9958 should make multiple consoles fairly trivial to do because the screen text for all the consoles can be held in video RAM at all times. Switching consoles should largely be a matter of writing to a couple of registers.

Lots of questions remain. One of the interesting ones is what should the interface to the Operating System calls look like? The "pure" approach is to use software interrupts. This is the preferred way because the service routine is isolated from the caller with a new stack frame, with hardware interrupts from within the service routine automatically disabled. It is also even more robust in the face of future change then a jump table because the number of the service routine to execute can be expressed as a simple register value. None the less they have a significant performance penalty compared to a simple jump table. For now I have not dealt with this problem; system calls are just like any other subroutine call.

Another big question mark hangs over my ideas for using MuDdy to implement an MMU. This will be useful for isolating tasks and should enable a single task to access the full 64KByte memory space. But for now I will ignore this idea.

I have begun by prototyping up some of the facilities which the OS (unimaginatively named MAXI09OS) will need: a dynamic memory system, linked lists, task switching, and the driver model. I've also written some test tasks to help me verify that everything is working.

Dynamic memory management is required principally because the number and size of the tasks, and other objects, in the system is a variable quantity. Tasks themselves will need to allocate memory as they need it and the memory required to hold the code for the task itself will need to be allocated, by the OS, from the free pool. Without a multitasking environment most 8 bit systems get by without a dynamic memory manager.

My prototype is simple. A memory block has the following form:
  1. Pointer to the next block, or 0 for the end of the list
  2. The size of the block (including this header)
  3. A byte for flags, which is currently used to indicate wether the block is free or not
  4. The data block bytes themselves
In this trivial design these blocks are arranged across the dynamic memory area, know as the heap, which is initialised as a single block marked as being free. Allocation of a portion of memory is achieved by scanning the list of memory blocks until a large enough free one is found. If it is, then the block is split in two: the first half is the claimed space, and the second half is the remainder of the original, free, block. The next pointers and the block sizes need to be adjust appropriately to keep the list of blocks intact.

As an illustration of how this works, here is what the heap looks like after a sequence of operations are performed. The size is of the heap is 100 bytes.

For ease of illustration all values are in decimal and the heap begins at location 0. First the freshly initialised heap:

Next 10 bytes is allocated:

Only 5 bytes are useable. The 10 byte block is freed, and a 50 byte block is allocated:

Note the fragmentation, and the fact that the 50 byte block does not use any space from the freed up 10 byte block. This can be cured by coalescing adjacent free blocks, something I will get round to doing soon.

Anyway despite, or perhaps because of, its small code size - only about 80 lines of 6809 assembly - my dynamic memory allocator seems to work quite well. To actually get this implemented I first wrote the routines, then constructed test routines around them using the existing monitor program.

Next, linked lists. The requirements for the structure of memory areas in the heap are limited; the list will only ever need to be scanned one way. Therefore singly linked lists are appropriate: there is no benefit in maintaining links backwards through the chain. But for other OS “objects” this is not good enough - backwards scanning will be needed, so doubly linked lists would be beneficial.

Being familiar with AmigaOS I have decided to implement the same optimisation as used in AmigaOS’s Exec linked list: a hidden node in the the list header. This avoids some special casing on list operations and generally makes for a more efficient operation then a traditional list which uses nulls in the header to indicate an empty list.

The following list operations are available:
  • Initialise the list (since it is not possible to simply fill the header with nulls) (initlist)
  • Append a node to the tail (addtail)
  • Prepend a node to the head (addhead)
  • Remove the node from the tail, returning it in a register (remtail)
  • Remove the node from the head, returning it in a register (redhead)
  • Remove a node from anywhere in the list (remove)
The lists are similar to “minimal lists” in AmigaOS terms. There is not, yet, a priority field and thus no means to insert a list at it’s priority position. Nor is it possible to insert a node after a given node. None the less, the operations available should be sufficient for my little OS. For anyone interested in this code, you can see it here.

I have also made a start on a device abstraction layer. In the case of the UART driver this includes interrupts for reception.

A note on the terminology I'm trying to stick to. A driver is a set of routines for dealing with a type of device. Device are open for a driver type, with perhaps additional parameters passed in for things like the port to open.

Abstraction layers appear even in very simple operating environments. They are useful because it allows user programs to be written without regard for how input and output gets into them. For now, at least, the abstraction will be of an IO device, and not of a file on disk. It’s not yet clear how I will implement file IO. Another advantage of a driver abstraction is they allow the same generic routines to be used regardless of the underlying hardware.

The starting point for this is a “open device handle”. This is an object handle which can be passed to a “write” system function to output a byte on a particular device, be it the screen or a particular UART channel. It's nature as a handle means that the external user task will not be concerned with what it contains.

Opening a device first requires a driver object. The structure representing a driver has the following properties:
  1. Pointer to the driver prepare routine
  2. Pointer to the device open subroutine
  3. A device name (up to 8 characters plus a null)
The driver prepare routine allows global initialisation of all possible devices of that type to occur. These routines are ran before the OS itself is fully running. This usually includes setting up of the device list, but it might also include setting up interrupt routing if that is better done prior to a device being opened.

Drivers are currently held in a table in ROM. A routine, sysopen, takes a device name (null terminated) in the x register, and any other applicable arguments in other registers. For instance, with the UART driver the channel number (0 for PORT A, etc) is held in the a register. Once sysopen has located the address of the particular driver's open routine, that routine is then run. This performs the necessary functions for the driver. For instance, in the UART case it ensures the channel is not already in use, allocates memory for the "open device" handle, and then initialises the hardware in the UART IC. Finally it configures interrupts in the interrupt routing register inside DISCo. There will be more about interrupts later.

Upon a successful call to the “open device” subroutine, a device handle is returned. This handle is driver dependant but contains at least:
  1. A list node
  2. A pointer back to the driver table entry with which this device is a type of
  3. A pointer to the device's close routine
  4. Same for the read routine
  5. And the write routine
  6. A pointer to the generic "control" routine
The reason the close, read, write and control routine pointers are held in the open device structure is speed; since write, for example, takes an argument - the device structure pointer - if the write routine pointer was in in the driver structure an additional indirection would be needed to locate it on each syswrite call. Nonetheless I may re-evaluate this decision.

The purpose of the control routine is to provide a mechanism to perform arbitrary operations on the the device which are not data reads or writes. For example, the timer driver uses this to start and stop a timer. The UART driver will use this, some day, for allowing the sending of a break signal, changing baud rates etc.

Open devices are held in a per driver-type list, the purpose of which is to facilitate interrupt processing.

In the case of the UART driver, a whole load of additional information is needed:
  1. RX circular buffer (32 bytes)
  2. TX circular buffer (32 bytes, currently unused)
  3. ISR and "user" counters for the two buffers
  4. A pointer to the task that opened the UART channel (this is used for task switching)
  5. The allocated signal mask (signals are explained later)
  6. The base address for the hardware IO port for this port
  7. The port number
There's also some global state which is not part of the open device structure. This includes things like wether a port is in use. In theory this is redundant because of the port being in the device record, and the device record being pushed into a list, but the list node was added relatively recently.

At the moment writes are not interrupt based, but reads are. In fact, on a relatively slow machine like the MAXI09, interrupt-driven writes for the serial port are not all that useful, since even with the action of the UART FIFO, the UART would need its write-based ISR servicing rather frequently, unless the baud rate was quite slow. Nonetheless I will implement interrupt-driven writes at some point. Interrupt based reads are, on the other hand, very useful, since the task handling the UART channel can be put to sleep whilst it waits for data, most likely from the remote end's keyboard. This allows the computer to run other tasks while it waits for UART activity, without the performance hit you'd see polling the port. This leads us on nicely to the next topic.

Multitasking. This is a massive area. Modern OSes on PC are huge beasties and, ignoring fluff like the User Interface, much of this complexity revolves around the multitasking environment presented to the user. But multitasking is possible on nearly anything that might be classed as a computer, from microcontrollers and up. I have decided to attempt to implement a fairly simple, but effective, form of multitasking; it is loosely based, once gain, on AmigaOS ideas, but obviously it is much simpler to work on the 8 bit 6809. Whilst it is preemptive - tasks and interrupt handlers can force other tasks to be scheduled - there is no priority value so all tasks, except the special case idle task, are treated the same. The first thing to do was figure out how task switching should actually work from a MPU standpoint.

On a simple machine like a 6809 without any way to implement memory protection, task switching can be achieved by setting up a periodic interrupt and having the Stack Pointer adjusted so that when the RTI (Return from Interrupt) is executed at the bottom of the interrupt handler, a different path of execution is taken then the one the system was on when the interrupt was entered. If all of the machines registers are also saved via the stack, then the system should behave just as if the task being switched into had never been paused while another task ran.

This setup requires some kind of record to hold basic information on each task, including the stack pointer at the time the task was last running, as well as space set aside for the task's stack. The memory allocation system, just described, is used to create space for these structures, which currently look like this:
  1. List node
  2. Saved Stack Pointer
  3. Initial Program Counter (this is only used to identify tasks when debugging, in lieu of a task name field)
Preceding this record is the task's stack, currently fixed at 100 bytes. Eventually tasks will get names, and possibly an ID number.

The location of the first byte in a task structure is used as a handle to a task. Creating a task consists of the following steps. Currently the only parameter to this subroutine is the initial Program Counter, ie. the start of the task's code.
  1. Allocate 100 + the size of the task record bytes
  2. Calculate what the SP will need to be at task start
  3. Write a value into the stack for the Condition Code register which has the E (entire register set) bit set
  4. Write the initial PC into the stack
  5. Write the initial PC and SP into the task record
The stack for a full register set, as generated by an active signal on the IRQ line, looks like this:

From the 6809 Programming Manual

Thus the SP for a task, prior to starting it with an RTI, points to the address of the first register stacked (Condition Code). Upon the RTI instruction the machine state, including the Program Counter will be pulled off and the task will be started. The MPU pulls off the entire machine state  because the E bit in the Condition Code register will be set.

The timer interrupt handler, which is where task scheduling usually occurs, is roughly described by the following steps:
  1. First the current task pointer is retrieved from the global used to hold it
  2. The Stack Pointer register is saved into the current task structure
  3. If the ready queue is empty then no tasks are in their runnable state, so schedule the idle task and proceed to step 5
  4. Otherwise, rotate the ready queue; the previous head ready task becomes the tail task
  5. In any case, update the current task pointer and pull off that tasks stack pointer
Now we can RTI to run that task, which might be the idle task, or the same task as when this ISR was entered, or a completely different task.

One obvious optimisation to this sequence is to skip the ready list (queue) if there is only a single task in the ready set - if that's the case we can just leave the ready list alone.

Tasks can be in one of two lists: the ready list or the waiting list. Currently the running task is also in the ready queue; this simplifies some of the logic. Tasks are normally in the ready queue unless they are blocked waiting for an event; a signal. Signals are generated either by another task, or by an interrupt handler. Signal semantics are loosely copied from AmigaOS signals:
  • One difference is that only 8 signals are available instead of 32
    • At a future time I will utililise one special bit for forced task termination
  • Tasks, and drivers, obtain a signal (actually a mask with a single bit set) via allocsignal
  • Waiting, via wait, takes a bit mask. If any signal bit becomes set the task wakes up. Also, if the signal is pending before wait is called then the bit is cleared and the tasks skips the wait. In any case the bits that woke up (or not) the task are returned
  • Signalling, via signal, takes a task structure pointer and a set of bits. There's also a variant of this routine, intsignal, for use by interrupt handlers
So far the only use of signals is so that tasks can sleep (block) while they wait for serial input. This is really nice as it means that tasks that are just sat there waiting for button presses don't chew up MPU time. A further usage of signals is so that tasks can be informed of events, such as tasks being told to exit - eg. by a keystroke like Ctrl-C. AmigaOS built its messaging system (pending messages attached to either an anonymous or named list) with signals being the main mechanism used to inform the recipient of a new message. This was used for everything from GUI events to a high level application automation mechanism - via Arexx. I'm not yet sure if I'll need a message system for MAXI09OS. It would be trivial to implement on top of the pieces I have; I'm just unsure if I will need such a system yet. In addition, a concern I have about this kind of system is that each message will require a memory allocation, which will invariably lead to problems with memory fragmentation.

Actual interrupt handling is an involved processes. I must say I am not especially happy with the sheer number of steps required to process, say, a UART interrupt. DISCo's interrupt routing makes things marginally more efficient but the very rough order of processing is still something like:
  1. Read the interrupt status register from the DISCo FPGA
  2. Loop through the claimed interrupt signals, in priority order
  3. Jump to the handler when a match is found
  4. In the device handler (UART in this case) loop through the active channel list
  5. From the device pointer, obtain the start address for the UART channel
  6. Get the UART port IRQ status (on the 6809 this can be done with: lda ISR16C654,x)
  7. Test it to check the event type
  8. For reads, jump to the UART read handler, passing the device handler and base address for the channel
  9. Get the line state to see if there are bytes in the FIFO (lda LSR16C654,x)
  10. Loop Testing for RX having data, writing it into the CPUs 32 byte circular data
  11. When we have filled the buffer, or there isn't any more exit that loop
  12. Schedule the task which owns the port by moving it to the top of the ready list
  13. Enter the scheduler, which will save the current tasks state (which probably isn't the task that owns the UART channel)
  14. Make the top of the ready list task the active one by setting the Stack Pointer up for it
  15. Finally we can RTI
In the case of the UART the FIFO helps a lot. Even with it I have still had to reduce the console baud rate to 19,200 for fear of loosing data. A viable alternative to lowering the data rate would be to introduce hardware flow control, but I'm not that keen on that. I'm sure further optimisations in interrupt processing are possible anyway.

I have implemented two additional drivers just to prove the model: one for the LED attached to DISCo, and one for timer events.

The LED driver is just pointless fun. Ordinary I don't want tasks to be banging on hardware directly. Eventually an MMU implementation should prevent it. To show how hardware would always be accessed through a driver layer I have written a driver for the simplest hardware on the board: an LED. Writing a 1 turns the LED on; you can probably guess what writing a 0 does. To show the utility of a driver even for this trivial hardware, only one task can open the LED at a time. And the task which owns the LED is currently the idle task, which strobes the LED as it runs so you can tell how busy (or idle) the system is.

The timer driver is more useful. The system itself uses a 40hz interrupt to schedule tasks, and this driver hangs off of it. Each instance of an open timer device allows for a single timer action which is configured via syscontrol. Timers can be stopped or started. When started, the interval and wether the timer is a one off or in repeat mode can be set via a simple command block passed via y.

The timer device interrupt handler is fairly straightforward. It has to scan the open timer devices, updating counts and dealing with repeating timers. If a timer has reached its interval value the task that owns the timer is signalled.

So far I have written two test tasks.

The first task is the more interesting of the two. I wrote it so I could test timers and waiting for multiple signals in one task.

The task starts by opening UART port A, and a timer. The timer is initially configured to repeat once a second. The main loop of the task waits on both device signal bits, indicating which of the two signals was received via the UART port. In the case of the UART, outstanding bytes are pulled down via sysread, until no more bytes are outstanding. To make things more interesting if a space character is read the timer is reconfigured, via syscontrol, to be a non repeating 5 seconds one.

This all works quite well, as the following screenshot tries to show:

The top of the screenshot shows the periodic (one second) timer expiring a couple of times. Then I typed h, e, l, l, o and finally space (20 hex), which turns the timer into a 5 second non repeating timer.

The second one is a version of one of the first test programs I wrote in 6809 assembly: a loop that prompts for a string, gets the string, and then outputs the string. This task makes use of two wrapper functions, getstr and putstr that wrap the device routines for byte-based IO. The getstr routine uses wait when sysread indicates there is no data available yet. It is not possible for these routines to read from multiple devices at the same time, or to do anything else at all whilst getting the string. The source code for these routines is available here.

Here's a not very interesting screenshot of this task running:

Of course this is all the more interesting because both tasks were running at the same time. I have even had two computers (the regular Linux box and the MacBook) attached at the same time, one for each task.

I must have a bug somewhere though as very occasionally I have seen system lock ups. To try to diagnose this, and earlier since corrected problems, I wrote a simple debug handler using the spare UART port C (the TTL level channel). So far no luck figuring out this particular problem, but I'm sure I will work it out soon.

All in all I am pleased with how my little OS is coming along. There's obviously a great more still to do, but I'm very happy that large pieces of the OS core are starting to take shape. The next thing I will start on, after fixing the lock up bug, is the VDC console driver...