Saturday, 15 October 2016

Console driver and other OS progress

The last few month's free time has been spent knee deep in the Operating System for my 8 bit micro. The main area of attack has been on implementing a console (screen and keyboard) driver, but I've been working on a few other things as well.

Up until recently I have not tried linking the code that makes up the monitor, or MAXI09OS. Instead the assembler source code files are concatenated together via include statements. While this works, it means all labels and other symbols are global across the code. It's generally not nice structuring things this way, especially as the project grows to be quite large.

MAXI09OS is now assembled into object files and linked using aslink, a component of asxxxx. This means that only symbols that need to be global are marked as such, and other named symbols can be file-local. In all, it is a much tidier arrangement. There are still some further improvements I'd like to make to how MAXI09OS is built, especially around the idea of producing header (include) files which the "user" programs will need. I also want to structure the source code tree so that drivers have their own subdirectory etc.

I have made a fair amount of progress on the console driver. It's far from the VT102-emulated terminal I have in mind for the "finished" version, but it is useable.

The first thing I did was split out the low level hardware aspects of the UART driver into a new source file. The reason for doing this is because the console driver shares much of the UART code for interfacing with the keyboard controller. Once this was done the UART driver itself was modified to use the low level functions and re-tested. As a consequence of the keyboard controller using the slower 9600 baud rate, the low level UART code also contains a routine for setting the rate. This is also useable by the higher level UART driver's open routine through the b register.

The first thing to get working, within the console driver, was keyboard input. This operates in a similar manner to the UART driver's reading operation except that within the interrupt handler the scan codes, obtained from the keyboard MCU, are fed through a translation routine to turn them into ASCII. This routine also deals with shift key and control key handling. As before, no attempt is currently made to deal with key repeat. The resultant ASCII characters are fed into the user side circular buffer, to be picked up by the sysread subroutine when the task is woken up by a signal.

One additional complication revolves around "virtual" console support. Like with UART ports, each console is a "unit". But obviously the input side (keyboard) and output side (screen) are shared. Since they require very little MPU resources, and only consume the relatively bountiful video memory in the V9958 VDC, there are six virtual consoles. In the keyboard interrupt handler the function keys F1 to F6 are checked and if one is pushed the appropriate console is displayed by writing to the "pattern name" base register. Video memory is arranged as follows:
  • 0x0000 - patterns AKA fonts
  • 0x8000 - console unit 0
  • 0x9000 - console unit 1
This continues up to console unit 5. Note a few details: I am only using the lowest 64KB of video RAM out of a possible 192KB, and also that although the consoles only consume 24 rows of 80 columns (1960) bytes, it is not possible to pack the consoles closer together because of limitations in the pattern name base register. Anyway, this all means that console switching is instantaneous, something I'm quite pleased with.

Actually getting the console to display something was accomplished by "porting" the previously written V9958 monitor VDC code into the console's write function. The code has been tidied up quite a lot but still only supports the very basic control codes: carriage return, new line, backspace, tab, and formfeed - which, per the norm, clears the console. I have also implemented vertical scrolling.

One quite serious restriction exists. Since the VDC manages its video memory via address pointer registers it is necessary to prevent any other task from manipulating the VDC's registers. The first, brute force, approach to this was to disable interrupts while the VDC is being used to stop other tasks from being scheduled. This certainly worked but it prevents all interrupts from being serviced, even if that interrupt had nothing to do with a task which might be using the VDC.

The solution I'm currently using is to leave interrupts enabled, but disable task switching. This is accomplished by having the scheduler return into the current task if task switching is disabled.
This solution remains suboptimal; there might be tasks waiting to run which are not using the VDC; they are needlessly blocked from running. The better way is to use a mutex to pause tasks that want access to the VDC once one task has obtained access to it. Other tasks would not be affected. I might try implementing some mutex routines, though I am concerned about them being efficient enough.

Putting this console driver with the work described previously, I now have a test setup for my little OS which includes the following test tasks, each running on their own virtual console:
  1. Firstly the timer-based task as previously described. It opens the console and a timer, starting a repeating second based timer, until the space bar is pressed. At which point a 5 second non repeating timer starts. It also shows the hex values for what is read from the console driver, ie. the ASCII values for key down events.
  2. Two copies of the "enter a string, print the string" looping task.
  3. A serial terminal. This ties together a virtual console and a UART port. Both are opened and then a loop entered. In the loop both devices are waited on. The device that caused the task to wake is read, and the opposite task is written to. UART port 1 is used here.
  4. I also have a task which performs the "enter a string, print the string" operation on UART port 0.
In general, things are looking pretty good. The timer task updates its console while another tasks's console is displayed, and the UART port presents its task simultaneously.

When it comes to the serial terminal task, there are a few issues, beyond the fact that the serial terminal is only honouring the very simplest control codes.

The big issue is with scrolling. The V9958 is a great video controller but it has no special support hardware for scrolling the screen in text mode. Instead the video RAM data has to be read into MPU RAM and then written out again, but shifted up a line. For most uses of the console this is ok; the task just has some extra work to do to scroll the display. But for the serial terminal task this extra work needs to be done before the serial ports FIFO fills up.

This problem is, perhaps, unfixable at least without hardware assistance. The problem lies in the time it takes the MPU to do the line scroll vs the time taken for the UART to receive a line of text. At 9600 baud a full line of text will take roughly 1 / (9600 / 10) by 80 characters which is 83ms. I have cycle counted the MPU scroll code and it takes - ignoring setup code - 43ms. This means everything is fine if the lines are long, but if a (say) 8 char line is received there is only 8.3ms to scroll the screen, which isn't long enough. Even with the UART FIFO and MPU circular buffer, data loss will eventually occur as "short" text lines are received.

The "solution" I have come up with is to scroll the screen in half-screen chunks instead of every line. This reduces both the frequency of scrolling that's required, and it also reduces the work required to do the scrolling since more time is spent filling the video memory with zeros instead of writing out previously read in data. The result of this is not very pretty - using the console feels a bit strange - but it mostly cures the data loss problem. Only mostly though; on very short lines - the yes command is a nice way to show the problem - occasionally characters are still lost.

There is, perhaps, a better solution involving the usage of the UARTs hardware flow control lines. Hardware flow control can be used to pause data transmission when the receiver is not ready to receive it. In theory this could be used to pause the transfer while the MPU is scrolling the display. Unfortunately despite playing with this for a few hours I cannot get it to work.

So far I'm very happy with my OS's progress. The scrolling problem is annoying but I'm going to move on - serial terminal support was always going to be used as a way to test the driver model instead of being anything actually "practical". The serial terminal is a nice way to improve, and learn about, terminal emulation though, when I get around to that.

There's a few different things I could work on next:
  • The previously mentioned VT102 improvements.
  • Some ability to measure how busy the system is would be nice. Currently I use the idle LED to get a rough idea for how idle the system is, but it would be nice to know which tasks are using the MPU the most.
  • Some means for open device inheritance when a task creates another task. Currently each task is responsible for opening a UART port, or a console. It would be better if the parent task did it as this would mean the same task code could be used regardless of the IO device. It could then even be possible to have multiple instances of the same task's code running multiple times if a task either used only the stack or kept its variables in allocated memory, with the pointer to that memory stored in the stack (or perhaps the u register)
  • Another massive chunk of the Operating System revolves around proving disk access to tasks. I have thus far given this problem very little thought.
I feel what I have now is at a nice point to go back and fill in some smallish missing parts before I start thinking about the disk routines, which will be a large chunk of work. Some smaller areas to tackle include:
  • Tidying up the source tree into different subsystem directories.
  • Improve the keyboard behaviour. Key repeat is something I've put off doing for months. And then there's support for the caps lock key.
  • Reworking the debug console to make the output cleaner and easier to read.
  • UART transmit interrupts.
  • Improving interrupt processing speed. This also involves some work on DISCo's VHDL. I have a few ideas for how to simplify, and thus speed up, interrupt processing.
  • More device drivers, for example a joystick driver would be interesting to do. It would need polled and event (changes) modes.
I've also made a short video showing the test tasks, including the serial terminal:

So many ideas, but so little time...

Friday, 17 June 2016

Prototyping parts of an OS for MAXI09

This past few months has been spent thinking about, and prototyping, ideas for the Operating System I want to write for the computer. But before talking about that, a summary of the other little pieces of hardware progress. Anyone who has viewed the video will be familiar with most of this.

First up SPI. I have implemented a trivial SPI controller for DISCo. It is really nothing more then a MPU interface to the SPI pins: SCLK, MISO and MOSI, as well as the Chip Selects SS0 through SS3. Bit shifting is done in the MPU so it is far from a proper controller, but it is adequate to verify that the SPI peripherals have been wired up correctly. So far I have verified the function of the DS1305 (PDF) Real Time Clock, and the CAT25256 (PDF) 32KByte EEPROM. When I have less interesting things to do I will implement a proper SPI controller in VHDL, which will move the bit shifting from MPU code to logic in DISCo, which will speed up SPI data transfers immensely.

The behaviour of the OPL2 sound interface has also been more thoroughly tested then before. I did this by writing a 6809 routine to read in and play back DOSBox DRO format files, which had previously been written out by my laptop. I was surprised by the quality of the playback from the OPL2 on the MAXI09 board, it is indistinguishable from the DOSBox playback, but when recording it for YouTube the sound capture was poor, so I thought I would try again to do it justice. The following recording was made using a USB microphone:

So, to Operating Systems. Here is a brief summary of what I think should be achievable on my MAXI09 micro:

User's point of view
  • Multiple, at least 4, VDC+keyboard text consoles switchable via function keys
  • Serial console active concurrently
  • "DOS" like command shell: List directory, copy file, type file, delete file, create directory, run external command, etc
  • Date command; set and get time from the RTC
  • Possibly very simple batch files
  • If I really went crazy, I could think about writing an editor and assembler
  • The ability to act as a serial terminal for a Linux box, with basic VT102 terminal emulation
  • Fall back machine code monitor, entered via special key combination, based on already written one but with improvements such as command names instead of letters
Low level
  • Pre-emptive multitasking with at least 8 tasks
  • Signal based events and messages
  • No direct hardware access for tasks, with the exception of the VDC (necessary for performance reasons)
  • Byte-wise driver abstraction layer for at least: VDC text mode, keyboard, UARTs, parallel port, SPI, joysticks
  • Block abstraction layer for IDE
  • "Special" drivers for OPL2
  • Timer driver for repeat and non repeat delays, with an interface into the RTC via SPI abstraction
  • Polled or interrupt/signal IO
  • MinixFS support, read and write; likely whole files not proper byte-streams
  • Dynamic memory allocation; used by the system and user tasks
  • Runtime loadable modules for drivers and libraries
This feels very ambitious, but so was the MAXI09 hardware. I have no intention of writing an "8 bit UNIX"; others have already done this. Some elements, eg. the multiple console idea, are borrowed from Linux. But the bigger influence is my usage of AmigaOS with its light-weight multitasking.

I have a few bits of hardware in the MAXI09 to help with things which, on the surface, seem complex. The V9958 should make multiple consoles fairly trivial to do because the screen text for all the consoles can be held in video RAM at all times. Switching consoles should largely be a matter of writing to a couple of registers.

Lots of questions remain. One of the interesting ones is what should the interface to the Operating System calls look like? The "pure" approach is to use software interrupts. This is the preferred way because the service routine is isolated from the caller with a new stack frame, with hardware interrupts from within the service routine automatically disabled. It is also even more robust in the face of future change then a jump table because the number of the service routine to execute can be expressed as a simple register value. None the less they have a significant performance penalty compared to a simple jump table. For now I have not dealt with this problem; system calls are just like any other subroutine call.

Another big question mark hangs over my ideas for using MuDdy to implement an MMU. This will be useful for isolating tasks and should enable a single task to access the full 64KByte memory space. But for now I will ignore this idea.

I have begun by prototyping up some of the facilities which the OS (unimaginatively named MAXI09OS) will need: a dynamic memory system, linked lists, task switching, and the driver model. I've also written some test tasks to help me verify that everything is working.

Dynamic memory management is required principally because the number and size of the tasks, and other objects, in the system is a variable quantity. Tasks themselves will need to allocate memory as they need it and the memory required to hold the code for the task itself will need to be allocated, by the OS, from the free pool. Without a multitasking environment most 8 bit systems get by without a dynamic memory manager.

My prototype is simple. A memory block has the following form:
  1. Pointer to the next block, or 0 for the end of the list
  2. The size of the block (including this header)
  3. A byte for flags, which is currently used to indicate wether the block is free or not
  4. The data block bytes themselves
In this trivial design these blocks are arranged across the dynamic memory area, know as the heap, which is initialised as a single block marked as being free. Allocation of a portion of memory is achieved by scanning the list of memory blocks until a large enough free one is found. If it is, then the block is split in two: the first half is the claimed space, and the second half is the remainder of the original, free, block. The next pointers and the block sizes need to be adjust appropriately to keep the list of blocks intact.

As an illustration of how this works, here is what the heap looks like after a sequence of operations are performed. The size is of the heap is 100 bytes.

For ease of illustration all values are in decimal and the heap begins at location 0. First the freshly initialised heap:

Next 10 bytes is allocated:

Only 5 bytes are useable. The 10 byte block is freed, and a 50 byte block is allocated:

Note the fragmentation, and the fact that the 50 byte block does not use any space from the freed up 10 byte block. This can be cured by coalescing adjacent free blocks, something I will get round to doing soon.

Anyway despite, or perhaps because of, its small code size - only about 80 lines of 6809 assembly - my dynamic memory allocator seems to work quite well. To actually get this implemented I first wrote the routines, then constructed test routines around them using the existing monitor program.

Next, linked lists. The requirements for the structure of memory areas in the heap are limited; the list will only ever need to be scanned one way. Therefore singly linked lists are appropriate: there is no benefit in maintaining links backwards through the chain. But for other OS “objects” this is not good enough - backwards scanning will be needed, so doubly linked lists would be beneficial.

Being familiar with AmigaOS I have decided to implement the same optimisation as used in AmigaOS’s Exec linked list: a hidden node in the the list header. This avoids some special casing on list operations and generally makes for a more efficient operation then a traditional list which uses nulls in the header to indicate an empty list.

The following list operations are available:
  • Initialise the list (since it is not possible to simply fill the header with nulls) (initlist)
  • Append a node to the tail (addtail)
  • Prepend a node to the head (addhead)
  • Remove the node from the tail, returning it in a register (remtail)
  • Remove the node from the head, returning it in a register (redhead)
  • Remove a node from anywhere in the list (remove)
The lists are similar to “minimal lists” in AmigaOS terms. There is not, yet, a priority field and thus no means to insert a list at it’s priority position. Nor is it possible to insert a node after a given node. None the less, the operations available should be sufficient for my little OS. For anyone interested in this code, you can see it here.

I have also made a start on a device abstraction layer. In the case of the UART driver this includes interrupts for reception.

A note on the terminology I'm trying to stick to. A driver is a set of routines for dealing with a type of device. Device are open for a driver type, with perhaps additional parameters passed in for things like the port to open.

Abstraction layers appear even in very simple operating environments. They are useful because it allows user programs to be written without regard for how input and output gets into them. For now, at least, the abstraction will be of an IO device, and not of a file on disk. It’s not yet clear how I will implement file IO. Another advantage of a driver abstraction is they allow the same generic routines to be used regardless of the underlying hardware.

The starting point for this is a “open device handle”. This is an object handle which can be passed to a “write” system function to output a byte on a particular device, be it the screen or a particular UART channel. It's nature as a handle means that the external user task will not be concerned with what it contains.

Opening a device first requires a driver object. The structure representing a driver has the following properties:
  1. Pointer to the driver prepare routine
  2. Pointer to the device open subroutine
  3. A device name (up to 8 characters plus a null)
The driver prepare routine allows global initialisation of all possible devices of that type to occur. These routines are ran before the OS itself is fully running. This usually includes setting up of the device list, but it might also include setting up interrupt routing if that is better done prior to a device being opened.

Drivers are currently held in a table in ROM. A routine, sysopen, takes a device name (null terminated) in the x register, and any other applicable arguments in other registers. For instance, with the UART driver the channel number (0 for PORT A, etc) is held in the a register. Once sysopen has located the address of the particular driver's open routine, that routine is then run. This performs the necessary functions for the driver. For instance, in the UART case it ensures the channel is not already in use, allocates memory for the "open device" handle, and then initialises the hardware in the UART IC. Finally it configures interrupts in the interrupt routing register inside DISCo. There will be more about interrupts later.

Upon a successful call to the “open device” subroutine, a device handle is returned. This handle is driver dependant but contains at least:
  1. A list node
  2. A pointer back to the driver table entry with which this device is a type of
  3. A pointer to the device's close routine
  4. Same for the read routine
  5. And the write routine
  6. A pointer to the generic "control" routine
The reason the close, read, write and control routine pointers are held in the open device structure is speed; since write, for example, takes an argument - the device structure pointer - if the write routine pointer was in in the driver structure an additional indirection would be needed to locate it on each syswrite call. Nonetheless I may re-evaluate this decision.

The purpose of the control routine is to provide a mechanism to perform arbitrary operations on the the device which are not data reads or writes. For example, the timer driver uses this to start and stop a timer. The UART driver will use this, some day, for allowing the sending of a break signal, changing baud rates etc.

Open devices are held in a per driver-type list, the purpose of which is to facilitate interrupt processing.

In the case of the UART driver, a whole load of additional information is needed:
  1. RX circular buffer (32 bytes)
  2. TX circular buffer (32 bytes, currently unused)
  3. ISR and "user" counters for the two buffers
  4. A pointer to the task that opened the UART channel (this is used for task switching)
  5. The allocated signal mask (signals are explained later)
  6. The base address for the hardware IO port for this port
  7. The port number
There's also some global state which is not part of the open device structure. This includes things like wether a port is in use. In theory this is redundant because of the port being in the device record, and the device record being pushed into a list, but the list node was added relatively recently.

At the moment writes are not interrupt based, but reads are. In fact, on a relatively slow machine like the MAXI09, interrupt-driven writes for the serial port are not all that useful, since even with the action of the UART FIFO, the UART would need its write-based ISR servicing rather frequently, unless the baud rate was quite slow. Nonetheless I will implement interrupt-driven writes at some point. Interrupt based reads are, on the other hand, very useful, since the task handling the UART channel can be put to sleep whilst it waits for data, most likely from the remote end's keyboard. This allows the computer to run other tasks while it waits for UART activity, without the performance hit you'd see polling the port. This leads us on nicely to the next topic.

Multitasking. This is a massive area. Modern OSes on PC are huge beasties and, ignoring fluff like the User Interface, much of this complexity revolves around the multitasking environment presented to the user. But multitasking is possible on nearly anything that might be classed as a computer, from microcontrollers and up. I have decided to attempt to implement a fairly simple, but effective, form of multitasking; it is loosely based, once gain, on AmigaOS ideas, but obviously it is much simpler to work on the 8 bit 6809. Whilst it is preemptive - tasks and interrupt handlers can force other tasks to be scheduled - there is no priority value so all tasks, except the special case idle task, are treated the same. The first thing to do was figure out how task switching should actually work from a MPU standpoint.

On a simple machine like a 6809 without any way to implement memory protection, task switching can be achieved by setting up a periodic interrupt and having the Stack Pointer adjusted so that when the RTI (Return from Interrupt) is executed at the bottom of the interrupt handler, a different path of execution is taken then the one the system was on when the interrupt was entered. If all of the machines registers are also saved via the stack, then the system should behave just as if the task being switched into had never been paused while another task ran.

This setup requires some kind of record to hold basic information on each task, including the stack pointer at the time the task was last running, as well as space set aside for the task's stack. The memory allocation system, just described, is used to create space for these structures, which currently look like this:
  1. List node
  2. Saved Stack Pointer
  3. Initial Program Counter (this is only used to identify tasks when debugging, in lieu of a task name field)
Preceding this record is the task's stack, currently fixed at 100 bytes. Eventually tasks will get names, and possibly an ID number.

The location of the first byte in a task structure is used as a handle to a task. Creating a task consists of the following steps. Currently the only parameter to this subroutine is the initial Program Counter, ie. the start of the task's code.
  1. Allocate 100 + the size of the task record bytes
  2. Calculate what the SP will need to be at task start
  3. Write a value into the stack for the Condition Code register which has the E (entire register set) bit set
  4. Write the initial PC into the stack
  5. Write the initial PC and SP into the task record
The stack for a full register set, as generated by an active signal on the IRQ line, looks like this:

From the 6809 Programming Manual

Thus the SP for a task, prior to starting it with an RTI, points to the address of the first register stacked (Condition Code). Upon the RTI instruction the machine state, including the Program Counter will be pulled off and the task will be started. The MPU pulls off the entire machine state  because the E bit in the Condition Code register will be set.

The timer interrupt handler, which is where task scheduling usually occurs, is roughly described by the following steps:
  1. First the current task pointer is retrieved from the global used to hold it
  2. The Stack Pointer register is saved into the current task structure
  3. If the ready queue is empty then no tasks are in their runnable state, so schedule the idle task and proceed to step 5
  4. Otherwise, rotate the ready queue; the previous head ready task becomes the tail task
  5. In any case, update the current task pointer and pull off that tasks stack pointer
Now we can RTI to run that task, which might be the idle task, or the same task as when this ISR was entered, or a completely different task.

One obvious optimisation to this sequence is to skip the ready list (queue) if there is only a single task in the ready set - if that's the case we can just leave the ready list alone.

Tasks can be in one of two lists: the ready list or the waiting list. Currently the running task is also in the ready queue; this simplifies some of the logic. Tasks are normally in the ready queue unless they are blocked waiting for an event; a signal. Signals are generated either by another task, or by an interrupt handler. Signal semantics are loosely copied from AmigaOS signals:
  • One difference is that only 8 signals are available instead of 32
    • At a future time I will utililise one special bit for forced task termination
  • Tasks, and drivers, obtain a signal (actually a mask with a single bit set) via allocsignal
  • Waiting, via wait, takes a bit mask. If any signal bit becomes set the task wakes up. Also, if the signal is pending before wait is called then the bit is cleared and the tasks skips the wait. In any case the bits that woke up (or not) the task are returned
  • Signalling, via signal, takes a task structure pointer and a set of bits. There's also a variant of this routine, intsignal, for use by interrupt handlers
So far the only use of signals is so that tasks can sleep (block) while they wait for serial input. This is really nice as it means that tasks that are just sat there waiting for button presses don't chew up MPU time. A further usage of signals is so that tasks can be informed of events, such as tasks being told to exit - eg. by a keystroke like Ctrl-C. AmigaOS built its messaging system (pending messages attached to either an anonymous or named list) with signals being the main mechanism used to inform the recipient of a new message. This was used for everything from GUI events to a high level application automation mechanism - via Arexx. I'm not yet sure if I'll need a message system for MAXI09OS. It would be trivial to implement on top of the pieces I have; I'm just unsure if I will need such a system yet. In addition, a concern I have about this kind of system is that each message will require a memory allocation, which will invariably lead to problems with memory fragmentation.

Actual interrupt handling is an involved processes. I must say I am not especially happy with the sheer number of steps required to process, say, a UART interrupt. DISCo's interrupt routing makes things marginally more efficient but the very rough order of processing is still something like:
  1. Read the interrupt status register from the DISCo FPGA
  2. Loop through the claimed interrupt signals, in priority order
  3. Jump to the handler when a match is found
  4. In the device handler (UART in this case) loop through the active channel list
  5. From the device pointer, obtain the start address for the UART channel
  6. Get the UART port IRQ status (on the 6809 this can be done with: lda ISR16C654,x)
  7. Test it to check the event type
  8. For reads, jump to the UART read handler, passing the device handler and base address for the channel
  9. Get the line state to see if there are bytes in the FIFO (lda LSR16C654,x)
  10. Loop Testing for RX having data, writing it into the CPUs 32 byte circular data
  11. When we have filled the buffer, or there isn't any more exit that loop
  12. Schedule the task which owns the port by moving it to the top of the ready list
  13. Enter the scheduler, which will save the current tasks state (which probably isn't the task that owns the UART channel)
  14. Make the top of the ready list task the active one by setting the Stack Pointer up for it
  15. Finally we can RTI
In the case of the UART the FIFO helps a lot. Even with it I have still had to reduce the console baud rate to 19,200 for fear of loosing data. A viable alternative to lowering the data rate would be to introduce hardware flow control, but I'm not that keen on that. I'm sure further optimisations in interrupt processing are possible anyway.

I have implemented two additional drivers just to prove the model: one for the LED attached to DISCo, and one for timer events.

The LED driver is just pointless fun. Ordinary I don't want tasks to be banging on hardware directly. Eventually an MMU implementation should prevent it. To show how hardware would always be accessed through a driver layer I have written a driver for the simplest hardware on the board: an LED. Writing a 1 turns the LED on; you can probably guess what writing a 0 does. To show the utility of a driver even for this trivial hardware, only one task can open the LED at a time. And the task which owns the LED is currently the idle task, which strobes the LED as it runs so you can tell how busy (or idle) the system is.

The timer driver is more useful. The system itself uses a 40hz interrupt to schedule tasks, and this driver hangs off of it. Each instance of an open timer device allows for a single timer action which is configured via syscontrol. Timers can be stopped or started. When started, the interval and wether the timer is a one off or in repeat mode can be set via a simple command block passed via y.

The timer device interrupt handler is fairly straightforward. It has to scan the open timer devices, updating counts and dealing with repeating timers. If a timer has reached its interval value the task that owns the timer is signalled.

So far I have written two test tasks.

The first task is the more interesting of the two. I wrote it so I could test timers and waiting for multiple signals in one task.

The task starts by opening UART port A, and a timer. The timer is initially configured to repeat once a second. The main loop of the task waits on both device signal bits, indicating which of the two signals was received via the UART port. In the case of the UART, outstanding bytes are pulled down via sysread, until no more bytes are outstanding. To make things more interesting if a space character is read the timer is reconfigured, via syscontrol, to be a non repeating 5 seconds one.

This all works quite well, as the following screenshot tries to show:

The top of the screenshot shows the periodic (one second) timer expiring a couple of times. Then I typed h, e, l, l, o and finally space (20 hex), which turns the timer into a 5 second non repeating timer.

The second one is a version of one of the first test programs I wrote in 6809 assembly: a loop that prompts for a string, gets the string, and then outputs the string. This task makes use of two wrapper functions, getstr and putstr that wrap the device routines for byte-based IO. The getstr routine uses wait when sysread indicates there is no data available yet. It is not possible for these routines to read from multiple devices at the same time, or to do anything else at all whilst getting the string. The source code for these routines is available here.

Here's a not very interesting screenshot of this task running:

Of course this is all the more interesting because both tasks were running at the same time. I have even had two computers (the regular Linux box and the MacBook) attached at the same time, one for each task.

I must have a bug somewhere though as very occasionally I have seen system lock ups. To try to diagnose this, and earlier since corrected problems, I wrote a simple debug handler using the spare UART port C (the TTL level channel). So far no luck figuring out this particular problem, but I'm sure I will work it out soon.

All in all I am pleased with how my little OS is coming along. There's obviously a great more still to do, but I'm very happy that large pieces of the OS core are starting to take shape. The next thing I will start on, after fixing the lock up bug, is the VDC console driver...

Saturday, 27 February 2016

Physical construction of the MAXI09 PCB is now complete

Physical construction of the MAXI09 PCB is now complete. A final, 3 hour, session with the soldering iron completed that task. Here is a picture of the top of the board:

I have also found a solution to the problem of unwanted flux residue. After trying out various things including isopropyl alcohol, it seems that the best approach is good old white spirit applied with an old toothbrush, followed by a rinse under the tap. Thanks to my work colleague Rebecca Gellman of the Retro Computer Museum for the tip! I'm no longer embarrassed to include a picture of the bottom of the board:

You can see the fix for the MEMRDY and /DMAREQ pins, which sour the otherwise error-free board.

The other piece of construction I've done is to, albeit temporarily, mount the PCB and keyboard on a piece of acrylic. I hope to eventually redo this in a more permanent way, but for now I can at least type on the keyboard. MAXI09 now resembles a real computer!

Here is a picture of the computer. At the time it was plugged into the living room TV:

This picture, if you look closely, shows a Compact Flash attached to the IDE port. I was pleased that this works nicely, as before, though thus far the CF is operating only in 8 bit mode, and without the help of the DMA Controller.

Although there remains much work to be done, both in programmable logic and in terms of software, MAXI09 can now be used almost like a real 80s computer:
  • Attaches to a TV/RGB monitor
  • Keyboard
  • Joystick ports
  • Sound
  • Mass storage
To illustrate the above, here's a picture of the computer playing the Snake game I wrote a year or so ago for my previous board:

The Snake game was loaded from the Compact Flash, with the keyboard used for operating the computer.

The computers's main limitations are down to the software. My Monitor is not a proper operating environment. Eventually I hope to write a simple multitasking Operating System, but in the mean time the Monitor can be used to test the hardware, including the parts implemented in programmable logic, and to try out ideas for the OS I hope to write, investigate interrupts, etc.

Another thing I have been working on is a boot loader. The main purpose of this is to allow the EEPROM to be reprogrammed regardless of its content. One of the flaws with the previously implemented in-system EEPROM reprogramming is that the code to reprogram the device is stored in the EEPROM. If the reprogramming fails, maybe because of a power interruption, then the computer is un-bootable, necessitating a pull of the EEPROM and a reprogram on my home made EEPROM programmer. The solution to this problem is to put the reprogramming code somewhere else. In the case of the MAXI09 there is one interesting possibility: the MuDdy FPGA. The Flex10K (PDF) are interesting because as well as the programmable logic, they also have RAM bits. These can be used for anything including data logging or lookup tables for maths functions, but a simpler use is to expose the memory on external pins. It is also possible to pre-load the memory with content set at configuration time. And removing the memory array's write line makes it behave as a ROM.

The EPF10K10 has 3 lots of 2048 bits, or 3 by 256 bytes of RAM, called EAB (PDF) by Altera. 256 bytes is enough to implement a flash routine in 6809 machine code. This leaves 2 by 256 bytes for other purposes, like the MMU functionality when I get round to implementing it. It might also be possible to implement a boot loader which reads the OS code from blocks in the IDE device, or receives the image via XMODEM.

One further complication in my boot loader is that it needs to overlay the top most page of EEPROM at 0xFF00 to 0xFFFF. This is necessary because after the boot loader has switched to the runtime system (the OS, or as it currently stands the Monitor), that image and not the loader needs to occupy the top page so that the interrupt vectors are in place.

This is accomplished by having another register in MuDdy which controls what is presented at the top page. A zero, the default, puts the loader FPGA ROM there and a one puts the EEPROM at that page. For completeness it is also possible, via another bit in the same register, to write project the EEPROM. This is in addition to write protection hardware jumper.

This register is called MUDDYSTATE, although this is probably a bad name for it. Bit 0, when 1, write enables the EEPROM. Bit 1, when 1, maps the EEPROM onto the top most page. All other bits are currently unused and will be ignored here.

A simplified description of the new MAXI09 boot process is as follows:
  1. MUDDYSTATE is set to 0b00 after MuDdy has been configured.
  2. 6809 resets into the loader code
  3. Loader outputs start-up banner saying: press f to flash
  4. Two second, approximately, wait for input
  5. If no input, jump to the last step
  6. If we got an f then the operator wants to reprogram the EEPROM
  7. Copy the loader image, 256 bytes, to a scratch area of RAM
  8. Calculate the new position for the programming routine and jump to it
  9. Enable writing to the EEPROM, and map it into the top page by writing 0b11 into MUDDYSTATE
  10. Copy 16 KByte of 64 byte pages from the serial port to the EEPROM, much as done previously
  11. Disable writing to the EEPROM by writing 0b10 to MUDDYSTATE
  12. Send the new content of the EEPROM back to host computer so it can verify the content, much as before
  13. Jump to the top of EEPROM, at 0xC000, which will start the OS or Monitor
Inside the OS, almost the very first thing it needs to do to is write 0b10 into MUDDYSTATE, in case the EEPROM was not just flashed. This will ensure the loader ROM is not at the top page regardless of wether it was previously reprogrammed, and that writing to the EEPROM is disabled.

The loader code is cobbled together from various bits of Monitor code, including serial routines and the reprogramming code itself. Even after I finally implement it fully, serial IO inside the loader program will always be polled and not interrupt driven. This is both to reduce the size of the code and to keep it as simple as possible. Currently the loader program is about 220 bytes in length, so not much more room is available.

Actually getting the loader code into the MuDdy FPGA is an interesting process. After producing the 256 loader binary via the ASxxxx assembler, it needs to be converted into a format that the Altera tools require. Two file formats are supported: Intel HEX and Altera's own Memory Image File (MIF). After trying and failing to get the ASxxxx linker, aslink, to produce compatible Intel HEX  files, I eventually settled on writing a simple Perl script that converts a binary file into the MIF format. This file is then associated with a RAM array VHDL component which in turn is instantiated by a wizard within the Quartus tool.

The summary of all this is that the loader works very well. Regardless of the state of the EEPROM, I can always recover the system. It is even possible to start the computer without the EEPROM socketed; albeit all you see at the serial console is the prompt to start the reprogramming process.

As I've said, there is much more to be done before starting properly on planning the OS and related software.
  • Whilst the IDE interface works fine in 8 bit mode, I want to get 16 bit transfers implemented. This will involve implementing a latch in DISCo to hold the high byte of the 16 bit IDE data bus, as I've described previously.
  • Also, now that I have a, albeit 8 bit IDE interface, it would be nice to modify my, somewhat ropey, low level IDE routines to make use of the DMA Controller implemented in MuDdy.
  • Although the Amiga 600 keyboard is functional, the work needs finishing. Scan-codes need to be interpreted for the cursor and other non alpha-numeric keys, commands need to be added to twiddle with the RGB LED and caps lock LED, a keyboard-activated reset is needed, etc.
  • Now that the SPI peripheral ICs have been attached to the board, I have no excuse not to implement an SPI bus controller inside DISCo. It will permit access to to DS1305 (PDF) Real Time Clock, analogue joystick pots, and the 32KB SPI EEPROM.
Another piece of hardware I want to play with is my "new" printer. I found, through eBay, a Citizen 120D printer. Here's a stock picture of one of these 9 pin dot matrix printers:

I paid a good price, and it is complete with a manual and even the C64 interface module. This was the first printer I ever owned; I used one with my Amiga 1200 in the 90s. I'd like to test that the parallel port on the MAXI09 board works properly by sending the printer data from memory, using a Monitor command specially written for this purpose.

Another thing I hope to do very soon is record a video of the MAXI09 in action...

Update: And here is that video. The "production values" are terrible, but hopefully someone will find it interesting:

Friday, 22 January 2016

Sound, video, DMA; more bring up

Construction of the MAXI09 is continuing. With each part soldered onto the board comes a limited amount of testing with the monitor. Enough to exercise the very basic level of functionality, nothing more. Proper testing, and actually making the part do cool things, will come later.

The first part added was the OPL2:

Since this was previously prototyped on breadboard I didn't foresee any problems, and sure enough there weren't any, save for a strange problem with the 3.5mm mono socket. I had assumed that I could still use a stereo plug, with the sound coming out of both channels. But instead I got a sound mostly consisting of fuzz. I have solved the problem by using a splitter cable, between my PC speakers and the MAXI09, but it would be nice to understand what's going on. Testing was done by sending the simplest stream of register writes that will play a tone.

The next parts to be attached was the keyboard controller section:

As you can see, the RGB LED is not yet soldered. After soldering it was time to test that the QUART could talk to the controller. I did this by modifying the AVR controller code to output the scancodes as a stream of bytes instead of printable characters, and writing a routine in the monitor to print this stream as bytes are received from the keyboard controller port, Port D in the QUART. It was a big relief to see this working. MAXI09 will have a nice keyboard!

I haven't yet tested comms the other way. This will be needed to control the RGB LED, and possibly configure key repeat parameters, though I may decide to implement that in the MPU. Nor have I made the controller code process these messages.

Continuing on with the QUART, the next step was to attach the RJ45 serial ports and the accompanying MAX238 (PDF):

After the problems I had with the previous board's RS232 level serial ports, and despite all the checking I'd done on the MAXI09 schematic, I still did not expect these ports to work. But strangely enough, both ports work flawlessly. I have since switched to using an RJ45 port as the main console port, since it's use does not require fiddling with the small wires at the end of the USB TTL serial converter lead.

At some point in the design of the PCB I made a small blunder. Somehow I hadn't noticed that I'd used the footprint for a very small buzzer. Like the previous board MAXI09 includes a buzzer, this time attached to the DISCo FPGA. But this time I accidentally chose a footprint for a 7mm buzzer instead of the 12mm ones used before. This was especially irritating since I have at least a dozen 12mm buzzers in my parts drawer. The solution was simple though and I sourced some 7mm buzzers from eBay. Unfortunately the results are not great; these buzzers are not very loud. I thought it might be weak drivers in the FPGA, but the 12mm buzzer sounded just fine on the old FPGA breadboard. It's not a big problem though I do wish I'd used the correct footprint.

The final bit, for now, of hardware assembly involved the V9958 VDC:

Though I have a reasonable level of experience with this part, I was still a little apprehensive about the transistor amplifier section. The only difficulty I encountered was in soldering the actual transistors. The TO-92 footprint I used had the 3 terminals in a line. This made soldering them needlessly difficult with the chisel tip that was fine for soldering the rest of the board. Next time I will use a footprint with splayed pads.

Once the construction was completed, and the ICs inserted into their sockets, all that was left to do was modify MuDdy's VHDL to get it to generate the /VDCREAD and /VDCWRITE signals, and modify the monitor to look for the VDC at its new IO address.

The first thing I tried worked first time - my old Snake game. I had to modify the game to read the joystick position from the new joystick port instead of reading it from the old AY-8912, but otherwise the game code is unchanged from before.

I am very happy and relieved that MAXI09 has working sound, video and keyboard!

There remains some more hardware work to be done but before getting on with that I have made a start at the task of implementing the DMA Controller. I previously, about a year ago now, hacked a DMAC into the old board, with reasonable results. With the resources available in the EPF10K10 (PDF) I have now got a fairly useful DMAC implemented in MuDdy.

Because I have not yet tackled the MMU, my DMAC is currently, just like the rest of the computer, limited to the MPUs 16 bit address space. The following features are available:
  • 16 bit (arbitrary) source, destination and length registers
  • Increment control on source and destination
  • Invert source before writing (not very useful but fun)
  • Write only, don't read
The bits in the flags register are currently arranged as follows:
  • 7-4 Unused
  • 3 Write only
  • 2 Negate source
  • 1 Increment destination
  • 0 Increment source
The main purpose of the write only facility is to make the DMAC efficient at clearing memory.

Here's a screenshot of the DMAC being exercised to copy a grand total of 16 byes, from EEPROM to RAM:

First the original, pre DMA transfer, source and destination memory blocks are dumped out. Then the source, destination and length registers are set up. Finally the flags register is written to, which triggers the transfer. After its completed the MPU is un-halted and execution resumes. The new destination memory block is dumped out, showing that the transfer has been successful.

Perhaps more interesting is a look at the key signals in the system whilst this is going on:

This shows the E clock, /HALT, BA, BS MPU lines, as well as /READ and /WRITE as generated by MuDdy. When BA and BS are high, the MPU has relinquished the busses and the DMA Controller can begin reading and writing bytes. The machine cycles before and after the DMA transfer is occurring are "dead" cycles; neither the MPU or the DMA Controller are owning the busses. This is an unfortunate facet of how the 6809 manages the busses.

The speed of the transfer is a byte copied every two machine cycles, or one cycle if the controller is only filling memory. This is significantly faster then 6809 code could carry out the same task.

Whilst working on the construction of the MAXI09 board, I found another weakness in the circuit. I had planned to be able to swap in a 6309 at some point, so I could experiment with the part. However, after I'd submitted the PCB design for manufacture, I found a small but significant difference between the 6309 and 6809: the EXTAL pin, when the MPU is being clocked by an external oscillator, needs to be left floating instead of being tied to ground as it is with the 6809. If I'd know this I would have included a jumper between the pin and ground. Whilst there are "dirty" workarounds for this problem, such as pin removal, I think I will stick to using the 6809 in the board.

I'm now in the process of tidying up my somewhat rushed VHDL. One of the things I have struggled with is writing "nice" designs, designs which aren't riddled with cut and paste and other nasties. I'm getting there though. Hopefully soon it will be presentable enough to put on github...