Saturday, 22 April 2017

Code reorganisation, debug monitor, IDE, MinixFS

I have restructured the source code tree for MAXI09OS. Source files are now grouped in directories, with a directory for drivers, another one for the core of the OS, etc. The assembly source files are no longer stuck in a single directory. The makefiles are, by necessity, a little more complex as a result. After toying with using make in a recursive (PDF) fashion, I've instead settled on using include files which enumerate the build files.

One other improvement is that the source for the 256 byte boot loader, described previously, has been folded into the main MAXI09OS repo, along with the flasher tool for transferring new ROM images to the MAXI09 board.

All in all, it's a neater setup even if it is a little over engineered. I learned a bit more about make writing it as well. There's a few further improvements I could make if I wanted to, like a better header dependancy system then simply having a full rebuild whenever one of the headers is changed, but since a full rebuild takes only seconds it's probably pointless.

Back in the OS code itself, I have been improving my monitor by folding it into the OS as a kind of debug task. Commands are now words instead of a single letter. So "dump" instead of "d". Much nicer. The "command table" is handled through a generic subroutine, so other environments, like the Shell, can use the same mechanism.

While in use, the monitor task is the only one which will be scheduled. It is entered by hitting return on whatever IO device it is running on, which is usually a UART port. Interrupts are still enabled, but a flag variable is incremented that causes the scheduler, when it runs under the main ticker interrupt, to always return causing the same task to run each time. In this state the monitor will see a mostly static system. This is the same mechanism (forbid()/permit()) used when a task needs to be the only task running in the system.

I've added a command that shows memory stats, and the tasks in the system, or just the information about a specific task as this screenshot shows:
The "Largest" value from the memory command warrants some elaboration. This is the size of the largest free memory block. Since the simple linked list approach to the memory management lacks a mechanism to coalesce free blocks, it is very prone to memory fragmentation. The Largest value is the size of the biggest free memory block, which might well be considerably less then the total free memory. Actually after coalescing adjacent free blocks, free memory could still be fragmented.

I've also been working on making the monitor useful for exercising device drivers directly, without the need to write a specific test program.

With the sysopen command, it is possible to set the A register (which is usually the unit number) as well as the B register, which is used to set the baud rate in the UART driver but is otherwise not specifically assigned to a particular purpose.

The main motivation for doing this was to make it easier to write a driver for the IDE interface.

The IDE driver is for sector level access to the attached disk; the filesystem layer, described later, sits on top of it.

The same driver mechanism, and subroutines (sysopen, sysread, etc) are used for the IDE driver, except that in the case of sysread additional registers are used since the read is a sector read and not a byte read.

The following registers are used, in both sysread and syswrite:
  • X: the device handle, as usual
  • Y: the memory address to write to (sysread) or read from (syswrite)
  • U: the sector number (LBA) to start from
  • A: the count of sectors to read or write
Currently no interrupts are used so the IO operations busy-wait until the device is ready to send (or receive) the data. There are obstacles in MAXI09OS to doing this which I'll write about later. In reality this would only really matter if MAXI09 was ever attached to a very slow, old, hard disk. Whilst a CompactFlash is used the interrupt would fire, most likely, only a few iterations into the polling loop. Such is the speed of the MPU in MAXI09 retaliative to what a CF would more usually be attached too. All that said, getting interrupts working with the IDE interface would be nice.

I'm also using syscontrol for a few things (more will come later):
  • IDECTRL_IDENTIFY - 0: perform an identify command on the drive and fill out the 512 byte sector of information referenced by the Y register
  • IDECTRL_READ_MBR - 1: read sector 0 into device handle memory and copy partition information into a system table
The partition table, which is a section of 4 lots of 16 bytes within the MBR sector, contains start and length sector information about each partition, as well as other non pertinent data. The syscontrol action reads this in and uses it as sector offsets when doing IO on a partition. Currently no info about the partition table is returned with the syscontrol call. I will probably change this at some point so the user could view the table etc.

Inside the sysopen call partitions map nicely to units, with unit 0 being used to access the entire disk. The partition table information is used to calculate an offset for each partition / unit. At present, the lengths of each partition is not enforced and accesses for the first partition could overlap into the second, etc. This would be trivial to honour I'd I wanted to write some extra code.

This screenshot shows the monitor being used to open the IDE device and issue the IDENTITY command. The first 128 bytes of the resultant sector are dumped out, showing the vendor and model:
Here's a look at the monitor being used to:
  1. Open the entire disk
  2. Read the MBR
  3. Close the entire disk
  4. Open partition 1
  5. Read in the Minix superblock (2 sectors) into memory location 0x4000
  6. Dump out the first 128 bytes of the superblock
(The way I worked out that this was a MinixFS superblock was to spot the 0x8f13 sequence at offset 0x10. This is the magic number sequence for a Minix filesystem with 30 character file names, byte swapped since the format on disk is little-endian.)

After implementing the low level IDE layer (and MBR reading) the next task was to write routines for dealing with the filesystem layer.

For anyone interested in the guts of this ancient filesystem I suggest you read this overview, as well as this look at some of the data structures. Needless to say, MinixFS version 1 and 2 are about the simplest form of UNIX filesystem, all the more interesting (for me and this project) because the important structure fields are 16 bits wide.

The functionality nicely splits in two modules:


This wraps a mounted Minix filesystem handle.

It contains code to parse the superblock structure (including lots of little to big endian swaps), and a routine to read a arbitrary 1KB FS block which calls to the underlying device ie. the IDE layer to do the reading at the calculated physical offset. This routine uses a block offset calculated when the filesystem is mounted (of course, the underlying IDE layer will apply its own partition-based offset) The filesystem-layer offset calculation uses fields from the superblock, which includes fields which indicate the number of blocks used for the inode and data zone bitmaps.

There's also a routine to read in an arbitrary inode. This uses a simple cache; the last disk block of inodes read in. If the requested inode has already been read in then there won't be any disk access. 

An interesting aspect of this module is that it is possible to have multiple mounted file systems in the system at the same time. However to keep things simple the init task is responsible for mounting a system-wide root filesystem.

Also since an open "low level" device handle is the thing which is mounted, in theory the code written supports the adding of other block devices sitting under the filesystem code, say a SCSI disk attached to a controller.

Those good things said, I have not attempted to implement a VFS-type layer in MAXI09OS. Only Minix FS volumes can be mounted and the subroutine is called "mountminix". I did toy with writing more abstractions that would allow, hypothetically, the writing of a FAT16 layer without changing any user level code but in the end I concluded it would add yet more complexity and time and not really teach me anything useful.


This wraps an open "file", and presents the same entry points for an open file as the other drivers in the system, with the addition of a new sysseek call to move the file pointer around the file. It also contains other routines, for things like stat-ing a file by its filename.

The basic "thing which is opened" is a file referenced by its inode number. sysopen is called with X pointing to "minix", Y pointed at a mounted filesystem superblock handle obtained with mountminix, and U with the inode number. As usual the device/file handle is returned in X.

These handles are used for all types of "files". Opening one principally consists of locating and reading in its inode. This inode includes the "data zones" (block numbers) for the entry's data. For directories this data consists of an array of 32 byte directory entries AKA dirents. The structure of a dirent is trivial:
  1. 2 byte inode number
  2. 30 byte (max) null-padded filename
Thus to open a file by it's filename the first thing to do is to open the directory the file is in. We have to start somewhere, and what we start with is the root directory inode, which in Minix is always inode 1. (Inode 0 is reserved.)

The dirent array is iterated through, possibly by reading in additional data zone (filesystem blocks) if the directory contains more entries then would fit in a single filesystem block.

If a match of filename is found, the inode number for the file becomes known and its inode structure can be read from disk. The content - data zones - of the file can then be read in using the data zone pointers.

The following screenshot shows the monitor to be used to:
  1. Open the IDE disk.
  2. Read the partition table.
  3. Close the IDE disk.
  4. Open partition 1.
  5. Mount the MinixFS.
  6. Open inode 1, the root directory.
  7. Read in the first few dirents and dump them out.
As well as simple linear reading through a file it is also possible to seek to an arbitrary byte position, and continue reading from there.

One of the massive tasks I've not yet even started is writing to the filesystem. This is a pretty big job and opens up all sorts of interesting scenarios that need dealing with. For instance, how to deal with two tasks, one which is writing to a file, whilst another has it open for reading? Things get very challenging, very fast.

The last thing I have been working on is the command-line Shell. Currently it is very, very basic. You can perform the following operations:
  1. Change directory to a named directory (cd)
  2. List the current directory (list)
  3. List in long form the current directory (list -l)
  4. Output files by name (type)
These are internal commands, effectively subroutines in the ROM. But external commands should be perfectly possible too. Since I can now read from a filesystem, if the user enters the filename for a command that filename will be opened, read into RAM, and jumped too as a subroutine of the Shell process.

The list functionality warrants a little discussion. In a simple listing, it is only necessary to iterate the dirents in a directory. The user cannot tell even wether the entries being listed are files or directories, since those attributes (the file "mode") are stored in the inode and not with the filenames. List in long form opens each file's inode and displays the type of the "file" (regular, directory, pipe, device node, etc), along with its files size, user and group etc. Thus list long, on MAXI09 as it is on a real UNIX/Linux, is more expensive then simply listing the names of the files.

I've yet to write a routine to output decimal numbers, so all the values for things like file size are still in hex.

The following screenshot shows the Shell being used to wander around the Minix filesystem, listing directories etc:
The Shell is coming along quite nicely. Of course, what's not shown here is that it is possible to have multiple Shells running at the same time, on the virtual consoles and on the UART ports. There's a few limitations not shown in the screenshot, like the fact that you can only change directory into a directory in the current directory, not to an arbitrary path.

So MAXI09OS is coming along quite nicely, though you still can't really "do" anything useful with it yet.

I think I'll take a break from working on the software side for a while now and switch to working on the hardware:
  • There's the SPI controller VHDL code in DISCo. I can then write a driver for it, and finally some utilities for setting and getting the date and time from the Real Time Clock.
  • The Citizen 120D+ printer I bought about a year ago has yet to even be powered on. I could have a go at writing a driver so I can print to it, much like how they can be outputted to the console. This might also prove that I need to implement command redirection.
  • I could have a go at finishing the building of an analogue joystick I started a year or so ago, and experimenting with that
  • The keyboard controller can generate a reset signal on a particular key combination, but I've not even experimented with getting that working yet
I've had MAXI09 up and running for more then a year now, and it continues to keep on giving with no end in sight...

Thursday, 19 January 2017

More progress.... Lots of different areas

Looking again at the VHDL for my interrupt router, I realised that adding a priority encoder was fairly simple:

Instead of being the masked and active interrupt lines, ROUTED is now the number (bit position plus one) of the active line. 0 is used as a special case: no interrupt lines, which are exposed by the mask, are active. This should never happen inside the ISR because when there are no active interrupts the MPU's interrupt lines should stay high, but it is catered for "just in case". The only downside with this method is that interrupt priorities are set in the VHDL. In other words the fact that a QUART interrupt overrides (and hides) a RTC interrupt can't be changed by MPU code.

This all means my main interrupt service routine is reduced to something approximating:

As each driver is opened, it will set up its interrupt handler routine pointer in the appropriate table, and set DISCo's interrupt mask bit to allow the incoming interrupt through to the MPU. For example, here is the relevant code in the UART driver:

This is much more efficient then before, when the main ISR had to manipulate mask bits, looping over a list of possible interrupt lines. The top level (IC level) interrupt processing is now nice and fast and most of the time spent in handling an interrupt is now used dealing with the actual interrupt-generating peripheral IC's registers and not "housekeeping".

I've also been working on the keyboard microcontroller code and have implemented, at last, key repeat and support for caps lock.

Key repeat was easier then I expected. Key repeat in any keyboard generally consists of two, separately configured delay values:
  • Typematic delay: the pause between a key being pressed and the first repeat press being generated.
  • Typematic rate: the pause between subsequent repeated key press events being generated.

Both values have reasonable default values set in the MCU code. I have also introduced a command code so the 6809 can change these values via the UART channel.

Because of wanting to keep all commands in a single byte, I have had to be a bit frugal. The command byte is arranged as follows:
  • 0b00000000 - red LED off
  • 0b00000001 - red LED on
  • 0b00000010 - green LED off
  • 0b00000011 - green LED on
  • 0b00000100 - blue LED off
  • 0b00000101 - blue LED on
  • 0b00000110 - re-initialize the controller state
  • 0b01xxxxxx - set typematic delay to xxxxxx
  • 0b10xxxxxx - set typematic rate to xxxxxx

The delay values are scaled such that 63 (decimal) is a suitable - slow enough - value. There is currently no acknowledging of the byte sent to the keyboard controller. I will look at doing that when I have figured out transmission interrupts in the main computer.

From the 6809 user-level code point of view, sending a command to the controller is accomplished via a syscontrol call on any open console. However the delay values impacts all consoles.  Eventually I will write a system utility to set these values when the system starts up, as well as interactively.

I had a couple of ideas for how to program in the operation of the caps lock key and its LED. Ideally I wanted the LED to be turned on and off by the 6809 sending a control byte to the MCU when the caps lock key was pressed. This would allow the caps lock LED to be used, in a crude way, to determine if the system was responsive since the LED would only toggle if the 6809 was running and able to pick up the keypresses, generating a command to turn the LED on or off in reply.

This would have required a byte to be sent in response to one received, in the console driver's sysread function. The problem with this is that task switching would have to be disabled while the byte was sent because another task might also be wanting to send a byte to the keyboard MCU, for the same or a different reason. This is a clear millisecond at 9600 baud and whilst I could increase the rate to sidestep this problem, I instead decided on a simpler approach.

The caps lock LED is handled entirely in the keyboard MCU code, which tracks wether caps lock is on or not. Communicating caps lock changes with the main 6809 MPU is done by sending a key down scancode byte (high bit clear) when caps lock turns on, and a key up sequence (high bit set) when caps lock is turned off. Cunningly, this mirrors what an ordinary shift key does. This means that the actual event of the caps lock key being released is not sent, but there seems to be no reason why the 6809 would ever need to know about this occurrence.

If anyone is interested, the code for my keyboard controller is available on github as normal. I'm pleased with how relatively simple the controller code has remained with these two bits of extra functionality.

One small "annoyance" that I noticed, after making the initial changes to scancode translation in the console driver: caps lock does not behave exactly like shift in "real" keyboards. Indeed, some early computer keyboards had two "lock" keys, a shift lock and a caps lock. If caps lock is on, numbers should still be available instead of punctuation. This has complicated the 6809 scancode translation a little. Instead of yet another translation table, a call to a toupper subroutine is made to convert a byte, if it's a lowercase letter, to uppercase. This occurs - only if caps lock is on - after the shift and control key checks have selected an alternative scancode translation table.

I've also been busy improving the core OS itself. Tasks now have a default IO device. IO routines - character based like sysread and the string based wrappers like getstr - have been wrapped *defio variants which pull out the default IO channel from a variable in RAM. For efficiency, this variable is updated from the copy held in the task structure each time a task is scheduled. It is also possible to set this IO device when a task is created.

With this work it is possible to use the same task code in multiple tasks. The "echo test" task, as I now call it, allocates the memory to hold the inputted string and uses the default IO channel to get and put the string. Multiple tasks all use the same code, without making a copy, and operate on two virtual consoles as well as a UART port. In other words this task code is reentrant. This method will eventually be used when the echo test task is replaced with something resembling a Shell and should be a significant memory saver.

One issue to address in most multitasking systems is how a task should exit. This is a surprisingly complex operation for any multitasking OS. I have borrowed some ideas from UNIX (massively simplified) and hold an exit code in the task's structure. The exiting child signals its parent which can, in turn, extract the task structure pointer (now fee memory) and exit code, returned in x and a respectfully.

I am still not sure if I'll use the above described mechanism for the running of all sub-tasks, since it is quite a bit of setup. After assuming for years AmigaDOS used this kind of mechanism in its CLI, I have since learned that instead external commands are simply read off disk and run as subroutines of the CLI task. The reason is obvious: efficiency. However, this simple technique makes pipelining commands more awkward, since the whole output has to be buffered for the next task (this probably explains why AmigaDOS lacked pipes). I will have to experiment when I get closer to having a useable Shell to determine which approach to use.

Finally, I've been working on improving the debug output which is presented, if enabled at assembly time, on UART port 2 - the TTL header. There are code, and text output, improvements.

Each debug message now has a "section type", for example messages related to the memory allocator are marked against DEBUG_MEMORY. At assembly time, it is possible to select t which debug messages should be generated for that particular build. So if I have a problem with the interrupt processing I can choose to only output those messages, etc. Debug messages were previously always included in the generated binary; now they are only included if debug mode is enabled.

The messages also look neater because each one is prefixed by a section label. To print out that task address with the task name, the following code is used:

debugxtaskname ^'Scheduling task: ',DEBUG_TASK

This required some new uses of macros in ASxxxx. The macro has to allocate space for the message to print in the generated code, but it can't go directly into the stream of code since otherwise the MPU would try to run the message as if it as code. Instead a new "area" was created, called DEBUGMSG. This is towards the end of ROM and contains all the debug messages. This particular message only shows if DEBUG_TASK is enabled for the the build.

Typical output from the debug serial line looks like the following:
All told, I'm pretty pleased with how my weird little OS is coming along. There's still lots of interesting things to work on:
  • I'm keen to try organising the code a little better. There are now about 30 files, and it would be nice to have a directory layout with drivers in their own directory etc.
  • More drivers: joystick, printer port - I have yet to try hooking my Citizen 120D+ printer up. The SPI bus needs an interface too.
  • Improvements to the console driver: I could look at some of the VT100 codes, and try to make the serial terminal more useable with them
  • Transmission interrupts on the UART. I still have yet to crack them.
  • Extend my, very preliminary, debug monitor. I have started "porting" my old monitor to the MAXI09OS so I can use it for debugging by dumping out things like task structures, the allocated memory blocks etc.
I  could also have a crack at extending the VHDL in MuDdy and DISCo:
  • The IDE port needs a high byte latch implemented in DISCo
  • DISCo also needs a proper SPI controller, instead of the basic SPI pin interface which is currently written
Or I could have a think about how I will implement a storage interface in the OS. I have a few ideas, but more thinking and planning is needed...

Saturday, 15 October 2016

Console driver and other OS progress

The last few month's free time has been spent knee deep in the Operating System for my 8 bit micro. The main area of attack has been on implementing a console (screen and keyboard) driver, but I've been working on a few other things as well.

Up until recently I have not tried linking the code that makes up the monitor, or MAXI09OS. Instead the assembler source code files are concatenated together via include statements. While this works, it means all labels and other symbols are global across the code. It's generally not nice structuring things this way, especially as the project grows to be quite large.

MAXI09OS is now assembled into object files and linked using aslink, a component of asxxxx. This means that only symbols that need to be global are marked as such, and other named symbols can be file-local. In all, it is a much tidier arrangement. There are still some further improvements I'd like to make to how MAXI09OS is built, especially around the idea of producing header (include) files which the "user" programs will need. I also want to structure the source code tree so that drivers have their own subdirectory etc.

I have made a fair amount of progress on the console driver. It's far from the VT102-emulated terminal I have in mind for the "finished" version, but it is useable.

The first thing I did was split out the low level hardware aspects of the UART driver into a new source file. The reason for doing this is because the console driver shares much of the UART code for interfacing with the keyboard controller. Once this was done the UART driver itself was modified to use the low level functions and re-tested. As a consequence of the keyboard controller using the slower 9600 baud rate, the low level UART code also contains a routine for setting the rate. This is also useable by the higher level UART driver's open routine through the b register.

The first thing to get working, within the console driver, was keyboard input. This operates in a similar manner to the UART driver's reading operation except that within the interrupt handler the scan codes, obtained from the keyboard MCU, are fed through a translation routine to turn them into ASCII. This routine also deals with shift key and control key handling. As before, no attempt is currently made to deal with key repeat. The resultant ASCII characters are fed into the user side circular buffer, to be picked up by the sysread subroutine when the task is woken up by a signal.

One additional complication revolves around "virtual" console support. Like with UART ports, each console is a "unit". But obviously the input side (keyboard) and output side (screen) are shared. Since they require very little MPU resources, and only consume the relatively bountiful video memory in the V9958 VDC, there are six virtual consoles. In the keyboard interrupt handler the function keys F1 to F6 are checked and if one is pushed the appropriate console is displayed by writing to the "pattern name" base register. Video memory is arranged as follows:
  • 0x0000 - patterns AKA fonts
  • 0x8000 - console unit 0
  • 0x9000 - console unit 1
This continues up to console unit 5. Note a few details: I am only using the lowest 64KB of video RAM out of a possible 192KB, and also that although the consoles only consume 24 rows of 80 columns (1960) bytes, it is not possible to pack the consoles closer together because of limitations in the pattern name base register. Anyway, this all means that console switching is instantaneous, something I'm quite pleased with.

Actually getting the console to display something was accomplished by "porting" the previously written V9958 monitor VDC code into the console's write function. The code has been tidied up quite a lot but still only supports the very basic control codes: carriage return, new line, backspace, tab, and formfeed - which, per the norm, clears the console. I have also implemented vertical scrolling.

One quite serious restriction exists. Since the VDC manages its video memory via address pointer registers it is necessary to prevent any other task from manipulating the VDC's registers. The first, brute force, approach to this was to disable interrupts while the VDC is being used to stop other tasks from being scheduled. This certainly worked but it prevents all interrupts from being serviced, even if that interrupt had nothing to do with a task which might be using the VDC.

The solution I'm currently using is to leave interrupts enabled, but disable task switching. This is accomplished by having the scheduler return into the current task if task switching is disabled.
This solution remains suboptimal; there might be tasks waiting to run which are not using the VDC; they are needlessly blocked from running. The better way is to use a mutex to pause tasks that want access to the VDC once one task has obtained access to it. Other tasks would not be affected. I might try implementing some mutex routines, though I am concerned about them being efficient enough.

Putting this console driver with the work described previously, I now have a test setup for my little OS which includes the following test tasks, each running on their own virtual console:
  1. Firstly the timer-based task as previously described. It opens the console and a timer, starting a repeating second based timer, until the space bar is pressed. At which point a 5 second non repeating timer starts. It also shows the hex values for what is read from the console driver, ie. the ASCII values for key down events.
  2. Two copies of the "enter a string, print the string" looping task.
  3. A serial terminal. This ties together a virtual console and a UART port. Both are opened and then a loop entered. In the loop both devices are waited on. The device that caused the task to wake is read, and the opposite task is written to. UART port 1 is used here.
  4. I also have a task which performs the "enter a string, print the string" operation on UART port 0.
In general, things are looking pretty good. The timer task updates its console while another tasks's console is displayed, and the UART port presents its task simultaneously.

When it comes to the serial terminal task, there are a few issues, beyond the fact that the serial terminal is only honouring the very simplest control codes.

The big issue is with scrolling. The V9958 is a great video controller but it has no special support hardware for scrolling the screen in text mode. Instead the video RAM data has to be read into MPU RAM and then written out again, but shifted up a line. For most uses of the console this is ok; the task just has some extra work to do to scroll the display. But for the serial terminal task this extra work needs to be done before the serial ports FIFO fills up.

This problem is, perhaps, unfixable at least without hardware assistance. The problem lies in the time it takes the MPU to do the line scroll vs the time taken for the UART to receive a line of text. At 9600 baud a full line of text will take roughly 1 / (9600 / 10) by 80 characters which is 83ms. I have cycle counted the MPU scroll code and it takes - ignoring setup code - 43ms. This means everything is fine if the lines are long, but if a (say) 8 char line is received there is only 8.3ms to scroll the screen, which isn't long enough. Even with the UART FIFO and MPU circular buffer, data loss will eventually occur as "short" text lines are received.

The "solution" I have come up with is to scroll the screen in half-screen chunks instead of every line. This reduces both the frequency of scrolling that's required, and it also reduces the work required to do the scrolling since more time is spent filling the video memory with zeros instead of writing out previously read in data. The result of this is not very pretty - using the console feels a bit strange - but it mostly cures the data loss problem. Only mostly though; on very short lines - the yes command is a nice way to show the problem - occasionally characters are still lost.

There is, perhaps, a better solution involving the usage of the UARTs hardware flow control lines. Hardware flow control can be used to pause data transmission when the receiver is not ready to receive it. In theory this could be used to pause the transfer while the MPU is scrolling the display. Unfortunately despite playing with this for a few hours I cannot get it to work.

So far I'm very happy with my OS's progress. The scrolling problem is annoying but I'm going to move on - serial terminal support was always going to be used as a way to test the driver model instead of being anything actually "practical". The serial terminal is a nice way to improve, and learn about, terminal emulation though, when I get around to that.

There's a few different things I could work on next:
  • The previously mentioned VT102 improvements.
  • Some ability to measure how busy the system is would be nice. Currently I use the idle LED to get a rough idea for how idle the system is, but it would be nice to know which tasks are using the MPU the most.
  • Some means for open device inheritance when a task creates another task. Currently each task is responsible for opening a UART port, or a console. It would be better if the parent task did it as this would mean the same task code could be used regardless of the IO device. It could then even be possible to have multiple instances of the same task's code running multiple times if a task either used only the stack or kept its variables in allocated memory, with the pointer to that memory stored in the stack (or perhaps the u register)
  • Another massive chunk of the Operating System revolves around proving disk access to tasks. I have thus far given this problem very little thought.
I feel what I have now is at a nice point to go back and fill in some smallish missing parts before I start thinking about the disk routines, which will be a large chunk of work. Some smaller areas to tackle include:
  • Tidying up the source tree into different subsystem directories.
  • Improve the keyboard behaviour. Key repeat is something I've put off doing for months. And then there's support for the caps lock key.
  • Reworking the debug console to make the output cleaner and easier to read.
  • UART transmit interrupts.
  • Improving interrupt processing speed. This also involves some work on DISCo's VHDL. I have a few ideas for how to simplify, and thus speed up, interrupt processing.
  • More device drivers, for example a joystick driver would be interesting to do. It would need polled and event (changes) modes.
I've also made a short video showing the test tasks, including the serial terminal:

So many ideas, but so little time...

Friday, 17 June 2016

Prototyping parts of an OS for MAXI09

This past few months has been spent thinking about, and prototyping, ideas for the Operating System I want to write for the computer. But before talking about that, a summary of the other little pieces of hardware progress. Anyone who has viewed the video will be familiar with most of this.

First up SPI. I have implemented a trivial SPI controller for DISCo. It is really nothing more then a MPU interface to the SPI pins: SCLK, MISO and MOSI, as well as the Chip Selects SS0 through SS3. Bit shifting is done in the MPU so it is far from a proper controller, but it is adequate to verify that the SPI peripherals have been wired up correctly. So far I have verified the function of the DS1305 (PDF) Real Time Clock, and the CAT25256 (PDF) 32KByte EEPROM. When I have less interesting things to do I will implement a proper SPI controller in VHDL, which will move the bit shifting from MPU code to logic in DISCo, which will speed up SPI data transfers immensely.

The behaviour of the OPL2 sound interface has also been more thoroughly tested then before. I did this by writing a 6809 routine to read in and play back DOSBox DRO format files, which had previously been written out by my laptop. I was surprised by the quality of the playback from the OPL2 on the MAXI09 board, it is indistinguishable from the DOSBox playback, but when recording it for YouTube the sound capture was poor, so I thought I would try again to do it justice. The following recording was made using a USB microphone:

So, to Operating Systems. Here is a brief summary of what I think should be achievable on my MAXI09 micro:

User's point of view
  • Multiple, at least 4, VDC+keyboard text consoles switchable via function keys
  • Serial console active concurrently
  • "DOS" like command shell: List directory, copy file, type file, delete file, create directory, run external command, etc
  • Date command; set and get time from the RTC
  • Possibly very simple batch files
  • If I really went crazy, I could think about writing an editor and assembler
  • The ability to act as a serial terminal for a Linux box, with basic VT102 terminal emulation
  • Fall back machine code monitor, entered via special key combination, based on already written one but with improvements such as command names instead of letters
Low level
  • Pre-emptive multitasking with at least 8 tasks
  • Signal based events and messages
  • No direct hardware access for tasks, with the exception of the VDC (necessary for performance reasons)
  • Byte-wise driver abstraction layer for at least: VDC text mode, keyboard, UARTs, parallel port, SPI, joysticks
  • Block abstraction layer for IDE
  • "Special" drivers for OPL2
  • Timer driver for repeat and non repeat delays, with an interface into the RTC via SPI abstraction
  • Polled or interrupt/signal IO
  • MinixFS support, read and write; likely whole files not proper byte-streams
  • Dynamic memory allocation; used by the system and user tasks
  • Runtime loadable modules for drivers and libraries
This feels very ambitious, but so was the MAXI09 hardware. I have no intention of writing an "8 bit UNIX"; others have already done this. Some elements, eg. the multiple console idea, are borrowed from Linux. But the bigger influence is my usage of AmigaOS with its light-weight multitasking.

I have a few bits of hardware in the MAXI09 to help with things which, on the surface, seem complex. The V9958 should make multiple consoles fairly trivial to do because the screen text for all the consoles can be held in video RAM at all times. Switching consoles should largely be a matter of writing to a couple of registers.

Lots of questions remain. One of the interesting ones is what should the interface to the Operating System calls look like? The "pure" approach is to use software interrupts. This is the preferred way because the service routine is isolated from the caller with a new stack frame, with hardware interrupts from within the service routine automatically disabled. It is also even more robust in the face of future change then a jump table because the number of the service routine to execute can be expressed as a simple register value. None the less they have a significant performance penalty compared to a simple jump table. For now I have not dealt with this problem; system calls are just like any other subroutine call.

Another big question mark hangs over my ideas for using MuDdy to implement an MMU. This will be useful for isolating tasks and should enable a single task to access the full 64KByte memory space. But for now I will ignore this idea.

I have begun by prototyping up some of the facilities which the OS (unimaginatively named MAXI09OS) will need: a dynamic memory system, linked lists, task switching, and the driver model. I've also written some test tasks to help me verify that everything is working.

Dynamic memory management is required principally because the number and size of the tasks, and other objects, in the system is a variable quantity. Tasks themselves will need to allocate memory as they need it and the memory required to hold the code for the task itself will need to be allocated, by the OS, from the free pool. Without a multitasking environment most 8 bit systems get by without a dynamic memory manager.

My prototype is simple. A memory block has the following form:
  1. Pointer to the next block, or 0 for the end of the list
  2. The size of the block (including this header)
  3. A byte for flags, which is currently used to indicate wether the block is free or not
  4. The data block bytes themselves
In this trivial design these blocks are arranged across the dynamic memory area, know as the heap, which is initialised as a single block marked as being free. Allocation of a portion of memory is achieved by scanning the list of memory blocks until a large enough free one is found. If it is, then the block is split in two: the first half is the claimed space, and the second half is the remainder of the original, free, block. The next pointers and the block sizes need to be adjust appropriately to keep the list of blocks intact.

As an illustration of how this works, here is what the heap looks like after a sequence of operations are performed. The size is of the heap is 100 bytes.

For ease of illustration all values are in decimal and the heap begins at location 0. First the freshly initialised heap:

Next 10 bytes is allocated:

Only 5 bytes are useable. The 10 byte block is freed, and a 50 byte block is allocated:

Note the fragmentation, and the fact that the 50 byte block does not use any space from the freed up 10 byte block. This can be cured by coalescing adjacent free blocks, something I will get round to doing soon.

Anyway despite, or perhaps because of, its small code size - only about 80 lines of 6809 assembly - my dynamic memory allocator seems to work quite well. To actually get this implemented I first wrote the routines, then constructed test routines around them using the existing monitor program.

Next, linked lists. The requirements for the structure of memory areas in the heap are limited; the list will only ever need to be scanned one way. Therefore singly linked lists are appropriate: there is no benefit in maintaining links backwards through the chain. But for other OS “objects” this is not good enough - backwards scanning will be needed, so doubly linked lists would be beneficial.

Being familiar with AmigaOS I have decided to implement the same optimisation as used in AmigaOS’s Exec linked list: a hidden node in the the list header. This avoids some special casing on list operations and generally makes for a more efficient operation then a traditional list which uses nulls in the header to indicate an empty list.

The following list operations are available:
  • Initialise the list (since it is not possible to simply fill the header with nulls) (initlist)
  • Append a node to the tail (addtail)
  • Prepend a node to the head (addhead)
  • Remove the node from the tail, returning it in a register (remtail)
  • Remove the node from the head, returning it in a register (redhead)
  • Remove a node from anywhere in the list (remove)
The lists are similar to “minimal lists” in AmigaOS terms. There is not, yet, a priority field and thus no means to insert a list at it’s priority position. Nor is it possible to insert a node after a given node. None the less, the operations available should be sufficient for my little OS. For anyone interested in this code, you can see it here.

I have also made a start on a device abstraction layer. In the case of the UART driver this includes interrupts for reception.

A note on the terminology I'm trying to stick to. A driver is a set of routines for dealing with a type of device. Device are open for a driver type, with perhaps additional parameters passed in for things like the port to open.

Abstraction layers appear even in very simple operating environments. They are useful because it allows user programs to be written without regard for how input and output gets into them. For now, at least, the abstraction will be of an IO device, and not of a file on disk. It’s not yet clear how I will implement file IO. Another advantage of a driver abstraction is they allow the same generic routines to be used regardless of the underlying hardware.

The starting point for this is a “open device handle”. This is an object handle which can be passed to a “write” system function to output a byte on a particular device, be it the screen or a particular UART channel. It's nature as a handle means that the external user task will not be concerned with what it contains.

Opening a device first requires a driver object. The structure representing a driver has the following properties:
  1. Pointer to the driver prepare routine
  2. Pointer to the device open subroutine
  3. A device name (up to 8 characters plus a null)
The driver prepare routine allows global initialisation of all possible devices of that type to occur. These routines are ran before the OS itself is fully running. This usually includes setting up of the device list, but it might also include setting up interrupt routing if that is better done prior to a device being opened.

Drivers are currently held in a table in ROM. A routine, sysopen, takes a device name (null terminated) in the x register, and any other applicable arguments in other registers. For instance, with the UART driver the channel number (0 for PORT A, etc) is held in the a register. Once sysopen has located the address of the particular driver's open routine, that routine is then run. This performs the necessary functions for the driver. For instance, in the UART case it ensures the channel is not already in use, allocates memory for the "open device" handle, and then initialises the hardware in the UART IC. Finally it configures interrupts in the interrupt routing register inside DISCo. There will be more about interrupts later.

Upon a successful call to the “open device” subroutine, a device handle is returned. This handle is driver dependant but contains at least:
  1. A list node
  2. A pointer back to the driver table entry with which this device is a type of
  3. A pointer to the device's close routine
  4. Same for the read routine
  5. And the write routine
  6. A pointer to the generic "control" routine
The reason the close, read, write and control routine pointers are held in the open device structure is speed; since write, for example, takes an argument - the device structure pointer - if the write routine pointer was in in the driver structure an additional indirection would be needed to locate it on each syswrite call. Nonetheless I may re-evaluate this decision.

The purpose of the control routine is to provide a mechanism to perform arbitrary operations on the the device which are not data reads or writes. For example, the timer driver uses this to start and stop a timer. The UART driver will use this, some day, for allowing the sending of a break signal, changing baud rates etc.

Open devices are held in a per driver-type list, the purpose of which is to facilitate interrupt processing.

In the case of the UART driver, a whole load of additional information is needed:
  1. RX circular buffer (32 bytes)
  2. TX circular buffer (32 bytes, currently unused)
  3. ISR and "user" counters for the two buffers
  4. A pointer to the task that opened the UART channel (this is used for task switching)
  5. The allocated signal mask (signals are explained later)
  6. The base address for the hardware IO port for this port
  7. The port number
There's also some global state which is not part of the open device structure. This includes things like wether a port is in use. In theory this is redundant because of the port being in the device record, and the device record being pushed into a list, but the list node was added relatively recently.

At the moment writes are not interrupt based, but reads are. In fact, on a relatively slow machine like the MAXI09, interrupt-driven writes for the serial port are not all that useful, since even with the action of the UART FIFO, the UART would need its write-based ISR servicing rather frequently, unless the baud rate was quite slow. Nonetheless I will implement interrupt-driven writes at some point. Interrupt based reads are, on the other hand, very useful, since the task handling the UART channel can be put to sleep whilst it waits for data, most likely from the remote end's keyboard. This allows the computer to run other tasks while it waits for UART activity, without the performance hit you'd see polling the port. This leads us on nicely to the next topic.

Multitasking. This is a massive area. Modern OSes on PC are huge beasties and, ignoring fluff like the User Interface, much of this complexity revolves around the multitasking environment presented to the user. But multitasking is possible on nearly anything that might be classed as a computer, from microcontrollers and up. I have decided to attempt to implement a fairly simple, but effective, form of multitasking; it is loosely based, once gain, on AmigaOS ideas, but obviously it is much simpler to work on the 8 bit 6809. Whilst it is preemptive - tasks and interrupt handlers can force other tasks to be scheduled - there is no priority value so all tasks, except the special case idle task, are treated the same. The first thing to do was figure out how task switching should actually work from a MPU standpoint.

On a simple machine like a 6809 without any way to implement memory protection, task switching can be achieved by setting up a periodic interrupt and having the Stack Pointer adjusted so that when the RTI (Return from Interrupt) is executed at the bottom of the interrupt handler, a different path of execution is taken then the one the system was on when the interrupt was entered. If all of the machines registers are also saved via the stack, then the system should behave just as if the task being switched into had never been paused while another task ran.

This setup requires some kind of record to hold basic information on each task, including the stack pointer at the time the task was last running, as well as space set aside for the task's stack. The memory allocation system, just described, is used to create space for these structures, which currently look like this:
  1. List node
  2. Saved Stack Pointer
  3. Initial Program Counter (this is only used to identify tasks when debugging, in lieu of a task name field)
Preceding this record is the task's stack, currently fixed at 100 bytes. Eventually tasks will get names, and possibly an ID number.

The location of the first byte in a task structure is used as a handle to a task. Creating a task consists of the following steps. Currently the only parameter to this subroutine is the initial Program Counter, ie. the start of the task's code.
  1. Allocate 100 + the size of the task record bytes
  2. Calculate what the SP will need to be at task start
  3. Write a value into the stack for the Condition Code register which has the E (entire register set) bit set
  4. Write the initial PC into the stack
  5. Write the initial PC and SP into the task record
The stack for a full register set, as generated by an active signal on the IRQ line, looks like this:

From the 6809 Programming Manual

Thus the SP for a task, prior to starting it with an RTI, points to the address of the first register stacked (Condition Code). Upon the RTI instruction the machine state, including the Program Counter will be pulled off and the task will be started. The MPU pulls off the entire machine state  because the E bit in the Condition Code register will be set.

The timer interrupt handler, which is where task scheduling usually occurs, is roughly described by the following steps:
  1. First the current task pointer is retrieved from the global used to hold it
  2. The Stack Pointer register is saved into the current task structure
  3. If the ready queue is empty then no tasks are in their runnable state, so schedule the idle task and proceed to step 5
  4. Otherwise, rotate the ready queue; the previous head ready task becomes the tail task
  5. In any case, update the current task pointer and pull off that tasks stack pointer
Now we can RTI to run that task, which might be the idle task, or the same task as when this ISR was entered, or a completely different task.

One obvious optimisation to this sequence is to skip the ready list (queue) if there is only a single task in the ready set - if that's the case we can just leave the ready list alone.

Tasks can be in one of two lists: the ready list or the waiting list. Currently the running task is also in the ready queue; this simplifies some of the logic. Tasks are normally in the ready queue unless they are blocked waiting for an event; a signal. Signals are generated either by another task, or by an interrupt handler. Signal semantics are loosely copied from AmigaOS signals:
  • One difference is that only 8 signals are available instead of 32
    • At a future time I will utililise one special bit for forced task termination
  • Tasks, and drivers, obtain a signal (actually a mask with a single bit set) via allocsignal
  • Waiting, via wait, takes a bit mask. If any signal bit becomes set the task wakes up. Also, if the signal is pending before wait is called then the bit is cleared and the tasks skips the wait. In any case the bits that woke up (or not) the task are returned
  • Signalling, via signal, takes a task structure pointer and a set of bits. There's also a variant of this routine, intsignal, for use by interrupt handlers
So far the only use of signals is so that tasks can sleep (block) while they wait for serial input. This is really nice as it means that tasks that are just sat there waiting for button presses don't chew up MPU time. A further usage of signals is so that tasks can be informed of events, such as tasks being told to exit - eg. by a keystroke like Ctrl-C. AmigaOS built its messaging system (pending messages attached to either an anonymous or named list) with signals being the main mechanism used to inform the recipient of a new message. This was used for everything from GUI events to a high level application automation mechanism - via Arexx. I'm not yet sure if I'll need a message system for MAXI09OS. It would be trivial to implement on top of the pieces I have; I'm just unsure if I will need such a system yet. In addition, a concern I have about this kind of system is that each message will require a memory allocation, which will invariably lead to problems with memory fragmentation.

Actual interrupt handling is an involved processes. I must say I am not especially happy with the sheer number of steps required to process, say, a UART interrupt. DISCo's interrupt routing makes things marginally more efficient but the very rough order of processing is still something like:
  1. Read the interrupt status register from the DISCo FPGA
  2. Loop through the claimed interrupt signals, in priority order
  3. Jump to the handler when a match is found
  4. In the device handler (UART in this case) loop through the active channel list
  5. From the device pointer, obtain the start address for the UART channel
  6. Get the UART port IRQ status (on the 6809 this can be done with: lda ISR16C654,x)
  7. Test it to check the event type
  8. For reads, jump to the UART read handler, passing the device handler and base address for the channel
  9. Get the line state to see if there are bytes in the FIFO (lda LSR16C654,x)
  10. Loop Testing for RX having data, writing it into the CPUs 32 byte circular data
  11. When we have filled the buffer, or there isn't any more exit that loop
  12. Schedule the task which owns the port by moving it to the top of the ready list
  13. Enter the scheduler, which will save the current tasks state (which probably isn't the task that owns the UART channel)
  14. Make the top of the ready list task the active one by setting the Stack Pointer up for it
  15. Finally we can RTI
In the case of the UART the FIFO helps a lot. Even with it I have still had to reduce the console baud rate to 19,200 for fear of loosing data. A viable alternative to lowering the data rate would be to introduce hardware flow control, but I'm not that keen on that. I'm sure further optimisations in interrupt processing are possible anyway.

I have implemented two additional drivers just to prove the model: one for the LED attached to DISCo, and one for timer events.

The LED driver is just pointless fun. Ordinary I don't want tasks to be banging on hardware directly. Eventually an MMU implementation should prevent it. To show how hardware would always be accessed through a driver layer I have written a driver for the simplest hardware on the board: an LED. Writing a 1 turns the LED on; you can probably guess what writing a 0 does. To show the utility of a driver even for this trivial hardware, only one task can open the LED at a time. And the task which owns the LED is currently the idle task, which strobes the LED as it runs so you can tell how busy (or idle) the system is.

The timer driver is more useful. The system itself uses a 40hz interrupt to schedule tasks, and this driver hangs off of it. Each instance of an open timer device allows for a single timer action which is configured via syscontrol. Timers can be stopped or started. When started, the interval and wether the timer is a one off or in repeat mode can be set via a simple command block passed via y.

The timer device interrupt handler is fairly straightforward. It has to scan the open timer devices, updating counts and dealing with repeating timers. If a timer has reached its interval value the task that owns the timer is signalled.

So far I have written two test tasks.

The first task is the more interesting of the two. I wrote it so I could test timers and waiting for multiple signals in one task.

The task starts by opening UART port A, and a timer. The timer is initially configured to repeat once a second. The main loop of the task waits on both device signal bits, indicating which of the two signals was received via the UART port. In the case of the UART, outstanding bytes are pulled down via sysread, until no more bytes are outstanding. To make things more interesting if a space character is read the timer is reconfigured, via syscontrol, to be a non repeating 5 seconds one.

This all works quite well, as the following screenshot tries to show:

The top of the screenshot shows the periodic (one second) timer expiring a couple of times. Then I typed h, e, l, l, o and finally space (20 hex), which turns the timer into a 5 second non repeating timer.

The second one is a version of one of the first test programs I wrote in 6809 assembly: a loop that prompts for a string, gets the string, and then outputs the string. This task makes use of two wrapper functions, getstr and putstr that wrap the device routines for byte-based IO. The getstr routine uses wait when sysread indicates there is no data available yet. It is not possible for these routines to read from multiple devices at the same time, or to do anything else at all whilst getting the string. The source code for these routines is available here.

Here's a not very interesting screenshot of this task running:

Of course this is all the more interesting because both tasks were running at the same time. I have even had two computers (the regular Linux box and the MacBook) attached at the same time, one for each task.

I must have a bug somewhere though as very occasionally I have seen system lock ups. To try to diagnose this, and earlier since corrected problems, I wrote a simple debug handler using the spare UART port C (the TTL level channel). So far no luck figuring out this particular problem, but I'm sure I will work it out soon.

All in all I am pleased with how my little OS is coming along. There's obviously a great more still to do, but I'm very happy that large pieces of the OS core are starting to take shape. The next thing I will start on, after fixing the lock up bug, is the VDC console driver...

Saturday, 27 February 2016

Physical construction of the MAXI09 PCB is now complete

Physical construction of the MAXI09 PCB is now complete. A final, 3 hour, session with the soldering iron completed that task. Here is a picture of the top of the board:

I have also found a solution to the problem of unwanted flux residue. After trying out various things including isopropyl alcohol, it seems that the best approach is good old white spirit applied with an old toothbrush, followed by a rinse under the tap. Thanks to my work colleague Rebecca Gellman of the Retro Computer Museum for the tip! I'm no longer embarrassed to include a picture of the bottom of the board:

You can see the fix for the MEMRDY and /DMAREQ pins, which sour the otherwise error-free board.

The other piece of construction I've done is to, albeit temporarily, mount the PCB and keyboard on a piece of acrylic. I hope to eventually redo this in a more permanent way, but for now I can at least type on the keyboard. MAXI09 now resembles a real computer!

Here is a picture of the computer. At the time it was plugged into the living room TV:

This picture, if you look closely, shows a Compact Flash attached to the IDE port. I was pleased that this works nicely, as before, though thus far the CF is operating only in 8 bit mode, and without the help of the DMA Controller.

Although there remains much work to be done, both in programmable logic and in terms of software, MAXI09 can now be used almost like a real 80s computer:
  • Attaches to a TV/RGB monitor
  • Keyboard
  • Joystick ports
  • Sound
  • Mass storage
To illustrate the above, here's a picture of the computer playing the Snake game I wrote a year or so ago for my previous board:

The Snake game was loaded from the Compact Flash, with the keyboard used for operating the computer.

The computers's main limitations are down to the software. My Monitor is not a proper operating environment. Eventually I hope to write a simple multitasking Operating System, but in the mean time the Monitor can be used to test the hardware, including the parts implemented in programmable logic, and to try out ideas for the OS I hope to write, investigate interrupts, etc.

Another thing I have been working on is a boot loader. The main purpose of this is to allow the EEPROM to be reprogrammed regardless of its content. One of the flaws with the previously implemented in-system EEPROM reprogramming is that the code to reprogram the device is stored in the EEPROM. If the reprogramming fails, maybe because of a power interruption, then the computer is un-bootable, necessitating a pull of the EEPROM and a reprogram on my home made EEPROM programmer. The solution to this problem is to put the reprogramming code somewhere else. In the case of the MAXI09 there is one interesting possibility: the MuDdy FPGA. The Flex10K (PDF) are interesting because as well as the programmable logic, they also have RAM bits. These can be used for anything including data logging or lookup tables for maths functions, but a simpler use is to expose the memory on external pins. It is also possible to pre-load the memory with content set at configuration time. And removing the memory array's write line makes it behave as a ROM.

The EPF10K10 has 3 lots of 2048 bits, or 3 by 256 bytes of RAM, called EAB (PDF) by Altera. 256 bytes is enough to implement a flash routine in 6809 machine code. This leaves 2 by 256 bytes for other purposes, like the MMU functionality when I get round to implementing it. It might also be possible to implement a boot loader which reads the OS code from blocks in the IDE device, or receives the image via XMODEM.

One further complication in my boot loader is that it needs to overlay the top most page of EEPROM at 0xFF00 to 0xFFFF. This is necessary because after the boot loader has switched to the runtime system (the OS, or as it currently stands the Monitor), that image and not the loader needs to occupy the top page so that the interrupt vectors are in place.

This is accomplished by having another register in MuDdy which controls what is presented at the top page. A zero, the default, puts the loader FPGA ROM there and a one puts the EEPROM at that page. For completeness it is also possible, via another bit in the same register, to write project the EEPROM. This is in addition to write protection hardware jumper.

This register is called MUDDYSTATE, although this is probably a bad name for it. Bit 0, when 1, write enables the EEPROM. Bit 1, when 1, maps the EEPROM onto the top most page. All other bits are currently unused and will be ignored here.

A simplified description of the new MAXI09 boot process is as follows:
  1. MUDDYSTATE is set to 0b00 after MuDdy has been configured.
  2. 6809 resets into the loader code
  3. Loader outputs start-up banner saying: press f to flash
  4. Two second, approximately, wait for input
  5. If no input, jump to the last step
  6. If we got an f then the operator wants to reprogram the EEPROM
  7. Copy the loader image, 256 bytes, to a scratch area of RAM
  8. Calculate the new position for the programming routine and jump to it
  9. Enable writing to the EEPROM, and map it into the top page by writing 0b11 into MUDDYSTATE
  10. Copy 16 KByte of 64 byte pages from the serial port to the EEPROM, much as done previously
  11. Disable writing to the EEPROM by writing 0b10 to MUDDYSTATE
  12. Send the new content of the EEPROM back to host computer so it can verify the content, much as before
  13. Jump to the top of EEPROM, at 0xC000, which will start the OS or Monitor
Inside the OS, almost the very first thing it needs to do to is write 0b10 into MUDDYSTATE, in case the EEPROM was not just flashed. This will ensure the loader ROM is not at the top page regardless of wether it was previously reprogrammed, and that writing to the EEPROM is disabled.

The loader code is cobbled together from various bits of Monitor code, including serial routines and the reprogramming code itself. Even after I finally implement it fully, serial IO inside the loader program will always be polled and not interrupt driven. This is both to reduce the size of the code and to keep it as simple as possible. Currently the loader program is about 220 bytes in length, so not much more room is available.

Actually getting the loader code into the MuDdy FPGA is an interesting process. After producing the 256 loader binary via the ASxxxx assembler, it needs to be converted into a format that the Altera tools require. Two file formats are supported: Intel HEX and Altera's own Memory Image File (MIF). After trying and failing to get the ASxxxx linker, aslink, to produce compatible Intel HEX  files, I eventually settled on writing a simple Perl script that converts a binary file into the MIF format. This file is then associated with a RAM array VHDL component which in turn is instantiated by a wizard within the Quartus tool.

The summary of all this is that the loader works very well. Regardless of the state of the EEPROM, I can always recover the system. It is even possible to start the computer without the EEPROM socketed; albeit all you see at the serial console is the prompt to start the reprogramming process.

As I've said, there is much more to be done before starting properly on planning the OS and related software.
  • Whilst the IDE interface works fine in 8 bit mode, I want to get 16 bit transfers implemented. This will involve implementing a latch in DISCo to hold the high byte of the 16 bit IDE data bus, as I've described previously.
  • Also, now that I have a, albeit 8 bit IDE interface, it would be nice to modify my, somewhat ropey, low level IDE routines to make use of the DMA Controller implemented in MuDdy.
  • Although the Amiga 600 keyboard is functional, the work needs finishing. Scan-codes need to be interpreted for the cursor and other non alpha-numeric keys, commands need to be added to twiddle with the RGB LED and caps lock LED, a keyboard-activated reset is needed, etc.
  • Now that the SPI peripheral ICs have been attached to the board, I have no excuse not to implement an SPI bus controller inside DISCo. It will permit access to to DS1305 (PDF) Real Time Clock, analogue joystick pots, and the 32KB SPI EEPROM.
Another piece of hardware I want to play with is my "new" printer. I found, through eBay, a Citizen 120D printer. Here's a stock picture of one of these 9 pin dot matrix printers:

I paid a good price, and it is complete with a manual and even the C64 interface module. This was the first printer I ever owned; I used one with my Amiga 1200 in the 90s. I'd like to test that the parallel port on the MAXI09 board works properly by sending the printer data from memory, using a Monitor command specially written for this purpose.

Another thing I hope to do very soon is record a video of the MAXI09 in action...

Update: And here is that video. The "production values" are terrible, but hopefully someone will find it interesting: