Big Mess o’ Wires


An 8-bit home-built CPU

Multitasking Success

After a LOT of fiddling, I finally pulled off a successful multitasking demo. It’s not the most exciting thing in the world to watch, but once you understand what’s happening, you’ll appreciate the nerd factor. The photo below shows BMOW powering-up and starting two processes, which are then multitasked to create the appearance of simultaneous execution. First the boot code prints “boot” to prove that the machine is alive, and then a few punctuation marks that serve as status codes while it’s initializing the two processes. Following that is the output from the processes themselves. One prints the letters A-M, and the other prints the digits 0-9.

multitasking demo shot

The two processes have different delay lengths between each printed character, which is why the sequence doesn’t appear strictly as letter-number-letter-number. The differing delays were intended to help expose race conditions: wait long enough, and the two processes should attempt to print to the LCD at the same time, exposing any possible bugs with resource contention.

How it Works

BMOW multitasking required writing a simple process manager and scheduler, and takes advantage of the machine’s semi-protected memory banks. BMOW memory is organized into banks of 64K each, with the OS occupying bank 0, and other processes each occupying a private bank. Code executing outside of bank 0 is unable to directly view or modify memory in other banks. It can only invoke specific system procedures to do such work on its behalf, using the JSP instruction (jump to system procedure).

At power-on time, the boot code copies the letters program into bank 1, and the numbers program into bank 2. It then calls a scheduler routine to add process table entries for the two processes. This routine sets up some initial stack data in the appropriate bank, to mimic what would be on the stack if the process had already been running, but was suspended by an interrupt. It then initializes the process table entry to contain the process’ stack pointer value, and marks that table entry as active. Finally, the boot code sets the real-time clock to generate an interrupt every 2 ms, and halts.

Two milliseconds later, the first interrupt is triggered. The interrupt service routine first pushes the current register contents and program counter value onto the stack. Since each process (each bank) has a separate stack, this means the registers are saved onto the stack of whatever process was interrupted. Then the ISR checks to see if it’s a timer interrupt (it could also be keyboard or USB). If it’s a timer interrupt, it calls a scheduler routine to find a new process to run.

The scheduler stores the value of the stack pointer into the current process’ table entry. Then it scans the process table, beginning at the next entry after the current process, looking for active entries. When it finds one, it copies the value in the process table entry into the stack pointer, and then executes the RTI instruction (return from interrupt). This instruction pops the program counter and register contents off the stack, and resumes execution of the new process where it left off.

Each process knows nothing about the other processes, and thinks that it’s in complete control of a 64K machine. From the point of view of the process, it can’t even tell that anything special happens when it’s suspended and restored, except that some time has passed that can’t be accounted for. It’s a bit like an alien abduction for a BMOW process.

Observant readers may have wondered: where do the letters and numbers programs come from? On a typical computer, the OS loads and runs programs from disk or some other storage medium, but BMOW has none. For this demo, the programs are compiled separately, and then embedded into the OS itself, which is stored in ROM. The programs appear as data to the OS, and it just blindly copies their bytes into the appropriate memory banks. This works, but obviously isn’t an ideal solution, because it ties the OS and the demo programs together into a single unit. It’s a reasonable work-around until BMOW gets some storage medium that can be altered independently of the OS boot ROM.

The Road from Here to There

Planning everything out and writing the scheduler code was actually pretty simple. I had the multitasking demo working fine on the BMOW simulator a week ago, but on the real hardware, none of the processes would start. Then I procrastinated looking into it further, and built the Robo-Flower instead (see previous post).

It turned out that there were multiple hardware problems to overcome. After a lot of experimentation, I was able to determine that the processes weren’t starting because their program codes weren’t correctly copied into their memory banks. A bug in the memory copy routine, maybe? But then why wouldn’t it show up in the simulator? I tried writing a simpler memory copy routine just for this purpose, and it worked, but I was stumped as to why, and the simpler routine wouldn’t work in the general case.

Finally I noticed that my original copy routine was doing an indirect load through a pointer that happened to be stored at an address ending in 7F hex. Reading the two byte pointer required incrementing the address from 7F to 80 in order to read the high byte. On a hunch, I changed the program slightly to store the pointer at 80 instead (incrementing to 81 for the high byte), and it worked!

This looked suspiciously like a very nasty clock glitch problem I ran into a few months ago, that laid waste to BMOW for a while. In that case, the low byte of the program counter would sometimes increment from FF to 01, or from 7F to 81, as if it were being incremented twice. Now I had the address register doing the same thing. I “fixed” the problem with the PC by making a nonsensical change to the GAL equations that implemented it, but I never really resolved the underlying issue. Unfortunately when I made the same change to the address register GAL equations, it didn’t help.

Back when the PC incrementing problem occurred, I was pretty sure it was due to some noise or signal reflection on the clock wire. I tried a couple of quick signal termination tests, but they didn’t help, and I didn’t follow-up at the time. Now I was considering more signal termination experiments again, when it occurred to me to wonder why I had my clock signal daisy-chained to so many chips to begin with. The clock signal comes from a buffer with eight outputs, so I could easily run a new clock line to some of the chips, shortening the existing clock line, and hopefully reducing noise and reflections. All it took were a few minutes with the wire wrap tool, and it worked like a champ! I don’t know why I didn’t do that in the first place.

I wasn’t quite out of the woods yet, though. The two processes ran now, but the output was a mess. Every 10-20 characters was some garbage symbol rather than a letter or number, and the display didn’t word-wrap properly. After an initial heart attack, I quickly realized this was likely the result of a bug in the microcode for BRK (break) that I fixed a while back, where the X register wasn’t correctly preserved across interrupts. Although I’d made a fix in the microcode source a few weeks ago, I’d postponed reprogramming the micro-ROMs, since every time I touch them something bad seems to happen. This time proved to be no exception: I reprogrammed the micro-ROMs, powered up the machine, and it failed to boot at all. After a second heart attack, I tried pushing and prodding the micro-ROMs until BMOW came back to life, which it eventually did with successful results. Obviously there’s a loose connection somewhere on one of the micro-ROM pins, which will undoubtedly come back to haunt me later.

Future Plans

I’m pretty happy with how the multitasking demo worked out, and I can now check off one of the “stretch goals” listed on the About BMOW page. To really finish the multitasking support, though, there’s still a fair bit of work left to be done. For one, there’s no way to stop a process once it’s started! There’s also no way to get info about running processes. A lot of time is wasted in the scheduler and the system procedure interface that could be optimized. There’s no test-and-set atomic operation that can be used as a semaphore. There’s no dedicated stack for interrupt handling: it uses the interrupted process’ stack, which means if that process had an invalid SP, the whole machine will crash.

Really the whole JSP (jump to system procedure) interface for calling OS routines needs to be reworked. JSP is implemented as a jump to an address in bank 0. Supply a bad address, and it will crash. This should really be replaced by some kind of indexed trap system, where JSP takes a trap number as a parameter rather than an address.

At this point, I’m going to archive what I’ve got, and then turn BMOW back into a single-process machine. Without a windowed graphical display or multiple output terminals, there’s not really any compelling use case for multitasking. But running only a single process on a multitasking-capable machine is very inefficient, at least how I’ve implemented it. The scheduler overhead and the indirection of the JSP interface eat up a lot of time that could otherwise be better-spent by the single process.

I’ll try to avoid doing anything major that would prevent switching back to multi-tasking in the future, so for now I’ll just disable the scheduler interrupt, and maybe provide some lighter-weight interface to the OS routines. I’ll keep the 64K bank model for the time being, since I don’t yet have any examples that would benefit from more than 64K, and changing it would be non-trivial. In particular I’d need to change the microcode for all the instructions to expect 3-byte addresses instead of 2-byte ones, which would add an extra clock cycle to the execution time of any instruction that manipulates an address. Worse, certain address-manipulation instructions might become impossible for 3-byte addresses unless I added a second temporary register somehow. So I’ll push those problems off into the future, and concentrate on other things.

No comments

Robo-Flower

I recently took a break from BMOW’s multi-tasking support to work on something completely different. (Actually, I have a multi-tasking demo working in the simultator, but it doesn’t work on the real hardware yet.) The “something different” was a robo-flower: a pulsing light powered from solar energy that’s made to look like a flower from Junkyard Wars. The circuit was based on this Make Magazine article, but the construction design is my own. Here’s a photo of the result:

Robo-Flower

During the day, the solar cell charges the main storage capacitor. At night, the storage capacitor powers the circuit, making the LED in the flower’s center pulse organically every few seconds. I followed the circuit design from the Make article, with these exceptions:

  • Replaced the 1000uF “pumm” capacitor (C3) with a 3300uF one. This was one of the suggested alterations.
  • Replaced the rechargable battery with a 1F “super capacitor”. A 10F capacitor was a suggested alteration, but I couldn’t find one.
  • Where their diagram shows resistor R3 connected to the positive power supply, I instead connected it through a diode to the negative end of the LED. This makes the LED turn off all the way at the end of each pumm, which I think looks better.

The robo-flower works, but I’m pretty disappointed with how it turned out. For one thing, the pulses of light are surprisingly dim, compared to what I’d expected. They reach a brightness I’d call “on”, but not bright. If you look at the circuit diagram, you’ll see that the pulses come from the 3300uF capacitor (charged to about 3v) discharging through an LED. With no current-limiting resistor, there should be a brief pulse of high current = high brightness, but that’s not what I’m seeing.

The bigger disappointment is that even after leaving the flower in the sun until the main 1F storage capacitor is fully charged (as measured with a voltmeter), the pulsing only lasts for about 15 minutes after the sun’s illumination is lost. That’s pretty lame. Other designs that use rechargable batteries pulse nearly all night, I think, so I was expecting much more. Yes, it’s a capacitor powering the circuit and not a battery, but 1F is a mighty-big capacitance.

If I get motivated enough, I may build another copy of the circuit on a protoboard, so I can experiment with it further, or measure the voltage at various points on the oscilloscope, to see if I can’t improve the behavior. Otherwise it’ll be one more piece of odd electronics junk littering my home. :-)

No comments

User Mode Programs

I made a little progress this week on separating what I hope will eventually become the operating system software from everything else. Up until now, all the test programs I’ve written have essentially been the operating system, or part of it: their executable code exists in ROM, they execute in memory bank 0, and they make direct subroutine calls to library routines for handling the keyboard and LCD. I finally cleaned this up a little, and got my first working example of a user program, separate from the OS. It just prints the letter “A” endlessly and doesn’t make for an exciting demo, but it’s a start.

Since there’s no disk or other storage media yet, the user program was actually embedded in the OS boot code, as a piece of data. At boot-up, the OS copies the user program into memory bank 1, which is RAM instead of ROM, and then jumps to it to begin execution. The program then executes inside of bank 1, and has no visibility or access to other banks. In order to access hardware like the keyboard or LCD, it uses a new instruction I added named JSP, or “jump to system procedure”. This instruction takes a two byte immediate argument, which indicates the subsystem (LCD, keyboard, clock, etc) and procedure to be invoked. The microcode for this instruction switches to bank 0, and then uses the arguments to lookup into a jump table to find the address of a system procedure handler routine to invoke. The handler does the requested work, and a RSP instruction (return from system procedure) returns control to the original program in its original bank.

The idea is that JSP should allow user programs to access hardware resources in a controlled way, without doing a jump directly to some bank 0 handler address, since handlers might change location between versions of the OS. Jumping to the wrong bank 0 address might also cause the OS to write invalid data or crash in the worst case. Of course the trade-off for this extra layer of protection is that access to hardware resources is more cumbersome and slow.

This is all still executing as a single process for now: The OS program starts, sets up the user program, and jumps to it. Once that happens, there is no more OS program running. However, the next logical step is to add an interrupt handler to swap between several user programs. In theory this shouldn’t be difficult: just store a table of saved registers for each process somewhere in bank 0. Whenever a timer interrupt occurs, the handler would copy the current register contents into an entry in the table, fill the registers using another entry in the table, and then return. I think the harder part is actually thinking of two programs that would make sense to run at the same time on BMOW. Multiprocessing makes a lot more sense on a graphical display with windows than on a text screen. With only one LCD, output from two programs would be jumbled together. Maybe I could make one program that uses the keyboard and LCD for I/O, while another runs simultaneously using the USB interface. Or maybe someone here can suggest a more impressive multiprocessing demo?

2 comments

Clocks and Interrupts

Today I moved a few steps closer to completion of the initial hardware design. First, I verified that BMOW’s interrupt mechanism actually works. My previous experiments with the keyboard and USB employed busy loops to continuously poll for new values. Although I had built in support for an interrupt line from the beginning, I never actually tested it until today. Luckily, it works fine. When interrupts are enabled, and a device signals an interrupt request, the next time an instruction is fetched from memory it will read zero instead of the instruction opcode. Opcode zero is the “break” instruction, which saves the current register contents and jumps to an interrupt service routine whose address is stored in a dedicated trap vector. The ISR then queries each device to determine which one generated the interrupt, and handles whatever work needs to be done. Finally, an RTI (return from interrupt) instruction restores the register contents and re-fetches the instruction that was interrupted.

As a simple test, I rewrote my earlier keyboard editor demo to use interrupts for the keyboard rather than polling. I had the main program print a continuous series of dots to the LCD, while the interrupt service routine decoded keyboard input and echoed the key to the LCD. It worked as you’d expect: a continuous series of dots that got intermixed with other letters if you typed at the keyboard.

The second task of the afternoon was to add BMOW’s real-time clock. I’d previously planned for the RTC and incorporated it into the design schematics, but hadn’t actually connected the hardware until now. I’m using a Texas Instruments BQ4847 integrated RTC module with a built-in battery. The RTC enables BMOW to know what date and time it is, which is nice, but its more important use is as a programmable interrupt timer. I plan to use a repeating RTC timer interrupt with a period of 1ms or so as the basis for multi-tasking in the OS. Each time the interrupt hits, the ISR will suspend whatever program was running, restore the state of another previously suspended program, and return. As long as programs are time-sliced rapidly enough, and all the relevant machine state is preserved when a program is swapped in and out, it should appear as if all the programs are running simultaneously.

To test both functions of the real-time clock, I configued an interrupt to trigger every 1 second, and an interrupt service routine to print the current date and time.

Real-Time Clock

One other interesting feature of the RTC module is its ability to turn a separate static RAM into a battery-backed SRAM. The RTC’s Vout pin can be used to provide power to an SRAM from the clock battery when the system power is off, preserving the SRAM contents. The RTC will also write-protect the SRAM by forcing its chip-enable input to false when the system power is in transition or off. It would be pretty easy to convert BMOW’s existing 512K SRAM to a battery-backed RAM this way, or to add a second 512K battery-backed SRAM. A second battery-backed RAM would be a simple way to add a small “disk” to the machine, instead of specialized hardware to connect to a compact flash or SD card, or a real hard disk.

In other news, I updated the “Technical Info” page with more details, so check it out.

Edit: I just noticed that I set the RTC for “Saturday May 30, 2008″, which is a non-date given that today is actually Saturday May 31. It’s interesting that the RTC didn’t barf on it. It must take the day of the week as an input along with date and time, rather than determining the day of the week from the date.

8 comments

Small C

I’ve begun looking more deeply into the “Small C” compiler that Bill mentioned. It looks very cool, yet also wildly inefficient, which is about what I expected. With some more effort, I think I could get a working C compiler for BMOW that generated not-too-awful code, which would be great for porting programs from other systems, or something like getting an open-source chess program running. But for cases where the size and speed of the code are important, Small C isn’t going to replace hand-written assembly.

There are several versions of Small C floating around on the internet. Small C Plus appears to be the most fully-featured, with support for structs and floating point math. Unfortunately it looks difficult to port, being targted for some ancient Amstrad CP/M computer. Maybe I’ll take a shot at it someday.

Small C v2.1 by James E. Hendrix seems to be the “official” version, and is the one discussed in the Small C handbook that I bought. I had a lot of difficulty getting it to compile and run with a modern compiler (Visual C++ 2005), though. It seems to assume that uninitialized memory will contain zero, and uses some non-standard library functions that only exist if you compile Small C with itself. I did eventually manage to get it to build and run, and compile a simple program, but it looks awkward to port.

That leaves Small C for UNIX, which looks to be the most feature-poor, yet the easiest to port. I was able to get it to compile and run under Visual C++ 2005 in a few minutes. All the machine-specific code generation routines have already been pulled out into a separate file, so in theory it should be easy to retarget to the BMOW instruction set.

I was surprised at how small the compiler’s source code is. I expected all kinds of code related to context free grammars and what not. Instead it’s only about 3000 lines of code, which is pretty manageable. As far as I can tell, it does only one forward pass through the code, with a lot of lines like:

if (*ptr == '[' )
    do_array();

There’s a lot that Small C doesn’t have, of course. There are no structs, floats, or bools, and no function pointers. All functions are assumed to return int. You can’t define a variable when you declare it (like int x=4;). And function headers must be written in ancient K&R style C, where the types of the formal parameters are given separately from their names, in the space before the function’s first open curly brace. But in exchange for these limitations, you get a compiler that’s simple enough to understand and retarget without going crazy.

The compiler assumes that the CPU has two general registers, and a stack. It also assumes that each register is large enough to hold an int, which is also the size of a pointer. The minimum entry requirements are therefore two 16-bit registers and a stack pointer, which is already a problem for BMOW. I can treat two 8-bit registers as a single 16-bit register by having the compiler emit assembly to manipulate them individually, but BMOW only has three 8-bit registers, which makes one and a half 16-bit registers. I would need to add a fourth 8-bit register somehow if I wanted to meet Small C’s assumptions about the hardware. A work-around is to use a fixed memory location as the missing register, but that’s ugly and slow.

Small C also assumes that the registers all support the same operations for most purposes. This isn’t the case for BMOW, where many operations like addition are only supported for the accumulator register. To add a number to another register, it’s necessary to save the accumulator somewhere, copy the other register to the accumulator, perform the addition, copy the result back, and restore the original accumulator value. That’s not the kind of code you want your compiler to be generating very often.

The biggest mismatch between Small C and BMOW is address pointers. BMOW has no such concept: you can’t store a pointer in a register (or a pair of registers) and then dereference it. Pointers in memory can be dereferenced, and there are enough flexible addressing modes like stack-relative, x-indexed, so that in most cases your program shouldn’t need to explicitly dereference a pointer. In contrast, Small C assumes that either of the registers can be treated as pointers and dereferenced with ease. Any time a value must be loaded from memory, the compiler emits code to first load the address into the primary register, and then load the data using that address. This results in a block of horribly convoluted BMOW assembly to accomplish the same thing.

An example would probably help. Consider the following C program:

func()
{
    int g;
    g = 0xDEAD;
    return g;
}

This could easily be collapsed to a single line that returns 0xDEAD without the need for a local variable, but ignore that for the moment. So this program must:

  • Allocate stack space for a 2-byte local variable
  • Store 0xDEAD in the stack location of the local variable
  • Retrieve the current value of the local variable from the appropriate stack location (yes it will always be 0xDEAD), and store it in the primary register
  • Return

I’ve defined the “primary register” as the Y and A registers, with Y holding the high byte and A the low byte. So some hand-written assembly for func() might be:

func:
    ; reserve space for local variable "g"
    pha ; push anything on the stack, value doesn't matter, only the change in the stack pointer
    pha

    ; store 0xDEAD in g
    lda #$DE
    sta 2,s ; store DE at stack + 2
    lda #$AD
    sta 1,s ; store AD at stack + 1

    ; retrieve g into primary register
    lda 2,s ; get high byte from stack + 2
    tay ; transfer it to the Y regster
    lda 1,s ; get low byte from stack + 1

    plp ; free the space on the stack used by the local variable
    plp
    rts ; return

That’s not too bad. Compare that with the code generated by my BMOW back-end for the Small C compiler. In addition to YA as the primary register, it also uses X and a fixed memory location (cc_regw) as a secondary register, and another fixed memory location (cc_regtemp) as scratch space. It also makes use of two subroutines to put (cc_pint) and get (cc_gint) an int to/from the primary register, similar to the original 8080 back-end for Small C. cc_gint assumes the address is already in the primary register, and overwrites it with the value at that address. cc_pint assumes the address is in the secondary register, and the value to store is in the primary register.

cc_regw = $4FFF
cc_regtemp = $4FFD

cc_gint:
    sta cc_regtemp
    sty cc_regtemp+1
    phx
    ldx #1
    lda (cc_regtemp),x
    tay
    dex
    lda (cc_regtemp),x
    plx
    rts

cc_pint:
    stx cc_regtemp
    phx
    ldx cc_regw
    stx cc_regtemp+1
    pha
    ldx #1
    tya
    sta (cc_regtemp),x
    pla
    dex
    sta (cc_regtemp),x
    plx
    rts

;func()
func:
;{
;    int g;
    pha
    pha
;    g=0xDEAD;
    stx cc_regtemp
    tsr
    ldx cc_regtemp
    clc
    adc    #<1
    bcc +
    iny
+   pha
    tya
    clc
    adc    #>1
    tay
    pla
    phy
    pha
    lda    #<57005
    ldy    #>57005
    plx
    stx cc_regtemp
    plx
    stx cc_regw
    ldx cc_regtemp
    jsr cc_pint
;    return g;
    stx cc_regtemp
    tsr
    ldx cc_regtemp
    clc
    adc    #<1
    bcc +
    iny
+   pha
    tya
    clc
    adc    #>1
    tay
    pla
    jsr cc_gint
    jmp cc1
;}
cc1:
    plp
    plp
    rts

The tsr instruction transfers the stack pointer to the X, Y, and A registers. X gets the bank byte of the stack pointer, which we don’t need, and since we also don’t want to destroy the value in X, the code pushes and pops it around the tsr.

It all works, but… yuck.

3 comments

Keyboard Kaos

After another round of bug fixing and the creation of a BMOW keyboard driver, the keyboard works! And it truly works this time: pressing the A key shows “a” on the LCD, and pressing SHIFT and the A key shows “A”, just like you’d expect. The craziness with key scan codes has been conquered. I even added support for newline, backspace, and scrolling to the LCD driver, to create a decently functional 20×4 text editor.

Keyboard Text Entry

I’m using an old Compaq PS/2  keyboard that I had lying around. With its own keyboard input and LCD output, BMOW is now an interactive, self-contained computer. No connection to a PC or terminal is needed. It’s just psychological, but somehow this makes it feel more “real”.

Writing the keyboard driver wasn’t too bad. I wrote it first in C, then hand-translated the C to assembly. It’s a simple state machine that maintains the current up/down state of all the modifier keys (shift, control, alt), as well as some state related to extended key codes, and make/break flags. When a new scan code arrives, it decides if that represents an internal state change (shift was pressed, etc), or a real key event. If it’s a real key event, it maps the scan code to an ASCII value using the modifier values and a lookup table, stores it in OS keyboard buffer, and sets a flag indicating that a new key is ready for handling by the program.

When I first powered on the CPU to test the keyboard driver, it didn’t work at all. The hardware went back to the annoying state of “nothing happens at all”, which is a pain to debug. After some sleuthing, I traced the cause to a bug in the GAL equations for the stack pointer: the most significant bit of the stack was getting cleared whenever the SP was incremented. Doh! How did that ever work before? After fixing the stack pointer bug, the keyboard still didn’t work. With more sleuthing, I discovered that two of the wire connections on the keyboard connector had broken off. Double doh!

Oddly, having a working text “editor” with the keyboard and LCD seems less interesting than the scan code debug display I had before. Typing text on a keyboard and seeing it on a screen is something we do all the time, and it seems completely unremarkable. We just expect the right letters to appear, backspace to work, and return to begin a new line, without even thinking about it. I guess BMOW has now reached the same level of dull predictability as any other PC!

2 comments

Microcode Cleanup

I’ve gotten a bit derailed from the task of writing the keyboard scan code handler, into the bigger issue of how device drivers like this should work in general in BMOW. That’s led me to an entirely different question of the BMOW operating system (if you can call it that), and what kind of instruction support might be needed for the OS and user programs to communicate. It’s not something I need to resolve now, really, but I’d rather give it some thought and hopefully get it right the first time.

The simplest approach would be a single-user machine, similar to how vintage 8-bit computers like the Apple II typically worked. Then the “OS” wouldn’t be much more than just a library of helpful routines for programs to call to interface with the hardware.

It may be a pipe dream, but one of my larger goals is to support some kind of multi-tasking OS, at least in a simple form. The key to this is some kind task scheduler, and a protected memory design to prevent one user program from interfering with another. My plan is to make all the user programs run as if BMOW had 16-bit addresses, with a 64KB memory space. Since BMOW actually has 24-bit addresses, the uppper 8-bits will be treated as a process ID. None of the normal CPU instructions will be able to view or modify the upper 8-bits of an address register. When an interrupt hits or an OS routine is called, however, the upper 8-bits will be set to zero, which will essentially be the OS’s process ID, with its own memory space. I can then add some additional CPU instructions for manipulating full 24-bit addresses, whose microprograms verify the upper 8-bits of the PC are zero before doing anything else. So in theory at least, each user program will exist in its own 64KB space and be invisible to other user programs, but the OS can still view and modify them all. There are still many details to work out, however, and my original hardware design didn’t exactly foresee all this, so it’s a challenge.

For task switching, I would use the yet-to-be-installed real-time clock, and set it to trigger an interrupt every few milliseconds. The interrupt mechanism would then push all the CPU registers onto the current user program’s stack, then jump to an OS interrupt handler. This handler would need to copy the stack pointer into a process table somewhere in OS memory, install the stack pointer from a different process, pop the saved registers off the stack, and continue execution of the next process. Boom, it’s a multi-tasking OS. Of course I’m glossing over many details, and in practice it will probably be much more complicated than this. There are many questions to answer, such as whether the OS itself can be pre-empted by an interrupt.

Assuming this all works, BMOW should be able to support as many processes as there are 64KB banks available, minus one for the OS. With the 512KB currently installed, that’s 7 user processes, which is probably plenty. Increasing the installed RAM to the 1MB I originally planned for would raise the user process limit to 15. I can’t imagine that I could run 15 simultaneous programs on this hardware at any kind of useful speed, but it might be interesting to try!

4 comments

Keyboardin’ Part 3

keyboard
The keyboard hardware interface works! I’m now a happy key-tapping fool, because with its own keyboard and LCD display, BMOW can operate completely independently from any host PC. The 4×20 LCD is certainly a very cramped workspace, so a Hyperterminal connection over USB from a PC is still the most practical solution for any serious BMOW interaction. But for the sheer cool factor, a stand-alone BMOW with its own I/O beats a USB tether to a Wintel box hands down.

Of course I can’t actually type anything useful with the keyboard interface yet, that would be too easy. You might think that the keyboard sends ASCII bytes to the computer as you type them, similar to a terminal connection, but in fact it’s a totally different interface. Each keypress sends a scan code to the computer, which is normally a single byte but can sometimes be two or three bytes. Then when the key is released, the keyboard sends a “break” scan code, followed by another scan code for the released key, which is usually but not always the same as the keypress scan code. Even the shift key works this way. So to tell if the user typed a capital “A”, you’ve got to keep track of the current state of the shift key by listening for its keypress and key release scan codes, then translate the scan code byte 0×1C to the “a” key, then finally convert that to the character “A” when the shift key state is factored in. I haven’t written any of the key state and scan code conversion software yet, so while it’s true that my hardware interface works, all it does currently is display scan codes.

My original hardware design for the keyboard interface almost worked on the first try. The only problem I ran into was with the clock signal generated by the keyboard, which is used to synchronize the data it sends to the computer. The rise time on the clock signal was mind-numbingly slow: about 1.2 microseconds to go from 0 to 5 volts. Compare that to BMOW’s main system clock, which rises in about 15 nanoseconds. The slow keyboard clock was causing trouble, because the different BMOW chips connected to it sensed a low-to-high clock transition at slightly different voltages, which equated to substantially different transition times due to the slow rise time. My solution was to feed the keyboard clock signal through a Schmitt trigger inverter, which turns a slowly-changing input like this one into a nice clean, quick low-to-high step.

4 comments

New Instruction Set?

I’m sort of at a crossroads now, with the basic hardware working, and I need to decide what to do next. There’s still some remaining hardware work, of course: the keyboard and real-time clock, but neither of those is essential. Then there are possible hardware extras, like video. And there’s the entire operating system and software library to consider. My eventual goal is to support multiple programs running at once, managed by a multi-tasking OS.

I find myself gravitating towards a rewrite of the BMOW instruction set. Since the machine is microcoded, there’s nothing specifically tying the hardware to the choice and syntax of instructions. The present instruction set is nearly identical to the 6502 instruction set, which seems like a reasonable fit, but I’m getting more and more annoyed with it. I’ve been looking at the 68000 instruction set, and while its hardware assumptions (e.g. 32-bit registers) don’t match BMOW very well, it feels like a good foundation for a cleaner, simpler instruction set.

My main gripes with the current 6502-style instruction set are:

  • Every combination of operation and operand is a unique instruction, which feels clunky and difficult to follow. Loading the X register is LDX, storing Y is STY, transferring A to Y is TAY. Compare this with a 68000-style MOVE SRC, DST. So the previous examples would like something like MOVE (A0), X, MOVE Y, (A0), MOVE A, Y.
  • Some obvious-seeming instructions are just missing. You can push A on the stack with PHA, but there’s no PHX or PHY. There’s no instruction to add two registers: only to add a register to a value in memory.
  • There are no stack-relative addressing modes. Support for the stack pointer in general is pretty minimal. This makes sense for the 6502’s single-byte stack pointer, but BMOW has a full 24-bit pointer for the stack.
  • There’s no concept of an address stored in a register. The ability to auto-increment and decrement address registers is similarly hidden from the assembly programmer.
  • All the opcodes are three letters. This is a small thing, but it’s annoying and makes the assembly less readable. It’s not like there’s some budget for opcode name lengths.

I’m thinking of rewriting BMOW’s microcode to implement a new instruction set much closer to the 68000’s. To the assembly programmer, BMOW would appear as three 8-bit data registers (the current A, X, and Y) and two 24-bit address registers (the AR and SP, which presently are visible only at the microcode level). The 8-bit T register would remain hidden. I think I still need it as a temporary location for implementing certain instructions in microcode, but if not, I could expose it as a fourth data register.

Exposing the address registers is the most powerful change. Consider a hypothetical program to copy 100 bytes between two memory locations. First the 6502-style:

LDX #99
loop: LDA $4000, X
STA $5000, X
DEX
BPL loop

Then the 68000-style program:

MOVE #99, D0
MOVE $4000, A0
MOVE $5000, A1
loop: MOVE (A0)+, (A1)+
SUB #1, D0
BGE loop

It may not be obvious, but I could implement the 68000-style instructions to run this program far faster. For instance, that MOVE (A0)+, (A1)+ could be done in 3 clock cycles, compared with 10-12 for the LDA/STA pair, because the address registers are already set up and no address arithmetic is needed. Furthermore, the 68000-style program could be a subroutine in which the actual source and destination addresses are computed at runtime, while the 6502-style program forces the addresses to be known at assembly time, or to use even slower indirect, indexed addressing.

There are some downsides to replacing the instruction set with a new 68000-style one:

  • Performance may be worse in some cases with a 68000-style instruction set. In general, such an instruction set would have slightly simpler instructions, and so would require more instructions to accomplish the same work as a 6502 instruction set. For example, the 6502 instruction LDA $4000 would be translated as the two 68000 instructions MOVE $4000, A0 and MOVE (A0), D0.
  • The hardware can’t easily support indexed address modes like MOVE 4(A0), D0. That instruction is supposed to load a value from a location in memory 4 bytes beyond where A0 points. That means adding 4 to A0, but there’s no place to store the result of that addition other than back into A0 or A1. So the microcode would need to save and restore the old value of A0 (slow), or make indexed address modes actually change the address register (an unwanted change in semantics), or drop support for that address mode entirely.
  • All software and tools would need to be written from scratch. Currently I’m using a modified 6502 assembler, and I can sort of run existing 6502 programs with a little work. But a BMOW-68000 would be different enough from a real 68000 that I’d need to write my own assembler, and couldn’t hope to run existing 68000 programs.

One other option would be to create an instruction set modeled on the 65816, the 16-bit successor to the 6502. It does add stack-relative addressing, as well as some of the obvious-but-missing instructions from the 6502. It still strikes me as rather clunky and non-symmetric, but I should probably take a deeper look at it before jumping to conclusions.

5 comments

USB Action

BMOW is now USB-enabled for PC communication, for both reads and writes. Yahoo!

I am typing this

I wrote a simple program to log when the USB interface goes up and down, and to echo all bytes received from the PC to BMOW’s LCD. The program also adds one to each received byte before transmitting it back, so I could prove it was transmitting real data and I wasn’t seeing some local echo. The result was that whatever letter I typed on the PC, the next letter of the alphabet got sent back in reply. Here’s the Windows Hyperterminal session:

Hyperterminal

The whole exercise went very smoothly, and I didn’t have to do any debugging at all. Once I got the USBMOD4 in place, it all worked on the first attempt you see here.

The only wrinkle was with the USBMOD4 itself. I bought one months ago that was damaged in shipping: some pins were broken off. Futurlec sent me a replacement, but I didn’t open it until yesterday, only to discover it was the wrong part! After scouring the USBMOD4 data sheet, I worked out a modified solution that let me use the damaged module, which was only possible because all but one of the broken pins were ground pins. I ended up needing to desolder a SMD component on the USBMOD4 body, which was ugly, and I was also forced to use the integrated USB jack rather than the external USB jack I’d planned on. But I can live with the little flaws… and hey, now I’ve got USB!

1 comment

« Previous PageNext Page »


Homebuilt CPUs WebRing

JavaScript by Qirien Dhaela

Join the ring?

David Brooks, designer of the Simplex-III homebrew computer, has founded the Homebuilt CPUs Web Ring. To join, drop Dave a line, mentioning your page's URL. It will then be added to the list. You'll need to copy this code fragment into your page.