<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.3.1" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>Big Mess o' Wires</title>
	<link>http://www.stevechamberlin.com/cpu</link>
	<description>A home-built CPU, and other messy electronics adventures</description>
	<pubDate>Sun, 07 Mar 2010 20:27:08 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.1</generator>
	<language>en</language>
			<item>
		<title>Verilog Headaches</title>
		<link>http://www.stevechamberlin.com/cpu/2010/03/07/verilog-headaches/</link>
		<comments>http://www.stevechamberlin.com/cpu/2010/03/07/verilog-headaches/#comments</comments>
		<pubDate>Sun, 07 Mar 2010 20:19:54 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.stevechamberlin.com/cpu/2010/03/07/verilog-headaches/</guid>
		<description><![CDATA[I&#8217;m having some trouble finding the best way to structure the Verilog code for this CPU. In particular, I&#8217;ve encountered one small headache and one larger one.
The small headache relates to the best way to describe complex combinatorial logic that doesn&#8217;t involve any registers. Consider some hypothetical logic that determines the value of the incrementPC [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m having some trouble finding the best way to structure the Verilog code for this CPU. In particular, I&#8217;ve encountered one small headache and one larger one.</p>
<p>The small headache relates to the best way to describe complex combinatorial logic that doesn&#8217;t involve any registers. Consider some hypothetical logic that determines the value of the incrementPC and loadA control signals, based on the current state. One way to do this would be:<br />
<code><br />
&nbsp;&nbsp;&nbsp;&nbsp;wire incrementPC, loadA;<br />
&nbsp;&nbsp;&nbsp;&nbsp;assign incrementPC = (state == s1) || (state == s3) || (state == s4);<br />
&nbsp;&nbsp;&nbsp;&nbsp;assign loadA = (state == s0) || (state == s2) || (state == s4);</p>
<p></code></p>
<p>That works fine, and it&#8217;s pretty clear what it does. But for more complex designs, it&#8217;s clearer to use procedural assignment and a case statement, grouping all of the control signals for each state together:<br />
<code><br />
&nbsp;&nbsp;&nbsp;&nbsp;reg incrementPC, loadA;<br />
&nbsp;&nbsp;&nbsp;&nbsp;always @* begin<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;case (state)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;s0:<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;incrementPC = 1'b0;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;loadA = 1'b1;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// other control signals...<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;s1:<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;incrementPC = 1'b1;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;loadA = 1'b0;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// other control signals...<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;s2:<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;incrementPC = 1'b0;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;loadA = 1'b1;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// other control signals...<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;s3:<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;incrementPC = 1'b1;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;loadA = 1'b0;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// other control signals...<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;s4:<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;incrementPC = 1'b1;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;loadA = 1'b1;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// other control signals...<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;endcase<br />
&nbsp;&nbsp;&nbsp;&nbsp;end</p>
<p></code><br />
The problem with this approach is visible in the first line: incrementPC and loadA must be declared as type &#8220;reg&#8221;, even though they are not registers. During synthesis, no register will be created as long as your code is correct, but Verilog demands that the target of a procedural assignment like this always be type &#8220;reg&#8221;. So reg does not always mean that something is a register. I find this very confusing and misleading, because it means you can&#8217;t just look at the Verilog code to see which signals are registers and which are purely combinatorial. </p>
<p>My bigger problem is more subtle, and is about good HDL design practices rather than any quirk of the Verilog standard. I&#8217;m unsure how explicit I should be in defining the structure of the virtual hardware described by the Verilog code. At one extreme, I could write a high-level functional description of *what* the CPU does, ignoring *how* it does it, and leave the Synthesis software to figure it out. Or at the other extreme, I could work out a block diagram of the CPU consisting of familiar real-world elements like registers, arithmetic unit, muxes, and busses, and then write Verilog code to describe these elements and how they&#8217;re all connected. </p>
<p>To help make this distinction clearer, here&#8217;s an example based on section 6.2.4 of the book <a href="http://www.amazon.com/FPGA-Prototyping-Verilog-Examples-Spartan-3/dp/0470185325">FPGA Prototyping by Verilog Examples</a>. Imagine a state-driven system that can add two input registers, and store the output in a third register. One way to describe this would be high-level, functional:</p>
<p><code><br />
&nbsp;&nbsp;&nbsp;&nbsp;always @(posedge clk) begin<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;case (state)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;s0:<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;d0 <= a + b;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;s1:<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;d1 <= b + c;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;s2:<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;d2 <= a + c;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;endcase<br />
&nbsp;&nbsp;&nbsp;&nbsp;end<br />
&nbsp;&nbsp;&nbsp;&nbsp;<br />
</code></p>
<p>Great, that&#8217;s compact and clear. But what does the datapath of this hardware look like? Is there one adder unit, or three? Who knows? It&#8217;s a black box, relying entirely on the synthesis software to do the right thing.</p>
<p>A second approach would be to explicitly define a single adder unit:</p>
<p><code><br />
&nbsp;&nbsp;&nbsp;&nbsp;assign mout = in1 + in2;<br />
&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;always @* begin<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// default: maintain same values<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;d0_next = d0;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;d1_next = d1;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;d2_next = d2;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;case (state)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;s0:<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;begin<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in1 = a;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in2 = b;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;d0_next = mout;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;end<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;s1:<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;begin<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in1 = b;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in2 = c;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;d1_next = mout;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;end<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;s2:<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;begin<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in1 = a;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in2 = c;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;d2_next = mout;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;end<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;endcase<br />
&nbsp;&nbsp;&nbsp;&nbsp;end<br />
&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;always @(posedge clk) begin<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;d0 <= d0_next;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;d1 <= d1_next;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;d2 <= d2_next;<br />
&nbsp;&nbsp;&nbsp;&nbsp;end<br />
&nbsp;&nbsp;&nbsp;&nbsp;<br />
</code></p>
<p>That makes the hardware design clearer, so it&#8217;s unambiguous that there&#8217;s only one adder. Is this second approach better than the first, then? Mabye, maybe not. If you&#8217;re optimizing for space, and don&#8217;t trust the synthesis software to be as smart as you are, then the second example is probably better. But if you&#8217;re optimizing for speed, having three separate adders (or at least the possibility of three) may actually be better.</p>
<p>Even this second design is somewhat ambiguous. Presumably there are some muxes at the input to the adder, and a mux or load enable at the input to each D register too. But the Verilog code leaves this all implied and unspecified. Here&#8217;s a third example that spells everything out in full detail:</p>
<p><code><br />
&nbsp;&nbsp;&nbsp;&nbsp;wire [1:0] in1Select, in2Select;<br />
&nbsp;&nbsp;&nbsp;&nbsp;assign in1 = (in1Select == 2'b00) ? a :<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(in1Select == 2'b01) ? b :<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(in1Select == 2'b10) ? c :<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;d;<br />
&nbsp;&nbsp;&nbsp;&nbsp;assign in2 = (in2Select == 2'b00) ? a :<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(in2Select == 2'b01) ? b :<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(in2Select == 2'b10) ? c :<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;d;<br />
&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;assign mout = in1 + in2;<br />
&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;wire loadEnableD0;<br />
&nbsp;&nbsp;&nbsp;&nbsp;wire loadEnableD1;<br />
&nbsp;&nbsp;&nbsp;&nbsp;wire loadEnableD2;<br />
&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;always @* begin<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// default: disable all loads<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;loadEnableD0 = 1'b0;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;loadEnableD1 = 1'b0;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;loadEnableD2 = 1'b0;<br />
&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;case (state)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;s0:<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;begin<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in1Select = 2'b00;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in2Select = 2'b01;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;loadEnableD0 = 1'b1;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;end<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;s1:<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;begin<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in1Select = 2'b01;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in2Select = 2'b10;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;loadEnableD1 = 1'b1;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;end<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;s2:<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;begin<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in1Select = 2'b00;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in2Select = 2'b10;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;loadEnableD1 = 1'b1;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;end<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;endcase<br />
&nbsp;&nbsp;&nbsp;&nbsp;end<br />
&nbsp;&nbsp;&nbsp;&nbsp;<br />
&nbsp;&nbsp;&nbsp;&nbsp;always @(posedge clk) begin<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (loadEnableD0)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;d0 <= mout;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (loadEnableD1)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;d1 <= mout;<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (loadEnableD2)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;d2 <= mout;<br />
&nbsp;&nbsp;&nbsp;&nbsp;end<br />
&nbsp;&nbsp;&nbsp;&nbsp;<br />
</code></p>
<p>This approach makes it very clear what&#8217;s happening in terms of the hardware, and you could build an equivalent physical circuit from 7400 parts. Is this better or worse than the other two approaches? I find it better in terms of understanding what will be synthesized, but it&#8217;s worse in terms of length. I also suspect that by specifying all the details in this way, it may be over-constraining the synthesis software, preventing it from using some clever optimizations to pack the same amount of logic into less space. </p>
<p>I find myself going around in circles with variations of these three approaches, unable to really get started with the actual CPU design work.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.stevechamberlin.com/cpu/2010/03/07/verilog-headaches/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Cramming Everything In</title>
		<link>http://www.stevechamberlin.com/cpu/2010/03/03/cramming-everything-in/</link>
		<comments>http://www.stevechamberlin.com/cpu/2010/03/03/cramming-everything-in/#comments</comments>
		<pubDate>Thu, 04 Mar 2010 03:56:32 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
		
		<category><![CDATA[Bit Bucket]]></category>

		<guid isPermaLink="false">http://www.stevechamberlin.com/cpu/2010/03/03/cramming-everything-in/</guid>
		<description><![CDATA[I&#8217;ve made a little bit of progress on the CPU in a CPLD project. As mentioned previously, this will be an 8-bit CPU with a 10-bit address space, targeting a 128 macrocell CPLD. The instruction set will be a simplified version of BMOW&#8217;s, which itself was a close cousin of the 6502 instruction set. Exactly [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve made a little bit of progress on the CPU in a CPLD project. As mentioned previously, this will be an 8-bit CPU with a 10-bit address space, targeting a 128 macrocell CPLD. The instruction set will be a simplified version of BMOW&#8217;s, which itself was a close cousin of the 6502 instruction set. Exactly how &#8220;simplified&#8221; it needs to be in order to fit remains to be seen, but I&#8217;m planning to omit the Y register, zero page, and indirect addressing modes. It will still have A and X registers, a hardware stack pointer, and all the &#8220;standard&#8221; opcodes in immediate, absolute, and indexed addressing modes. I&#8217;ve mostly just been planning and not writing much Verilog yet, but after fleshing out the datapath and a tiny bit of control logic, I&#8217;ve used 73 macrocells so far.</p>
<p>
Working with the Altera software has been pretty good so far, and I&#8217;ve been much less frustrated than when I was working with the Xilinx software to create 3DGT from an FPGA. I&#8217;m not sure if that&#8217;s the software itself, or simply that I&#8217;m working on a simpler project and a simpler device, but it&#8217;s a welcome change.</p>
<p>
I found a couple of similar CPU projects that might provide some inspiration:</p>
<p>
<strong> MCPU</strong> - <a href="http://www.opencores.com/project,mcpu">http://www.opencores.com/project,mcpu</a> - A very tiny CPU that fits in a 32 macrocell CPLD. It has a single 8-bit register, and just a 6-bit (64 word) address space. It also has only four instructions: NOR, ADD, store, and conditional jump. Yet with combinations of those instructions, you can do some pretty complicated stuff. Very clever! Check it out.</p>
<p>
<strong> MPROZ</strong> - <a href="http://www.unibwm.de/ikomi/pub/mproz/mproz_e.pdf">http://www.unibwm.de/ikomi/pub/mproz/mproz_e.pdf</a> - MCPU borrows its instruction set from here. MPROZ has a 15-bit address space, but NO data registers. All computation is done directly on locations in memory. It also does MCPU one better by having only three instructions: NOR, ADD, and branch. It fits in an FPGA with 484 macrocells.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.stevechamberlin.com/cpu/2010/03/03/cramming-everything-in/feed/</wfw:commentRss>
		</item>
		<item>
		<title>A CPU in a CPLD</title>
		<link>http://www.stevechamberlin.com/cpu/2010/02/28/a-cpu-in-a-cpld/</link>
		<comments>http://www.stevechamberlin.com/cpu/2010/02/28/a-cpu-in-a-cpld/#comments</comments>
		<pubDate>Mon, 01 Mar 2010 02:36:32 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
		
		<category><![CDATA[Bit Bucket]]></category>

		<guid isPermaLink="false">http://www.stevechamberlin.com/cpu/2010/02/28/a-cpu-in-a-cpld/</guid>
		<description><![CDATA[OK, the CPU design spark is back, sooner than I&#8217;d expected. I have an urge to implement a minimal CPU using a CPLD. If you&#8217;re not familiar with the term, a CPLD is a simple programmable logic chip, existing somewhere on the complexity scale between PALs (like the 22V10&#8217;s I used in BMOW) and FPGAs. [...]]]></description>
			<content:encoded><![CDATA[<p>OK, the CPU design spark is back, sooner than I&#8217;d expected. I have an urge to implement a minimal CPU using a CPLD. If you&#8217;re not familiar with the term, a CPLD is a simple programmable logic chip, existing somewhere on the complexity scale between PALs (like the 22V10&#8217;s I used in BMOW) and FPGAs. Typically a CPLD has a similar internal structure to PALs, with macrocells containing a single flip-flop and some combinatorial logic for sum-of-products expressions. They are also non-volatile like PALs. Yet typical CPLDs contain 10x as many macrocells as a PAL, with some macrocells used for internal purposes and not connected to any pin. FPGAs are generally much larger and more complex, with thousands of macrocells and specialized hardware blocks for tasks like multiplication and clock synthesis. FPGAs also normally contain some built-in RAM, and are themselves RAM-based, requiring configuration by some other device whenever power is applied.</p>
<p>I&#8217;m attracted to CPLD&#8217;s because I&#8217;m hoping they&#8217;ll provide a good step up from PAL&#8217;s, without drowning me in FPGA complexity, as happened when I worked on 3D Graphics Thingy.  I&#8217;m pretty confident I can figure out how to work with CPLD&#8217;s without driving myself crazy, increasing the chances that I might actually finish this project. Given the limited hardware resources of CPLD&#8217;s, fitting a CPU will also be an interesting challenge.</p>
<p>I&#8217;ve also been wanting to design my own custom PCB&#8217;s for quite some time, and this will give me an opportunity. The end goal of this project will be a single-board computer on a custom PCB, with my CPLD-CPU, RAM, ROM, some input buttons/switches, and some output LEDs/LCD. I need to limit myself to CPLD&#8217;s that come in a PLCC package, so I can use a through hole socket and solder it myself. Unfortunately that will limit my choices pretty severely. I think it&#8217;s theoretically possible to hand-solder the more common TQFP surface-mount package, but I&#8217;m not excited to try it. And for other package types, forget it.</p>
<p>Here&#8217;s some back-of-the envelope figuring to get the ball rolling. This is assuming an 8-bit CPU with a 10-bit address space (1K).</p>
<p><strong>I/O pins needed:</strong></p>
<ul>
<li>8 data bus</li>
<li>10 address bus</li>
<li>1 clock</li>
<li>1 /reset</li>
<li>1 /irq</li>
<li>1 read-write</li>
<li>~4 chip selects for RAM, ROM, peripherals</li>
</ul>
<p>That&#8217;s 26 I/Os. So a PLCC-44 package should be fine, as CPLDs in that package typically have about 34 I/Os.</p>
<p><strong>Macrocells needed for holding CPU state:</strong></p>
<ul>
<li>10 program counter</li>
<li>10 stack pointer</li>
<li>10 scratch/address register</li>
<li>8 opcode register</li>
<li>3 opcode phase</li>
<li>8 accumulator</li>
<li>8 index register</li>
<li>3 ALU condition codes</li>
</ul>
<p>That&#8217;s 60.</p>
<p>Then I&#8217;ll need some macrocells for combinatorial logic. This is a lot harder to predict, and in many cases I should be able to use the combinatorial logic resources and the flip-flop from a single macrocell. I&#8217;ll just pull some numbers out of thin air.</p>
<p><strong>Macrocells needed for combinatorial logic: </strong></p>
<ul>
<li>16? arithmetic/logic unit (8-bit add, AND, OR, shift, etc)</li>
<li>16? control/sequencing logic</li>
<li>??? other stuff I forgot</li>
</ul>
<p>So that&#8217;s a grand-total of 92 macrocells for everything.</p>
<p>If I shrunk the address space down, and maybe changed to a 4-bit word size, I might be able to fit it in a 64 macrocell CPLD. But more than likely, it seems I&#8217;ll be looking for a CPLD in the 100 to 128 macrocell range. Considering my requirement for PLCC packaging, that will limit the choices to two or three possibilities, but more on that later.</p>
<p>I think the most challenging part of this project will be the control/sequencing logic, and the assignment of opcodes. BMOW was microcoded, and used a separate microcode ROM to execute a 16-instruction microprogram to implement each CPU instruction. In this case, I&#8217;ll need to create dedicated combinatorial logic to drive all the enables, selects, and other inputs in the right sequence to ferry data around the CPU to execute the instructions. Doing this with minimal logic will be a real challenge, and undoubtedly I&#8217;ll be using the bits of the opcode itself to derive many of those control signals.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.stevechamberlin.com/cpu/2010/02/28/a-cpu-in-a-cpld/feed/</wfw:commentRss>
		</item>
		<item>
		<title>RC Servo Signal Decoder, Part 2</title>
		<link>http://www.stevechamberlin.com/cpu/2010/02/25/rc-servo-signal-decoder-part-2/</link>
		<comments>http://www.stevechamberlin.com/cpu/2010/02/25/rc-servo-signal-decoder-part-2/#comments</comments>
		<pubDate>Fri, 26 Feb 2010 04:55:59 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.stevechamberlin.com/cpu/2010/02/25/rc-servo-signal-decoder-part-2/</guid>
		<description><![CDATA[It works! I&#8217;ve continued poking away at this circuit to decode an RC airplane servo signal and trigger a camera shutter during flight, and I&#8217;m happy to report success!
Once I switched to using the CD4013 flip-flop with a positive logic clear input instead of negative logic, it was a piece of cake. I have to [...]]]></description>
			<content:encoded><![CDATA[<p>It works! I&#8217;ve continued poking away at this circuit to decode an RC airplane servo signal and trigger a camera shutter during flight, and I&#8217;m happy to report success!</p>
<p>Once I switched to using the CD4013 flip-flop with a positive logic clear input instead of negative logic, it was a piece of cake. I have to say, living just a mile from one of the USA&#8217;s largest electronics dealers (Jameco) is pretty sweet. I can hit their web site and place an order for practically any obscure electronic component I can think of, then cruise down to their offices and pick it up from the will-call desk an hour later. Nice!</p>
<p>I rebuilt the decoder circuit that I discussed last time, soldering everything together &#8220;dead bug&#8221; style. This was necessary in order to keep everything as small as possible, so I could fit it <strong>inside </strong>the camera body.  I forgot to take a photo before I closed everything up, but it looks very similar to this example from <a href="http://www.laureanno.com/RC">laureanno.com</a>:</p>
<p><img src="http://www.stevechamberlin.com/cpu/dead-bug.jpg" width="401" height="260" /></p>
<p>When I first connected the servo, decoder, and camera, it didn&#8217;t work. Nothing happened when I toggled the switch on my RC transmitter. Setting up the oscilloscope again, I was able to see that the reference pulse width generated by the RC circuit I&#8217;d built was about twice as long as it should have been. I&#8217;m not sure how that happened, even with 20% tolerance components, but I was able to quickly swap in a different value resistor, and get it working perfectly. Then with a bit of creative packing, I managed to cram it all back inside the camera body.</p>
<p>Today during my lunch hour, I was able to try it out for the first time. The shutter trigger worked fabulously! I wish I could say the same for the quality of the pictures, but unfortunately the focus wasn&#8217;t set quite right, and the photos are a little blurry. They&#8217;re still pretty fun to look at though. I was flying next to the headquarters of Oracle Corporation in Redwood City, California. Those are the clustered cylinder-shaped mirrored buildings you see in the photos. The plane looks like it was a little higher than the tallest building, which I think is 20 stories tall. See if you can find me in some of the photos!</p>
<p>Click any of the thumbnails below to see the full-sized version.</p>
<p><a href="http://www.stevechamberlin.com/cpu/aerial1.jpg"><img src="http://www.stevechamberlin.com/cpu/aerial1s.jpg" width="384" height="288" /></a>     <a href="http://www.stevechamberlin.com/cpu/aerial2.jpg"><img src="http://www.stevechamberlin.com/cpu/aerial2s.jpg" width="384" height="288" /></a></p>
<p><a href="http://www.stevechamberlin.com/cpu/aerial3.jpg"><img src="http://www.stevechamberlin.com/cpu/aerial3s.jpg" width="384" height="288" /></a>     <a href="http://www.stevechamberlin.com/cpu/aerial4.jpg"><img src="http://www.stevechamberlin.com/cpu/aerial4s.jpg" width="384" height="288" /></a></p>
<p><a href="http://www.stevechamberlin.com/cpu/aerial5.jpg"><img src="http://www.stevechamberlin.com/cpu/aerial5s.jpg" width="384" height="288" /></a>     <a href="http://www.stevechamberlin.com/cpu/aerial6.jpg"><img src="http://www.stevechamberlin.com/cpu/aerial6s.jpg" width="384" height="288" /></a></p>
<p><strong>February 27 Edit:</strong> I corrected the focus problem, and tried again. Unfortunately I got the propeller in some of the shots, and this new set wasn&#8217;t from as high an altitude. But I did get some great shots of the bay, an aerial self-portrait, and a flock of Canada geese.</p>
<p><a href="http://www.stevechamberlin.com/cpu/IMG_0054.jpg"><img src="http://www.stevechamberlin.com/cpu/IMG_0054s.jpg" width="384" height="288" /></a>     <a href="http://www.stevechamberlin.com/cpu/IMG_0055.jpg"><img src="http://www.stevechamberlin.com/cpu/IMG_0055s.jpg" width="384" height="288" /></a></p>
<p><a href="http://www.stevechamberlin.com/cpu/IMG_0058.jpg"><img src="http://www.stevechamberlin.com/cpu/IMG_0058s.jpg" width="384" height="288" /></a>     <a href="http://www.stevechamberlin.com/cpu/IMG_0018.jpg"><img src="http://www.stevechamberlin.com/cpu/IMG_0018s.jpg" width="384" height="288" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.stevechamberlin.com/cpu/2010/02/25/rc-servo-signal-decoder-part-2/feed/</wfw:commentRss>
		</item>
		<item>
		<title>RC Servo Signal Decoder for Camera Shutter Switch</title>
		<link>http://www.stevechamberlin.com/cpu/2010/02/21/rc-servo-signal-decoder-for-camera-shutter-switch/</link>
		<comments>http://www.stevechamberlin.com/cpu/2010/02/21/rc-servo-signal-decoder-for-camera-shutter-switch/#comments</comments>
		<pubDate>Sun, 21 Feb 2010 20:29:25 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
		
		<category><![CDATA[Bit Bucket]]></category>

		<guid isPermaLink="false">http://www.stevechamberlin.com/cpu/2010/02/21/rc-servo-signal-decoder-for-camera-shutter-switch/</guid>
		<description><![CDATA[Hey, I&#8217;m back. I think my oscilloscope made me do it. For the past six months I&#8217;ve been working with RC airplanes, not doing any electronics work. The oscilloscope has been taking up space on my desk while it sits untouched, gathering dust. Last week I finally decided I was never going to use it [...]]]></description>
			<content:encoded><![CDATA[<p>Hey, I&#8217;m back. I think my oscilloscope made me do it. For the past six months I&#8217;ve been working with RC airplanes, not doing any electronics work. The oscilloscope has been taking up space on my desk while it sits untouched, gathering dust. Last week I finally decided I was never going to use it again, and packed it away in a closet. But that got me to thinking about electronics again, and about what kind of projects I could do related to RC. So after just a few days, the oscilloscope has returned from its closet banishment and is in use once more for a new project.</p>
<p>I recently bought an Aiptek SD 1.3 megapixel camera, with the idea to mount it on the fuselage of one of my planes, and do some aerial photography. The Aiptek weighs just 52 grams (about 2 ounces), and so it won&#8217;t weigh down the plane excessively. But the tricky part is finding a way to activate the shutter while the plane is in the air. It turns out that this is mostly a solved problem, and it&#8217;s possible to build a circuit to decode the servo signal from an unused receiver channel, creating a 0 or 1 pulse depending on the position of a transmitter switch or stick. Then by hacking into the camera guts and a bit of soldering, that pulse can be used to trigger the shutter.</p>
<p><img src="http://www.stevechamberlin.com/cpu/servohack2.jpg" alt="Spektrum 6110 receiver with servo hacking harness connected" width="659" height="535" /></p>
<p>Here&#8217;s one of my planes (a GWS Slow Stick), with three spare wires hooked into the receiver&#8217;s &#8220;gear&#8221; channel (which I don&#8217;t normally use), connected to the oscilloscope and a growing circuit on the protoboard. It turns out that these servo signals for the channels are ideal for hacking with digital logic. Of the three wires connected to the receiver, one is ground, one is a regulated +5 volts, and one is a modulated position signal that indicates the desired position for that channel (rudder, elevator, aileron, flaps, gear, whatever). The connectors are even standard 0.1 inch male headers. What could be easier?</p>
<p><img src="http://www.stevechamberlin.com/cpu/servohack1.jpg" alt="Slow Stick sevo decoder" width="692" height="526" /></p>
<p>I examined the servo signal with the oscilloscope. It&#8217;s a regular pulse train with a 22ms period. The width of the pulse varies depending on the desired position for the channel. The width is about 1.2ms at the minimum position, and 2ms at the maximum position. Taking 1.6ms as the midpoint, what&#8217;s needed is a circuit that outputs 0 if the pulse width is less than 1.6ms, and 1 if it&#8217;s greater than 1.6ms. This could be done many different ways: the first two that come to mind are a small microcontroller, or a low-pass filter that turns the servo signal into a DC voltage, and compares it to a reference voltage.</p>
<p>I&#8217;ve decided to follow another example I found, which I thought was especially clever. It uses just two flip-flops and a couple of passive components. You can check out the <a href="http://www.laureanno.com/RC/Aiptek-1-3M-RC-Switch.gif">circuit schematic</a> for the details. The servo signal pulse train is used to clock the first flip-flop. It&#8217;s D input is tied high. When it&#8217;s clocked, its Q output goes high, which begins to charge an RC circuit. When the capacitor voltage gets high enough, it activates the asynchronous reset, clearing the Q output. The complementary /Q output is used to clock the second flip-flop, whose D input is the servo signal. If the RC time constant is chosen correctly, then the second flip-flop will be clocked 1.6ms after the first one, sampling the servo signal at that time. If the pulse width is less than 1.6ms it will sample a 0, otherwise it will sample a 1. Pretty neat!</p>
<p>My only headache is that I don&#8217;t have the 4013 CMOS flip-flop called for in the circuit. I do have lots of 74LS74 flip-flops, which are similar, but are TTL designs with an active low asynchronous reset instead of active high. I&#8217;d thought it would be simple to modify the circuit to work with an active low reset, but after a couple of hours of futzing around with it, I concluded that it&#8217;s either not possible, or I&#8217;m just not smart enough. I started by swapping the positions of the resistor and capacitor, but the circuit initializes in the reset state and never exits it. And even if I found a solution to that, the input current on this LS series chip is so high, that with a 10K resistor to ground, the voltage at the input pin is actually pulled up to 2 volts! Ack! I decided I&#8217;ll just buy a 4013 for a few cents, and stop banging my head.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.stevechamberlin.com/cpu/2010/02/21/rc-servo-signal-decoder-for-camera-shutter-switch/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Fail</title>
		<link>http://www.stevechamberlin.com/cpu/2010/02/04/fail/</link>
		<comments>http://www.stevechamberlin.com/cpu/2010/02/04/fail/#comments</comments>
		<pubDate>Fri, 05 Feb 2010 03:45:28 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
		
		<category><![CDATA[3D Graphics Thingy]]></category>

		<guid isPermaLink="false">http://www.stevechamberlin.com/cpu/2010/02/04/fail/</guid>
		<description><![CDATA[OK, it&#8217;s time to admit defeat. 3D Graphics Thingy is not going to happen. It&#8217;s been six months since I worked on it. Heck, I even let my web hosting account expire due to neglect.
So what happened? I ran hard into the memory interface wall. Getting a decent DRAM controller working proved to be far, [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.stevechamberlin.com/cpu/thomas.jpg" align="left" width="228" height="165" />OK, it&#8217;s time to admit defeat. 3D Graphics Thingy is not going to happen. It&#8217;s been six months since I worked on it. Heck, I even let my web hosting account expire due to neglect.</p>
<p>So what happened? I ran hard into the memory interface wall. Getting a decent DRAM controller working proved to be far, far more difficult than I&#8217;d expected, even with the assistance of Xilinx wizards and prebuilt controller packages. And since getting a working memory interface is a precondition to actually doing any of the 3D stuff, well, that sure put a damper on things.</p>
<p>A second reason for failure is that I found working with FPGAs to be abstract and unsatisfying, and the tool software to be a nightmare. When I built BMOW, I was constantly wiring things, debugging with the oscilloscope, buying new chips, soldering switches, and generally being hands-on. In contrast, 3DGT development ended up being nothing but writing Verilog in a text editor, and wondering why the Xilinx synthesis tools never did what I expected them to. The FPGA hardware itself just sat, untouched.</p>
<p>So what&#8217;s next? Since last summer, I haven&#8217;t done any electronics work at all, except building a light saber from a string of Christmas lights and a flourescent tube cover.  I&#8217;ve gotten pretty involved in remote control vehicles, primarily RC planes, which give a few excuses to solder and build simple circuits. I have half an idea to use an Arduino with my Slow Stick somehow, to collect acceleration data in flight, or automate aerial photography or something.</p>
<p>Maybe I&#8217;ll come back to the CPU design thing again at some point. I still have a 68008 and some other parts I bought last year that I never got to use, so those are still waiting for me. For all those who contacted me asking if they could build something like BMOW or 3DGT, or asking for advice, send me a note and let me know how your projects are progressing now.</p>
<p>Happy hacking wishes to you all!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.stevechamberlin.com/cpu/2010/02/04/fail/feed/</wfw:commentRss>
		</item>
		<item>
		<title>SDRAM</title>
		<link>http://www.stevechamberlin.com/cpu/2009/07/14/sdram/</link>
		<comments>http://www.stevechamberlin.com/cpu/2009/07/14/sdram/#comments</comments>
		<pubDate>Wed, 15 Jul 2009 04:39:18 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.stevechamberlin.com/cpu/2009/07/14/sdram/</guid>
		<description><![CDATA[I think I&#8217;m making life more difficult than it needs to be, trying to get this DDR2 SDRAM interface to work. It&#8217;s not that the logical interface is so complicated, really&#8230; you set your row and column addresses, do a burst transaction, check for refresh&#8230; not trivial, but not rocket science either. And the Xilinx [...]]]></description>
			<content:encoded><![CDATA[<p>I think I&#8217;m making life more difficult than it needs to be, trying to get this DDR2 SDRAM interface to work. It&#8217;s not that the logical interface is so complicated, really&#8230; you set your row and column addresses, do a burst transaction, check for refresh&#8230; not trivial, but not rocket science either. And the Xilinx MIG or other vendor-specific wizard will generate a memory interface for you to use as a starting point.</p>
<p>No, what seems to be difficult is that the margin for error with DDR2 SDRAM is much smaller than with SRAM or plain (single data rate) SDRAM. The voltages are lower, the timing tolerances are tighter, and much more care must be given to compensating for things like possible skew, processes variation between different FPGAs, power supply tolerances, and a host of other worries.</p>
<p>I&#8217;ve been reading a LOT on this topic in the past couple of weeks, and I&#8217;ve been struck by one thing. Except for my Xilinx Spartan 3A starter board, and Altera&#8217;s comperable Cyclone III board, I&#8217;ve seen zero boards that use DDR or DDR2 memory. The all use plain SDR SDRAM, also known as PC100 or PC133 depending on the speed. I looked at boards in the $150 to $300 range from <a href="http://www.opalkelly.com/products/xem3010/">Opal Kelly</a>, <a href="http://www.knjn.com/FPGA-FX2.html">KNJN</a>, <a href="http://www.xess.com/prods/prod035.php">XESS</a>, and others, and they all use plain SDR SDRAM. Maybe I should take a hint?</p>
<p>Meanwhile, I&#8217;ve been digesting as much FPGA documentation as I can. So far I&#8217;ve chewed through about 1500 pages of the Xilinx MIG user manual, Spartan 3 series user manual, and Spartan 3A addendum, and I&#8217;m midway through the comprehensive book <a href="http://www.amazon.com/gp/product/0470185325?ie=UTF8&amp;tag=runworks-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0470185325">FPGA Prototyping by Verilog Examples: Xilinx Spartan-3 Version</a>. It&#8217;s the best &#8220;getting started&#8221; reference I&#8217;ve seen yet, with good coverage of Verilog, FPGA hardware, and the Xilinx software tools.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.stevechamberlin.com/cpu/2009/07/14/sdram/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Small Progress</title>
		<link>http://www.stevechamberlin.com/cpu/2009/07/05/small-progress/</link>
		<comments>http://www.stevechamberlin.com/cpu/2009/07/05/small-progress/#comments</comments>
		<pubDate>Mon, 06 Jul 2009 04:44:50 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
		
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.stevechamberlin.com/cpu/2009/07/05/small-progress/</guid>
		<description><![CDATA[Finally, some small progress on the memory interface. After banging my head every which way against the Xilinx tools, and reading everything I could find on the subject, I came across Leo Silvestri&#8217;s page on modifying the Xilinx MIG memory controller design for a Spartan 3E board. It&#8217;s for a different kit and an older [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.stevechamberlin.com/cpu/s3aleds.jpg" width="368" align="left" height="223" />Finally, some small progress on the memory interface. After banging my head every which way against the Xilinx tools, and reading everything I could find on the subject, I came across Leo Silvestri&#8217;s page on <a href="http://whoyouvotefor.info/mig20.html">modifying the Xilinx MIG memory controller design for a Spartan 3E board</a>. It&#8217;s for a different kit and an older version of the software, but with his help I was finally able to build the reference design and testbench for the Spartan 3A board, program it to the FPGA, and see the LED that indicates success. It&#8217;s not very exciting, but it&#8217;s progress.</p>
<p>I still can&#8217;t believe all the steps I went through, and the whole process has made me quite bitter about Xilinx&#8217;s software tools. I&#8217;m sure it would be easier if I had better general knowledge of this field, but the last few weeks of this project have been like being lost at sea, and totally disoriented. It still feels more like a series of disconnected guesses than a genuine understanding, but here&#8217;s what I&#8217;ve managed to piece together on the topic of using the DDR2 SDRAM that&#8217;s on the Spartan 3A kit board.<br />
<br clear="all" /></p>
<ol>
<li>The Xilinx MIG can&#8217;t be used to generate a new memory controller design for the Spartan 3A board. This is because the way the SDRAM on the board is connected to the FPGA pins violates some of the MIG design rules. The only solution is to use the pre-built Spartan 3A board reference controller design, which then locks you into a specific burst length and CAS latency, or to hand-modify the code generated by the MIG, which is way beyond the skills of a noob like me.</li>
<li>Using the newest version of the Xilinx ISE and MIG, attempting to add the Spartan 3A reference design to your project will cause a crash. No answer from Xilinx support on this.</li>
<li>You can also get the Spartan 3A reference design as a zip file. But if you unzip it, add all the files to a new ISE project, and try to build it, you&#8217;ll get lots of errors about non-existant nets that I couldn&#8217;t resolve.</li>
<li>There&#8217;s also a batch file in the zip file that will create a new ISE project for you. But try to build it, and you&#8217;ll be told that the design requires a ChipScopePro license, which is Xilinx&#8217;s software logic analyzer. I found a discussion of this on the Xilinx forums, but no resolution other than to create a new controller design that omits ChipScopePro support, which is impossible for this board due to issue number 1 above.</li>
<li>What finally worked was to hand-edit the reference design, deleting parts of it semi-randomly until the ChipScopePro error disappeared. It turned out that required removing three modules called icon, ila, and vio, none of which seemed obviously related to debugging to me.</li>
</ol>
<p>So there you have it. The next step will be to begin to actually use this interface for something more interesting than lighting up an LED. I&#8217;m just now realizing that the interface created by the MIG is just the first, small step towards what the 3DGT memory controller must eventually become. It&#8217;s not enough to simply have an interface that permits reading and writing. To achieve half-way decent performance, much care will be required to manage and coordinate those reads and writes, minimizing waiting and wasted time, and maximizing throughput. And to top it off, it&#8217;s going to need a bus master to arbitrate memory access between the display circuit, pixel processors, vertex processors, and any other consumers of memory. All this is a substantial project in itself, that will need to be at least partially completed before any real progress can begin on the 3D part of 3DGT. Looks like a long, slow climb, but I&#8217;m moving ahead.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.stevechamberlin.com/cpu/2009/07/05/small-progress/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Xilinx Memory Controller</title>
		<link>http://www.stevechamberlin.com/cpu/2009/07/03/xilinx-memory-controller/</link>
		<comments>http://www.stevechamberlin.com/cpu/2009/07/03/xilinx-memory-controller/#comments</comments>
		<pubDate>Sat, 04 Jul 2009 07:58:04 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
		
		<category><![CDATA[3D Graphics Thingy]]></category>

		<guid isPermaLink="false">http://www.stevechamberlin.com/cpu/2009/07/03/xilinx-memory-controller/</guid>
		<description><![CDATA[I think I&#8217;m about ready to crush this Xilinx starter kit under my boot, and use the pulverized component dust to scrub my toilet. That&#8217;s not quite fair, though, as my frustration isn&#8217;t really with the hardware, but with the inexplicable Xilinx software. At this point, I&#8217;ve spent about 20 hours over a couple of [...]]]></description>
			<content:encoded><![CDATA[<p>I think I&#8217;m about ready to crush this Xilinx starter kit under my boot, and use the pulverized component dust to scrub my toilet. That&#8217;s not quite fair, though, as my frustration isn&#8217;t really with the hardware, but with the inexplicable Xilinx software. At this point, I&#8217;ve spent about 20 hours over a couple of weeks, just trying to instantiate the sample Xilinx SDRAM memory controller. I&#8217;m amazed that something so central to the use of a Xilinx FPGA or starter kit could be so obtuse. Or maybe it&#8217;s me that&#8217;s obtuse, but regardless, I was never so exasperated in all the time I was working on BMOW. Back then, at least each piece of hardware was small and understandable, and any errors were of my own making. Now I&#8217;m spending hour upon hour attempting to decode the error messages from Xilinx&#8217;s software, and trying to guess at how they intended this process to work. I expected something like:</p>
<ol>
<li>Create new project</li>
<li>Run &#8220;memory interface generator&#8221; wizard (which Xilinx calls the M.I.G.)</li>
<li>Choose memory type, speed, etc.</li>
<li>The wizard adds some auto-generated .v and .ucf (user constraints) files to my project</li>
<li>Optionally, wizard also adds a test bench, or some kind of example</li>
<li>Synthesize the example, program it to the starter kit, and blink some LEDs to show that it worked.</li>
</ol>
<p>That was the theory anyway. The reality has been a long series of software errors and omissions too dull to recount in detail. The short version is that when I use the MIG to generate an interface specifically for the Spartan 3A starter kit, the MIG crashes. If I follow some hazy instructions for manually adding the reference design to the project without using the MIG, then I get something that fails the &#8220;translate&#8221; step. If I use the MIG to generate a new interface design for a board that just happens to have the same hardware as the Spartan 3A starter kit, I also get something that fails the &#8220;translate&#8221; step. In either case, before the fatal errors, there are many warnings saying that dozens of flip-flops were determined to have a constant 0 or 1 value, and so were optimized away, as well as copious other warnings. Clearly I&#8217;m doing something very wrong, but creating a sample design using the reference memory interface on the reference board seems like it should be about as simple a case as it&#8217;s possible to get.</p>
<p>I would have given up on it a while ago, except that with no memory interface, there can be no 3D Graphics Thingy. This simply must be made to work in order for the project to progress any further. Unfortunately I&#8217;m about out of ideas. I need to find a simple walk-through tutorial that starts with &#8220;open ISE, press the New Project button&#8221; and finishes with happy green checkmarks next to all the steps in the processes window for an example design using the MIG controller. There are only about 10 mouse clicks needed between that start and finish, so it would seem hard to mess it up much. Either I&#8217;m doing something basic wrong, or omitting something, or my computer is haunted. With luck, it will become clear tomorrow.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.stevechamberlin.com/cpu/2009/07/03/xilinx-memory-controller/feed/</wfw:commentRss>
		</item>
		<item>
		<title>More on Memory</title>
		<link>http://www.stevechamberlin.com/cpu/2009/06/27/more-on-memory/</link>
		<comments>http://www.stevechamberlin.com/cpu/2009/06/27/more-on-memory/#comments</comments>
		<pubDate>Sat, 27 Jun 2009 17:44:54 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
		
		<category><![CDATA[3D Graphics Thingy]]></category>

		<guid isPermaLink="false">http://www.stevechamberlin.com/cpu/2009/06/27/more-on-memory/</guid>
		<description><![CDATA[I&#8217;ve been working hard the past week on a DDR2 memory controller for the Xilinx starter kit, and refining my estimates for 3d Graphics Thingy&#8217;s memory bandwidth requirements. There&#8217;s been progress, but it feels like things are moving at a snail&#8217;s pace.
I wrote earlier about some basic bandwidth estimates, and have revised them somewhat here. [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.stevechamberlin.com/cpu/memory.jpg" align="left" width="200" height="122" hspace="10" />I&#8217;ve been working hard the past week on a DDR2 memory controller for the Xilinx starter kit, and refining my estimates for 3d Graphics Thingy&#8217;s memory bandwidth requirements. There&#8217;s been progress, but it feels like things are moving at a snail&#8217;s pace.</p>
<p>I wrote earlier about some basic bandwidth estimates, and have revised them somewhat here. Assume pixels and texels are 16 bits (5-6-5 RGB format), z-buffer entries are 24 bits, and the screen resolution is 640&#215;480 @ 60Hz. Let&#8217;s also assume a simple case where there&#8217;s no alpha blending being performed, and every triangle has one texture applied to it, using point sampling for the texture lookup. For every pixel, every frame, the hardware must:<br />
<br clear="all" /></p>
<ol>
<li>Clear the z-buffer, at the start of the frame: 3 bytes</li>
<li>Clear the frame buffer, at the start of the frame: 2 bytes</li>
<li>Read the z-buffer, when a new pixel is being drawn: 3 bytes</li>
<li>Write the z-buffer, if the Z test passes: 3 bytes</li>
<li>Read the texture data, if the Z test passes: 2 bytes</li>
<li>Write the frame buffer, if the Z test passes: 2 bytes</li>
<li>Read the frame buffer, when the display circuit paints the screen: 2 bytes</li>
</ol>
<p>Assume too that the scene&#8217;s depth complexity is 4, meaning the average pixel is covered by 4 triangles, and steps 3-6 will be repeated 4 times. Add everything up, and that&#8217;s 47 bytes per pixel, times 640 x 480 is 14.43 MB per frame, times 60 Hz is 866.3 MB/s.</p>
<p>The DDR2 memory on the Xilinx starter kit board has a theoretical maximum bandwidth of 1064 MB/s, so that might just fit. I have serious reservations about my ability to later recreate such a high-speed memory interface on a custom PCB, but ignore that for now. Unfortunately you&#8217;ll never get anything close to the theoretical bandwidth in real world usage, unless you&#8217;re streaming a huge chunk of data to consecutive memory addreses. Even half the theoretical bandwidth would be doing well. I&#8217;ll be conservative and assume I can reach 1/3 of the theoretical bandwidth, which means 355 MB/s. That&#8217;s not enough. And I&#8217;ll also need some bandwidth for vertex manipulations, since I&#8217;ve only considered pixel rasterization, and possibly for CPU operations too. It looks like things will definitely be bandwidth constrained.</p>
<p>Fortunately there are some clever tricks that can be used to save lots of memory bandwidth.</p>
<ol>
<li><strong>Z occlusion:</strong> When a pixel fails the Z test at step 3, then steps 4-6 can be skipped. With a depth complexity of 4, and assuming randomly-ordered triangles, then on average 1 + 1/2 + 1/3 + 1/4 = 2.08 triangles will pass the Z test and get drawn, not 4. That&#8217;s a savings of 14 bytes per pixel, or 258 MB/s!</li>
<li><strong>Back-face culling:</strong> When drawing solid objects, it&#8217;s guaranteed that any triangle facing away from the camera will be overdrawn by some other triangle facing towards the camera. These back-face triangles can be ignored completely, skipping steps 3-6 and saving 10 bytes per culled pixel. Assuming half the pixels are part of back-facing triangles, then that&#8217;s a savings of 369 MB/s. Of course some of the pixels rejected due to back-face culling would also have been rejected by Z occlusion, so it&#8217;s not valid to simply add the savings from the two techniques.</li>
<li><strong>Z pre-pass:</strong> Another technique is to draw the entire scene while skipping steps 5 and 6, so only the Z buffer is updated. Then the scene is drawn again, but step 3 is changed to test for an exactly equal Z value, and step 4 is eliminated. This guarantees that steps 5 and 6 are only performed once per pixel, for the front-most triangle. However, step 3 must now be performed twice as many times, and all the vertex transformation and triangle setup work not accounted for here must be done twice. Whether this results in an appreciable overall savings depends on many factors.</li>
<li><strong>Skip frame buffer clear:</strong> If the rendered scene is indoors and covers the entire screen, then the frame buffer clear in step 2 can be omitted. That&#8217;s a savings of 37 MB/s.</li>
<li><strong>Skip Z-buffer clear: </strong>If the rendered scene covers the entire screen, then the Z-buffer clear in step 1 can also be omitted, but sacrificing one bit of Z-buffer accuracy. On even frames, the low half of the Z-buffer range can be used. On odd frames, the high half can be used, along with a reversal in the sense of direction, so larger values are treated as being closer to the camera. This means that every Z value from an even frame is farther away than any Z value from an odd frame, so each frame effectively clears the Z-buffer for the next one. This provides a savings of 55 MB/s.</li>
<li><strong>Texture compression:</strong> Compression formats like DXT1 can provide a 4:1 or better compression ratio for texture data. If the rasterizer can be structured so that an entire texture is read into a cache, and then used for calculations on many adjacent pixels, this can translate directly into a 4:1 bandwidth savings on step 5. Assuming less than perfect gains of 2:1, that translates to a savings of 18 MB/s.</li>
<li><strong>Texture cache:</strong> Neighboring pixels on the screen are likely to access the same texels, when the textures are drawn magnified. A texture that&#8217;s tiled many times across the face of a triangle may also result in many reads of the same texel. The expected savings depend on the particular model that&#8217;s rendered, but are probably similar to those for texture compression, or about 18 MB/s.</li>
<li><strong>Tiled Z-Buffer:</strong> The Z-buffer can be divided into many 8&#215;8 squares, with a small amount of state data cached for each square: the farthest point (largest Z value) in the square, and a flag indicating if the square has been cleared. That&#8217;s 25 bits per square, or 15 KB for a 640&#215;480 Z-buffer. That should fit in the FPGA&#8217;s block RAM. Then when considering a pixel before step 3, if the pixel&#8217;s Z value is larger than the cached Z-max for that square, the pixel can be rejected without actually doing the Z-buffer read. Furthermore, when the Z-buffer needs to be cleared, the cleared flag for the block can be set without actually clearing the Z-buffer values. Then the next time that Z-buffer square is read, if the cleared flag is set, the hardware can return a square filled with Z-far without actually reading the Z-buffer values. This skips both a Z write and a Z read for the entire square. In order to gain the benefit of the cleared flag, the hardware must operate on entire 8&#215;8 blocks at once before writing the result back to the Z-buffer. The total savings for both these techniques is at least 110 MB/s, and possibly as much as 165 MB/s depending on how much is occluded with the square-level Z test.</li>
<li><strong>Z-buffer compression:</strong> 8&#215;8 blocks of Z-buffer data can be stored compressed in memory, using some kind of differential encoding scheme. Like the previous technique, this would require the hardware to operate on an entire 8&#215;8 block at a time in order to see any benefit. The cost of all Z-buffer reads and writes might be reduced by 2:1 to 4:1, at the cost of additional latency and hardware complexity to handle the compression. This could provide a savings in the range of 350 MB/s.</li>
</ol>
<p>Unfortunately the savings from all these techniques can&#8217;t merely be summed, and the savings I&#8217;ve estimated for each one are assuming it&#8217;s done by itself, without any of the other techniques. However, when used together, the combination of backface culling plus Z-occlusion should provide at least 400 MB/s in savings, texture compression and caching another 30 MB/s, and Z-buffer tiling another 110 MB/s. That lowers the total bandwidth needs down to 326 MB/s, roughly the same as my conservative estimate of real-world available bandwidth.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.stevechamberlin.com/cpu/2009/06/27/more-on-memory/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
