Device Clock Generation

Hacker News Top News

Summary

A technical blog post discussing methods for generating device clocks in FPGA and ASIC designs for interfacing with peripherals such as NOR flash and NAND flash.

No content available
Original Article
View Cached Full Text

Cached at: 06/12/26, 05:51 AM

# Device Clock Generation Source: [https://zipcpu.com/blog/2025/12/17/devclk.html](https://zipcpu.com/blog/2025/12/17/devclk.html) After building a[CPU](https://zipcpu.com/about/zipcpu.html),[utilities for handling bus interconnects](https://github.com/ZipCPU/wb2axip), several DMAs and memory controllers, I often find my time focused on building interfaces between designs and external peripherals\. This seems to be where most of the business has landed for me\. Often, these peripherals require a clock output, coming from the design, and so I’d like to spend some time describing how to generate such a “device” clock\. Fig 1\. A Basic SOC with Peripherals![](https://zipcpu.com/img/devclk/soc.svg)There’s actually two topics that need to be discussed when working with modern high speed peripheral design\. One of them is*generating*the clock to be sent to the peripheral, such as Fig\. 1 above illustrates\. The second one involves*processing*a clock returned from the peripheral, as shown in Fig\. 2 below\. This is a key component of high speed designs such as DDR memories, eMMC, HyperRAM, or even NAND flash protocols\. This second topic is one we shall need to come back to at a later date\. Fig 2\. Data returned with a clock![](https://zipcpu.com/img/devclk/bidir-clk.svg)Today, I’d like to discuss how to go about*generating*a clock to control device interaction\. I first came across this problem when building a[NOR flash controller](https://zipcpu.com/blog/2019/03/27/qflexpress.html), based on first a[SPI interface](https://zipcpu.com/blog/2018/08/16/spiflash.html)and later a[Quad SPI interface](https://zipcpu.com/blog/2019/03/27/qflexpress.html)\.[My controller](https://github.com/ZipCPU/qspiflash)was designed for FPGAs, and so the clock could be built with a single frequency\. This design had the added complication that the clock needed to be paused from time to time\. Specifically, the clock needed to be turned off when nothing was going on\. Likewise, the clock needed to be turned off for one cycle after dropping \(i\.e\. activating\) the chip select pin, and for a couple cycles after the transaction was complete but before raising \(deactivating\) the chip select\. I had to deal with a similar problem when controlling a HyperRAM, but …[that design](https://github.com/ZipCPU/wbhyperram)failed when I wasn’t \(yet\) prepared to handle the return clock properly\. I did say this deserved an article in its own right, did I not? Processing data on a return clock properly can be a challenge\. I then built[a similar design for ASIC platforms](https://www.arasan.com/product/xspi-psram-master/)\. Unlike the FPGA, the final clock speed wouldn’t be known until run time\. It might be that the design started at a slower clock speed, only to later speed up to the full rate at run time\. Unlike an FPGA which can be fixed later, there’s really no room for failure in[ASIC work](https://zipcpu.com/blog/2017/10/13/fpga-v-asic.html)\. At least with an FPGA, if my board didn’t support a particular frequency, I could just rebuild the design for the clock frequency it did support\. This doesn’t work, though, for an ASIC–since it tends to be cost prohibitive to rebuild the design at a later time when you decide to connect it to a slower part than the one you designed it for\. The next design I worked with was a[NAND flash design](https://www.arasan.com/product/onfi-4-2-controller-phy/)\. NAND flash can be a challenge, since the protocol requires you to start at a slow frequency and only after you bring up the connection are you allowed to change to a faster frequency\.[This particular design](https://www.arasan.com/product/onfi-4-2-controller-phy/)was built for ASIC environments, and so it depended upon an analog component generating all the clocks I needed\. This worked great, up until someone wanted to purchase the design to work on an FPGA, then another wanted it to work on an FPGA, and another and so on\. Fig 3\. Single Data Rate \(SDR\) vs Dual Data Rate \(DDR\)SDR![](https://zipcpu.com/img/devclk/sdr.svg)DDR![](https://zipcpu.com/img/devclk/ddr.svg)Just to add another twist to the problem, many protocols require data transitions on both edges of the clock, a protocol often known as “Dual Data Rate” \(DDR\)\. Unlike the other designs above, these often require a clock that is 90 degrees offset from the data–so that each clock transition takes place in the middle of each data valid window, rather than on the edges of the window\. This sort of “offset” clock is necessary to guarantee setup and hold times within the slave peripheral\. An example of the clock and data relationship required by DDR as opposed to a traditional “single data rate” \(SDR\) clock is shown in Fig\. 3\. By the time I got to my[SDIO/eMMC controller](https://github.com/ZipCPU/sdspi), I think I finally had the clock division problem handled\. An[SDIO controller](https://github.com/ZipCPU/sdspi)needs bring up the SD card at 400kHz, and then depending upon the card, the PCB, and the controller, the speed may then be raised to 25MHz, 50MHz, 100MHz, or even 200MHz\. The clock may also be stopped whenever either there’s nothing to send or receive, or when the SOC can’t load or unload the data to the controller\. For example, you might ask an SD card to read and thus produce many blocks of data, then read the first two of these blocks into your internal buffers only to find that the CPU is slow in draining those buffers\. In that case, you would need to stop the interface clock before the external card tries to send you a third block of data that would have nowhere to go\. Other devices require user programmable device clock controllers, such as: - [10M/100M/1Gb Ethernet controllers](https://github.com/ZipCPU/videozip/tree/master/rtl/ethernet) While each of these speeds might use a single clock, building a truly trimode controller requires some extra work\. - [\(DDR\) SDRAM controllers](https://zipcpu.com/zipcpu2025/05/28/memtest.html) SDRAM controllers from an FPGA standpoint tend to be simple: just produce a clock\. However, you can turn the clock off for better power performance\. Yes, there are rules … but we won’t get into those here today\. - I2S [We discussed generating an I2S clock at a totally arbitrary frequency](https://zipcpu.com/blog/2019/06/28/genclk.html)some time ago\. - [I2C](https://zipcpu.com/blog/2021/11/15/ultimate-i2c.html) In general, I2C is too slow to be the focus of this article\. There is an I3C protocol that is built on top of I2C\. The techniques we discuss today might work well for I3C masters, but I’m not nearly as familiar with those\. - [SPI – not just NOR flash](https://github.com/ZipCPU/wbspi) While SPI*slaves*have a device clock as well, handling these clocks is fundamentally different from what I’m describing today\. My focus today will be on*generating*clock signals for the purpose of controlling external devices–such as an SPI master might need to do\. Specifically, today I want to look at and discuss generating a clock with one or more of the following characteristics: - **Output Signal:**We’re talking about interface clocks–those generated by the “master” of the interface\. These are*digital*signals, output from either an FPGA \(or ASIC\) device\. The output may be accomplished via a component like an[ODDR](https://zipcpu.com/blog/2020/08/22/oddr.html)or an OSERDES, with or without an additional analog delay following\. - **Discontinuous:**The clock may be discontinuous\. Many protocols \([flash](https://zipcpu.com/blog/2019/03/27/qflexpress.html),[SDIO/eMMC](https://github.com/ZipCPU/sdspi), etc\) allow or even require, the clock to be stopped, or otherwise only toggled when there’s something to send or receive\. As mentioned above, stopping the clock may also be useful for pausing a transmission in progress before a source buffer runs dry, or an incoming buffer overflows\. - **Dynamic Frequency:**Often, the outgoing clock needs to change frequency during operation as part of the protocol\. For example, the SDIO protocol needs to start at 400kHz, and then increase to 25MHz \(or more\)\. Therefore, a good clock generator will need to be able to naturally generate multiple clock frequencies as the protocol requires\. - **Minimum pulse width:**Switching between frequencies must be done by rule: clock glitches must be fully disallowed and guaranteed against\. Too\-short clock pulses cannot be allowed\. Clock high and low durations must always be at least a half period of the fastest allowable clock\. - **90 Degree Offset for DDR Signaling:**As shown in Fig 3, many modern protocols require both positive and negative edge signaling \(DDR\)\. This drops the required clock frequency by 2x, reducing the bandwidth that must be carried over the PCB for the same data rate\. However, the clock signal required to support such DDR signaling often needs to be delayed 90 degrees from the data, so that it transitions in the middle of the data valid period\. - **Faster than the controller’s clock:**Just to make matters worse, in[my eMMC design](https://github.com/ZipCPU/sdspi), I needed to generate a 200MHz DDR device clock from a 100MHz system clock\. All this is to say that our goal today will be to create a divided clock using digital, rather than analog, logic\. \(Yes, I can hear my analog engineering friends jump in here with the comment that “Everything is analog\!” God bless you, my friends\.\) ## The Problem The first approach I often see to this problem is the straight forward integer clock division approach\. Generally, it looks something like the following: ``` always @(posedge src_clk) if (reset) counter <= 0; else if (!active_clock) counter <= 0; else // if (active_clock) counter <= counter + 1; assign dev_clk = (high_speed) ? (src_clk && active_clock) : counter[user_selected_bit]; ``` In this case,`active\_clock`controls whether or not the clock is stepping, and`user\_selected\_bit`controls to what level of clock division we are interested in\. As for the`src\_clk`, that can be either the system clock or alternatively whatever is required to generate the fastest clock frequency required by the protocol\. Note that we’ve done nothing to guarantee this clock won’t glitch between speed selections, nor can we necessarily guarantee the minimum of two clock rates\. We’ll come back to these requirements later, albeit with a different \(better\) implementation\. The user logic required to use this clock this looks very simple at first: ``` always @(posedge dev_clk or posedge reset) if (reset) begin // Reset logic end else begin pedge_data <= // Logic controlling any flops based on the dev_clk end ``` When a protocol requires data on both edges of the clock, getting the data right for the second edge of the clock is also important\. But, how shall we output data on the negative edge of a clock we’ve just created out of thin air? We’ll need to transition on the negative edge to do this\. ``` always @(negedge dev_clk or posedge reset) if (reset) begin // Reset logic end else begin nedge_data <= // Logic controlling the negative clock's data end assign output_data = (dev_clk || !ddr_mode) ? pedge_data : nedge_data; ``` This approach leaves us with two problems\. The first is that we’re using our clock as a logic signal when we assign`dev\_clk`to possible be the same as our source clock\. The second problem is that we are transitioning user logic on this clock\. Worse, though, we’re now transitioning our user logic on both edges of the clock\. This violates[*the rules*](https://zipcpu.com/blog/2017/08/21/rules-for-newbies.html)of good digital logic design\. These aren’t necessarily issues when building ASIC designs\. However, in FPGA design, this clock will need to get onto the clocking network’s backbone somehow, and that’s not automatic\. Worse, this new clock is*not*the same as the original`src\_clk`–even when they are at the same frequency\. There will always be a delay between the two clocks–a delay that may not be captured by pre\-synthesis simulation, and so it can be a dangerous delay the engineer isn’t expecting when building this logic\. This leads to two commercial ASIC design challenges\. First, when designing an ASIC IP, you want to be able to test as much of the IP on an FPGA as possible\. Non FPGA compatible logic needs to be moved to the periphery of the design and carefully controlled\. Second, from a business point of view, it helps to be able to sell the ASIC design to FPGA customers in addition to ASIC customers\. So, even though you*can*do something like this on an ASIC, that doesn’t mean you*should*\. There are other problems\. - [Clock domain crossings \(CDCs\)](https://zipcpu.com/blog/2017/10/20/cdc.html) Since the`src\_clk`and`dev\_clk`are now two separate and distinct clock domains, you’ll need to properly manage every[clock domain crossing](https://zipcpu.com/blog/2017/10/20/cdc.html)between these two clock domains\. This can create additional delays through what otherwise might be high speed logic\. Likewise, the positive and negative edges of the same clock are also \(technically\) separate clock domains\. Moving between them is “possible, but not recommended\.” - Gating You may have noticed we haven’t properly gated our clock above\. Sure, we used an`active\_clock`signal to provide gating, but this signal does not guarantee the maximum frequency of the output clock\. This, however, is a minor problem that most engineers reading this blog would be able to easily fix with a little bit of additional logic\. Two problems in particular, though, become deal breakers when it comes to this type of design\. The first is that DDR interfaces often require a clock delayed by 90 degrees from the data, as shown in Fig\. 3 above\. The simple approach will not generate such a 90 degree delay\. While one might use an analog delay element, such as a Xilinx ODELAY element, to delay the clock signal by an appropriate amount, this will only work for high speed clocks and not for clocks less than 50MHz or so\. The second problem is, what do you do when you need a device clock that’s faster than your`src\_clk`, like I did in my[SDIO/eMMC controller](https://github.com/ZipCPU/sdspi)design? As a result, we really need another approach\. ## The Solution The basic solution is to return to[the rules](https://zipcpu.com/blog/2017/08/21/rules-for-newbies.html), and so avoid all transitions on the device clock edge at all\. Instead, we’ll continue to transition on our source clock and then use either an[ODDR](https://zipcpu.com/blog/2020/08/22/oddr.html)or an OSERDES to generate the final outgoing clock\. In the meantime, we’ll treat the newly generated device clock as a traditional logic signal–rather than a “clock” within our design\. That is, we’ll let it be and remain*logic*\. Let’s start by looking at Fig\. 3 above, and dividing the clock period into sections, as shown in Fig\. 4 below\. Fig 4\. Dividing the clock period![](https://zipcpu.com/img/devclk/ddrbyfour.svg)Nominally, we’d want at least two sections per clock–one for each piece of data in a DDR transmission\. Sadly, this isn’t enough, since the clock might need to be offset by 90 degrees\. Hence, we’ll need to break each clock period into four logically distinct time periods\. We can label these time periods 3:0, from left\-most or most\-significant being 3 down to the right most and least significant being 0\. From here, we can generate what I’m going to call a*wide*clock, four bits at a time\. This wide clock will then be output via a 4:1 OSERDES–if it is to keep pace with the source clock within our design\. At its fastest speed, this clock will be either`0011`\(where the MSB ‘0’ is transmitted “first”\), or`0110`if a 90 degree offset clock is required for DDR transmissions \(as shown in Fig\. 4\)\. At its next slowest speed, the clock would be`0000`followed by`1111`, or`0011`followed by`1100`\. Further clock divisions will use wide clocks of`0000`or`1111`\. If you wish to use an[ODDR](https://zipcpu.com/blog/2020/08/22/oddr.html)instead of a 4:1 OSERDES, you can still use this approach, save that you would be generating 2 wide clock bits at a time instead of four\. The fastest clock would be a repeating`01`, but this fastest clock would be unable to handle the 90 degree offsets of a DDR signal\. The next fastest would be either`00`followed by`11`, or the 90 degree offset version of the same at`01`followed by`10`\. If you want a clock running at twice your system frequency, you could use an eight\-bit wide clock signal, designed to feed an 8:1 SERDES\. Your fastest clock would become`00110011`\(non–DDR\) or`01100110`when working with DDR signals\. That’s the first step–the wide clock\. The second step is to generate, together with the wide clock signal, two other signals\. The first signal, let’s call this`new\_edge`, will indicate that a new clock cycle is beginning\. The second, which I shall call the`half\_edge`, will indicate that the second half of a clock cycle is beginning\. Both of these signals are also shown in Fig\. 4 above, each indicating the portion of the clock cycle they represent\. All three of these*logic*signals can be now generated by a “clock generator” module\. If necessary, this clock can be stopped either at the clock generator, or gated further down the signal pipeline by simply zeroing out the wide clock\. Let’s pause for a moment to illustrate what a “clock” like this might look like\. We’ll start with the highest speed clock, running at the source clock rate\. This clock will have a wide clock of`0011`, and new data on every clock edge\. Fig 5\. Highest speed SDR[![](https://zipcpu.com/img/devclk/h3.svg)](https://zipcpu.com/img/devclk/h3.svg)Fig\. 5 shows all of these key signals\. First, you can see the system clock, which we called`src\_clk`above, that everything is generated off of\. Next, you can see the IO clock we create, followed by the`wide\_clock`used to create it\. This is followed by the`new\_edge`control signal\. This clock might be the clock we would use for a data signal transitioning at once per clock \(SDR\)\. Therefore, to illustrate, I’ve also illustrated what a couple periods of this this data signal might look like\. Were this interface to run in DDR mode, sending one word of data on each edge of the clock, then the`wide\_clock`would need to be \(repeatedly\) set to`0110`, as shown in Fig\. 6 below\. Fig 6\. Highest speed DDR[![](https://zipcpu.com/img/devclk/h6.svg)](https://zipcpu.com/img/devclk/h6.svg)There are a couple key differences between Fig\. 6 and Fig\. 5 above\. The first, and perhaps most obvious, is that the data in Fig\. 6 are output at two words per system clock cycle\. This is often desirable, in that twice the data rate may now be achieved\. The second difference is that the IO clock is now offset 90 degrees from the data, instead of 180 degrees\. This is often necessary to guarantee that there is a clock transition in the middle of the data valid period\. To make this happen, the`wide\_clock`is now set to`0110`in each clock period\. Using these clock signals, we can also pause the clock–as shown in Fig\. 7 below\. Fig 7\. Pausing the clock[![](https://zipcpu.com/img/devclk/h6-pause.svg)](https://zipcpu.com/img/devclk/h6-pause.svg)Note that the key signals, such as`new\_edge`and`half\_edge`must also stop when the clock pauses \(stops\)\. Because there is no clock signal, the data output signals become don’t care\. \(For power reasons, I could see holding the output at at its previous value for short periods of time,`D2`in this case, but that’s another discussion\.\) This same signaling approach also works when dividing the clock speed by two\. Fig\. 8 shows an example SDR signal with a clock speed set to half the system clock speed\. Fig 8\. SDR at half the system clock rate[![](https://zipcpu.com/img/devclk/h0f.svg)](https://zipcpu.com/img/devclk/h0f.svg)Fig\. 9 shows the same thing, but this time for a DDR signal with the clock at half the system clock speed\. Fig 9\. DDR at half the system clock rate[![](https://zipcpu.com/img/devclk/h3c.svg)](https://zipcpu.com/img/devclk/h3c.svg)Before leaving this example, note how easy it was to change frequencies in this representation: we just adjusted the`wide\_clock`, and then the new and half clock positions changed to match\. We can drop the clock frequency again to a quarter of the system clock speed, as shown in Fig\. 10\. Fig 10\. SDR at a quarter of the system clock rate[![](https://zipcpu.com/img/devclk/h00ff.svg)](https://zipcpu.com/img/devclk/h00ff.svg)We can also offset this clock by 90 degrees, as shown in Fig\. 11\. Fig 11\. DDR at a quarter of the system clock rate[![](https://zipcpu.com/img/devclk/h0ff0.svg)](https://zipcpu.com/img/devclk/h0ff0.svg)When using this type of “wide” clock, user logic becomes simplified as well\. This “simplified” user logic is easily illustrated with an example\. For this example, let’s suppose we wished to control 8 data wires using this type of divided clock signaling\. Let’s also assume, for the purposes of this illustration, that the source arrives via an AXI stream interface with signals`S\_VALID`and`S\_DATA\[15:0\]`, and a ready signal given by`S\_READY`\. We’ll start with the`wide\_clock`,`new\_edge`, and`half\_edge`signals from the clock generator\. Note that, as we propagate these signals through our pipeline \(below\), we won’t send the`wide\_clock`straight to the output pad, but instead we’ll use it along side our data processing pipeline\. This way, if the pipeline must stall \(and it might need to\), the pipeline can also stall the outgoing clock at the same time\. Hence, we’ll create a one clock delayed version of this`wide\_clock`that we can call`outgoing\_clock`\. Further, a second signal,`active\_clock`, can be used to keep track of whether or not we’ve committed to the current clock cycle\. ``` always @(posedge src_clk) if (i_reset) begin outgoing_clock <= 4'h0; active_clock <= 1'b0; end else if ((S_VALID && S_READY) || (new_edge && second_edge)) begin // We commit to this clock if either // 1. We have new data and we are ready to consume this new data, *OR* // 2. We're in SDR (not DDR) mode, and we've already committed // to a byte of data that we haven't (yet) sent. // In both cases, we need to start a clock period. // // Note that S_READY implies new_edge // outgoing_clock <= wide_clock; // The "active_clock" signal is used to let us know that we've committed // to this clock cycle. From now until the next new_edge, we must // forward the wide_clock signal to the output. active_clock <= 1; end else if (new_edge) begin // The clock generator is creating an edge that ... we're not prepared // for or ready to handle. There's just no data available, so ... // let's stop the clock. outgoing_clock <= 4'h0; // In this case, we're not forwarding the clock, nor will we until // the next clock period. active_clock <= 1'b0; end else if (active_clock) // If we've already committed to this clock cycle, then we'll need to // ontinue it to its completion. outgoing_clock <= wide_clock; ``` Before we can get to the data, we need another key signal as well\. This is the`second\_edge`signal that we used above\. Here’s why: our data is going to arrive, 16b at a time via AXI stream\. If we are in DDR mode, then we’ll consume 8b on each edge of this clock–and possibly all 16b at once\. However, if we are only in SDR mode, then we’ll need to consume the second 8b on the next clock edge\. Hence, we’re going to need a signal that I’m calling,`second\_edge`, to tell us that we have 8b remaining of the 16b committed to us that didn’t get sent on the last clock tick\. ``` always @(posedge src_clk) if (reset && i_care_about_resets) second_edge <= 0; else if (S_VALID && S_READY) // In SDR, we just accepted 16b and output 8b. // We need another new_edge to send the remaining 8b. // Note that S_READY implies new_edge // // Also note that we only use this signal in SDR modes second_edge <= !ddrmode; else if (new_edge) // On any (other) new_edge, we can clear this signal second_edge <= 0; ``` That leads us to the`outgoing\_data`\. This is a 16 bit data signal, consisting of 8b,`outgoing\_data\[15:8\]`, which will be output on the first half of the clock, and another 8b,`outgoing\_data\[7:0\]`, which will be output on the second half of the clock\. A third signal,`next\_byte`, will be used for keeping track of the second byte of data in the case where we don’t output both bytes in the same clock period\. ``` always @(posedge src_clk) if (reset && i_care_about_resets) begin outgoing_data <= 0; next_byte <= 0; end else if (S_VALID && S_READY) begin // new_edge is implied by S_READY if (ddrmode && half_edge) begin // Set data for both halves of the clock // The first half in the MSBs outgoing_data[15:8] <= S_DATA[15: 8]; // The second half in the LSBs outgoing_data[ 7:0] <= S_DATA[ 7: 0]; end else begin // Set only the first half ot the data, but set it to be // output twice. We'll need to come back later for the second // outgoing byte. outgoing_data <= {(2){S_DATA[15:8]}}; end // Keep track of that second byte, so we can come back to it later. next_byte <= S_DATA[7:0]; end else if (new_edge ||(ddrmode && half_edge)) begin outgoing_data <= {(2){next_byte}}; end ``` The final signal we need to define is the`S\_READY`signal\. In this example, we can accept new data on any new clock edge,*unless*we have 8b remaining from the last clock edge that have yet to be output\. ``` assign S_READY = new_edge && !second_edge; ``` This approach provides us with a couple big advantages to our user logic over what we had before\. First and foremost,[all of our user logic now takes place on the same`src\_clk`](https://zipcpu.com/blog/2017/08/21/rules-for-newbies.html)\. We didn’t need any[CDCs](https://zipcpu.com/blog/2017/10/20/cdc.html)\. AXI slave data, generated externally on this`src\_clk`can now be used within our design on the same clock it was generated on\. Second, did you notice how we were able to[simply gate the clock](https://zipcpu.com/blog/2021/10/26/clk-gate.html)when there was no data available? If not, go back up and look again at the`active\_clock`signal\. Third, unlike the previous approach, we’ve now guaranteed that this clock signal won’t glitch\. That is, assuming the outgoing OSERDES won’t generate glitches from our glitchless data signals\. The previous clock generator, on the other hand, could well have had glitches between the clock and the data enabling it\. Also look at how easy it was to do pipelined processing\. The clock was generated prior to our pipeline, and simply propagated through the pipeline\. Although this pipeline only contains a single clock cycle, we could’ve easily extended the pipeline for multiple clock cycles if necessary by simply passing the`wide\_clock`,`new\_edge`, and`half\_edge`signals through the pipeline–adjusting them if and where necessary along the way\. As a result of this example, all IO pins can now be driven using a 4:1 OSERDES\. \(You could also use[ODDR](https://zipcpu.com/blog/2020/08/22/oddr.html)s for the data, if you trusted them to have the same timing relationship as the OSERDES\.\) What about frequency changes, or adjusting between the unshifted clock and the clock shifted by 90 degrees? What about when the clock is off, and needs to be turned on? All of these challenges and more now reside within the clock generator\. ## The Clock Generator For discussion purposes, let’s take a look at the[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)I used for[my SDIO/eMMC controller](https://github.com/ZipCPU/sdspi)\. As mentioned above, this[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)has the particular requirement of being able to generate two outgoing clock periods per system clock cycle, but otherwise it’s a fairly straight forward example of the discussion above\. From a configuration standpoint, there are a couple of configuration options\. For example, I wasn’t certain that I’d always have an 8:1 SERDES available to me, nor do all digital environments necessarily offer 2:1[ODDR](https://zipcpu.com/blog/2020/08/22/oddr.html)components\. Therefore, we allow those to be adjusted\. Second, I want to know the maximum number of bits required in my clock divider\. Still, these configuration parameters are fairly straightforward\. ``` module sdckgen #( // OPT_SERDES is required for generating an 8:1 output. parameter [0:0] OPT_SERDES = 0, // If no 8:1 SERDES are available, we can still create a clock // using a 2:1 ODDR via OPT_DDR parameter [0:0] OPT_DDR = 0, // To hit 100kHz from a 100MHz system clock, we'll need to // divide our 100MHz clock by 4, and then by another 250. // Hence, we'll need Lg(256)-2 bits. (The first three speed // options are special) localparam LGMAXDIV = 8 ) ( ``` The[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)is primarily controlled via three signals\. The first tells us whether we want our clock offset by 90 degrees for DDR outputs or not\. The second controls the speed of the outgoing clock\. The final signal tells us we can shut the clock down\. ``` input wire i_cfg_clk90, input wire [LGMAXDIV-1:0] i_cfg_ckspd, input wire i_cfg_shutdown, ``` When shut down, the wide clock output will be fixed at zero, as will both the`new\_edge`and`half\_edge`control signals\. The shutdown signal is actually really useful at slow clock speeds\. Sure you could shut the clock down, as we did above, by just not forwarding it through the pipeline\. On the other hand, once the clock has been shut down, you’d like to be able to restart it on a dime\. The shutdown control signal to our[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)allows us to do that\. Once set, the[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)takes the remainder of a clock cycle to shut down, and then stays ready to restart the clock at a moments notice\. The outputs from this module are just about what you would expect\. You have the three signals we’ve already discussed\. In this case,`o\_ckstb`is the`new\_edge`signal we’ve mentioned,`o\_hlfclk`is the`half\_edge`signal, and`o\_ckwide`is the`wide\_clock`signal\. ``` // output reg o_ckstb, // new_edge output reg o_hlfck, // half_edge output reg [7:0] o_ckwide, // wide_clock output wire o_clk90, output reg [LGMAXDIV-1:0] o_ckspd ); ``` The two new signals are`o\_clk90`and`o\_ckspd`\. These are feedback signals returned to the[control module](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdaxil.v), used to tell us when any frequency shift or phase shift operations are complete\. These feedback signals solve an issue I was having in my[eMMC controller](https://github.com/ZipCPU/sdspi), where the clock would be at some crazy low frequency \(100kHz or so\), and I’d want to speed it up\. Just setting the new clock speed wasn’t enough, since it might take a thousand clocks to finish a single cycle at the 100kHz clock speed\. However,[by checking these return signals via the register set, the software driver could then tell if any clock frequency change had fully taken effect](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/sw/emmcdrv.c#L1591-L1593)before going on to any next operation\. The next logic block is part of a two process finite state machine\. The first process, shown below, is the combinatorial process\. The second will be the clocked logic\. Personally, I’m not a big fan of two process state machines\. I’m just not\. They often seem to me to be adding extra work and complexity\. However, two process state machines allow me to reference logic results even before the full logic path is complete\. They also allow me an ability to describe more complicated logic than the simple single process state machine, so a two process state machine it is\. In this case, we are going to generate the next signal for the strobe,`nxt\_stb`, the clock,`nxt\_clk`, and the counter,`nxt\_counter`\. Of these signals,`nxt\_clk`is the simplest to explain\. This signal indicates that we’re about to start a new clock cyle\. In many ways, this is the combinatorial version of what is to become the`new\_edge`once latched\. Clock cycles themselves come in four phases, just like the four bits of the wide clock we discussed before\. You can think of these phases as the`0110`of the fastest clock before\. The first bit, 0, is the first phase of the clock\. Our`new\_edge`bit,`o\_ckstb`, will only ever be true on this phase\. The second bit, 1, is where the clock rises\. The third bit, 1 again, is the only phase where the`half\_edge`,`o\_hlfck`, will be set\. Finally, the clock will return to zero in the last phase\. If the clock is ever idle, it will idle in this first phase prior to delivering a`new\_edge`signal\. This background will help explain how I’ve divided up the counter\. There are`NCTR`bits to the counter\. Of those bits, the top two control the phase bits we just described, whereas the others are the clock divider\. The`nxt\_stb`signal, mentioned above and below, is simply a signal that these top two phase\-control bits are about to change\. With that as background, let’s take a look at how this works\. In general, the first step of any combinatorial block is to set all the values that will be determined within the block\. This is a good practice to get into to avoid accidentally generating any latches\. ``` always @(*) begin nxt_stb = 1'b0; nxt_clk = 1'b0; nxt_counter = counter; ``` From here, we subtract one from the bottom \(non\-phase\) bits of our counter on every cycle\. When these bits are zero, subtracting one will cause the counter to overflow and set our`nxt\_stb`signal, so we can know when to adjust the phase bits\. ``` { nxt_stb, nxt_counter[NCTR-3:0] } = counter[NCTR-3:0] - 1; if (nxt_stb) begin // Advance the top two bits { nxt_clk, nxt_counter[NCTR-1:NCTR-2] } = nxt_counter[NCTR-1:NCTR-2] +1; ``` If our clock speed is set to 0 \(wide clock of either`01100110`or`00110011`\) or 1 \(wide clock of`00111100`or`00001111`\), then we are always generating a new clock cycle\. In this case, we’ll hold the counter at zero and \(roughly\) ignore the phase\. ``` if ((OPT_DDR || OPT_SERDES) && ckspd <= 1) begin nxt_clk = 1; nxt_counter[NCTR-3:0] = 0; ``` Likewise, if the clock speed is equal to two, the wide clock will either alternate between`0000\_0000`and`1111\_1111`, or`0000\_1111`and`1111\_0000`, and so our phase will alternate, but otherwise everything else can be kept to zero\. ``` end else if (ckspd <= 2) begin nxt_clk = counter[NCTR-1]; nxt_counter[NCTR-3:0] = 0; ``` Finally, in the more general case, we’ll just set the bottom bits to count down from`ckspd\-3`to zero\. Yes, this is “just” a counter, but the maximum value is offset by three for the three special speeds we just discussed above\. ``` end else nxt_counter[NCTR-3:0] = ckspd-3; end ``` You may have noticed that we’ve only adjusted the bottom bits of this counter–the bits that count down\. We’ve done nothing to update the phase bits at the top of this “counter”, so let’s handle those next\. \(Spoiler alert: these MSBs don’t act like counter bits in this implementation\.\) Of course, for the highest frequencies, the counter will never change\. It sits at zero, with a permanent next phase of 3\. ``` if (nxt_clk) begin if ((OPT_DDR || OPT_SERDES) && new_ckspd <= 1) nxt_counter = {2'b11, {(NCTR-2){1'b0}} }; ``` When the speed setting is 2, we allow the top two bits to toggle back and forth\. If`nxt\_clk`is set, we need to reset these bits only\. ``` else if (new_ckspd <= 2) nxt_counter = { 2'b01, {(NCTR-2){1'b0}} }; ``` Finally, for the general case, we return the phase to zero and reset the clock\. ``` else begin nxt_counter[NCTR-1:NCTR-2] = 0; end end end ``` This is only the first half of this “two process” FSM\. The second half, with respect to the counter, is just about as simple\. Perhaps it is even more so, given that we’ve done all of the hard work above\. ``` always @(posedge i_clk) if (i_reset) begin if (OPT_SERDES) counter <= 0; else if (OPT_DDR) counter <= { 2'b11, {(NCTR-2){1'b0}} }; else counter <= { 2'b01, {(NCTR-2){1'b0}} }; end else if (nxt_clk && i_cfg_shutdown) counter <= { 2'b11, {(NCTR-2){1'b0}} }; else counter <= nxt_counter; ``` The big thing to notice here is the`nxt\_clk && i\_cfg\_shutdown`\. Remember, if the user ever asserts`i\_cfg\_shutdown`, we need to wait for clock cycle to complete before shutting it down\. Hence, we wait for the`nxt\_clk`signal before acting\. Then, once set, we leave the`counter`in a state where it will perpetually set`nxt\_clk`\. This way, the moment`i\_cfg\_shutdown`is released, we’ll be back to generating a clock again\. To explain this a bit better, imagine the clock generator is producing an output clock from ten periods of the source/system clock: five system clocks of`0000\_000`, followed by five more clocks of`1111\_1111`\. Imagine again that we’ve had several periods of these 10 clock cycles before the user asserts the clock shutdown signal\. We then wait another 10 cycles for the clock to fully shut down\. Now, if the user drops the shutdown signal after a further 3 cycles, we could either wait another 7 cycles \(to complete the 10\), or start immediately\. Here, we try to arrange to start a stopped clock immediately without violating any of our clocking rules\. The next signal,`clk90`, controls whether or not we’re generating an clock offset from`new\_edge`,`o\_ckstb`, by 90 degrees or not\. ``` always @(posedge i_clk) if (i_reset) clk90 <= 0; else clk90 <= w_clk90; assign o_clk90 = clk90; ``` This logic isn’t very interesting yet, since we’ve basically split a two process FSM\. It will become more so when we get to`w\_clk90`, and the first process of the FSM, below\. The key is, this logic must determine what the current 90 degree offset setting is\. Hence, when you look at the outgoing wide clock, this signal must match it\. How about the clock speed? In this case, we go through some error checking\. ``` initial ckspd = (OPT_SERDES) ? 8'd0 : (OPT_DDR) ? 8'd1 : 8'd2; always @(posedge i_clk) if (i_reset) ckspd <= (OPT_SERDES) ? 8'd0 : (OPT_DDR) ? 8'd1 : 8'd2; else ckspd <= w_ckspd; always @(*) if (OPT_SERDES) new_ckspd = i_cfg_ckspd; else if (OPT_DDR && i_cfg_ckspd <= 1 && !i_cfg_clk90) new_ckspd = 1; else if (i_cfg_ckspd <= 2 && (OPT_DDR || !i_cfg_clk90)) new_ckspd = 2; else if (i_cfg_ckspd <= 3) new_ckspd = 3; else new_ckspd = i_cfg_ckspd; assign w_clk90 = (nxt_clk) ? i_cfg_clk90 : clk90; assign w_ckspd = (nxt_clk) ? new_ckspd : ckspd; ``` The error checking is here to guarantee that a clock speed of 0 is only used when`OPT\_SERDES`is set\. Likewise, a clock speed of 1 may be used in[ODDR](https://zipcpu.com/blog/2020/08/22/oddr.html)mode \(wide clock of`00001111`\), but not when the`clk90`configuration is set \(calling for a wide clock of`0011\_1100`which is too complex for an[ODDR](https://zipcpu.com/blog/2020/08/22/oddr.html)output module to produce\)\. This continues for a clock speed of two which is fine for a non\-offset clock \(wide clock of`0000\_0000`followed by`1111\_1111`\), but not for an offset clock \(wide clock of`0000\_1111`followed by`1111\_0000`unless the`OPT\_DDR`option is set\. Finally, the two values`w\_clk90`and`w\_clkspd`are used to tell us what values our registered logic should use when generating a clock\. As such, they are either the registered values, or \(when we’re about to start a new cycle\) the new values\. With all this as background, we can now dig into the core of this logic–generating the three key signals we will be outputting\. On reset, these signals will simply be set to indicate a clock of the fastest rate, ready to go, but otherewise one that is idle \(`o\_ckwide=0`\)\. ``` initial o_ckstb = 0; initial o_hlfck = 0; initial o_ckwide = 0; always @(posedge i_clk) if (i_reset) begin o_ckstb <= 0; o_hlfck <= 0; o_ckwide <= 0; ``` Next, if we want to shutdown the clock, we can only do so on`nxt\_clk`\. When shutdown, the wide clock will be zero and the new edge signals willl all be suppressed\. ``` end else if (nxt_clk && i_cfg_shutdown) begin o_ckstb <= 1'b0; o_hlfck <= 1'b0; o_ckwide <= 8'h0; ``` As mentioned above, the key here is that the clock can suddenly start if the`i\_cfg\_shutdown`signal is released\. Using this logic, it does not need to remain phase coherent with whatever phase the clock had prior to being shutdown\. Moving on to our highest speed clock, we simply set that according to the 90 degree clock configuration\. In general, this speed will only ever generate one of two values:`01100110`or`00110011`\. ``` end else if (OPT_SERDES && w_ckspd == 0) begin o_ckstb <= 1; o_hlfck <= 1; o_ckwide <= (i_cfg_clk90) ? 8'h66 : 8'h33; ``` When running from a 100MHz system \(`src\_clk`\) clock, this plus the OSERDES will generates a 200MHz clock signal to the external device\. One might argue that the`OPT\_SERDES`here is really redundant\. There should be enough logic elsewhere to keep`w\_ckspd`at a non\-zero value if`OPT\_SERDES`is not set\. Why use it? It’s here specifically to provide a strong hint to the synthesis tool regarding logic that can be cleaned up if`OPT\_SERDES`is not set\. This block is complicated enough as it is, so adding it in should simplify our logic\. The problem with putting this value here, and generating a clock module based upon parameters such as`OPT\_SERDES`and`OPT\_DDR`, is that I now need to formally verify the IP under several conditions before I can know if it works\. This applies to simulation as well\. It is now no longer sufficient to run the simulation tool once when you do something like this\. It must now be run many times under different conditions\. As an engineer, I need to be aware of costs like this whenever I invoke logic like this\. In this case, I wanted to support multiple types of FPGAs \(and/or ASICs\), and so this was the logic I chose\. Our next speed,`ckspd=1`, has almost the same logic\. As before,`o\_ckstb`and`o\_hlfck`are both set continually in this mode\. In this case, our wide clock output will either be`0011\_1100`or`0000\_1111`depending on whether or not we need a 90 degree offset clock for DDR\. ``` end else if ((OPT_SERDES || OPT_DDR) && w_ckspd <= 1) begin o_ckstb <= 1'b1; o_hlfck <= 1'b1; o_ckwide <= (OPT_SERDES && w_clk90) ? 8'h3c : 8'h0f; ``` When running from a 100MHz system \(`src\_clk`\) clock, this generates a 100MHz clock as well\. You may note that there’s no real two\-cycle output signal\. The signaling, with`o\_ckstb`and`o\_hlfck`, allows us to describe a new clock together with or separate from the second half of that clock period, but offers nothing for describing two clock cycles in the same source clock period\. This is just a limitation in our chosen signaling\. The solution to this problem is specific to the[eMMC controller](https://github.com/ZipCPU/sdspi)that we’ve drawn[our example](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)from\. In this case, I look at both the DDR setting and the clock speed before generating any transmit data\. From this, I determine if I should be sending one byte, two bytes, or four bytes of data per clock\. The actual logic is more complex, due to the fact that the eMMC interface may run in 1b, 4b, or 8b modes, but that’s the story of[another piece of logic, found outside of the clock controller](https://github.com/ZipCPU/sdspi/blob/master/rtl/sdtxframe.v)\. As with clock speeds of either 0 \(200MHz\) or 1 \(100MHz\), the clock speed of 2 \(50MHz\) is also handled specially\. This is the speed that alternates between two outputs, generating either`00001111`followed by`11110000`in the offset mode \(`o\_clk90=1`\), or simply`00000000`followed by`11111111`in the normal mode\. ``` end else if (w_ckspd == 2) begin { o_ckstb, o_hlfck } <= (!nxt_counter[NCTR-1]) ? 2'b10 : 2'b01; if (w_clk90 && (OPT_SERDES || OPT_DDR)) o_ckwide <= (!nxt_counter[NCTR-1]) ? 8'h0f : 8'hf0; else o_ckwide <= (!nxt_counter[NCTR-1]) ? 8'h00 : 8'hff; ``` When running from a 100MHz system clock \(`src\_clk`above\), this generates a 50MHz output clock signal\. This might be the “fastest” speed you would normally think of for an integer clock “divider”\. As you can see, though, we’ve already generated outgoing 200MHz and 100MHz clocks above\. This brings us to the general case–a divided clock running at less than half our source clock rate\. Here, we’ve already done all of the hard work for`nxt\_clk`, so the outgoing next edge signal`o\_ckstb`is done\. ``` end else begin o_ckstb <= nxt_clk; ``` The half edge signal is determined by the counter\. The lower bits must be zero, indicating a new phase, and the top two bits indicate the new phase will be the third of four–so just entering halfway\. ``` o_hlfck <= (counter == {2'b01, {(NCTR-2){1'b0}} }); ``` The wide clock is determined by the top two phase bits of the next counter\. It’s either equal to the most significant bit, when there’s no clock offset, or the exclusive OR of the top two bits when there is\. ``` if (w_clk90) o_ckwide <= {(8){nxt_counter[NCTR-1] ^ nxt_counter[NCTR-2]}}; else o_ckwide <= {(8){nxt_counter[NCTR-1]}}; end ``` This leaves us with only one final signal: the current clock speed\. In this case, all the work has been done above, and nothing more need be done with it\. ``` always @(posedge i_clk) o_ckspd <= w_ckspd; ``` That’s the basic idea\. In summary: - There are four phases to the outgoing clock, either`0011`or`0110`\. - A counter generally helps us know when to transition from one phase to the next\. - High speeds get special attention\. - Data changes on the outgoing next edge signal,`o\_ckstb`\. In DDR modes, data can also change on the outgoing`o\_hlfstb`signal\. Key features of this approach include: - There’s no need for any[clock domain crossings](https://zipcpu.com/blog/2017/10/20/cdc.html)in the outgoing data path\. All outgoing signals are handled in the source clock domain\. - The clock may be gated at will, and \(re\)started quickly if necessary\. - Frequency changes are controlled, and will take place between clock periods\. - Although the clock is generated in logic, it doesn’t trigger any logic\. That is, nowhere in the design will anything in the outgoing logic path depend upon either`@\(posedge dev\_clk\)`or`@\(negedge dev\_clk\)`\. Instead, all of the logic is triggered off of the`o\_ckstb`or`o\_hlfstb`signals while still running on the same`src\_clk`we started from\. But … does it work? ## Simulation testing Just to get this clock generator off the ground, I built a[quick simulation test bench](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/verilog/tb_sdckgen.v)\. You can[find it here](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/verilog/tb_sdckgen.v), and we’ll walk through it quickly\. The first step was pretty boiler plate\. I simply started a VCD trace, placed the design into reset, and generated a 100MHz clock\. ``` initial begin $dumpfile("tb_sdckgen.vcd"); $dumpvars(0,tb_sdckgen); reset = 1'b1; clk = 0; forever #5 clk = !clk; end ``` For the second step, I wanted to place the design in a variety of configurations to see how it would work in each\. I chose to leave it in each configuration for five clock cycles before moving to the next\. I then defined a simple task,`capture\_beats`, that I could call to wait out five cycles of a given clock setting before moving on\. ``` task capture_beats; begin repeat(5) begin wait(w_ckstb); @(posedge clk); end end endtask ``` The last step, then, was to walk through one clock setting after another to see what would happen\. I started by taking the design out of reset, and configuring the inputs for a \(rough\) 100kHz clock\. ``` initial begin { cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h0fc; repeat (5) @(posedge clk) @(posedge clk) reset <= 0; // 100kHz (10us) capture_beats; ``` You can pretty well read the comments below to see the configurations I checked\. ``` // 200 kHz (5us) @(posedge clk) { cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h07f; capture_beats; // 400 kHz (2.52us) @(posedge clk) { cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h041; capture_beats; // 1MHz (1us) @(posedge clk) { cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h01b; capture_beats; // 5MHz (200ns) @(posedge clk) { cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h007; capture_beats; // 12MHz (80ns) @(posedge clk) { cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h004; capture_beats; // 25MHz (40ns) @(posedge clk) { cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h003; capture_beats; // 50MHz (20ns) @(posedge clk) { cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h002; capture_beats; // 100MHz @(posedge clk) { cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h001; capture_beats; // 200MHz @(posedge clk) { cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h000; capture_beats; // 25MHz, CLK90 @(posedge clk) { cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h103; capture_beats; // 25MHz, CLK90 @(posedge clk) { cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h102; capture_beats; // 100MHz, CLK90 @(posedge clk) { cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h101; capture_beats; // 200MHz, CLK90 @(posedge clk) { cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h100; capture_beats; $finish; end ``` These are basically all of the configurations I wanted to use the design with\. Using the generated trace, I can visually see all of the signals within this design working as intended\. Further, unlike the formal verification we’ll discuss next, I can actually see*many*clocks of this design\. This allows me to verify, for example, that the 100kHz, 200kHz, and 400kHz clock divisions work as designed\. Sadly, this test is woefully inadequate for any real or professional purpose\. The biggest problem with[this simple test bench script](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/verilog/tb_sdckgen.v)is that it’s not self checking\. I can run it, but the only way to know if the design did the right thing or not is to pull up a viewer and check the[VCD file](https://zipcpu.com/blog/2017/07/31/vcd.html)\. Sure, this might get me off the ground, but it is*horrible*for maintenance\. How should I know, for example, if a small and otherwise minor change breaks things? The second problem with[this test bench](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/verilog/tb_sdckgen.v)is that it does nothing to try out unreasonable input signals\. How shall I know, for example, that this design will never go faster than the fastest allowed frequency? That is, it should only ever be able to go as fast as the current speed, or the newly commanded speed\. Perhaps some of you may remember my comments on twitter about getting excited to try this new design as a whole \(not just the clock generator\) on an FPGA, only to be mildly \(not\) surprised that it didn’t work before all the formal proofs were finished? \(I couldn’t find them when I looked today …\) Yeah, there’s always a surprise you aren’t expecting that takes place when you work with real hardware\. So, while[this](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/verilog/tb_sdckgen.v)looks nice, and while the resulting traces look really pretty,[this test bench](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/verilog/tb_sdckgen.v)is highly insufficient\. Let’s move onto something more substantial\. ## Formal Properties I like to think of[this clock module](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)as a basic clock divider\. It’s not much more than a glorified counter, together with a 4\-state phase machine\. Yeah, sure, you can run through all 4 states in one clock cycle, but it’s still not really all that much more\. Formally verifying[this clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)should therefore be pretty simple\. One of the big keys to this proof is[the interface property set](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/formal/fclk.v)\. [I’ve discussed interface properties before](https://zipcpu.com/formal/2020/06/12/four-keys.html)\. The idea born from the fact that one component, such as[this clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v), is going to generate signals that another component, in this case[the transmit data generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdtxframe.v), will use\. Further, these two proofs will be independent of each other\. Hence, anything the[transmitter’s](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdtxframe.v)proof needs to assume should then be asserted in the[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)and vice versa\. That’s the purpose of the[property set](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/formal/fclk.v)\. The[property set](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/formal/fclk.v)\. also greatly simplifies the assertions found within the design itself\. Still, let’s look over the design assertions for now\. We’ll come back to the[property set](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/formal/fclk.v)in the next section\. We’ll start with the`f\_en`signal\. ``` initial f_en = 1'b1; always @(posedge i_clk) if (i_reset) f_en <= 1'b1; else if (nxt_clk) f_en <= !i_cfg_shutdown; ``` This just captures whether the clock should be shut down during the current cycle or not\. It’s that simple\. Many engineers just starting out with formal verification struggle to see past the assertions and the assumptions within the language to realize they can still use regular verilog when generating formal properties\. In this case,`f\_en`is nothing more than a register which we are going to use in our formal proof\. Nothing prevents you from doing this\. Indeed, you are more than able to write[more complicated state machines](https://zipcpu.com/formal/2019/02/21/txuart.html)when generating formal properties as well\. Just make sure that your new logic doesn’t make the same expresesions as the logic you are verifying, or you might convince yourself something works when it doesn’t\. When teaching, I like to explain this way: the best way to verify that`A`divided by`B`is`C`is to multiply`C`and`B`together\. If the result of the multiply is`A`, then you’ve verified your result\. Why does this work? Because you use different logic paths in your brain for division than you do for multiplication\. Hence, if you make a mistake in dividing, you aren’t likely to make the same mistake when multiplying\. The same is true of formal methods\. You can use logic in formal methods, just like you do in your design, you just don’t want to use the same logic lest your mind falsely convinces you its right when it isn’t\. This is sort of like having one witness to a murder called onto the stand twice under the same name\. Anyway, let’s move on\. The next step is to instantiate a copy of[the clock interface properties](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/formal/fclk.v)\. ``` fclk #( .OPT_SERDES(OPT_SERDES), .OPT_DDR(OPT_DDR) ) u_ckprop ( .i_clk(i_clk), .i_reset(i_reset), // .i_en(f_en), .i_ckspd(o_ckspd), .i_clk90(clk90), // .i_ckstb(o_ckstb), .i_hlfck(o_hlfck), .i_ckwide(o_ckwide), // .f_pending_reset(f_pending_reset), .f_pending_half(f_pending_half) ); ``` See how simply that was? In addition to the assertions within[this property set](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/formal/fclk.v),[the property set](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/formal/fclk.v)provides two output signals that we can use to connect the state of our design to the internal state of[the property set](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/formal/fclk.v)\. These signals are: - `f\_pending\_reset` This otherwise annoying signal is required for us to be able to handle the clock anomalies between reset and the first clock strobe\. This signal is set on a reset, and released once the clock gets started\. - `f\_pending\_half` This signal is simpler\. It simply means that we’ve seen the`new\_edge`\(`o\_ckstb`\) and not the`half\_edge`herein called`o\_hlfck`\. If`f\_pending\_half`is true, then the clock must generate`o\_hlfck`before it can generate`o\_ckstb`\. With these signals, we can express things like this: ``` always @(*) if (!i_reset && !o_hlfck && !o_ckstb && !f_pending_reset) assert(f_pending_half == (counter[NCTR-1:NCTR-2] < 2'b10)); ``` This helps us through long periods of time with neither`o\_hlfck`or`o\_ckstb`\. During this time,`f\_pending\_half`should be equivalent to the top two bits of our counter being either`2'b00`or`2'b01`\. Let’s look at some other assertions\. For example, if we shut the clock down, then we shouldn’t get any more new edges,`o\_ckstb`: ``` always @(posedge i_clk) if (f_past_valid) begin if ($past(!i_reset && i_cfg_shutdown)) begin assert(!o_ckstb); end ``` Now we can look at some of the specific options\. For example, the clock speed should only be zero \(200MHz\) if`OPT\_SERDES`is set\. While set to zero, either`o\_ckstb`should be set on every clock cycle or we should’ve received a clock shutdown request\. ``` if (ckspd == 0) begin assert(OPT_SERDES); assert(o_ckstb || $past(i_cfg_shutdown)); assert(counter == 0 ||counter == {2'b11,{(NCTR-2){1'b0}} }); end ``` Likewise, we should only ever be in a clock speed of 1 \(100MHz\) if either`OPT\_SERDES`or`OPT\_DDR`are set\. Further, if`OPT\_SERDES`is not set, we shouldn’t ever be implementing a 90 degree clock offset\. ``` if (ckspd == 1) begin assert(OPT_SERDES || OPT_DDR); if (!OPT_SERDES) begin assert(!clk90); end assert(counter == {2'b11,{(NCTR-2){1'b0}} }); end ``` A clock speed of two \(50MHz\) is available to all configurations\. In this case, the bottom bits–the non\-phase description bits–must always be zero\. ``` if (ckspd == 2) assert(counter == 0 || counter == {2'b01,{(NCTR-2){1'b0}} } || counter == {2'b10,{(NCTR-2){1'b0}} } || counter == {2'b11,{(NCTR-2){1'b0}} }); ``` Finally, in all other clock speeds, all we insist is that the lower bits of the counter be less than the clock speed minus three\. ``` if (ckspd >= 3) assert(counter[NCTR-3:0] <= (ckspd-3)); end ``` There are only two ways both`o\_ckstb`and`o\_hlfck`can be true at once\. The first is if the speed indicates either 200MHz or 100MHz\. The second is if the clock is stopped, and so the wide clock output is zero and a new clock is expected on the next clock cycle\. ``` always @(*) if (!i_reset && o_ckstb && o_hlfck) assert(ckspd <= 1 || (o_ckwide == 0 && nxt_clk)); ``` The difficult part of these assertions is that these aren’t enough to limit the output of the[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)\. Just to make certain the outputs are properly limited, I enumerate each together with the conditions they may be produced\. We’ll start with a zero output\. This can come from either a stopped clock, or one of two slow clock situations\. ``` always @(*) if (!i_reset) case(o_ckwide) 8'h00: if (nxt_clk) begin // A stopped clock assert(counter == {2'b11,{(NCTR-2){1'b0}} } || ckspd == 0); end else if(!clk90) begin // In slow situations with no offset assert(counter[NCTR-1] == 1'b0); end else if(clk90) begin // In slow (DDR) situations with a 90 degree clock offset assert(counter[NCTR-1:NCTR-2] == 2'b00 ||counter[NCTR-1:NCTR-2] == 2'b11); end ``` An output of`8'h0f`means we’re either in speed one with no clock offset and both clock edges active, or we’re in the first half of speed two\. ``` 8'h0f: assert((!clk90 && ckspd == 1 && o_ckstb && o_hlfck) ||(clk90 && ckspd == 2 && o_ckstb)); ``` An output of`8'hf0`can only mean we’re in the second half of speed two\. ``` 8'hf0: assert(clk90 && ckspd == 2 && !o_ckstb && o_hlfck); ``` An output of`8'hff`is common at slow speeds, but also completely determined by thee two top phase bits of the counter\. ``` 8'hff: if(!clk90) assert(counter[NCTR-1] == 1'b1); else assert(counter[NCTR-1:NCTR-2] == 2'b01 || counter[NCTR-1:NCTR-2] == 2'b10); ``` The last several outputs are very specific to their settings\.`8'h3c`is only possible in a speed of 1 with a 90 degree clock offset\. ``` 8'h3c: assert( clk90 && ckspd == 1 && o_ckstb && o_hlfck); ``` That leaves the two possible double\-clock outputs\. First, the double clock with no 90 degree offset\. ``` 8'h33: assert(!clk90 && ckspd == 0 && o_ckstb && o_hlfck); ``` The last possibility is the double clock with the 90 degree offset\. ``` 8'h66: assert( clk90 && ckspd == 0 && o_ckstb && o_hlfck); ``` Everything else is specifically disallowed\. ``` default: assert(0); endcase ``` ## Interface File While I might like to leave things there, a full proof of this[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)requires we go over the[formal interface file](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/formal/fclk.v)\. Remember, the purpose of the formal interface file is to separate two proofs\. In this case, we want to both formally verify the[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v), as well as the[transmitter data generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdtxframe.v)that will use the results of the[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)\. Further, unlike the[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v), the[transmitter data generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdtxframe.v)doesn’t really care if the signals to and from the[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)are realistic\. It only cares that they follow whatever rules it requires–things like either 1\) both`new\_edge && half\_edge`at the same time, or 2\) an alternating`new\_edge`with the`half\_edge`, and so forth\. You can find this[formal interface file](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/formal/fclk.v)among the other files associated with the formal proofs for this design\. Although it is written in Verilog, it’s not really something that could or would be synthesized\. For this reason I keep it in the`bench/formal`subdirectory of the project, rather than the`rtl/`subdirectory\. Starting at the top, our[property set](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/formal/fclk.v)must operate in at least three configurations: 1\) in an environment where the`wide\_clock`commands an 8:1 OSERDES, 2\) an environment where it commands an[ODDR](https://zipcpu.com/blog/2020/08/22/oddr.html)instead, or 3\) a simpler environment where neither option is available to us\. ``` module fclk #( parameter [0:0] OPT_SERDES = 1'b0, OPT_DDR = 1'b0 ) ( ``` Yes, we’ll need to run at least[3 formal proofs](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/formal/sdckgen.sby#L2-L4), one for each option, to make sure we’ve truly captured each option\. This, however, is just the price of doing business with configurable logic\. Our[formal properties](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/formal/fclk.v)will need the same inputs as the[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)\. The outputs of the[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)also need to be listed as inputs to this[property set](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/formal/fclk.v)\. While the[formal property set](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/formal/fclk.v)will primarily consist of assertions and assumptions, it will also produce two outputs–as discussed above\. These are necessary for making sure the[formal property set](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/formal/fclk.v)’s state is consistent with the internal state of the design\. ``` input wire i_clk, i_reset, // input wire i_en, input wire [7:0] i_ckspd, input wire i_clk90, // input wire i_ckstb, i_hlfck, input wire [7:0] i_ckwide, // output reg f_pending_reset, output reg f_pending_half ); ``` Some of you may recall the[challenges I’ve struggled through when trying to verify two co\-dependent components](https://zipcpu.com/formal/2018/12/18/skynet.html)\. My original approach was to[swap assumptions and assertions](https://zipcpu.com/formal/2018/04/23/invariant.html)between the two components\. This[didn’t work](https://zipcpu.com/formal/2018/12/18/skynet.html), primarily because it was possible for the resulting*assumptions*to render one or more assertions to be irrelevant or vacuous\. In that example, the logic of a design acted as an assumption as well\. In our case, we’re going to disconnect the two designs that will use this property set entirely\. The[clock generator \(the master\)](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)will make assertions that the[transmitter data generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdtxframe.v)will later assume, and vice versa\. To make this work, we’ll have the[SymbiYosys script](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/formal/sdckgen.sby)for the[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)define a`CKGEN`macro\. This will then tell us whether this property set is being used as part of the[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)’s proof, or the[transmitter data generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdtxframe.v)’s\. If a part of the[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)’s proof, we’ll make assertions about our outputs\. If a part of the[transmitter data generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdtxframe.v)’s proof, those “outputs” will now be inputs of the[transmitter data generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdtxframe.v), and so we should be making assumptions about them instead\. To do this, we’ll create a macro,`SLAVE\_ASSUME`, that can be used to describe properties of these outputs with either`assert`or`assume`statements\. ``` `ifdef CKGEN `define SLAVE_ASSUME assert // Clock generator proof `else `define SLAVE_ASSUME assume // Transmit data generator proof `endif ``` The next step is boiler plate: create an`f\_past\_valid`register to let us know if we can use the`$past\(\)`function or not\. \(Remember,`$past\(\)`s value is invalid on the first clock of any proof\.\) ``` reg f_past_tick, f_past_valid; reg last_reset, last_en, last_pending; reg [7:0] last_ckspd; initial f_past_valid = 0; always @(posedge i_clk) f_past_valid <= 1; ``` Likewise,`f\_pending\_reset`, will be true between the`i\_reset`signal and the first clock edge\. ``` initial f_pending_reset = 1'b0; always @(posedge i_clk) if (i_reset) f_pending_reset <= 1'b1; else if (i_ckstb || i_hlfck) f_pending_reset <= 1'b0; ``` Our second output,`f\_pending\_half`, is true from the top of the clock to the second half of the clock, but*only*if the top of the clock didn’t include the`half\_edge`signal \(called`i\_hlfck`herein\)\. ``` initial f_pending_half = 1'b0; always @(posedge i_clk) if (i_reset) f_pending_half <= 1'b0; else if (i_ckstb) f_pending_half <= !i_hlfck; else if (i_hlfck) f_pending_half <= 1'b0; ``` A third signal,`f\_past\_tick`, will allow us to reason about whether or not we just passed an edge\. We’ll get to this one in a bit\. ``` initial f_past_tick = 0; always @(posedge i_clk) f_past_tick <= i_ckstb || i_hlfck; ``` Now that we have these two signals, we can state with a certainty that we can’t start a new clock cycle while waiting for the second half of a clock cycle\. Likewise, if we are in second half of a clock cycle, we shouldn’t see the half edge again unless we’re starting a new \(and high speed\) clock\. ``` always @(posedge i_clk) if (!i_reset && !f_pending_reset) begin if (f_pending_half) `SLAVE_ASSUME(!i_ckstb); else if (i_hlfck) `SLAVE_ASSUME(i_ckstb); end ``` Now, with this as background, we can now make assertions about our various clock speeds, and the outputs that should be produced in each\. Note that in this[formal property set](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/formal/fclk.v), the`i\_ckspd`input reflects our*current*clock speed, and not just the*requested*clock speed that we worked with in the[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)\. Hence, it is an*output*of the generator[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v), and no longer the requested clock speed\. Let’s start with the highest speed \(200MHz\) clock output\. ``` always @(posedge i_clk) if (!i_reset) case(i_ckspd) 0: begin // We can only run in this speed if OPT_SERDES is set. `SLAVE_ASSUME(OPT_SERDES); // This speed has no pending half cycles. All clock cycles // are complete in one cycle. `SLAVE_ASSUME(f_pending_reset || !f_pending_half); if (i_ckwide == 0) begin // Clock is either *off*/inactive, or we're still coming // out of a reset. `SLAVE_ASSUME(f_pending_reset || (!i_ckstb && !i_hlfck)); end else begin // Clock is active, both edges are active in a clock // tick `SLAVE_ASSUME(i_ckstb && i_hlfck); end ``` The`wide\_clock`output, herein called`i\_ckwide`, can only have one of two values when active at this speed\. ``` if (i_clk90) begin // In the case of a 90 degree offset clock, if the // clock is active, it must be 0110_0110 `SLAVE_ASSUME(i_ckwide == 0 || i_ckwide == 8'h66); end else begin // Otherwise, if the clock is active, it must be // 0011_0011 `SLAVE_ASSUME(i_ckwide == 0 || i_ckwide == 8'h33); end end ``` Those are just the rules for 200MHz \(assuming a 100MHz system clock\)\. Now let’s drop down a speed, and look at the 100MHz clock\. In this mode, the new edge and half edge signals must also be present on the same clock\. Likewise, there’s no allowable means to have a pending second half–the first and second half must always show up on the same clock cycle\. ``` 1: begin if (i_ckwide == 0) begin `SLAVE_ASSUME(f_pending_reset || (!i_ckstb && !i_hlfck)); end else begin `SLAVE_ASSUME(i_ckstb && i_hlfck); end if (!f_pending_reset) `SLAVE_ASSUME(!f_pending_half); ``` At 100MHz, the outgoing wide clock can only be`0011\_1100`\(90 degree offset\), or`0000\_ffff`\. The former requires`OPT\_SERDES`, the latter may also be possible in`OPT\_DDR`mode–since the first four bits equal the last four bits\. ``` if (i_clk90) begin `SLAVE_ASSUME(i_ckwide == 0 || i_ckwide == 8'h3c); `SLAVE_ASSUME(OPT_SERDES); end else begin `SLAVE_ASSUME(i_ckwide == 0 || i_ckwide == 8'h0f); `SLAVE_ASSUME(OPT_SERDES || OPT_DDR); end end ``` Our last special clock speed is 50MHz\. For this case, we break our properties into two parts: the 90 degree offset, and the normal \(SDR\) case\. For the 90 degree offset clock, the clock must either be`0000\_1111`if we’re not waiting on the next half clock cycle, or`1111\_0000`if we are\. Likewise, either the new or half edge signal must be true on every cycle\. The only exception is for if/when the clock is stopped\. Further, this output will require either`OPT\_SERDES`or`OPT\_DDR`\. ``` 2: begin if (i_clk90) begin `SLAVE_ASSUME(i_ckwide == 0 || i_ckwide == 8'h0f || i_ckwide == 8'hf0); if (i_en) begin `SLAVE_ASSUME(i_ckwide != 0); end `SLAVE_ASSUME(OPT_SERDES || OPT_DDR); if (!f_pending_reset && f_pending_half) begin `SLAVE_ASSUME(i_ckwide == 8'hf0); end if (i_ckwide == 8'h00) begin `SLAVE_ASSUME(!i_ckstb && !i_hlfck); end else if (i_ckwide == 8'h0f) begin `SLAVE_ASSUME(i_ckstb); end else begin `SLAVE_ASSUME(i_hlfck); end ``` The normal offset is simpler\. This doesn’t require`OPT\_SERDES`or`OPT\_DDR`\. The wide clock can either be`0000\_0000`or`1111\_1111`\. Further, if ever the clock output is`1111\_1111`, then we must be on the second half edge\. ``` end else begin `SLAVE_ASSUME(i_ckwide == 0 || i_ckwide == 8'hff); if (i_ckwide == 8'hff) `SLAVE_ASSUME(i_hlfck); end end ``` This brings us to the default clock–the very slow clock generated by integer division \(i\.e\. the counter\)\. As before, the wide clock can either be`0000\_0000`or`1111\_1111`and hence needs no special hardware such as either`OPT\_SERDES`or`OPT\_DDR`\. ``` default: begin `SLAVE_ASSUME(i_ckwide == 0 || i_ckwide == 8'hff); if (!f_pending_reset && !i_clk90 && last_en && i_en) begin if (i_ckstb) begin `SLAVE_ASSUME(i_ckwide == 8'h00); end else if (i_hlfck) begin `SLAVE_ASSUME(i_ckwide == 8'hff); end else if (f_pending_half) begin `SLAVE_ASSUME(i_ckwide == 8'h00); end else // if (!f_pending_half) `SLAVE_ASSUME(i_ckwide == 8'hff); end end endcase ``` Just as a quick sanity check, if we have no special hardware, then both new and half edges can never be true on the same cycle\. ``` always @(posedge i_clk) if (!OPT_SERDES && !OPT_DDR) assert(!i_ckstb || !i_hlfck); ``` Let’s come back and double check the high speed cases\. These are the only cases where both new and half edge may be allowed at the same time\. In all other cases, one or both signals should be zero\. ``` always @(posedge i_clk) if (f_past_valid && !last_reset && (last_en || i_ckstb || i_hlfck)) begin case(i_ckspd) 0: `SLAVE_ASSUME(!i_en || (i_ckstb && i_hlfck)); 1: `SLAVE_ASSUME(!i_en || (i_ckstb && i_hlfck)); default: `SLAVE_ASSUME(!i_ckstb || !i_hlfck); endcase end ``` Feel free to check the[property set](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/bench/formal/fclk.v)out yourself\. While there are a couple more properties to it, these are the most significant\. ## Coverage Checking Any good verification set should include not just a simulation, not just formal induction based proofs, but also a set of coverage checks\. These are critical to making sure you haven’t \(accidentally\) assumed away some key component of the devices operation\. Were that to happen, then the formal proof would be irrelevant–even if it did pass\. Hence, we add some cover properties here to the[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)\. The first step is just to check if the clock is active, and if so, what mode it is active in\. ``` reg cvr_active, cvr_clk90; reg [7:0] cvr_spd, cvr_count; always @(posedge i_clk) if (!cvr_active) begin cvr_spd <= i_cfg_ckspd; cvr_clk90 <= i_cfg_clk90; end initial cvr_active = 0; always @(posedge i_clk) if (i_reset) cvr_active <= 1'b0; else if (cvr_spd != o_ckspd || cvr_spd != i_cfg_ckspd || !f_en || cvr_clk90 != i_cfg_clk90 || cvr_clk90 != clk90) // We want to prove what our clock output can do over // time, not so much what happens when/if it changes. cvr_active <= 0; else if (o_ckstb) cvr_active <= 1; ``` If the clock is active, we can then start counting every new edge that takes place while active\. ``` always @(posedge i_clk) if (i_reset || !cvr_active) cvr_count <= 8'b0; else if (o_ckstb && !(&cvr_count)) // Don't allow the counter to overflow, but otherwise // count the beginnings of each clock cycle. cvr_count <= cvr_count + 1; ``` With that as background, we can start looking at traces\! Let’s get cover traces for a variety of potential frequencies\. ``` always @(posedge i_clk) if (!i_reset) begin cover(cvr_spd == 2 && !clk90 && cvr_count > 2); // 50MHz cover(cvr_spd == 3 && clk90 && cvr_count > 2); // 25MHz cover(cvr_spd == 3 && !clk90 && cvr_count > 2); cover(cvr_spd == 4 && clk90 && cvr_count > 2); // 12MHz cover(cvr_spd == 4 && !clk90 && cvr_count > 2); cover(cvr_spd == 5 && clk90 && cvr_count > 2); // 8MHz cover(cvr_spd == 5 && !clk90 && cvr_count > 2); cover(cvr_spd == 6 && clk90 && cvr_count > 2); // 6MHz cover(cvr_spd == 6 && !clk90 && cvr_count > 2); end ``` We’ll have to handle covering the high speed options a bit differently\. In this case, we*only*want to check speeds requiring`OPT\_SERDES`if`OPT\_SERDES`is actually checked\. We can’t use an`if`for this, lest the formal tool decide we failed the cover check\. Hence, we’ll use a generate statement, so that the cover statements requiring`OPT\_SERDES`are*only*generated if`OPT\_SERDES`is true\. Now we can check for 200MHz, 100MHz, and 50MHz\. ``` generate if (OPT_SERDES) begin : CVR_SERDES always @(posedge i_clk) if (!i_reset) begin cover(cvr_spd == 0 && clk90 && cvr_count > 5); cover(cvr_spd == 1 && clk90 && cvr_count > 5); cover(cvr_spd == 1 && !clk90 && cvr_count > 5); cover(cvr_spd == 2 && clk90 && cvr_count > 5); cover(cvr_spd == 2 && !clk90 && cvr_count > 5); end ``` We can apply the same logic to`OPT\_DDR`, but we’ll have fewer clock options to check\. In this case, it’s only the 100MHz and 50MHz options\. ``` end else if (OPT_DDR) begin : CVR_DDR always @(posedge i_clk) if (!i_reset) begin cover(cvr_spd == 1 && !clk90 && cvr_count > 5); cover(cvr_spd == 2 && clk90 && cvr_count > 5); cover(cvr_spd == 2 && !clk90 && cvr_count > 5); end end endgenerate ``` By the time you get to this point, you should have a strong confidence that[this device clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)actually does what it needs to\. I certainly do, and it hasn’t failed me \(that I recall\) since going through this exercise\. Yes, other parts of this design have had problems, particularly the[front end](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdfrontend.v), but the[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)has been quite reliable\. ## Conclusions This is now my go\-to approach whenever I need to generate a device clock: - Generate the “clock” in logic\. - Generate the “clock” wide, so it can be output via either OSERDES or[ODDR](https://zipcpu.com/blog/2020/08/22/oddr.html)\. - Maintain all logic transitions on the original source clock\. - Use logical signals like you would enables to handle data transitions\. What did this gain us? We received several advantages from this approach: - A glitchless outgoing clock - An outgoing clock that can … - change frequency upon command, - turn on and off as necessary, - stop, and yet restart on a dime, and - switch between being data aligned and offset by 90 degrees\. This is everything we would want of an outgoing clock, with none of the challenges associated with breaking[*the rules*](https://zipcpu.com/blog/2017/08/21/rules-for-newbies.html)\. Indeed, this approach works nicely in both FPGA and ASIC contexts, as I’ve now used it quite successfully in both for multiple projects\. No, I don’t use the same[clock generator](https://github.com/ZipCPU/sdspi/blob/a1d912367ce71389ef25ced4b83d34d23b05b391/rtl/sdckgen.v)for all my projects, but that’s for both requirements \(the 200MHz clock is unique\) and[legal reasons](https://zipcpu.com/blog/2020/01/13/reuse.html)\. This leaves us with the topic of the “return clock”, which we’ll need to come back to and discuss on another day\.

Similar Articles

The hearts of the Super Nintendo

Fabien Sanglard

A detailed examination of the clock generation hardware in the Super Nintendo, explaining the two oscillators and the adjustable capacitor used to produce the required frequencies.

External Clock Generation on RTX 50 Series

Hacker News Top

Overclockers from Xtreme Systems have developed an external clock generation technique for the RTX 5090 using the Elmor External Clock Board (ECB), bypassing Nvidia's software-imposed limitations on VRAM and crossbar clock speeds. The method, still a work in progress, involves hardware-level signal injection to push GPU clocks beyond factory limits, yielding notable benchmark improvements.

Designing a Scientific Calculator from scratch in FPGA

Lobsters Hottest

A detailed blog series documenting the design and implementation of a scientific calculator from scratch using FPGA, covering numerical methods, CPU architecture, microcode, and hardware prototyping.

Rust async and the ARM generic timer

Lobsters Hottest

A technical blog post exploring Rust async programming on ARM architecture using the ARM generic timer, comparing timer peripherals and discussing frameworks like Embassy and RTIC.

The Return of Rigorous Full-System Timing Simulation

Hacker News Top

This blog post argues for a return to rigorous full-system timing simulation in computer architecture to overcome the 'timing simulation wall' and accurately capture modern system behaviors, advocating for measuring the right execution intervals with statistically sound methods rather than simulating everything in detail.