GUIDELINE
1.1.1INTRODUCTION
ASIC design is becoming more complex due to more and more IP integrated in a chip, and data is frequently transferred from one clock domain to another domain. Clock domain crossing issue becomes more and more important vector in a multi-clock, stable work chip.
This document mainly introduce below topics:
a. Where will occur CDC;
b. What problem will occur due to CDC issue;
c. How to design CDC logic correctly.
1.1.2APPLICATION AREA
In a multi-clock design, clock domain crossing occurs whenever data is transferred from a flop driven by one clock to a flop driven by another clock. As it is shown in Figure 1-1,
Figure 1-1 Clock domain crossing
*Note: definition of terminology:
Source clock: Clock A in figure 1-1 is defined as source clock;
Destination clock: Clock B in figure 1-1 is defined as destination clock;
Source clock domain: All the logic design whose reference clock is Clock A, like flip-flop FA in figure 1-1;
Destination clock domain: All the logic design whose reference clock is Clock B, like flip-flop FB in figure 1-1;
1.1.3PROBLEM DEFINITION
Meta-stability, glitch, multi-fanout and re-convergence may occur in an asynchronous design, they may cause design entering an un-anticipant state and result in function error.1.1.3.1 Meta-stability
Signal propagate cross asynchronous domains may create meta-stability if setup or hold time violation occurred, shown as figure 1-2.
Figure 1-2 Meta-stable issue
1.1.3.2 Glitch
Logic in the synchronization path result in glitches due to propagation delays, these glitches may get latched and result in false pulses at the synchronizer output, shown as figure 1-3.
*Note:
synchronization path: below path can be defined as synchronization path,
1. Path from source clock domain to destination clock domain, such as the path from Q
of DA1/DA2 to the D of DB1 in figure 1-3;
2. Path from Q to D of two flip-flops in destination clock domain, such as the path from Q
of DB1 to D of DB2 in figure 1-3.
Figure 1-3 Glitch issue1.1.3.3 Multi-fanout
Multi-fanout on the synchronization path may result in different value at the synchronizer output due to different propagation delay, shown as figure 1-4.
Figure 1-4 Multi-fanout issue
1.1.3.4 Re-convergence(信号重汇聚)
Re-convergence signals after synchronization may result in functional error, as it is shown as figure 1-5.
Figure 1-5 Re-convergence issue
Re-convergence logic is a special CDC issue which need logic designer pay more attention to, because all of CDC issue except Re-convergence, like meta-stable, multi-fanout and glitch, could be detected by CDC checking tool(for example cadence’s CONFORMAL). However, some of Re-convergence issue is so complex that it is hard to check by tools, take deep re-convergence issue for example, which is shown as figure 1-6.
Figure 1-6, Deep re-convergence issue
CLK_A: rhflrhflrhflrhflrhflrhflrhflrhflr
CLK_B: llrflrflrflrflrflrflrflrflrflrflrf
D0_A: rhhhflllllllllllllllllllllllllllll
D0_B: rhhhflllllllllllllllllllllllllllll
D0_C: rhhhflllllllllllllllllllllllllllll
D0_D: rhhhflllllllllllllllllllllllllllll
D1_A: llllrhhhflllllllllllllllllllllllll
D1_B: llllrhhhflllllllllllllllllllllllll
D1_C: lllllrhhhfllllllllllllllllllllllll
D1_D: lllllrhhhfllllllllllllllllllllllll
D2_A: llllllllrhhfllllllllllllllllllllll
D2_B: llllllllrhhfllllllllllllllllllllll
D2_C: lllllllllllrhhflllllllllllllllllll
D2_D: lllllllllllrhhflllllllllllllllllll
D3_E_syn: lllllllllllrhhflllllllllllllllllll
D3_F_syn: llllllllllllllrhhfllllllllllllllll
D3_G_syn: llllllllllllll llllll llllllllllllll
Figure 1-7 Timing diagram of deep re-convergence
From the figure 1-6, it can find that the circuit has two levels re-convergence, the first level re-convergence gate is flip-flop marked with blue color, and the second is flip-flop marked with red color. For cadence CDC tool, it would identify first level re-convergence and report them, but it can not identifysecond level re-convergence. Whichever level re-convergence occurred, it may result in functional error, as it is shown as figure 1-7. It is tool limitation. 1.1.4IMPLEMENTATION
This chapter introduces several logic schemes to design clock domain crossing logic correctly. These schemes could keep data transferring stably between different clock domains. Nearly all the clock domain crossing issue could be avoided if designer follow the design scheme introduced in the chapter.
1.1.4.1 Two synchronizer scheme
For one bit signal cross different clock domain, a general solution is using two flip-flops to sync two cycles in destination clock domain, but its pre-condition is the signal from source clock domain should hold long enough for destination clock to sample, in other words, the frequency of clock A should be less than clock B. Its circuit can be shown as figure 1-8:
Figure 1-8 two flip-flop sync
When use two synchronizer scheme, designer should keep no combinational cell (except inverter and buffer) in CDC path(*note), otherwise, glitch issue shown in figure 1-3 and multi-fanout issue shown in figure 1-4 may occurs. Note:
CDC path: the path from Q of flip-flop in source clock domain to the D of flip-flop in destination clock domain, for example, the path from FA/Q to FB1/D in the figure 1-8.
As a example of two synchronizer application, figure 1-9/1-10 take glitch and multi-fanout issue for example, shows how to design these logic.
Figure 1-9 solution of glitch issue
Figure 1-10 solution of multi-fanout issue
1.1.4.2 MUX structure sync scheme
For multi-bit signals cross among different clock domain, MUX scheme(shown as figure 1-11) can be used to keep clock domain crossing correctly.
MUX scheme adapts to the logic design that, a group of data would transfer form one clock domain to another, and there is a marked signal who indicates data stability when it assert, like Sel in the figure 1-11. Data from source clock keep unchanged until its marked signal detected and two filp-flops synced in destination clock domain, see figure 1-12 for detail.
Figure 1-11 MUX structure scheme
Clock A: lrhflrhflrhflrhflrhflrhflrhflrhflrh Clock B: lrfrfrfrfrfrfrfrfrfrfrfrfrfrfrfrfr
Data : *ddddddddddddddxddddddddddddddddddx
Sel : lllrhhhfllllllllllllrhhhfllllllllll Sel_a : lllllrhhhflllllllllllrhhhflllllllll Sel_b : llllllllrhhhflllllllllllrhhflllllll
Dmux_o:“”””””””xdddddddddddddddxdddddddddd
Dout :“”””””””””*ddddddddddddddd*dddddddd
Figure 1-12 MUX scheme timing diagram
1.1.4.3 Handshake scheme
Handshake is a design based on a protocol that, source clock domain assert request to destination clock domain and will not stop request until it gets the grant from destination clock domain; Destination clock domain receives the request, and continuously asserted grant till it finds the request from source clock domain is de-asserted. Handshake design can be a simple feedback synchronizer, or full handshake or half handshake.
1.1.4.3.1 Feedback synchronizer
Figure 1-13 shows a logic implementation circuit of feedback synchronizer. For this circuit, there is no limitation between the frequency of clock A and clock B, frequency of clock A could be less than clock B, or more than clock B. However, this circuit adopt to the case that, signal A is consisted of 1T pulse signal and the time from current pulse de-asserted to next pulse asserted is more than (2 clock A + 2 clock B) time.
5aa5 a55a 5aa5 a55a 5aa5 a55a
Figure 1-13 full feedback synchronizer logic
A timing diagram in figure 1-14 shows the work process of full feedback synchronizer circuit.
CLKA : lcccccccccccccccccccccccccccccccccc A: lrflllllllrflllllllllllllllllllllll FA0/Q: llrflllllllrfllllllllllllllllllllll FA1/D:
llrhhhhhhhfrhhhhhhhflllllllllllllll
FA1/Q: lllrhhhhhhhfrhhhhhhhfllllllllllllll FA2/Q: llllllllrhhhhfllrhhhhflllllllllllll FA3/Q: lllllllllrhhhhfllrhhhhfllllllllllll CLKB :lrfrfrfrfrfrfrfrfrfrfrfrfrfrfrfrfrf FB1/Q:lllllrhhhhhflrhhhhhflllllllllllllll FB2/Q:lllllllrhhhhhflrhhhhhflllllllllllll FB3/Q:lllllllllrhhhhhflrhhhhhflllllllllll B:lllllllllrhhhhhflrhhhhhflllllllllll
Figure 1-14 full feedback synchronizer timing
Figure 1-15 shows a simple feedback synchronizer logic which may usually
use in logic design, and figure 1-16 is its timing diagram.
Figure 1-15 Simple feedback synchronizer logic
Figure 1-16 Simple timing diagram of feedback synchronizer
1.1.4.3.2 REQ-GNT scheme
Figure 1-17 shows the handshake implementation principle. Request and grant transfer between Tx and Rx need to double synchronization, and data need to hold until Tx receive grant disable.
Figure 1-17 handshake implementation
Full handshake flow(see figure 1-18):
1. Tx asserts request signal (REQ) to Rx;
2. When Rx receives request (RE Q _B1), it asserts grant signal (GNT) to Tx;
3. When Tx detect grant signal (GNT_A1), it dis-asserts its request;
4. Finally, Rx detect request dis-assert, it also dis-assert its grant;
5. When Tx detect grant dis-assert, this transfer finish.
Figure 1-18 Full handshake timing diagram
Full handshake will occupy five clock cycles in Tx domain and six clock cycles in Rx clock domain. Full handshake is a safest mode to implement data transfer for Tx and Rx always know status each other.
Half-handshake is same as full handshake except Tx/Rx dis-assert their request/grant signals before receiving response.
When data burst transfer between two clock domain, FIFO synchronizer is a common solution.
Figure 1-19 show the implementation diagram of FIFO synchronizer, it include dual-port RAM, write/read control module and two flop-flops synchronizer.
In the implementation of FIFO synchronizer, the key issue is FIFO status indicator signal generation, they are full, half full, empty, half empty. Generally, write/read pointer use gray code, then they can use two flip-flops synchronizer to sync directly.
wclk wfull
winc wdata
rclk
rempty
rinc
rdata Figure 1-19 Dual-clock asynchronous FIFO
2n-Queue FIFO is a typical example of asynchronous FIFO, who is a small FIFO whose depth is 2n, any value can be assigned to n according to design need. Theoretically, 2n-Queue FIFO can stably transfer signal in a logic whose frequency of source clock is n times as fast as destination clock.
Detail verilog code implementation of 2n-Queue FIFO can be seen Appendix A (2n-Queue FIFO implementation code).
1.1.5LAYOUT GUIDE
No
1.1.6LIMITATION
No
1.1.7APPENDIX A
2n-Queue FIFO implementation code:module USBD_ASFF2(RST_, ICLK, OCLK, DIN, FULL, OUTE, OUTD) ; input RST_ ;
input ICLK ; //Write domain clock
input OCLK ; //Read domain clock
input DIN ; //write active indicator
output FULL;
output OUTE,OUTD;
parameter PTR_1BIT=1;
reg[PTR_1BIT:0] WR_PTR,RD_PTR ; ///read/write address, gray code reg FULL ;
wire [PTR_1BIT:0] Wr_Ptr_Nxt ;
wire [PTR_1BIT:0] Rd_Ptr_Nxt ;
wire [PTR_1BIT:0] Rd_Ptr_Wclk ; //read
wire [PTR_1BIT:0] Wr_Ptr_Rclk ;
reg EMPTY;
/************************
* define gray count
************************/
parameter gray0 = 2'b00,
gray1 = 2'b01,
gray2 = 2'b11,
gray3 = 2'b10;
/***********************************************
* translation of signals ==> RD connects to the EMPTY for
***********************************************/
wire RD = !EMPTY;
/***********************************************
* Write point
* - when DIN is active, go to next gray count
***********************************************/
// always@(WR_PTR or DIN) Wr_Ptr_Nxt <= #1 gray_inc(WR_PTR) ; // chad;
assign Wr_Ptr_Nxt = gray_inc(WR_PTR);
always@(posedge ICLK or negedge RST_ ) begin
if( !RST_ ) begin
WR_PTR <= 2'b0 ;
end else if( DIN) begin
WR_PTR <= Wr_Ptr_Nxt ;end
end
/***********************************************
* Read point
* - when RD is active, go to next gray count
***********************************************/
// always@(RD_PTR or RD ) Rd_Ptr_Nxt <= #1 gray_inc(RD_PTR) ;
// chad;
assign Rd_Ptr_Nxt = gray_inc(RD_PTR);
always@(posedge OCLK or negedge RST_ ) begin
if( !RST_ ) begin
RD_PTR <= 2'b0 ;
end else if( RD ) begin
RD_PTR <= Rd_Ptr_Nxt ;
end
end
/******************************************
* Full condition
* - sync read pointer to write clock domain
******************************************/
USBD_CDCS DNT_FL1(.Q(Rd_Ptr_Wclk[1]),.D(RD_PTR[1]),.CK(ICLK),.R(RST_)); USBD_CDCS DNT_FL0(.Q(Rd_Ptr_Wclk[0]),.D(RD_PTR[0]),.CK(ICLK),.R(RST_));
wire NFull= DIN &
(Wr_Ptr_Nxt[0]==(~Rd_Ptr_Wclk[0]) & Wr_Ptr_Nxt[1]==(~Rd_Ptr_Wclk[1])); wire FullX = NFull |
(WR_PTR[0]==(~Rd_Ptr_Wclk[0]) & WR_PTR[1]==(~Rd_Ptr_Wclk[1]));
always@(posedge ICLK or negedge RST_ )
if( !RST_ ) FULL <= 1'b0 ;
else FULL <= FullX ;
/***********************************************
* EMPTY
* - sync write pointer to read clock domian
***********************************************/
USBD_CDCS DNT_EP1(.Q(Wr_Ptr_Rclk[1]),.D(WR_PTR[1]),.CK(OCLK),.R(RST_)); USBD_CDCS DNT_EP0(.Q(Wr_Ptr_Rclk[0]),.D(WR_PTR[0]),.CK(OCLK),.R(RST_));
wire NEMPTY = (Rd_Ptr_Nxt == Wr_Ptr_Rclk) & RD ; wire EMPTYX = (RD_PTR == Wr_Ptr_Rclk) ;
wire EMPTYD = NEMPTY | EMPTYX ;
always@( posedge OCLK or negedge RST_ ) begin if( !RST_ ) EMPTY <= 1'b1 ;
else EMPTY <= EMPTYD ;
end
wire OUTD = !EMPTYD;
wire OUTE = !EMPTY;
function[PTR_1BIT:0] gray_inc ;
input[PTR_1BIT:0] gray ;
begin
case(gray) // synopsys parallel_case full_case
gray0 : gray_inc = gray1 ;
gray1 : gray_inc = gray2 ;
gray2 : gray_inc = gray3 ;
gray3 : gray_inc = gray0 ;
default:gray_inc = 2'bxx;
endcase
end
endfunction
endmodule
module USBD_CDCS(R, CK, D, Q);
input R;
input CK;
input D;
output Q;
reg Q;
reg QX;
always @ (negedge R or negedge CK)
begin
if (~R)
QX <= 1'b0;
else
QX <= D;
endalways @ (negedge R or posedge CK) begin
if (~R)
Q <= 1'b0;
else
Q <= QX;
end
endmodule