Design and optimization of CMOS based 4-bit comparator

. The paper primarily focuses on optimizing circuit delay and energy consumption, specifically in gate-level circuits. Three methods are employed for circuit optimization. The first method aims to minimize transistor usage to reduce both delay and energy consumption. The second method involves prioritizing logic gates based on underlying hardware, favoring simpler circuit structures whenever possible, given that our design primarily revolves around logic gates. The third method entails adjusting the number of stages to enhance delay optimization. To validate these rules, three distinct circuits were designed to implement a 4-bit absolute value comparator, each corresponding to one of the rules. Through simulation, calculation, and comparison, the best circuit was identified, providing validation for the rules. The second part of the paper shifts its focus towards optimizing delay and energy consumption by adjusting the sizing of logic gates and the supply voltage to achieve optimal overall performance. In conclusion, further research is needed to corroborate these three rules and identify additional rules, laying the foundation for intelligent circuit optimization.


Introduction
In industry, Moore's Law has a vital stage: every 18 to 24 months, when the price remains the same, while the number of transistors on the chip doubles, the performance will also double the increasing integration of the chip, the delay, power consumption and area are always the three most critical factors in determining the circuit performance [1].A 4-bit Absolute-Value Detector is a digital circuit, which realizes the function that when given a 4-bit input number in 2's complement format, the circuit should be able to generate its absolute value, then compare it to a specified threshold value, and finally provide a Boolean value in which 1 represents the input is greater than the threshold value and 0 for the opposite [2].
It has significant and vast applications in the industry [3].In terms of signal processing and measurement, it can quickly and accurately measure the input signal, especially in audio, video, sensor signal processing and other fields.In terms of alarm and fault detection, it can be used to detect whether the signal is beyond the preset threshold range to ensure the normal operation and security of the signal.In the control field, it can be used to implement the PID algorithm to provide stable and accurate comparative results by comparing the size differences between the input signal and the preset target values [4].In fact, its applications are more than the examples above.The working logic of 4-bit absolute detector is shown in figure 1.As is shown, there are two main components to the absolute-value detection.The first part is the absolute value circuit and the other one is comparator.Compared to TTL, the power consumption of CMOS is much smaller, but the speed of CMOS is inferior to TTL in most cases [5].As a result, when this paper designs a circuit based on CMOS, delay is relatively more important factor to influence the overall performance of circuit than this consumption.How to design a CMOS based circuit with shorter delay become the main problem in the initial stage.

Boolean circuit or functional circuit
Firstly, to achieve the absolute value circuit, there are two options.A functional circuit is a circuit that can be divided into functional blocks, and each functional block is interconnected and cooperates to realize the function.Boolean circuit means it is comprised of pure logic gates and the entire circuit an't be split into smaller unit which can realize a segmented function.When this paper design a 4-bit absolute value circuit, on one hand, as a functional circuit, the basic logic is that if the input value is negative, which means A [3] is equal to 1, flip A [2:0] and add 1 to it to get the absolute value of the negative value, and that if the input value is positive, the absolute value is equal to A [2:0].Then the absolute value circuit can be split into a selector, 3 inverters and 3-bit adder.On the other hand, as a Boolean circuit, all that needs to be done is to list the truth table, then simplify it with Karnaugh map and draw down the circuit with AND-OR gate.Finally, the absolute value circuit can be implemented through the 4-stage AND-OR gate circuit.
If draw critical path, it can be found the function one is much longer than the Boolean one.This is because in the process of implementation through the adder, because a number is fixed, a large number of doors is redundant, can be optimized, but in the process of implementation by pure logic circuit, in the case of results, each door will not overlap, each branch points to the unique result.Therefore, in this case, it is more recommended to use the pure logic circuit to achieve the absolute value circuit.So, the design of absolute value circuit is shown as figure 2 below.In the comparator circuit, there are 6 input signals.If the pure logic circuit is used, due to the input signal is more, the circuit series is more and more difficult to simplify.However, when the functional circuit is used, the three-bit comparator is divided into three one-bit comparators.If the upper level is equal, the next level is compared until the size is compared.As seen in the figure 3, each small unit has a simple structure and a clear function.Z represents the result of comparison, while carry represents whether the next 1-bit comparator works.If Y>T, Z=1.If not, Z=0.If Y=T, carry=1.If not, carry=0.In this case, it is more recommended to implement the comparator with functional circuits.Therefore, it is not difficult to get this conclusion that when designing a circuit, specific analysis should be done, finally to simplify the circuit and make the logic gate less.If the number of input signals is small (less than 4) and the outputs are clear, pure logic circuit should be preferred, which is shown in figure 4.  If there are many input signals and the overall circuit can be split into simple and repeated unit, functional circuit may be better.

The model of delay
From the basic concept of RC, this paper can get Then this paper assumes 3 variables ,, d f p .d is the normalized delay.f is known as the effort delay.p is the known as the parasitic delay [6].So, this paper can get Next, this paper makes d normalized to the delay of a FO-1 inverter (no self-load).With 0 gate RR = , d = fanout + normalized parasitic, so f is essentially equivalent to fanout.d is a variable that is independent of process, voltage, and temperature.So, this paper can get Then this paper can list the formula of p t according to the figure 5.
As a result, if this paper normalizes the delay to fictitious "technology time constant", this paper can get Assume that Finally, this paper can get

Comparison of logic gate delays
From the circuit topology, it can be found that an AND gate is equivalent to a circuit a NOT gate and a NOT gate, while an OR gate is equivalent to an NOR gate and an inverter, so the priority of the NAND gate is higher than that of the AND gate, and similarly, the priority of an NOR gate is higher than that of the OR gate.An XOR gate is equivalent to two 2-input AND gate and a 2-input OR gate, and after calculation, it can be found that the XOR gate delay is more than the total delay with the AND-OR gate.Moreover, XOR can have TPL and other forms to further reduce the delay, so the XOR gate has the highest priority [7,8].As the number of inputs increases, as shown in the figure above, a 3-input NOR gate can be split into a 2-input NAND gate and a 2-input NOR gate [9].Through calculation, it can be verified that the delay of a 3-input NAND gate is greater than the total delay of a 2-input NOR gate and a 2-input NAND gate.Likewise, it can be demonstrated that the priority of the n-input gates is higher than that of (n-1)-input gates.Finally, this paper can get such a priority relationship as shown in table 1.
Table 1.The priority of logic gates.

the number of stages of critical path 2.3.1. Calculate the optimum number of stages of critical path
The Inverter chain is shown in figure 8 [10].As you can see, the equation has N-1 unknowns.So, this paper performs partial derivatives on these N-1 unknowns to find the minimum value of the delay.Finally, this paper gets the conclusion that when the delay is minimal, it satisfies the condition that So, the optimal number of stages can get

Optimize circuit delay by decreasing or increasing the number of stages
According to the present circuit, this paper can find the critical path and count the number of stages.
Typically, the stages of critical path are much more than the optimal stages for a circuit to implement a relatively more complex function, so here this paper is mainly discussing about reducing the number of stages.If increasing the number of stages is needed, just do the opposite.
Here this paper proposes three kinds of solutions.The first one is to optimize the logic.This paper can try to classify different kinds of cases of the truth table in multiple ways to avoid certain kinds of cases where the logic is too complicated.If some common submodules are needed, you can try to use encapsulated circuits, which often have been optimized to the best.The second one is to use logic gates with more inputs.It's easy to comprehend that a logic gate with more inputs can replace two or more logic gates, such reducing the number of stages.But what needs to take notice is that that's not always the case because the logic gates in the critical paths may have their branches so that if you use a more complicated gates with more inputs, you may cover cases in the branch thus complicating the critical path and increase the delay.
The third one is to replace series with parallel.This is a very valid solution to reduce delay.When there are many modules, consider how to connect them with each other in parallel to shorten the length of the total critical path.Finally, the optimal circuit of this paper is as below in figure 9.

Optimization of delay and power consumption based on varying sizing and DD V 3.1. the optimization of delay 3.1.1. the calculation of the optimal sizing of logic gates in critical path
The model of a circuit with branches is shown in figure 10.Based on the theory of 2.2, considering the effect of branching on delay, this paper adds branching effort to the theory.For each logic gate with a branch, it has branching effort b , For example, as shown in figure, inverter 1 has 2 branches, so Path Branching Effort: Now this paper can compute the path effort As a result, Based on the concept in 2.3.1 this paper can get when ,the total delay is minimum, so Finally, this paper can determine the optimal gate sizes:

The calculation of minimum delay
The Critical path is shown in figure 11.
* min 8 40.9 When the delay is minimized, the sizing of each gate can be obtained according to , and is shown in table 3.

The model of power consumption
Each time the output flips, it means that load C is charged or discharged.To predict the power consumption of the circuit, this paper needs to know how often load C charges or discharges.Based on the fundamental physical concept, this paper can know the power consumption of charging or discharging for one time is , so the total power consumption in N cycles is ( n is the number of 01 → transitions in N cycles) .To calculate the average Power, this paper introduces the concept of  .The definition of  is transition probability.
0→1 =  →∞    = ( = 1) ⋅ ( = 0) (34) Therefore, for every logic gate, In general, when optimizing circuit energy, it is necessary to change the sizing and DD V at the same time in order to achieve the best.However, delay and power consumption constraint with each other.So, when optimizing energy, this paper set the delay to , and change the DD V and sizing through nonlinear programming to find the minimum value of energy.Here are the results shown in table 5.The following is the curve of minimum energy with the delay of  As seen above, this paper finally finds that in the condition of 1.5 min D , the minimum energy consumption is 11.93W.Similarly, another nine group of data are processed, and find out the minimum energy consumption in the condition of different times of min D shown in table 6.Finally, the relationship of Delay and Energy Consumption can be drawn as below in figure 13.As shown above, this paper can conclude that as delay increases, the minimum energy consumption in the condition is significantly reduced.As a result, it's an effective method to optimize circuit according to our need by changing the sizing and DD V of logic gates.

Conclusion
As a commonly used basic module, 4-bit absolute-value comparator plays a very important role in industry.In the first part, I mainly provide three ways to optimize the circuit, whether to use pure Boolean circuits or circuits with functional blocks, set the priority for logic gates, and use more appropriate number of stages in critical path.As a sophomore undergraduate, I chose this topic to study circuit design starting with a relatively simpler circuit topology of 4-bit absolute-value comparator, and use it as an example to establish a basic understanding of circuit design through exploration.Moreover, my purpose is to obtain more general methods and conclusions of circuit design by myself, and finally provide more ideas to solve design and optimization problems for you.So, I think the direction of the research of this project is very meaningful not only for me but also for the field.In the second part, I talked about how to optimize delay and energy consumption by changing the sizing and DD V .In terms of delay optimization, when the logic gates on the critical path , the delay is minimal.In terms of energy consumption optimization, through nonlinear programming, the minimum energy with the delay of certain multiples of min D is found, and the constraint relationship between delay and minimum energy is also verified.In short, although there is still a lot to improve and deepen in this project, it is a very valuable and challengeable experience for me and gives me a clearer plan for my future learning direction.

Figure 2 .
Figure 2. The circuit of absolute value (Photo/Picture credit: Original).

2. 2 .
The priority of the logic gate Commonly used logic gates are AND gate, OR gate, NOT gate, XOR gate.If Complementary Static CMOS is used, as shown in the figure 5 below, this paper can see the transistor structure of each logic gate.

Figure 5 .
Figure 5.The circuit of logic gates (Photo/Picture credit: Original).

Figure 6 .
Figure 6.The curve of the delay of NOR gate with different number of inputs (Photo/Picture credit: Original).
, this paper can optimize the circuit as figure7below.

Figure 7 .
Figure 7.The optimized circuit with higher priority (Photo/Picture credit: Original).

Figure 8 .
Figure 8. Inverter chain [10].From the figure above, this paper can get  = 1 + 2 + ⋯ +  (12) = √,  − 1 ⋅ ,  + 1 (16) it has the minimum delay and each stage has the same fanout and delay.When each stage is sized by f and has the same fanout f : = , this paper can get the optimal number is 3.47, which means if the number of stages of critical path is 3 or 4, it may have the smallest delay.

Figure 9 .
Figure 9.The optimized circuit with fewer stages (Photo/Picture credit: Original).

Figure 10 .
Figure 10.The model of a circuit with branches (Photo/Picture credit: Original).

Figure 12 .
Figure 12.The relationship between energy consumption and DD V under

Figure 13 .
Figure 13.The relationship between and energy consumption (Photo/Picture credit: Original).
this paper can calculate the , gp of different logic gates.Through calculation, if this paper takes NOR as example and assume that n is the number of inputs, this paper can get , according to delay g h p =  + , this paper can get the figure 6 as below.

Table 2 .
The calculation of parameters of each stage.

Table 3 .
The sizing of each stage.

Table 4 .
, =   +   ,   =  ⋅   =  ⋅ Delay satisfies the concept below.The calculation of parameters needed in the energy consumption concept.

Table 5 .
The results of sizing of each stage.

Table 6 .
The results of delay and energy consumption under different times of Dmin.