Synthesis of Chebyshev-I filter using folding and retiming

In synthesizing Digital signal processing (DSP) architecture, maintaining low silicon area and high performance becomes an important factor which can be achieved by various optimization techniques. To achieve this, we employ two design optimization techniques: folding and retiming, which are applied to 3 order Chebyshev I high pass digital filter to minimize the functional units (adders, multipliers) and to reduce the number of registers. Folding transformation is used to determine the control circuits in DSP architecture by executing multiple algorithm operation on a single functional unit. Retiming using register minimization is applied after folding, thereby reducing the numbers of multipliers and adders from 7 to 1 and 6 to 1, respectively, without affecting the input and output characteristics of the filter.


INTRODUCTION
Tremendous growth of digital signal processing (DSP) and its importance promotes advances in certain fields of applications such as telecommunication, military, instrumentation and control, image processing, seismology, speech processing and biomedical signal processing.DSP programs are executed repetitively for an infinite number of times and they are assumed to be non-terminating (Jackson et al., 2003;Salivahanan et al., 2010).This can be exploited by designing more efficient DSP system in terms of speed, area and power.The strategy of designing an efficient filter also needs to concentrate on reducing the number of functional units.
Advancement in technology and emerging trends required DSP architecture with less space and power consumption where the signal processing algorithm are modified to accommodate the circuit.To achieve the goals such as less area, high speed and low power different algorithms are proposed such as pipelining, folding, retiming etc.The transformation in which multiple algorithm operations are time multiplexed to a single functional unit is known as folding.This algorithm provides a technique for designing control circuits for hardware and helps to synthesize DSP architecture that can be operated using single or multiple clocks.Folding reduces the number of functional units; it may also lead to the usage of large number of registers (Keshab, 2012;Rajalakshmi et al., 2013).To avoid this, retiming technique is used to compute the minimum number of registers require to implement a folded DSP architecture and to allocate data to these registers to provide *Corresponding author.E-mail: lallei.chanu98@gmail.comAuthor(s) agree that this article remain permanently open access under the terms of the Creative Commons Attribution License 4.0 International License x architecture with low silicon area (Rajapadhy and Kiaei, 1991).This design optimization platform is designed using MATLAB/Simulink and Xilinx.

Folding
This transformation technique helps to determine the control circuits in DSP system in which multiple algorithm operations are time multiplexed to a single functional unit which leads to the reduction of functional units (such as adders, multipliers) resulting with low silicon area.Figure 1a shows an example of a DSP program for adding two samples where the operation computes Here, one output sample is produced every 2 clock cycles and hence the input is valid for 2 clock cycles (2l+0 and 2l+1, where l is the iteration).In 0 cycle, (n)+ (n) is performed.In cycle 2l+1(second), (n)+ (n) is switch to the adder along with (n) and the sum is stored in the unit cycle 2. Folded architecture in which 2 addition operation are folded to a single pipelined adder is shown in Figure 1(b).

Retiming
It is used to change locations of delay elements without changing the input/output characteristic of the system which is illustrated using Figure 2a and b.The filter 2(a) is described by And filter 2(b) is described by These two filters are having the same input-output characteristics and can be derived from one another with the help of retiming, even though these filters are having delays at different locations.
Retiming can be used to increase the clock rate, to decrease the number of registers and to reduce the power consumption of a circuit (Keshab et al., 1992).

CHEBYSHEV FILTER
Type I Chebyshev filters are all-pole filters that exhibit equiripple behavior in the passband and monotonic characteristic in the stopband.By increasing the order N, the Chebyshev response approximates the ideal response.It has the property that they minimize the error between the idealized and the actual filter characteristic over the range of the filter.This type of filter is named after Pafnuty Chebyshev (John and Dimitris, 1996).The magnitude response of Chebyshev type I filter can be expressed as:

|H(jΩ)| =
Where A is the filter gain, is the constant, is the 3dB cutoff

DATA FLOW GRAPH (DFG)
The operations of DSP algorithm are assumed to be executed repetitively.The DSP filter blocks needed to be optimized can be represented by DFG due to its easier, efficient and compactness.DFG is a directed graph G with sets of nodes/vertices V and sets of edges E (Edward, 1991;Rakshi et al., 2010).Each node in the DFG represents an algorithm operation and any arc U → V with w(e) delays states that the output of the iteration of U is used to execute the iteration of V.The arc with and without delays represent the inter iteration and intra iteration precedence constraint, respectively (John and Dimitris, 1996).
DFG is used to described hardware architecture which depends on folding factor (N), number of operation folded to a single functional unit.Hu and Hv denote the operators that execute the operation U and V in the hardware DFG.Operations processed by the operators form a folding set S. Each folding sets contains N entries, some of which may be a null operations denoted as .A delay or register elements in the hardware represents a storage unit.

FOLDING EQUATIONS
Folding is a transformation technique used to reduce the silicon area by time multiplexing many algorithm operations into single functional units, such as adders and multipliers.It provides a systematic process to design control circuit for hardware.Folding is applied to the filter to reduce the chip area (Keshab, 2012).
Consider an edge e connecting the nodes U and V with w(e) delays.Let the execution of iteration of the nodes U and V be scheduled at the time units Nl+u and Nl+v, respectively, where u and v are the folding orders of nodes U and V that satisfy 0≤u, v≤N-1.Hu and Hv denote the functional unit that executes the nodes U and V.If Hu is pipelined by Pu stages, the iteration of node U is available at time unit Nl+u+Pu.The result of iteration of node U is used by iteration of the node V which is executed at N(l+w(e)) + v. Thus, the result must be stored for: Time units, which is independent of l, a folding set is an ordered set of N operations executed by the same functional unit which depends on the folding order.The folding order of a node is the block of time to which the node is scheduled to execute the operation in the hardware.The folding sets of 3 rd ordered Chebyshev filter is shown in Figure 3 and are given by = S1= {1, 2, 3, 4, 5, 6, } and = S2= {7, 8, 9, 10, 11, 12, 13} Using the above folding sets, the filter is folded with folding factor 6 which means that the iteration period of the folding architecture is 6 units of time (u.t).Here, each node of the filter is executed once every 6 u.t in the folded architecture that is the folded hardware executes six operations.The folding set contains one null operation in Position 6 during which no operation is performed by the adder.The folding equations for each edge are given in Table 1 RETIMING Basically, retiming is also a transformation technique used to change the location of the delay elements without affecting the input and output characteristic of the circuit.Retiming has to be performed before folding to forced causality of the system (Leiserson et al., 1986;Monteiro et al., 1993).The negative values of the above folding equations are made positive by using cutest retiming; a special case of retiming which only affects the weights of the edges of the cutest.It consist of adding k delays to each edge from disconnected subgraphs G1 to G2 and removing k delays from G2 to G1.Using retiming the weight of the edge U is computed as: = w(e)+r(V)-r(U) (2) The retiming folding constraints are obtained using the relation Where x is the floor of x, which is the largest integer less than or equal to x.The retimed folding constraints are: -r(2)≤0, r(9)-r(4)≤0, r(4)-r(2)≤0,r(11)-r(6)≤-1, r(6)-r(4)≤0, r(13)-r(6)≤0 From these folding constraints, we can form the constraint graph.The inequalities can be solved using Floyd-Warshall algorithm and the final constraints after applying algorithm are: r(1)=0, r(2)=0,r(3)=0,r(4)=0,r(5)=0,r(6)=0, r(7)=0, r(8)=0, r(9)=0, r(10)=-1, r(11)=-1, r (12)=-1, r(13)=0.
We can find the new retimed value using Equation 2. By applying the folding equations the new delays can be obtained and then cutest retiming is applied to have positive values from which the architecture can be derived.

REGISTER MINIMIZATION TECHNIQUE
The main objective here is to minimize the architectural area by minimizing the number of registers.The folded structure contains a higher number of register because the intermediate results need to be stored (Keshab, 2012;Parhi, 1992;Rajapadhy and Kiaei, 1991).This minimization process follows two steps: (a) Lifetime analysis table and lifetime chart.(b) Data allocation using forward and backward register allocation.In lifetime analysis, a data sample (variable) is live from the time it is produced through the time it is consumed.It is dead, after the variable is consumed.When the variable is live, it occupies one register (Deepa and Vijaya, 2012).Here, the number of live variables at each time unit is computed and the number of registers needed by the folded architecture is computed.Lifetime table can be constructed by considering the two parameters and their relations: = u+ (Table 2).

= +max{ (U )}
The linear lifetime chart is shown in Figure 4, which graphically represents the lifetime of each variable.
Here, the horizontal lines represent the clock cycle and vertical lines represent the lifetime of a variable.With the help of this chart, the resultant minimum number of registers is obtained as the maximum number of live variable at any time step.The maximum number of register is Max{0,0,1,1,2,3,4.4,3,4,4,5,5,5,5,4,5,5,6,6}= 6.After lifetime chart, the minimum number of register required to implement the architecture is found to be 6 and data are allocated to the registers.The registers are named as R1, R2, R3, R4, R5 and R6.This allocation scheme dictates how the variables are assign to registers (Figure 5).

Folded architecture
The folded architecture with respect to the folding equations and allocation table with a minimum of 6 registers is derived and shown in Figure 6.

RESULTS
Tables 3 and 4 show the comparison of unfolded and folded filter with respect to adders, multipliers and number of registers, and the device utilization for both the architecture.These architectures (unfolded and folded with register minimization) are synthesized using Spartan 3A/3AN device.Here the number of functional units such as adders and multipliers are reduced to 1 each, with 6 registers.In addition to these, the number of components namely slices, slice flip flop, look up tables (LTU) and input-output blocks (IOs), present in Spartan are reduced due to the reduction in functional units.

Conclusion
This paper addresses the challenges and opportunity of minimizing the filter architecture by the growing trend of VLSI DSP systems.It has been demonstrated that the Chebyshev I high pass filter architecture in terms of required number of functional unit such as adder and multiplier is substantially reduced from 6 adders to 1 and 7 multipliers to 1 by the folding and retiming with register minimization techniques.This context describes an effective and efficient heuristic and provides an optimized environment for digital filter.Our experimental results demonstrate that folding and retiming can significantly reduce the silicon area and therefore providing flexibility to the cost and effort of the designers.

DFigure 1 .
Figure 1.(a) A DSP program with 2 addition operations; (b) A folded architecture where the 2 addition operations are folded to a single pipelined adder.

Figure 2 .
Figure 2. Two versions of an IIR filter and the computation times of the nodes are in parentheses.

Figure 3 .
Figure 3. Direct form II, 3 rd order Chebyshev I filter.