ece 565 presentation

Leakage Power Reduction using Modified Force Directed Scheduling

Lakshmi Yasaswi Kamireddy651771619

Equations for Delay and Leakage PowerRelationship between delay and threshold voltage:

Where β is the power law constant taken from [Koichi and Sakurai, ASP-DAC’2000].

Relationship between leakage power and threshold voltage:

from [Sheu,et al ,IEEE ‘1987].The above equation show that varying Vth will effect both delay and leakage power.

With the introduction of multiple threshold voltage techniques ,it is possible to replace a low-Vth resource with a high Vth resource to save leakage power without affecting the performance.

Problem Definition:Given a data flow graph, a dual-Vth module library which contains both low-Vth and high-Vth design of the same function unit, a timing constraint Tcon which is given as the number of maximum control steps, and proper resource constraint Rtp of each resource type tp.We schedule operations into appropriate control steps and assign them with proper threshold voltages while considering resource constraints so as to achieve the desired performance with minimum total leakage power consumption.

Min-cut based Algorithm for Power Aware Scheduling: [Wang et.al,IEEE,2013]

Initialize all operations to high Vth

Is Tcon violated

Min cut Algorithm on critical DAG

Reassign selected operations Vth from high to low

FDS an operation

Is Rtp violated

Build MOG

Min cut Algorithm on MOG

Reassign selected operations Vth from high to low

Is scheduling over

DFG, Timing and Resource Constraints

Scheduling result and threshold voltage assignment

Min-cut based Vth assignment Modified FDS integrated with Vth assignment

yes

no

yes

no

yes

no

Evaluate total latency

Build a critical DAG

Some Definitions:Tasap (vi) and Talap (vi) are the ASAP and ALAP times of an operation vi .

Mobility M(vi) = [ Tasap (vi) , Talap (vi) ].

Lattp is the latency of the functional unit.

Critical node : If an operation has mobility equal to the latency of the functional unit. M(vi) = Lattp .

Critical path : A path from s to t is a critical path if it goes through only the critical nodes.

Execution Interval (EI) is defined as a set of consecutive Latency control steps.If EI starts from control step it is denoted as EIj .

For an operation of mobility M(vi)=[ Tasap (vi) , Talap (vi) ], the candidate execution intervals are EIj , Tasap (vi)≤j ≤Talap (vi) – Lattp +1 .

First, all the operations are initialized to high-Vth.If a critical path violates timing constraints, we build the critical DAG, and assign the min-cut selected operations to low-Vth. This procedure is repeated until the timing constraints are met. During the next procedure, operations will be scheduled and their threshold voltages will be adjusted by modified force-directed scheduling. After scheduling an operation, resource constraints are checked, and, if some of them are violated, threshold voltages of appropriate operations are adjusted by computing the mincut of mobility overlap graph (MOG). The output result satisfies both timing and resource constraints.

Min-cut based Algorithm for Power Aware Scheduling: [Wang et.al,IEEE,2013]ctd..

Min-cut based Vth Assignment:

A. Construction of Critical DAG

Let start(e), end(e) be the start and the end node of edge .The critical DAG G’(V’,E’) generated from G(V,E) is constructed as :1) V’={s,t}UV’op ,where V’op ={vi | vi is a critical node}.2) E’={e|e is in the critical path}.

B. Construction of split DAG

Each node in V’op is split into two, creating a new edge which connects them, and the direction of this edge is defined along with the data flow.For example node vi will be split into and and the corresponding directed edge is (,).The construction of split DAG ( , ) is constructed as:1) ={s,t}U{,| vi Є V’op}.2) = Es U E’, where Es ={(,) | vi Є V’op }.3)=

Min-cut based Algorithm for Power Aware Scheduling: [Wang et.al,IEEE,2013]ctd..

Where SS() is the sensitivity and is defined as the operation’s power increment –latency decrement ratio if its threshold is changed from high to low.SS()=Where is the leakage power variation due to threshold voltage reassignment of v i ,and is vi ‘ s latency decrement.B. Sensitivity CalculationThe leakage power increment that is in the numerator of the sensitivity is what is difficult to estimate.It depends on the resource usage ,so if we can estimate the resource usage we can estimate this term.1)Probability – Based Resource Usage Estimation:To order the operations , a function Prob[ck] on control step k (ck) by probabilistic number of similar operations that their execution intervals contain ck.

Let Prob[vi ,EIj], vi Є Vop , 1≤j≤Tcon – Lattp +1 be the probability that vi is scheduled into Eij ; Prob[vi,ck] , 1≤k≤Tcon be the probability that vi ‘s execution interval contains ck ;

Fvi be the set of EI covered by M(vi).

=

=

Min-cut based Algorithm for Power Aware Scheduling: [Wang et.al,IEEE,2013]ctd

Probmax is defined as the maximum Prob[ck], 1≤k≤Tcon .Prob[ck] estimates resources usage in control step k, and Probmax estimates the total resource usage.

Explanation: When operations execution intervals contain the same control step they can’t share the same resource. As Prob[ck] indicates the concurrency of similar operations, it estimates the number of resources required to schedule all the operations in ck .Probmax estimates the total resources usage of this type.

Min-cut based Algorithm for Power Aware Scheduling: [Wang et.al,IEEE,2013]ctd

Reassigning vi ‘s Vth leads to change in both high Vth and low Vth Probmax .Let ΔProbmax_Vhigh, ΔProbmax_Vlow be the change in the Probmax. When we assign a high Vth to low Vth ΔProbmax_Vhigh will be negative as we are reducing one high Vth assigned unit and ΔProbmax_Vlow will be positive as we are increasing one low Vth assigned unit.ΔPlk(vi)= ΔPlk_Vhigh(vi) + ΔPlk_Vlow(vi)Plk_Vhigh(vi)=+Plk_Vlow(vi)=+The resource leakage power variations due to the replacement are given as:ΔPlk_Vhigh(vi)= Plk_Vhigh(vi) ΔProbmax_Vhigh

ΔPlk_Vlow(vi)=Plk_Vhigh(vi) ΔProbmax_Vlow

ΔPlk(vi)= Plk_Vhigh(vi) ΔProbmax_Vhigh+Plk_Vhigh(vi) ΔProbmax_Vlow

So SS(vi) can be calculated using the above calculated ΔPlk(vi) . Consider multiplierLPtp_Vhigh() = 2.3/3=0.77LPtp_Vlow () =124.6/2=62.3Probmax_Vhigh(X) = -1 Probmax_Vlow (X) = 1Plk_Vhigh()=0.77*3+0=2.3Plk_l_Vlow ()=0+62.3*3=186.9 ΔPlk_Vhigh()=2.3*(-1)=-2.3ΔPlk_l_Vlow ()=186.9*(1)=186.9ΔPlk=-2.3+186.9=184.6.SS(X)=184.6/(3-2)=184.6

Example of Min cut based Vth Assignment

White nodes are high-Vth operations.Gray nodes are low-Vth operationsDashed nodes are the operations in the min cut.Say the timing constraint Tcon = 3 control steps.

Functional unit

Low -Vth High-Vth

LP(uW)

Lat(control step)

LP(uW)

Lat(control step)

ALU 19.5 1 0.3 2

Multiplier 124.6 2 2.3 3


Initial state 1st iteration 2nd iteration3rd iteration

5 control steps initial state –violates Tcon

3 adders are selected by min-cut algorithm, and assigned to low-Vth.

This reduces critical path length to 4 control steps and causes 174.15 units leakage power increment.

The critical DAG is updated, and the SS() values of high-Vth operations are recalculated.

An adder and the multiplier are selected and their threshold voltages are lowered down.

This generates 242.65 units leakage power increment and reduces critical path delay to 3 control steps and hence Tcon is satisfied.

4 control steps after assigning 3 adders to low Vth –still violates Tcon

3 control steps after assigning 3 operators to low Vth –Tcon satisfied

Critical DAG


Modified Force Directed Scheduling:Force directed scheduling is modified to schedule operations and to adjust operations’ threshold voltage assignments with the consideration of resource constraints.The resource usage of type tp is estimated by Probmax_Vhigh+Probmax_Vlow .Let Rexceed=max(Probmax_Vhigh+Probmax_Vlow-Rtp,0) which checks for resource constraint violation.Where Rtp is the resource constraint of type tp.

If Rexceed>0 means the resource usage will violate the resource constraint and hence our goal is to reduce Rexceed to 0.

Let Resti(ck) be an estimate of total number of high-Vth and low-Vth designed resources required in ck.Rest(ck)=ProbVhigh[ck]+ProbVlow[ck] .

Let Jam Control Step(Cjam) is set of control steps in which resource constraint is violated i.e Cjam will be set of all ck at which ProbVhigh[ck]+ProbVlow[ck]>Rtp .

To meet the resource constraints we need to reduce Resti (ck) for all ck Є Cjam .

Explanation: Changing a high-Vth assigned vi in ck (ck Є Cjam) to a low-Vth one leads a decrement of ProbVhigh[vi, ck] on ProbVhigh[ck] and an increment ProbVlow [vi, ck] on ProbVlow [ck]. vi’s working time is shortened while its mobility remains the same, which makes ProbVhigh[vi, ck] > ProbVlow [vi, ck] and hence Resti(ck) is decreased.If vi, vj are high-Vth assigned operations with the same operator type. If ck Є M(vi), ck Є M(vj), and M(vi) <M(vj). Reassigning vi will reduce Rest faster.


While reassigning using the set Cjam we cannot do simultaneous assignment of operators that have mobility overlap, because that will cause over reduction.Also we want to keep as many high Vth as possible to reduce our leakage power.

To overcome this a Mobility Overlap Graph(MOG) is constructed.There exists an edge if their mobilities overlap else there is no edge.MOG=(Vr,Er) where Vr={s,t}UVr’ , Vr’ represents set of high Vth operations of the same type in ck ,ck Є Cjam .There is a directed edge (vi, vj) Є Er if their mobilities overlap, and vi is on the left of vj in the mobility graph.

Operations in a mobility overlap graph (MOG) are of the same type, and their mobilities become the main factor to determine the threshold voltage reassignment.

Supposing all control steps are in Cjam


Construction of Split Mobility Graph:SMOG=(, ) is 1) ={s,t} U {Є Vr’}.2) = U Er ,where = {Є Vr’}.3) Weight(e)=

Operation with smaller mobility should be set with a smaller sensitivity, so that it has a higher priority to be selected by the mincut algorithm. So , SSr(vi) = |M(vi)|γ; where γ> 0 an empirical coefficient.Min-cut algorithm is executed on this split mobility overlap graph (SMOG) to find a set of operations with the minimum sensitivity covering all the paths. The operations selected in each iteration have no mobility overlap, which avoids Resti(ck), ck Є Cjam, is over reduced.


The second paper introduces a method called weighted interval scheduling instead of building the mobility overlap graph.Let start(v),end(v) be the first and last control step of M(v).Each operations mobility M(v) corresponds to an interval , whose start time,finish time ,and weight are start(v),end(v) and Rredc(v) where Rredc(v) is the resource usage reduction when reassignment is done from high Vth to low Vth.The problem is to find a set of non-overlapping intervals Vw with maximum total weight.The intervals are first sorted in ascending order of starting time and with set indices of 1 up to n.For 1≤i ≤n , let wi be the weight of the ith interval, and value(i) be the maximum total weight of non-overlapping intervals with the indices of at least i. set(i) stores the indices of intervals whose weights are value(i),then value(i)=max{wi +value (j),value(i+1)} set(i)=Where j is equal to the index of the first interval that starts after interval i ends.

To find maximum weight1)Set value(n)=wn ,set(n)={n}2)For 1≤i ≤n ,find the smallest j<i such that interval j doesnot overlap with interval i.3)For i = n down to 1, compute value(i) and set(i) using above equation .4)The maximum weight of non overlapped intervals is value(1), and set(1) stores the set of corresponding interval indices.

Leakage Power Aware Scheduling in High Level Synthesis: [Wang et.al,IEICE,2014

Example of weighted interval scheduling based Vth assignment

Tcon=6 control steps,RC=2

Functional unit

Low -Vth High-Vth

LP(uW)

Lat(control step)

LP(uW)

Lat(control step)

ALU 19.5 1 0.3 2

Multiplier 124.6 2 2.3 3

Resource constraint violation as available resources are 2 but c2 needs three resources.So v1,v2,v5,v7 have overlap so using the weighted scheduling we will find the set of maximum weight overlap free operations which in this case is just v1 .So v1 is assigned to low Vth

Resource constraint violation as only one high Vth is available after assigning the other to low Vth. Cjam={c2,c3,c4,c5}.Weights of all the six operation are calculated and Vw={v2,v3,v4}

Leakage Power Aware Scheduling in High Level Synthesis: [Wang et.al,IEICE,2014

Resource constraints and timing constraints satisfied.

Leakage Power Aware Scheduling in High Level Synthesis: [Wang et.al,IEICE,2014]

It is mainly based on slack =Talap(v)-Tasap(v).In process of module selection slack indicates the maximum delay increase the node can afford by replacingcurrent module with a slower module instance.

Module Instance Usage Graph (MIUG(m)) = (Vm, Em). A module instance usage graph is generated from the synthesized DFG to represent the resource allocation result.Each node v ∈Vm is a node in the DFG bound to module instance m. There is a directed edge (u,v)∈Em if during the usage, the instance m executes operation node u immediately preceding the execution of node v.

Composite Constraint Graph (CCG)= (V, Ec). A composite constraint graph is generated to calculate new slacks. Node v ∈ V is a node in the DFG. There is a directed edge (u, v) ∈ Ec if and only if there is either an edge (u, v) in the original DFG, or an edge (u, v) in any MIUG. We call the two types of edges as the data dependency edge, and the resource constrained edge, respectively.

A sample DFG Initial Synthesis

MIUG CCG

Two slow adders with delay of 2 cycles (sa_1, sa_2), one fast adder with delay of 1 cycle (fa_1), three slow subtractors with delay of 2 cycles (ss_1, ss_2, and ss_3), one fast subtractor with delay of 1 cycle (fs_1). All these instances are implemented in low-Vthtechnology. Assume the latency constraint is 8 cycles

Leakage Power Optimization with Dual Vth Library: [Tang et.al,DAC,2005]

By running the resource unconstrained ASAP and ALAP algorithms on the CCG graph, the slack graph (SG) on the composite constraint graph is generated.The value inside the node is current delay, and the triplet shown next to each node v denotes tASAP (v)/ tALAP (v)/s(v).

The paper proposes a heuristic to reduce leakage poer in already scheduled and resource allocated design.

Slack sensitive graph and Slack sensitive transitive closure graph :The slacks in the composite constraint graph give designers the freedom of module replacements. The slack change on one node will affect the slack of other nodes due to the data dependency and resource sharing.

Determine slack sensitive edges:On a composite constraint graph, an edge (u, v)∈Ec is called slack sensitive if either tASAP(v) – tASAP(u) = d(u) or tALAP(v) –tALAP(u) = d(u). A slack sensitive edge implies that the slacks of the two nodes of the edge are sensitive to each other’s slack change.

Slack Sensitive Graph (SSG)= (V, Es). Each node v∈V is a node in the DFG. There is a directed edge (u, v)∈Es if and only if an edge (u, v) is a slack sensitive edge.


Slack Sensitive Graph

Composite Constraint Graph

Slack Sensitive Transitive Closure Graph (SSTG)=(V, Est). Each node v∈V is a node in the DFG. There is a directed edge (u,v) ∈ Est if there is a directed path from u to v in SSG. Two nodes u, v∈V are called slack insensitive if there does not exist an edge (u, v) or (v, u) in SSTG. We call a set SI = {v1, v2, …, vk} in the DFG as slack insensitive if all nodes in the set are pair-wise slack insensitive

Slack insensitive sets {E, F, I}, {A, B, D}, {G, H}.

An instance replacement is safe if after the replacements, the slack for each node is still non-negative. To check the feasibility of each replacement, assign the nodes with new delay in the module instance graph, and perform resource unconstrained ASAP and ALAP scheduling to calculate the new slacks for all the nodes.

If there exists negative slack nodes, the replacement is not safe. A flag is used to state if the node is safe for replacement or not.

Also Plk_Vhigh(vi)=+Plk_Vlow(vi)=+The resource leakage power variations due to the replacement are given as:ΔPlk_Vhigh(vi)= Plk_Vlow(vi)-Plk_Vhigh(vi)


In order to consider the resource sharing constraints, an undirected module instance sensitive graph (MISG) is constructed from the transitive closure graph, with nodes sharing the same module instance being combined into one node.

Consider any two combined nodes U and V. Suppose the node U consists of operation {u1, u2, … , up}, and V consists of {v1, v2,…, vq}. There is an edge (U, V) in the graph if and only if there is an edge (ui, vj) in the original transitive graph.

If the replacement is feasible for a module instance m, we assign a leakage power reduction weight to the node in the new graph according to the equation ΔPlk_Vhigh(vi)= Plk_Vlow(vi)-Plk_Vhigh(vi); otherwise, the weight on the node is zero.

If the new instance graph MISG is a transitive closure graph, find the maximum-weight-independent-set of the MISG . Otherwise, use a greedy approach to get a near-maximum weight independent set.

After replacements of one set of independent module instances, the slack of operation nodes in original DFG might not become zero, and there might still exist some safe replacements for module instances. The leakage power consumption of the data flow graph will be iteratively reduced.

MISG

Transitive Closure Graph


Algorithm: Leakage Power Optimization With Dual-Vth LibraryInput: An initial synthesized data flow graph G, resource allocation table T, timing constraint, a dual-Vth module library, leakage power table for each module instanceOutput: A synthesized data flow graph that satisfies the timing constraint with less leakage power consumptionBegin(1)Construct module instance usage graphs MIUGs from G and T, label each MIUG with RSafe=TRUE to imply the module instance might be feasible to change its implementation.(2)Construct composite constraint graph CCG(3)Construct a general transitive closure graph TG from CCG.(4)Construct a module instance sensitive graph MISG from TG.(5)Recognition and find a transitive orientation for the MISG .(6)For each module instance with RSafe=TRUE, build slack graph SG from CCG through resource unconstrained ASAP and ALAPalgorithms on CCG, and update the safety of the replacement by setting the RSafe value. If there is no safe replacement, return.(7)For each node U in the MISG with RSafe = TRUE, calculate and assign a weight, according to Equation (2), to node U. Fornode U with RSafe = FALSE, the weight is set to 0.(8)Find the maximum weight independent set of MISG if MISG is a transitive graph; otherwise, find a near-maximum weightindependent set of MISG using a heuristic approach.(9)Replace the module instances node in the set of (7) with their high-Vth designs, and assign their RSafe to be FALSE.(10)Update the delay of each operation node in G(11)Go to (6)End


Comparisons :The two algorithms[Wang et.al,IEEE,2013;Wang et.al ,IEICE,2014] proved to produce better leakage power reduction than MWIS because resource usage is well estimated ,which makes the scheduling and Vth adjustment accurate.They also have lower run time than the MWIS this is because they use min cut max flow which accelerates the algorithm.Time complexity stated:Min-Cut SMOG Min-cut weighted interval MWISO(Tcon*N2) O(Tcon*N) O(MV3)In algorithms[Wang et.al,IEEE,2013;Wang et.al ,IEICE,2014] FDS helps in smoothening the Prob[ck] value which helps the solution to be obtained faster.Both min cut SMOG approach and weighted interval scheduling are optimal but in case of MWIS if the MISG is not a transitive graph then we use a near MWIS which is not optimal.

Weakness:The time complexity is considered independent for min cut and FDS but a good enough time complexity analysis has not been provided in the first two papers. According to me the time complexities for these two papers will be higher than the once stated.The results for the first two papers are not compared and I think both these techniques have to be compared to see which one performs better.

A better approach could be:The min cut based approach can be further modified in a way that the Probability estimate can be introduced into the FDS force calculations. As FDS also uses a similar concept of mobility. There are approaches for Dynamic power minimization as in [Allam and Ramanujan,ICICDT,2006] ][Gupta and Katkoori, ISVLSI,2002] which if thought in this way can be extended to leakage power minimization also.

References[1][Koichi and Sakurai, ASP-DAC’2000]:Koichi Nose, Takayasu Sakurai, Optimization of Vdd and Vth for Low-Power and High-Speed Applications, ASP-DAC, 2000.[2][Sheu,et al ,IEEE ‘1987]:B. Sheu , D. Scharfetter , P. Ko and M. Jeng "BSIM: Berkeley Short-Channel IGFET Model for MOS Transistors", IEEE J. Solid-State Circuits, vol. SC-22, no. 4, 1987[3][Tang et.al,DAC,2005]:Xiaoyong Tang, Hai Zhou, Prith Banerjee, Leakage Power OptimizationWith Dual-Vth Library In High-Level Synthesis, DAC, 2005.[4][Wang et.al,IEEE,2013]: N. Wang, S. Chen and T. Yoshimura "Min-Cut Based Leakage Power Aware Scheduling in High-Level Synthesis", International Symposium on Quality Electronic Design (ISQED), pp, pp.164 -169 [5][Wang et.al,IEICE,2014]: Nan WANG Song CHEN Cong HAO Haoran ZHANG Takeshi YOSHIMURA “Leakage Power Aware Scheduling in High-Level Synthesis”, IEICE transactions on fundamentals of electronics, communications and computer sciences,vol E97-A, no.4, 2014[6][Allam and Ramanujan,ICICDT,2006] A. K. Allam and J. Ramanujam, "Modified Force-Directed Scheduling for Peak and Average Power Optimization using Multiple Supply-Voltages," in ICICDT '06: Proceedings of the 2006 IEEE International Conference on Integrated Circuit Design and Technology, 2006.[7][Gupta and Katkoori, ISVLSI,2002] S. Gupta and S. Katkoori, "Force-Directed Scheduling for Dynamic Power Optimization," in ISVLSI '02: Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2002.

ece 565 presentation

Engineering