Trending ▼   ResFinder  

Greedy algorithm

15 pages, 0 questions, 0 questions with responses, 0 total responses,    0    0
suan
  
+Fave Message
 Home > suan >

Formatting page ...

Greedy Algorithms Minimum Spanning tree problem (MST problem) : A weighted graph is a graph with a weight ( a number) associated with each edge. A spanning tree is a connected acyclic sub-graph. If the number of vertices in the graph is n, then the number of edges in a spanning tree is n 1. The weight of a sub-graph is the sum of the weights of the edges in the sub-graph. The problem is : Given a graph G, find a spanning tree of minimum weight. Kruskal s algorithm : We start with T = , and at each iteration try to insert an edge from the rest so that it does not lead to a cycle in T. If it does, then we discard it and try another edge. If n 1 edges get inserted, then we get a spanning tree and if we run out of edges before that then the graph is disconnected and there is no spanning tree. With this strategy, one way to find the MST will be to try all choices of edge to be inserted at each stage and find out a sequence of choices which will lead to the minimum weight. But this method will have complexity which is exponential. Kruskal s algorithm makes the choice at each stage in a greedy manner by choosing the edge with minimum weight from the rest and it can be proved that this leads to the MST. Thus a greedy algorithm which is typically used for optimization problems is one in which the solution can be built up in stages where in each stage a choice is to be made. An intuitively obvious (Greedy) choice is made. The choice should meet some conditions and if it does not, then it is discarded and the next choice is made. If we run out of choices before finding a solution then the problem has no solution. Otherwise an optimal solution is obtained. Greedy algorithms are usually very efficient. Kruskal s algorithm is thus a greedy algorithm. In many cases the optimal solution and candidate solutions are subsets of a given set and in such case the typical starting point is an empty set. Elements are then added in a greedy manner till a solution is obtained. Kruskal(G,E) // G : Graph with n vertices, E : set of edges {Set of edges T = ; heap of edges A = buildheap(E) (e) initialize(n) // for a partition of T into connected components (n) //partition as inverted tree with weighted merge and path compression nc = n // number of components While nc > 1 { if A is empty not connected : no MST c = (u,v) =deletemin(A) (loge) ( (n)) x = find(u) : y = find(v) 1 if ( x not equal to y) {merge(x,y) (1) insert c in T //T as linked list with insertbeg (1) nc } The loop executes at most e times and for some examples exactly e times. Therefore complexity is (e + e(loge + (n)) i.e. (eloge). But e is O(n2) and hence loge is O(logn2) i.e. O(logn). Since e >= n, loge is (logn). Therefore complexity is (elogn). Example 5.1 : Find a MST for the following graph using Kruskal s algorithm. Vertices Adjacent vertices/weight 1 2/6 3/5 5/5 2 1/6 4/2 5/1 3 1/5 5/4 4 2/2 5/5 6/1 5 1/5 2/1 3/4 6 1/7 4/1 2/3 Edges sorted according to weight Edge 2,5 4,6 2,4 2,6 3,5 Weight 1 1 2 3 4 Chosen at stage 1 2 3x 4 Components at different stages Stage Components 1 2,5 2 2,5 4,6 3 2,4,5,6 4 2,3,4,5,6 5 1,2,3,4,5,6 Minimum weight 1 + 1 + 2 + 4 + 5 = 13 6/7 6/3 4/5 1,3 1,5 4,5 5 5 5 5 1,2 1,6 67 Prim s Algorithm : This is another greedy algorithm for the MST problem. Here the tree T grows by starting with a single vertex and adding at each stage an edge with one end in the current tree and the other end outside. Such an edge with the minimum weight is chosen. The choice is thus a greedy one. The vertices outside the tree are maintained in a priority queue with the key for v equal to the minimum value d(v) of w(u,v) for u in the current T. For every v outside T a p(v) is also maintained which is the nearest vertex of T. After obtaining v by a deletemin operation, the vertex v and the edge {p(v),v} is added to the tree. Once the vertex is inserted the key values of the neighbours u of v not in the tree are decreased to w(u,v) if it is less than the previous key value which can be done with decrease-key operations. In such a case p(u) is also updated to v. Thus we get : 2 Prim(G = (V,E,w), r) // w:E-> Z is an integer valued weight function // on the edges, r starting vertex, n no of vertices, e no of edges {T = for all v {d(v) = ; p(v) = null} key(r) = 0; Q = buildheap(V) O(n) while (Q ) O(n) {v = deletemin(Q); O(logn) If (d(v) = ) tree disconnected no MST If p(v) not null add edge {p(v),v} to T For every neighbour u of v not in T If w(u,v) < d(u) {decreasekey(u,w(u,v)); p(u) = v;} } } Using a bit vector of vertices to represent the current T, check of v not in T is O(1). Using a binary heap decrease-key is O(logn) and hence complexity is O(n + n logn + e logn) i.e. O(e logn) same as that of Kruskal s algorithm. However using Fibonacci heap decrease-key has an amortized complexity O(1) and the complexity improves to O(e+nlogn). Example 5.2 : Find a MST for the following graph using Prim s algorithm. Vertices Adjacent vertices/weight 1 2 3 4 5 6 Stage 1 v d(v) p(v) Stage 2 v d(v) p(v) Stage 3 v d(v) p(v) Stage 4 v d(v) p(v) 2/3 3/5 1/3 4/2 1/5 5/4 2/2 5/5 1/2 2/1 1/7 2/3 1234 xxxx 0 ^ 2345 11x1 35 2 ^ 2346 5551 1457 ^ 346 522 423 ^ 5/2 5/1 6/7 6/3 6/1 3/4 4/5 4/1 56 xx 6 1 7 3 Stage 5 v d(v) p(v) Stage 6 3 5 4 6 4 1 ^ v d(v) p(v) 3 5 4 ^ minimum weight = 2 + 1 + 2 + 1 + 4 = 10 Activity Selection Problem : Given a set of activities A = {a1, a2, ..ai,.... an} with each activity having a start time si and finish time fi such that0<=si< fi Two activities ai and aj are called compatible if fi <= sj or fj <= si. A subset of activities C of A is called compatible if any two activities in C are compatible. The activity selection problem is to find a maximum size compatible subset of activities. A greedy algorithm builds up the solution by starting with empty C and inserting at every stage an activity a in A\C chosen in a greedy manner such that a is compatible with all other members of C. The possible greedy choices are : 1) Choose one with minimum si 2) Choose one with minimum fi 3) Choose one with minimum duration fi - si 4) Choose one with fewest overlaps with the remaining members It can be proved that (2) leads to the optimal and others do not. Thus we get the algorithm : Sort A on end time C is empty i = 0 ; f0 = 0 /// i : last element inserted for j = 1 to n // j = current choice if (sj >= fi) {insert aj in C; i = j } Not counting the sort (the activities may already be sorted on finish times) the complexity is (n). Example : Activity no s f 123456 423564 645787 After sorting on f Activity no 2 3 1 4 6 5 s 234546 f 45 6778 * * * The set C builds up as : 4 * : selected {2} {2,1} {2,1,5} Huffman Coding : Let T be a text file containing characters c1, c2, .., cn with frequency f1, f2, ., fn (or probability). The code for a sequence of characters will simply be the concatenation of the codes of these characters. If we use a binary code C of length li for ci, then the file has length (expected length) L(C) = i lifi. A code is called unambiguous if it satisfies the property that the code for a sequence of characters a1a2 .an does not match the code for another sequence b1b2 bm unless m = n and ai = bi. We want to choose an unambiguous code C such that L(C) is minimum. Such a code is called an optimal code. Fixed length code like ASCII are unambiguous but may not optimize L(C). Suppose there is a file containing 100,000 characters consisting of the characters a,b,c,d,e,f with frequency 45,13,12,16,9,5 thousands. For a fixed length code 3 bits (minimum of 3 bits required) leads to a size 300K bits. A variable length code 0,101,100,111,1101,1100 gives a size 224K bits. It can be proved that this is an optimal code. For a string s[1..n] a substring s[1..m] with m <= n is called a prefix. The code for a character is called a codeword. A code is called a prefix code if no codeword is a prefix of another codeword. A prefix code is always unambiguous. We can consider prefix codes only since it can be proved that there is always one optimal prefix code. A binary prefix code can be represented by a binary tree whose leafs are the characters to be coded. The code is given by a path to the leaf putting a 1 if we go to the right child and 0 if we go to the left child. In our example the code tree is 0 1 a 0 1 1 0 1 0 b c 0 5 f 1 d e Encoding for any binary code is easy. We have a list of characters and their codes and simply concatenate the codes for the characters of the file e.g. fabc will be coded as 11000101100 with the above code. For a binary prefix code, decoding is also simple using the code-tree. We descend the tree from the root according to the bits in the file. As soon as a leaf is reached, a character is detected and we start from the root again. We omit the proofs of the following theorems. Theorem 1 : In an optimal binary prefix code any non-leaf node of the codetree has two children (i.e. is a full binary tree). Such a tree has |C| - 1 non-leaf nodes. Proof : Suppose a non-leaf node has one child. Let it be replaced by the child. Then lengths of the codes of all the leaf nodes which are descendents of this node will be reduced by one which implies that the original code was not optimal. The fact that such a tree has |C| - 1 non-leaf nodes follows from induction. Theorem 2 : If c is character with the longest code in an optimal binary prefix code then its leaf node has a sibling giving a code of the same length. Proof : A leaf with maximum depth must have a sibling by Theorem 1. This sibling cannot be a non-leaf node because then there will be nodes at a higher depth. The theorem follows. Theorem 3: Let c and c be characters represented by sibling nodes of biggest depth in an optimal binary prefix code for the character set S. Combine characters c and c to c with frequency f(c) + f(c ) and let S be the new character set. The nodes for c and c are combined with the parent to give a new leaf node for c and let T be the resulting code tree for S . Then T is an optimal code tree for S . Proof : Let T1 be another code tree for S . Split c with frequency f(c)+f(c ) into a non-leaf node and two child nodes with frequency f(c) and f(c ). Let T1 be the resulting code tree for S. Since the codes of c, c are of length one more than that of the code of c , we get, L(T1) = L(T1 ) + f(c) + f(c ) and L(T) = L(T ) + f(c) + f(c ) Since T is optimal L(T1) >= L(T) and hence L(T1 ) >= L(T ) i.e. T is optimal for S . 6 | | | S, T | | | S T c f(c) + f(c ) c f(c) c f(c ) | | | S, T1 / c | | | S , T1 | c c | | f(c)+f(c ) If we know how to choose c and c correctly then we get an algorithm for the code by combining c and c , getting a new smaller character set and going on like this recursively. The greedy choice is to choose c and c to be the characters with the least frequencies. We can prove that the resulting greedy algorithm correctly gives an optimal binary prefix code. An iterative implementation maintains a priority queue Q with nodes of the tree with key value f(x). This is known as the Huffman algorithm : Huffman(S,f) // S a character set with frequency function f { n = |S| Q = buildheap(S,f) // O(n) For I = 1 to n 1 // O(n) times {z = new node; z->left = x = deletemin(Q); // O(logn) z->right = y = deletemin(Q); // O(logn) z->f = x->f + y->f insert(Q,z); // O(logn) } return deletemin(Q); } The complexity is O(nlogn) Theorem 4 : The Huffman algorithm is correct. Proof : It is sufficient to prove that there is an optimal code tree with two least frequent characters being sibling nodes of biggest depth. Let T be an optimal code 7 tree where a,b are sibling nodes of biggest depth. Let c, c be the nodes for the two least frequent characters. Exchange nodes a with node c and node b with c to obtain the code tree T. Then dT(c) = dT (a) = dT(c ) = dT (b) = d1 dT(a) = dT (c) = d2 ; dT(b) = dT (c ) = d3 f(a) >= f(c) and f(b) >= f(c ) Also d2 = dT (c) <= dT(c) = d1 and d3 = dT (c ) <= dT(c ) = d1. Hence L(T) L(T ) = f(c)dT(c) + f(c )dT(c ) + f(a)dT(a) + f(b)dT(b) - f(c)dT (c) - f(c )dT (c ) f(a)dT (a) - f(b)dT (b) = f(c)(dT(c) dT (c)) + f(a)(dT(a) dT (a)) + f(b)(dT(b) dT (b)) + f(c )(dT(c ) dT (c )) = f(c)(d1 d2) + f(a)(d2 d1) +f(b)(d3 d1) + f(c )(d1 d3) = - (f(a) f(c))(d1 d2) (f(b) f(c ))(d1 d3) <= 0 Therefore T is also optimal. T | | a b T c d c 8 | | d a b Example 5.3 : Determination of the Optimal code tree for C given a/45 b/13 c/12 d/16 a/45 14 e/9 b/13 f/5 c/12 d/16 e/9 f/5 25 14 f/5 c/12 e/9 b/13 a/45 25 30 c/12 14 f/5 d/16 b/13 a/45 d/16 e/9 55 30 25 a/45 14 d/16 c/12 b/13 e/9 f/5 above. 9 100 a/45 55 25 c/12 30 b/13 14 f/5 d/16 e/9 Knapsack problem : There are n items with values vi and weights wi for i = 1 to n. There is a knapsack which can hold a maximum weight W. W, vi and wi are positive integers. Which items should be put I;n the knapsack so that it has the highest value. Since an item can either be taken as a whole or not taken, this problem is known as the 0-1 Knapsack problem. In the fractional Knapsack problem fractions of the items can be taken. A greedy algorithm works in this case. We sort the items in descending order of vi/wi. We try to take as much as possible of the first item. If that gets exhausted we go to the next item and so on. Thus we get : Sort the items in descending order of vi/wi A=W;i=1 While (A > 0 and i <= n) {If (A >= wi) {put wi of item i; A = A - wi } Else {put A of item I; A = 0; i++ } The complexity is (nlogn). Exercise : Prove that the algorithm is correct. The greedy algorithm does not work for the 0-1 Knapsack problem. For example for 10 Items -> v w v/w 12 6 10 12 65 3 12 3 4 W=5 The greedy algorithm gives items 1,2 with value 16 whereas optimal is 2,3 with value 22 A dynamic programming algorithm for the 0-1 Knapsack problem is based on the following recursion : Let c(i,j) be the optimum value for items 1,2, .,i and knapsack weight j. Then c(0,j) = 0 c(i,0) = 0 c(i,j) = max(c(i-1,j), c(i-1,j-wi) + vi) if j >= wi else = c(i-1,j) We need c(n,W) Let s(i,j) be 1 if item i is selected in c(i,j), then the algorithm is c(i,0) = c(0,j) = 0; s(i,j) = 0 For i = 1 to n For j =1 to W If (j < wi) c(i,j) = c(i-1,j); Else if (c(i-1,j) > c(i-1,j-wi)+ vi) c(i,j) = c(i-1,j); Else {c(i,j) = c(i-1,j-wi) + vi ; s(i,j) = 1 } S= // Set of selected items i=n;j=W while (i>0 and j > 0) {if (s(i,j) = 1) {j = j wi ; insert i in S;} i--; } Complexity is (nW) For the above example : c(i,j) j 0 1 2 i| v w 0000 6 1 1066 10 2 2 0 6 10 12 3 3 0 6 10 s(i,j) j i| 0 1 2 3 3 4 5 0 6 16 16 0 6 16 18 0 6 16 22 0 1 2 3 4 5 0 0 0 0 0 1 0 0 0 1 1x 0 0 1 1 0 0 1 1 1 0 1 1 1x Exercise 5.4 : Prove that this is not a polynomial-time algorithm. 11 Matroids : In many optimization problems maximize f : M R, M is a set of subsets of some set X. A greedy algorithm builds up the optimal subset A by inserting elements one by one selected by a greedy choice (e.g. Activity selection problem). M is called hereditary if it is closed under taking subsets i.e. if I is in M and I is a subset of I then I is also in M . Elements of S are called independent. Let an element x of X have a nonnegative weight w(x). The combinatorial optimization problem for the hereditary system M is to find the maximum weight independent set I in M. In certain hereditary subsets the following greedy algorithm solves the combinatorial optimization problem associated with M and the weight function w. Greedy(M,w) { A = empty B=M While (B not empty) {select maximum weight element x in B Delete x from B If A U {x} is independent A = A U {x} } } A hereditary system of subsets M of X is called a matroid if M has the exchange property i.e. if I and I are independent and |I | > |I| then there is some x in I I such that I U {x} is independent. Theorem : The following are equivalent. 1) M is a matroid. 2) If A is a subset of X and I, I are maximal independent subsets of A then |I| = |I |. 3) The greedy algorithm solves the combinatorial optimization problem associated with M with respect to any nonnegative weight function on X. Proof : 1) implies 2) : Let I, I be maximal independent subsets of A where A is some subset of X. If possible let |I| < |I |. Then there is some x in I I such that I U {x} is independent which implies that I is not maximal a contradiction. 2) implies 3) : Suppose the greedy algorithm does not solve the problem for some set of nonnegative weights w(e) i.e. the greedy algorithm yields a maximal independent set I = {e1, e2, ei} whereas there is a set J = {e1 , e2 , ,ej } such that w(J) > w(I). We can assume that J is maximal since if it is not we can insert some more elements till J becomes maximal. By property 2) i = j. We assume the elements of I and J to be ordered such that w(e1) >= w(e2) >= . w(ei) and w(e1 ) >= w(e2 ) >= .. w(ei ). We shall show that w(ek) >= w(ek ) which will contradict w(J) > w(I) and prove our assertion. For k = 1 this is trivially true. Suppose this is true for k = 1,2, ,m-1 and not true for k = m. Consider the set A = { e in X : w(e) >= w(em )}. Now {e1, e2, .., em-1} is a 12 maximal independent subset of A, because if {e1, e2, , em-1, e} is independent and w(e) >= w(em ) > w(em) then the greedy algorithm would have chosen e instead of em as the next element of I. However {e1 ,e2 ,..em } is another independent subset of A of larger number of elements. This proves that w(ek) >= w(ek ) for k = 1,2, i. 3) implies 1) : Suppose the greedy algorithm solves the combinatorial optimization problem for M with any nonnegative weight w. Suppose there are two independent subsets I and I with |I | > |I|. Then there is a subset I of I such that |I | = |I| + 1. Obviously I is independent and if there is e in I I such that I U {e} is independent then e is in I I also. Therefore our assertion will be proved if we can prove that for independent subsets I and I with |I| = p and |I | = p+1, there is some e in I such that I U {e} is independent. Suppose there is no such e. We consider the following weights : w(e) = p + 2 if e is in I p + 1 if e is in I I 0 otherwise Then w(I ) >= (p+1)2 > p(p+2) > w(I). Thus w(I) is not the optimal weight. However for these weights the greedy algorithm at first pick up all the elements of I and thereafter elements of zero weight only. Hence it will give the optimal weight to be w(I) i.e. the greedy algorithm will not work for these weights. This proves our assertion. Theorem : The set of acyclic subgraphs of a graph is a matroid. Proof : Let A be any subgraph of a graph G. Let A have components C1, C2, . Ck . Let n1, n2, nk be to number of the vertices in these. Then any maximal acyclic subgraph of A will be the union of spanning trees of the components and hence will have n1 + n2 + .+ nk - k edges. Hence by the above theorem the set of acyclic subgraphs of a graph is a matroid. For the MST problem we can take the new weight w = w0 w, where w0 > max(w). Then the MST problem is equivalent to the maximum weight independent set problem for the matroid of acyclic subgraphs. A Task-Scheduling problem : Given 1) a set of unit-time tasks S={a1,a2, ,an} scheduled to start at t=0 2) a set of integer deadlines d1,d2, dn such that 1<= di <=n 3) a set of penalties w1,w2, ..wn such that if ai is not finished by di, there is a penalty wi. Find a schedule for S that minimizes the total penalty. 13 We say that a task is early in a schedule if it finishes within the deadline otherwise we say that it is late. Given a schedule we can reschedule with early tasks coming first with increasing deadlines followed by late tasks in any order without changing the penalty. If a late task is followed by an early task they can be interchanged without changing the penalty. If the early jobs are not in increasing deadline then there must be ai, aj finishing at k and k+1 respectively with di>dj. Since aj is early dj > k+1 and hence di > k+1. Hence after interchanging ai is early. Since aj finishes earlier it remains early. Let a subset A of S be called independent if there is a schedule for A in which each task in A is early. Let the collection of independent subsets be M. Obviously M is hereditary. We shall prove that M is a matroid. Since minimizing the penalty for the late tasks is that same as maximizing the penalty for the early tasks, our problem will be to find the maximum weight independent set in this weighted matroid. For a set A of tasks let Nt(A) = number of tasks in A whose deadlines are <= t. N0(A) = 0 Lemma : For any set A of tasks the following are equivalent. 1) A is in M. 2) For t = 0 to n, Nt(A) <= t. 3) A can be scheduled with increasing deadlines such that no tasks are late. Proof : 1) implies 2) : If Nt(A) > t then this many tasks cannot be scheduled to end on or before t. 2) implies 3) : Since i-th largest deadline is at least i, it can be scheduled to end at i. 3) implies 1) Follows from the definition of M. Theorem : M is a matroid. Proof : Let A,B be in M with |A| < |B|. We have N0(A) = N0(B) = 0 and Nn(A) |A| < |B| < Nn(B) . Let k < n be such that Nj(A) < Nj(B) for j > k and Nk(B) <= Nk(A). Therefore Nk+1(B) > Nk+1(A) and therefore there is a task in B A with deadline k+1. Let A = AU{a}. For j <= k Nj(A ) = Nj(A) <= j since A is in M. Also for j >k Nj(A ) = Nj(A) + 1 <= Nj(B) <= j since B is in M. Therefore A is in M which concludes the proof. Because of this the greedy algorithm will work. The checking of the independence of a subset A can be done as follows : Initialize E(1..n) = 0 For a in A E(da)++ ni = 0 For i = 1 to n {ni += e(i) if (ni > i) return not independent 14 } return independent This will be (n) and hence the greedy algorithm will be (n2) Example : ai 1 2 3 4 5 6 7 di 4 2 4 1 6 4 3 wi 20 60 70 30 10 50 40 Chosen at stage x21x534 Optimal schedule : a2, a7, a6, a3, a5, a1, a4 a1, a4 late Penalty 20 + 30 = 50 15

Formatting page ...

Formatting page ...

Formatting page ...

Formatting page ...

Formatting page ...

Formatting page ...

Formatting page ...

Formatting page ...

Formatting page ...

Formatting page ...

Formatting page ...

Formatting page ...

Formatting page ...

Formatting page ...

 

  Print intermediate debugging step

Show debugging info


 

 


© 2010 - 2026 ResPaper. Terms of ServiceContact Us Advertise with us

 

suan chat