算法设计与分析笔记0x03 —— 贪心法(2)

Optimal Off-line Caching

  • Goal: 最小化覆盖数
  • Farthest-in-future: 删除原cache中变量在读取序列中的最后一个

  • Reduced schedule: 在读取元素时才插入元素

Transformation Proof

  • 任何unreduced schedule可变为reduced schedule, 并且eviction次数不会增加
  • Pf (by induction on number of unreduced items)
    • Suppose S brings d into the cache at time t, without a request.
    • Let c be the item S evicts when it brings d into the cache.
    • Case 1: d evicted at time t’, before next request for d.
    • Case 2: d requested at time t’ before d is evicted.

Lemma

Let S be a reduced schedule that makes the same schedule as $S_{FF}$ through the first j requests. Then there is a reduced schedule S’ that makes the same schedule as $S_{FF}$ through the first j+1 requests, and incurs no more evictions than S does.

  • Consider (j+1)st request $d = d_{j+1}$.
  • Since S and $S_{FF}$ have agreed up until now, they have the same cache contents before request j+1.
  • Case 1: (d is already in the cache). S’ = S
  • Case 2: (d is not in the cache and S and $S_{FF}$ evict the same element). S’ = S
  • Case 3: (d is not in the cache; $S_{FF}$ evicts e; S evicts f $\neq$ e).
    • begin construction of S’ from S by evicting e instead of f
    • now S’ agrees with $S_{FF}$ on first j+1 requests
    • From request j+2 onward, we make S’ the same as S, but this becomes impossible when e or f is involved
    • Let j’ be the first time after j+1 that S and S’ take a different action(must involve e or f (or both)), and let g be item requested at time j’.
    • Case 3a: g = e. Can’t happen with Farthest-In-Future since there must be a request for f before e.
    • Case 3b: g $\neq$ e, f. S must evict e. Make S’ evict f; now S and S’ have the same cache.
    • Case 3c: g = f. Element f can’t be in cache of S, so let e’ be the element that S evicts.
      • if e’ = e, S’ accesses f from cache; now S and S’ have same cache
      • if e’ $\neq$ e, S’ evicts e’ and brings e into the cache; now S and S’ have the same cache. S’ is no longer reduced, but can be transformed into a reduced schedule which
        • a) agrees with S FF through step j+1
        • b) has no more evictions than S

“Optimal” Proof

FF is optimal eviction algorithm.

  • Pf. (by induction on number of requests j)
    • Base case (trivial):
      • There exists an optimal reduced schedule S that makes the same schedule as S FF through the first 0 requests.
    • Inductive step (implied by the lemma):
      • If there exists an optimal reduced schedule S that agrees with S FF through the first j requests,
      • then there exists an optimal reduced schedule S’ that agrees with S FF through the first j+1 requests

On-line vs. Off-line

Offline: full sequence of requests is known a priori.
Online (reality): requests are not known in advance.

  • LIFO. Evict page brought in most recently.
  • LRU. Evict page whose most recent access was earliest.

Minimum Spanning Tree

Given a connected graph G = (V, E) with real-valued edge weights $c_e$ , an Minimum Spanning Tree is a subset of the edges T $\subseteq$ E such that T is a spanning tree whose sum of edge weights is minimized.

Cayley’s formula. There are $n^{n-2}$ spanning trees of n vertices.
Assume all edges cost $c_e$ are distince.

  • Cut property: Let S be any subset of nodes, and let e be the min cost edge with exactly one endpoint in S. Then the MST contains e.
    • Suppose e does not belong to T*, and let’s see what happens.
    • Adding e to T creates a cycle C in T.
    • e is in a cycle C with exactly one endpoint in S $\Rightarrow$ there exists another edge f in C with exactly one endpoint in S.
    • T’ = T* $\cup$ { e } – { f } is also a spanning tree.
    • Since $c_e$ < $c_f$, cost(T’) < cost(T*).
    • This is a contradiction.
  • Cycle property: Let C be any cycle, and let f be the max cost edge belonging to C. Then the MST does not contain f.
    • Suppose f belongs to T*, and let’s see what happens.
    • Deleting f from T creates a cut S in T.
    • f is in the cycle C with exactly one endpoint in S $\Rightarrow$ there exists another edge e in C with exactly one endpoint in S.
    • T’ = T* $\cup$ { e } – { f } is also a spanning tree.
    • Since $c_e$ < $c_f$, cost(T’) < cost(T*).
    • This is a contradiction.

Prim’s Algorithm

  • Initialize S = any node.
  • Apply cut property to S.
  • Add min cost edge in cutset corresponding to S to T, and add one new explored node u to S.

Implementation Use a priority queue.

  • Maintain set of explored nodes S.
  • For each unexplored node v, maintain attachment cost a[v] = cost of cheapest edge from v to a node in S.
  • $O(n^2)$ with an array; $O(mlogn)$ with a binary heap.

 

Kruskal’s Algorithm

  • Consider edges in ascending order of weight.
  • Case 1: If adding e to T creates a cycle, discard e according to cycle property.
  • Case 2: Otherwise, insert e = (u, v) into T according to cut property where S = set of nodes in u’s connected component.

Implementation Use the union-find data structure.

  • Build set T of edges in the MST.
  • Maintain set for each connected component.
  • O(mlogm) for sorting and O(m ∝ (m, n)) for union-find.

 

There is another algorithm Reverse-Delete algorithm.

  • Start with T = E. Consider edges in descending order of cost. Delete edge e from T unless doing so would disconnect T.

Clustering

Given a set U of n objects labeled $p_1$ , …, $p_n$ , partition into
clusters s.t. objects in different clusters are far apart.

K-clustering Divide objects into k non-empty groups.

Distance function

  • $d(p_i,p_j)=0$ iff $p_i=p_j$ (identity of indiscernibles)
  • $d(p_i,p_j)\geq 0$ (nonnegativity)
  • $d(p_i,p_j)=d(p_j,p_i)$ (symmetry)

Spacing Min distance between any pair of points in different clusters.

Clustering of maximum spacing Given an integer k, find a k-clustering of maximum spacing.

Greedy Clustering Algorithm

Single-link k-clustering algorithm

  • Create n clusters, one for each object.
  • Find the closest pair of objects such that each object is in a different cluster; add an edge between them and merge the two clusters.
  • Repeat n-k times until there are exactly k clusters.

Key observation. This procedure is precisely Kruskal’s algorithm(except we stop when there are k connected components).

Theorem Let C denote the clustering $C_1$, …, $C*_k$ formed by deleting the k-1 most expensive edges of a MST. C is a k-clustering of max spacing.
Pf. Let C denote some other clustering $C_1$, …, $C_k$.

  • The spacing of C is the length d of the $(k-1)^{st}$ most expensive edge in MST.
  • Let $p_i$, $p_j$ be in the same cluster in C, say $C*_r$, but different clusters in C, say $C_s$ and $C_t$.
  • Some edge (p, q) on $p_i$-$p_j$ path in $C*_r$ spans two different clusters in C.
  • All edges on $p_i$-$p_j$ path have length $\leq$ d* since Kruskal chose them.
  • Spacing of C is $\leq$ d* since p and q are in different clusters.

  • Kruskal’s algortihm 是重复寻找尽可能短的边直到成为一个整体。
  • Single-link k-clustering algorithm 是重复寻找尽可能短的边直到成为k个整体(clusters)。

Coin Changing

  • Goal: 用尽量少的coins找零
    • 已知金额,已知面额
  • Cashier’s algorithms
    • At each iteration, add coin of the largest value that does not task us past the amount to be paid.

Presudocode

 

  • 贪心法是否最优取决与硬币面额
k $c_k$ All optimal solutions mush satisfy Max value of coins 1,2,…k-1 in any OPT
1 1 P <= 4
2 5 N <= 1 4
3 10 N+D <= 2 4+5=9
4 25 Q <= 3 20+4=24
5 100 no limit 75+24=99

Proof Skills

  • Greedy algorithm stays ahead. Show that after each step of the greedy algorithm, its solution is at least as good as any other algorithm’s.
  • Structural. Discover a simple “structural” bound asserting that every possible solution must have a certain value. Then show that your algorithm always achieves this bound.
  • Exchange argument. Gradually transform any solution to the one found by the greedy algorithm without hurting its quality.

Leave a Reply

Your email address will not be published.