# 算法设计与分析笔记0x03 —— 贪心法(2)

## Optimal Off-line Caching

• Goal: 最小化覆盖数
• Farthest-in-future: 删除原cache中变量在读取序列中的最后一个

• Reduced schedule: 在读取元素时才插入元素

### Transformation Proof

• 任何unreduced schedule可变为reduced schedule, 并且eviction次数不会增加
• Pf (by induction on number of unreduced items)
• Suppose S brings d into the cache at time t, without a request.
• Let c be the item S evicts when it brings d into the cache.
• Case 1: d evicted at time t’, before next request for d.
• Case 2: d requested at time t’ before d is evicted.

### Lemma

Let S be a reduced schedule that makes the same schedule as $S_{FF}$ through the first j requests. Then there is a reduced schedule S’ that makes the same schedule as $S_{FF}$ through the first j+1 requests, and incurs no more evictions than S does.

• Consider (j+1)st request $d = d_{j+1}$.
• Since S and $S_{FF}$ have agreed up until now, they have the same cache contents before request j+1.
• Case 1: (d is already in the cache). S’ = S
• Case 2: (d is not in the cache and S and $S_{FF}$ evict the same element). S’ = S
• Case 3: (d is not in the cache; $S_{FF}$ evicts e; S evicts f $\neq$ e).
• begin construction of S’ from S by evicting e instead of f
• now S’ agrees with $S_{FF}$ on first j+1 requests
• From request j+2 onward, we make S’ the same as S, but this becomes impossible when e or f is involved
• Let j’ be the first time after j+1 that S and S’ take a different action(must involve e or f (or both)), and let g be item requested at time j’.
• Case 3a: g = e. Can’t happen with Farthest-In-Future since there must be a request for f before e.
• Case 3b: g $\neq$ e, f. S must evict e. Make S’ evict f; now S and S’ have the same cache.
• Case 3c: g = f. Element f can’t be in cache of S, so let e’ be the element that S evicts.
• if e’ = e, S’ accesses f from cache; now S and S’ have same cache
• if e’ $\neq$ e, S’ evicts e’ and brings e into the cache; now S and S’ have the same cache. S’ is no longer reduced, but can be transformed into a reduced schedule which
• a) agrees with S FF through step j+1
• b) has no more evictions than S

### “Optimal” Proof

FF is optimal eviction algorithm.

• Pf. (by induction on number of requests j)
• Base case (trivial):
• There exists an optimal reduced schedule S that makes the same schedule as S FF through the first 0 requests.
• Inductive step (implied by the lemma):
• If there exists an optimal reduced schedule S that agrees with S FF through the first j requests,
• then there exists an optimal reduced schedule S’ that agrees with S FF through the first j+1 requests

### On-line vs. Off-line

Offline: full sequence of requests is known a priori.
Online (reality): requests are not known in advance.

• LIFO. Evict page brought in most recently.
• LRU. Evict page whose most recent access was earliest.

## Minimum Spanning Tree

Given a connected graph G = (V, E) with real-valued edge weights $c_e$ , an Minimum Spanning Tree is a subset of the edges T $\subseteq$ E such that T is a spanning tree whose sum of edge weights is minimized.

Cayley’s formula. There are $n^{n-2}$ spanning trees of n vertices.
Assume all edges cost $c_e$ are distince.

• Cut property: Let S be any subset of nodes, and let e be the min cost edge with exactly one endpoint in S. Then the MST contains e.
• Suppose e does not belong to T*, and let’s see what happens.
• Adding e to T creates a cycle C in T.
• e is in a cycle C with exactly one endpoint in S $\Rightarrow$ there exists another edge f in C with exactly one endpoint in S.
• T’ = T* $\cup$ { e } – { f } is also a spanning tree.
• Since $c_e$ < $c_f$, cost(T’) < cost(T*).
• Cycle property: Let C be any cycle, and let f be the max cost edge belonging to C. Then the MST does not contain f.
• Suppose f belongs to T*, and let’s see what happens.
• Deleting f from T creates a cut S in T.
• f is in the cycle C with exactly one endpoint in S $\Rightarrow$ there exists another edge e in C with exactly one endpoint in S.
• T’ = T* $\cup$ { e } – { f } is also a spanning tree.
• Since $c_e$ < $c_f$, cost(T’) < cost(T*).

### Prim’s Algorithm

• Initialize S = any node.
• Apply cut property to S.
• Add min cost edge in cutset corresponding to S to T, and add one new explored node u to S.

Implementation Use a priority queue.

• Maintain set of explored nodes S.
• For each unexplored node v, maintain attachment cost a[v] = cost of cheapest edge from v to a node in S.
• $O(n^2)$ with an array; $O(mlogn)$ with a binary heap.

### Kruskal’s Algorithm

• Consider edges in ascending order of weight.
• Case 1: If adding e to T creates a cycle, discard e according to cycle property.
• Case 2: Otherwise, insert e = (u, v) into T according to cut property where S = set of nodes in u’s connected component.

Implementation Use the union-find data structure.

• Build set T of edges in the MST.
• Maintain set for each connected component.
• O(mlogm) for sorting and O(m ∝ (m, n)) for union-find.

There is another algorithm Reverse-Delete algorithm.

• Start with T = E. Consider edges in descending order of cost. Delete edge e from T unless doing so would disconnect T.

## Clustering

Given a set U of n objects labeled $p_1$ , …, $p_n$ , partition into
clusters s.t. objects in different clusters are far apart.

K-clustering Divide objects into k non-empty groups.

Distance function

• $d(p_i,p_j)=0$ iff $p_i=p_j$ (identity of indiscernibles)
• $d(p_i,p_j)\geq 0$ (nonnegativity)
• $d(p_i,p_j)=d(p_j,p_i)$ (symmetry)

Spacing Min distance between any pair of points in different clusters.

Clustering of maximum spacing Given an integer k, find a k-clustering of maximum spacing.

### Greedy Clustering Algorithm

• Create n clusters, one for each object.
• Find the closest pair of objects such that each object is in a different cluster; add an edge between them and merge the two clusters.
• Repeat n-k times until there are exactly k clusters.

Key observation. This procedure is precisely Kruskal’s algorithm(except we stop when there are k connected components).

Theorem Let C denote the clustering $C_1$, …, $C*_k$ formed by deleting the k-1 most expensive edges of a MST. C is a k-clustering of max spacing.
Pf. Let C denote some other clustering $C_1$, …, $C_k$.

• The spacing of C is the length d of the $(k-1)^{st}$ most expensive edge in MST.
• Let $p_i$, $p_j$ be in the same cluster in C, say $C*_r$, but different clusters in C, say $C_s$ and $C_t$.
• Some edge (p, q) on $p_i$-$p_j$ path in $C*_r$ spans two different clusters in C.
• All edges on $p_i$-$p_j$ path have length $\leq$ d* since Kruskal chose them.
• Spacing of C is $\leq$ d* since p and q are in different clusters.

• Kruskal’s algortihm 是重复寻找尽可能短的边直到成为一个整体。

## Coin Changing

• Goal: 用尽量少的coins找零
• 已知金额，已知面额
• Cashier’s algorithms
• At each iteration, add coin of the largest value that does not task us past the amount to be paid.

### Presudocode

• 贪心法是否最优取决与硬币面额
k $c_k$ All optimal solutions mush satisfy Max value of coins 1,2,…k-1 in any OPT
1 1 P <= 4
2 5 N <= 1 4
3 10 N+D <= 2 4+5=9
4 25 Q <= 3 20+4=24
5 100 no limit 75+24=99

## Proof Skills

• Greedy algorithm stays ahead. Show that after each step of the greedy algorithm, its solution is at least as good as any other algorithm’s.
• Structural. Discover a simple “structural” bound asserting that every possible solution must have a certain value. Then show that your algorithm always achieves this bound.
• Exchange argument. Gradually transform any solution to the one found by the greedy algorithm without hurting its quality.