# 算法基础笔记0x05 —— 查找与选择

## Search Algorithm

• 给定一个排好序的数组，搜索输入x
• Idea Maintain an interval between [l, r] where x may be. Shrink the interval as quickly as possible.
• Initially the interval is all of A, i.e. l = 1, r = |A|.
• Use a probe point m = (l + r) / 2.
• If x < A[m], x can’t lie in [m, r], so keep searching in [l, m-1].
• If x > A[m], keep searching in [m+1, r].
• If l = r but x ≠ A[m], then x is not in A.

• 一个函数 f 是单峰(unimodal)的如果存在一个 x, 使得在 y≤x上, f 单调递减, y>x上, f 单调递增
• 即 f 存在唯一最小值
• 反之亦然
• Our goal is to find x.
• Maintain an interval [l,r] containing x.
• Use two probe points $x_1$ and $x_2$.
• If $f(x_1)<f(x_2)$, $x_2$ mush be in the increasing part of f, and x can’t be in $[x_2, r]$. So we keep searching in $[l,x_2]$.
• If $f(x_1)>f(x_2)$, keep searching in $[x_1,r]$.
• If $f(x_1)=f(x_2)$, search in $[x_1,x_2]$.
• Set $x_1=l+\frac{r-1}{3}$, $x_2=r-\frac{r-l}{3}$.
• Then $[l,r]$ decreases by at least 1/3 each iteration.
• So we can find x in $O(log(R-L))$ iterations, where $[L,R]$ is an initial interval containing x.

## Selection

• Given list A of n numbers, and i ≤ n, select(A,i) finds i’th largest number in A.
• Assume numbers all distinct, for simplicity.
• Select generalizes median.
• median of A = Select(A, n/2).
• We show how to do select, and hence median, in O(n) time.
• 将A分为小的部分，递归选出各个小部分的中位数，最终再在各个中位数中选出中位数。

### Algorithm

• 假设n有i项，我们想要第i大的一项，为了简易举例，我们假设n整除5
• Divide A to n/5 parts of 5 elements each.
• For each part, find its median.
• Let B is the set of n/5 medians, recursively find median in B.
• Select(B, n/10)????

• Partition A using x as pivot.(like quicksort)
• Say there are k values ≤ x in A, and n-k values > x.
• If i = k, return x.
• We want the i’th largest item in A, which is x.
• If i < k, return Select(L, i).
• Ex If i=5 and k=10, then 5th largest item in A is 5 th largest item in L.
• Else i > k, return Select(R, i-k).
• Ex if i=15 and k=10, then 15th largest item in A is 5 th largest item in R.

### Analysis

• Let S(n) be an upper bound on select’s running time given a list of size n.
• S(n) = S(n/5) + S(u) + O(n)
• u is the size of whichever partition we recurse on.
• O(n) term comes from two parts.
• First, for each of the n/5 groups of 5 numbers, find their median in constant time ⇒ O(n) time overall.
• Next, partition array in O(n) time.
• S(n/5) to find the median of the n/5 group medians.
• The key to ensuring select runs fast is ensuring u is small.
• We will show u ≤ 7n/10. (i.e. length of left part L and right part R ≤ 7n/10)

• Groups of 5 elements are shown in columns.
• Medians are shown in white.
• Elements less than x are shown to in the shadow.
• There are n/5*1/2=n/10 medians less than x.
• For each such median, there are two more values from the group less than the median.
• There are 3 values from each group < x.
• So there are at least 3n/10 values < x. I.e. |L| ≥ 3n/10. Similarly, |R| ≥ 3n/10.
• 分为n/5个部分，其中右边的一半求出的中值小于最终中值，而在这一半中有3个元素小于最终中值。
• We have S(n) ≤ S(n/5) + S(7n/10) + O(n).
• Let S(n) = cn, O(n) = bn.
• cn = c(n/5) + c*7n/10 + bn, we can get c=10b.
• Thus, S(n) ≤ 10bn = O(n) for some constant b.

Used in Quicksort

• 因为Select算法时间上限为O(n)，我们可以用于quicksort(nlogn)中选取pivot的步骤
• 然而Select运算时间中省去的常数b相较于随机选取pivot的算法时间常数(1.39)太过庞大
• 所以大O分析省略了常数，有时隐藏了重要信息

## Randomized Select

• Just as quicksort runs fast in practice by selecting random pivots, we can make select run faster by selecting random pivots.
• The runtime is still O(n), but with a smaller constant factor.
• Pick a random pivot and partition the array.
• Again, use the quicksort partition method. Runs in O(n) time.
• Recurse on one of the parts that has the target value.
• Notice the difference with quicksort, which recurses on both parts.

• We will select i’th largest value in A.
• If A has one element, return it.
• Otherwise, choose a random index q and partition array wrt x=A[q].
• Suppose there are k-1 values < x, and n-k values > x.
• If i=k, then x is i’th element of A.
• Else if i<k, recursively select i’th element in left side.
• Else i>k, recursively select i-k’th element in right side.

### Analysis

• S(n)=S(u) + cn.
• u is the size of the side we recurse on.
• cn time to partition the array wrt random pivot.
• Assume u=n/2

S(n) = cn + S(n/2)
= cn + cn/2 + S(n/4)
= cn + cn/2 + cn/4 + S(n/8)
= …
= cn + cn/2 + cn/4 + … + c
= cn (1+1/2+1/4+1/8+… + 1/n)
≤ 2cn.