算法基础笔记0x05 —— 查找与选择

Search Algorithm

  • 给定一个排好序的数组,搜索输入x
  • Idea Maintain an interval between [l, r] where x may be. Shrink the interval as quickly as possible.
    • Initially the interval is all of A, i.e. l = 1, r = |A|.
  • Use a probe point m = (l + r) / 2.
    • If x < A[m], x can’t lie in [m, r], so keep searching in [l, m-1].
    • If x > A[m], keep searching in [m+1, r].
    • If l = r but x ≠ A[m], then x is not in A.


  • 一个函数 f 是单峰(unimodal)的如果存在一个 x, 使得在 y≤x上, f 单调递减, y>x上, f 单调递增
    • 即 f 存在唯一最小值
    • 反之亦然
  • Our goal is to find x.
  • Maintain an interval [l,r] containing x.
  • Use two probe points $x_1$ and $x_2$.
    • If $f(x_1)<f(x_2)$, $x_2$ mush be in the increasing part of f, and x can’t be in $[x_2, r]$. So we keep searching in $[l,x_2]$.
    • If $f(x_1)>f(x_2)$, keep searching in $[x_1,r]$.
    • If $f(x_1)=f(x_2)$, search in $[x_1,x_2]$.
  • Set $x_1=l+\frac{r-1}{3}$, $x_2=r-\frac{r-l}{3}$.
  • Then $[l,r]$ decreases by at least 1/3 each iteration.
  • So we can find x in $O(log(R-L))$ iterations, where $[L,R]$ is an initial interval containing x.

Selection

  • Given list A of n numbers, and i ≤ n, select(A,i) finds i’th largest number in A.
    • Assume numbers all distinct, for simplicity.
  • Select generalizes median.
    • median of A = Select(A, n/2).
  • We show how to do select, and hence median, in O(n) time.
  • 将A分为小的部分,递归选出各个小部分的中位数,最终再在各个中位数中选出中位数。

Algorithm

  • 假设n有i项,我们想要第i大的一项,为了简易举例,我们假设n整除5
  • Divide A to n/5 parts of 5 elements each.
  • For each part, find its median.
  • Let B is the set of n/5 medians, recursively find median in B.
    • Select(B, n/10)????

  • Partition A using x as pivot.(like quicksort)
  • Say there are k values ≤ x in A, and n-k values > x.
  • If i = k, return x.
    • We want the i’th largest item in A, which is x.
  • If i < k, return Select(L, i).
    • Ex If i=5 and k=10, then 5th largest item in A is 5 th largest item in L.
  • Else i > k, return Select(R, i-k).
    • Ex if i=15 and k=10, then 15th largest item in A is 5 th largest item in R.

Analysis

  • Let S(n) be an upper bound on select’s running time given a list of size n.
  • S(n) = S(n/5) + S(u) + O(n)
    • u is the size of whichever partition we recurse on.
    • O(n) term comes from two parts.
      • First, for each of the n/5 groups of 5 numbers, find their median in constant time ⇒ O(n) time overall.
      • Next, partition array in O(n) time.
    • S(n/5) to find the median of the n/5 group medians.
    • The key to ensuring select runs fast is ensuring u is small.
    • We will show u ≤ 7n/10. (i.e. length of left part L and right part R ≤ 7n/10)

  • Groups of 5 elements are shown in columns.
  • Medians are shown in white.
  • Elements less than x are shown to in the shadow.
  • There are n/5*1/2=n/10 medians less than x.
  • For each such median, there are two more values from the group less than the median.
  • There are 3 values from each group < x.
  • So there are at least 3n/10 values < x. I.e. |L| ≥ 3n/10. Similarly, |R| ≥ 3n/10.
  • 分为n/5个部分,其中右边的一半求出的中值小于最终中值,而在这一半中有3个元素小于最终中值。
  • We have S(n) ≤ S(n/5) + S(7n/10) + O(n).
    • Let S(n) = cn, O(n) = bn.
    • cn = c(n/5) + c*7n/10 + bn, we can get c=10b.
    • Thus, S(n) ≤ 10bn = O(n) for some constant b.

Used in Quicksort

  • 因为Select算法时间上限为O(n),我们可以用于quicksort(nlogn)中选取pivot的步骤
  • 然而Select运算时间中省去的常数b相较于随机选取pivot的算法时间常数(1.39)太过庞大
  • 所以大O分析省略了常数,有时隐藏了重要信息

Randomized Select

  • Just as quicksort runs fast in practice by selecting random pivots, we can make select run faster by selecting random pivots.
    • The runtime is still O(n), but with a smaller constant factor.
  • Pick a random pivot and partition the array.
    • Again, use the quicksort partition method. Runs in O(n) time.
  • Recurse on one of the parts that has the target value.
    • Notice the difference with quicksort, which recurses on both parts.

  • We will select i’th largest value in A.
  • If A has one element, return it.
  • Otherwise, choose a random index q and partition array wrt x=A[q].
  • Suppose there are k-1 values < x, and n-k values > x.
  • If i=k, then x is i’th element of A.
  • Else if i<k, recursively select i’th element in left side.
  • Else i>k, recursively select i-k’th element in right side.

Pseudocode

Analysis

  • S(n)=S(u) + cn.
    • u is the size of the side we recurse on.
    • cn time to partition the array wrt random pivot.
  • Assume u=n/2

S(n) = cn + S(n/2)
= cn + cn/2 + S(n/4)
= cn + cn/2 + cn/4 + S(n/8)
= …
= cn + cn/2 + cn/4 + … + c
= cn (1+1/2+1/4+1/8+… + 1/n)
≤ 2cn.

Leave a Reply

Your email address will not be published.