# 算法设计与分析笔记0x07 —— 动态规划(1)

Divide and Conquer 小问题中独立解决

• Dynamic Programming 小问题是有重叠的
• 分隔时，父问题共享子问题

## Weighted Interval Scheduling

Notation. 将工作按结束时间排序 $f_1 \leq f_2 \leq … \leq f_n$.
Def. p(j) = largest index i < j such that job i is compatible with j.
Goal: 求最大化工作权重之和

OPT(j)的公式
OPT(j) = value of optimal solution to the problem consisting
of job requests 1, 2, …, j.

• Case 1: OPT selects job j.
• can’t use incompatible jobs {p(j) + 1, p(j) + 2, …, j – 1}
• must include optimal solution to problem consisting of remaining compatible jobs 1, 2, …, p(j)
• Case 2: OPT does not select job j.
• must include optimal solution to problem consisting of remaining compatible jobs 1, 2, …, j-1

### Memoization

Store results of each sub-problem in a cache; lookup as needed.

Memoized version of algorithm takes O(nlogn) time.

• Sort by finish time: O(n log n).
• Computing p(.) : O(n) after sorting by start time

M-Compute-Opt(j) : O(n)

• Each entry M[j] is computed only once
• The computation of M[j] invokes M-Compute-Opt twice

### Top-down vs. Bottom-up

Bottom-Up Version:

Top-down 有时可跳过一些不需要计算的OPT(j)
Bottom-up 省去递归额外的消耗

## Segmented Least Squares

• Given n points in the plane: $(x_1, y_1)$, $(x_2, y_2)$, …, $(x_n, y_n)$.
• Find a line y = ax + b that minimizes the sum of the squared error(和方差)
$$SSE=\sum_{i=1}^{n}(y_i -ax_i -b)^2$$
• Solution, min error is achieved when
• $$a=\frac{n\sum_i x_i y_i – \sum_i x_i \sum_i j_i}{n\sum_i x_{i}^{2} – (\sum_i x_i)^2}$$
• $$b=\frac{\sum_i j_i – a\sum_i x_i}{n}$$

• 用多条线段拟合, 平衡accuracy和parsimony
• 每条线段都有误差平方和E
• L为线段数量, c为常数, 控制二者平衡
• 找到一系列线段，最小化E+cL

OPT(j) = minimum cost for points $p_1$, $p_2$, …, $p_j$.
e(i, j) = minimum sum of squares for points $p_i$, $p_{i+1}$, …, $p_j$.

$O(n^2)$ pairs, and $O(n)$ to calculate SSE per pair. So the total time is $O(n^3)$.

## Knapsack Problem

Goal: 重量最轻(不大于限制值)，价值最大

Greedy Algorithm

• repeatedly add item with maximum value vi .
• repeatedly add item with maximum weight wi .
• repeatedly add item with maximum ratio vi / wi .
• Not so good.
• Greedy thief won’t get the most. 🙂

OPT(i,w), 只考虑前i个物品，在容量上限为w的情况下的最优解

• Case 1: OPT does not select item i.
• OPT selects best of { 1, 2, …, i-1 }
• Case 2: OPT selects item i.
• new weight limit = w – wi
• OPT selects best of { 1, 2, …, i–1 } using this new weight limit

Running time: $\Theta(nW)$

• exponential
• Pseudo-polynomial
• NP-complete — 目前未发现polynomial的算法

## RNA Secondary Structure

Goal: 给定碱基序列，找到配对数最大的二级结构
Second structure A set of pairs S = {(bi, bj)} that satisfy:

• [Watson-Crick.] S is a matching and each pair in S is a Watson-Crick complement: A-U, U-A, C-G, or G-C.
• [No sharp turns.] The ends of each pair are separated by at least 4 intervening bases. If (bi, bj)∈S, then i < j – 4.
• [Non-crossing.] If (bi, bj) and (bk, bl) are two pairs in S, then we cannot have i < k < j < l.

Sharp turn:

Crossing:

OPT(i,j) 第i个到第j个碱基最大配对数

• If i ≥ j – 4.
• OPT(i, j) = 0 by no-sharp turns condition.
• If i < j-4: take max of two cases
• Case 1. Base bj is not involved in a pair.
• OPT(i, j-1)
• Case 2. Base bj pairs with bt for some i x ≤ t < j – 4.
• Non-crossing constraint decouples resulting sub-problems
• $1 + max_t \{ OPT(i, t-1) + OPT(t+1, j-1) \}$

Running time: $O(n^3)$.

## Sequence Alignment

For two strings:

• ocurrance
• occurrence
Different alignment may result different mismatches

Edit distance

• Gap penalty $\delta$; mismatch penalty $\alpha_{pq}$.
• Cost = sum of gap and mismatch penalties.

Application

• Basis for Unix diff
• Speech recognition
• Computational biology

Goal: 给定两个字符串X=x1 x2 … xm, Y=y1 y2 … yn，找到使得cost最小的对齐方式

• 不能出现前后交叉

### Problem Structure

OPT(i,j) = min cost of strings ‘x1 x2 … xi’ and ‘y1 y2 … yj’

• Base Case:
• i=0, OPT(i,j)=j$\delta$
• j=0, OPT(i,j)=i$\delta$
• Case 1: OPT mastches xi-yj
• 只看 OPT(i-1,j-1)
• Case 2: OPT leaves xi unmatched
• gap penalty for xi and OPT(i-1,j)
• Case 3: OPT leaves yj unmatched
• gap penalty for yj and OPT(i,j-1)

### Algorithm

Analysis $\Theta(mn)$ time and space