## Binary Tree

- 二叉树包含0\1\2个节点。
- 一棵二叉树有一个根节点(root)。
- 无子节点的节点为叶(leaves)。
- 其他节点为中间节点(internal nodes)。
- 一个点的子树为这个点和点下面的全部点。
- 每个点到根节点有唯一的路径。
- 一个点的深度(depth)是路径长度。
- 一棵树的高度(height)是所有点中最大深度。
- In a complete binary tree
- All layers, except possibly the last, are full.
- Nodes in the bottom layer are as far left as possible.

### Complete Binary Trees

- Suppose a complete binary tree has height h.
- There are $2^k$ nodes at level k < h.
- At level h, there are at most $2^h$ nodes.
- The total number of nodes is
- At least $1+2+4+…+2^{h-1} +1 = 2^h$.
- At most $1+2+4+…+2^{h-1} +2^h = 2^{h+1} -1$.

- Lemma Height = h ⇔ # nodes ∈ [$2^h$, $2^{h+1} -1$].
- Lemma # nodes = n ⇔ height = $\lfloor log_2 n\rfloor$.
- Even a short complete binary tree contains many nodes!

## Heaps

- A (max) heap is a complete binary tree where
- Each node contains a value.
- The value of any node is larger than the value of any other node in its subtree.
- In a min heap, value of any node is smaller than value of any other node in subtree.

- A heap with n nodes has height $\lfloor log_2 n\rfloor$.

### Heapify

- When there is a wrong place in heap, heapify operation restores the heap property everywhere, i.e. produces a correct heap.
- Take the broken node, and swap it with the larger of its two children.
- If there’s only one child, swap with it.
- If the node at its new position is still broken, repeat the swapping process.
- Until the node becomes a leaf.

- Lemma If a heap is broken at only one node, then after heapifying the node, the heap property holds everywhere.

### Build A Heap

- Label nodes of a complete binary tree: Left to right, from top to bottom.

**Now convert an array of size n to a heap with n nodes.**

- First map the array to a complete tree of size n.
- A[i] goes to node with label i.
- However, the tree might not be a heap.
- Note only internal (non-leaf) nodes can violate heap property.

- Apply heapify to all internal nodes.
- Heapify in reverse label order.
- Right to left, bottom up.

## Heapsort

- Given an array to sort, first build a heap from the array.
- Remove the root node.
- Place the last, i.e. bottom-most and rightmost node, x, in the root position.
- Heapify x.
- Repeat till heap is empty.

- Lemma 3 Given a heap of height h and a node at depth d, applying heapify to the node takes O(h-d) time.
- Lemma 4 Building a heap of size n takes O(n) time.
- Proof To build the heap, we first heapify all the depth h-1 nodes, then h-2 nodes, etc.
- There are $2^{h-i}$ nodes at depth h-i, and each such node take O(i) time to heapify.
- So the time to heapify all of them is $O(\sum_{i=1}^h i2^{h-i})=O(2^{h+1}-h-2)$.
- Since h=O(logn), it takes O(n) time to build the whole heap.

### Complexity

Heapsort

- Convert array to heap.
- Call heapify n times.

- Repeatedly extract root.
- Call heapify n times.

- Thm Heapsort sorts n numbers in O(nlong) time.
- Proof Converting array to heap takes O(n) time by previous Lemma 4.
- Extracting root and reheapifying takes O(h) = O(logn) time by Lemma 3.
- Reheapify n times in total. So overall time is O(n) + O(nlogn) = O(nlogn).

### Comparison

- Quicksort generally a little faster than mergesort, and uses less memory.
- Heapsort is generally slower than others on processors due to cache effects.
- Cache is a small bit of on-chip memory.
- Cache is up to 100X faster to access than main memory.
- Most memory accesses can be served from cache if algorithm reads data in a regular (e.g. sequential) way.
- Heap operations access memory in a random way, causing expensive cache misses.
- Mergesort and quicksort, due to divide and conquer structure, have good locality of reference, i.e. many cache hits.

- Mergesort and quicksort can be parallelized. It’s not clear how to parallelize heapsort.
- However, heaps have many other uses, e.g. priority queues for shortest paths and MSTs.