# 算法基础笔记0x04 —— 堆排序

## Binary Tree

• 二叉树包含0\1\2个节点。
• 一棵二叉树有一个根节点(root)。
• 无子节点的节点为叶(leaves)。
• 其他节点为中间节点(internal nodes)。
• 一个点的子树为这个点和点下面的全部点。
• 每个点到根节点有唯一的路径。
• 一个点的深度(depth)是路径长度。
• 一棵树的高度(height)是所有点中最大深度。
• In a complete binary tree
• All layers, except possibly the last, are full.
• Nodes in the bottom layer are as far left as possible.

### Complete Binary Trees

• Suppose a complete binary tree has height h.
• There are $2^k$ nodes at level k < h.
• At level h, there are at most $2^h$ nodes.
• The total number of nodes is
• At least $1+2+4+…+2^{h-1} +1 = 2^h$.
• At most $1+2+4+…+2^{h-1} +2^h = 2^{h+1} -1$.
• Lemma Height = h ⇔ # nodes ∈ [$2^h$, $2^{h+1} -1$].
• Lemma # nodes = n ⇔ height = $\lfloor log_2 n\rfloor$.
• Even a short complete binary tree contains many nodes!

## Heaps

• A (max) heap is a complete binary tree where
• Each node contains a value.
• The value of any node is larger than the value of any other node in its subtree.
• In a min heap, value of any node is smaller than value of any other node in subtree.
• A heap with n nodes has height $\lfloor log_2 n\rfloor$.

### Heapify

• When there is a wrong place in heap, heapify operation restores the heap property everywhere, i.e. produces a correct heap.
• Take the broken node, and swap it with the larger of its two children.
• If there’s only one child, swap with it.
• If the node at its new position is still broken, repeat the swapping process.
• Until the node becomes a leaf.
• Lemma If a heap is broken at only one node, then after heapifying the node, the heap property holds everywhere.

### Build A Heap

• Label nodes of a complete binary tree: Left to right, from top to bottom.

Now convert an array of size n to a heap with n nodes.

• First map the array to a complete tree of size n.
• A[i] goes to node with label i.
• However, the tree might not be a heap.
• Note only internal (non-leaf) nodes can violate heap property.
• Apply heapify to all internal nodes.
• Heapify in reverse label order.
• Right to left, bottom up.

## Heapsort

• Given an array to sort, first build a heap from the array.
• Remove the root node.
• Place the last, i.e. bottom-most and rightmost node, x, in the root position.
• Heapify x.
• Repeat till heap is empty.

• Lemma 3 Given a heap of height h and a node at depth d, applying heapify to the node takes O(h-d) time.
• Lemma 4 Building a heap of size n takes O(n) time.
• Proof To build the heap, we first heapify all the depth h-1 nodes, then h-2 nodes, etc.
• There are $2^{h-i}$ nodes at depth h-i, and each such node take O(i) time to heapify.
• So the time to heapify all of them is $O(\sum_{i=1}^h i2^{h-i})=O(2^{h+1}-h-2)$.
• Since h=O(logn), it takes O(n) time to build the whole heap.

### Complexity

Heapsort

• Convert array to heap.
• Call heapify n times.
• Repeatedly extract root.
• Call heapify n times.
• Thm Heapsort sorts n numbers in O(nlong) time.
• Proof Converting array to heap takes O(n) time by previous Lemma 4.
• Extracting root and reheapifying takes O(h) = O(logn) time by Lemma 3.
• Reheapify n times in total. So overall time is O(n) + O(nlogn) = O(nlogn).

### Comparison

• Quicksort generally a little faster than mergesort, and uses less memory.
• Heapsort is generally slower than others on processors due to cache effects.
• Cache is a small bit of on-chip memory.
• Cache is up to 100X faster to access than main memory.
• Most memory accesses can be served from cache if algorithm reads data in a regular (e.g. sequential) way.
• Heap operations access memory in a random way, causing expensive cache misses.
• Mergesort and quicksort, due to divide and conquer structure, have good locality of reference, i.e. many cache hits.
• Mergesort and quicksort can be parallelized. It’s not clear how to parallelize heapsort.
• However, heaps have many other uses, e.g. priority queues for shortest paths and MSTs.