算法基础笔记0x04 —— 堆排序

Binary Tree

  • 二叉树包含0\1\2个节点。
  • 一棵二叉树有一个根节点(root)。
  • 无子节点的节点为叶(leaves)。
  • 其他节点为中间节点(internal nodes)。
  • 一个点的子树为这个点和点下面的全部点。
  • 每个点到根节点有唯一的路径。
  • 一个点的深度(depth)是路径长度。
  • 一棵树的高度(height)是所有点中最大深度。
  • In a complete binary tree
    • All layers, except possibly the last, are full.
    • Nodes in the bottom layer are as far left as possible.

Complete Binary Trees

  • Suppose a complete binary tree has height h.
  • There are $2^k$ nodes at level k < h.
  • At level h, there are at most $2^h$ nodes.
  • The total number of nodes is
    • At least $1+2+4+…+2^{h-1} +1 = 2^h$.
    • At most $1+2+4+…+2^{h-1} +2^h = 2^{h+1} -1$.
  • Lemma Height = h ⇔ # nodes ∈ [$2^h$, $2^{h+1} -1$].
  • Lemma # nodes = n ⇔ height = $\lfloor log_2 n\rfloor$.
  • Even a short complete binary tree contains many nodes!

Heaps

  • A (max) heap is a complete binary tree where
    • Each node contains a value.
    • The value of any node is larger than the value of any other node in its subtree.
    • In a min heap, value of any node is smaller than value of any other node in subtree.
  • A heap with n nodes has height $\lfloor log_2 n\rfloor$.

Heapify

  • When there is a wrong place in heap, heapify operation restores the heap property everywhere, i.e. produces a correct heap.
  • Take the broken node, and swap it with the larger of its two children.
    • If there’s only one child, swap with it.
    • If the node at its new position is still broken, repeat the swapping process.
    • Until the node becomes a leaf.
  • Lemma If a heap is broken at only one node, then after heapifying the node, the heap property holds everywhere.

Build A Heap

  • Label nodes of a complete binary tree: Left to right, from top to bottom.


Now convert an array of size n to a heap with n nodes.

  • First map the array to a complete tree of size n.
    • A[i] goes to node with label i.
    • However, the tree might not be a heap.
    • Note only internal (non-leaf) nodes can violate heap property.
  • Apply heapify to all internal nodes.
    • Heapify in reverse label order.
    • Right to left, bottom up.

Heapsort

  • Given an array to sort, first build a heap from the array.
  • Remove the root node.
  • Place the last, i.e. bottom-most and rightmost node, x, in the root position.
  • Heapify x.
  • Repeat till heap is empty.


  • Lemma 3 Given a heap of height h and a node at depth d, applying heapify to the node takes O(h-d) time.
  • Lemma 4 Building a heap of size n takes O(n) time.
  • Proof To build the heap, we first heapify all the depth h-1 nodes, then h-2 nodes, etc.
    • There are $2^{h-i}$ nodes at depth h-i, and each such node take O(i) time to heapify.
    • So the time to heapify all of them is $O(\sum_{i=1}^h i2^{h-i})=O(2^{h+1}-h-2)$.
    • Since h=O(logn), it takes O(n) time to build the whole heap.

Complexity

Heapsort

  • Convert array to heap.
    • Call heapify n times.
  • Repeatedly extract root.
    • Call heapify n times.
  • Thm Heapsort sorts n numbers in O(nlong) time.
  • Proof Converting array to heap takes O(n) time by previous Lemma 4.
    • Extracting root and reheapifying takes O(h) = O(logn) time by Lemma 3.
    • Reheapify n times in total. So overall time is O(n) + O(nlogn) = O(nlogn).

Comparison

  • Quicksort generally a little faster than mergesort, and uses less memory.
  • Heapsort is generally slower than others on processors due to cache effects.
    • Cache is a small bit of on-chip memory.
    • Cache is up to 100X faster to access than main memory.
    • Most memory accesses can be served from cache if algorithm reads data in a regular (e.g. sequential) way.
    • Heap operations access memory in a random way, causing expensive cache misses.
    • Mergesort and quicksort, due to divide and conquer structure, have good locality of reference, i.e. many cache hits.
  • Mergesort and quicksort can be parallelized. It’s not clear how to parallelize heapsort.
  • However, heaps have many other uses, e.g. priority queues for shortest paths and MSTs.

Leave a Reply

Your email address will not be published.