Dictionary also called dynamic set, which support the following operations.

- find(k) returns 1 if k is in D, 0 otherwise.
- insert(k) puts k in D.
- delete(k) removes k from D.
- pred(k) / succ(k) return the predecessor / successor to k in D, resp.
- min / max return the minimum / maximum value in D, resp.

Ways to implement a dictionary.

Data structure | Time complexity | Comments |
---|---|---|

Linked list | O(n) | n: # of items, slow |

Hash table | O(1) expected for find, insert, delete. O(n) for pred / succ and min / max. | Randomized algorithm. O(n) in worst case. |

Balanced search tree | O(logn) | Many variant, AVL, red-black, B-tree |

Skip lists, splay trees, etc. | O(logn) | Additional functionalities, like parallelizable, faster to access recent items, etc. |

## Binary Search Tree

- In a binary search tree, each node x has at most two children, the left and right child, denoted x.l and x.r.
- The node also stores a key, denoted x.key.
- BST Property For any node x, keys in left subtree of x are all less than x.key, and keys in right subtree are all bigger than x.key.

### Basic Operations

**Find Operation**

- To find whether k is in the tree, start at the root and move downwards.
- At node x, compare k’=x.key and k.
- If k=k’, return node x.
- If k<k’, then if k is in tree, it must be left of x. So go to x’s left child.
- Else k>k’, and go to x’s right child.

- Once reach a leaf with key k’, if k’ = k, then we found the key.
- If k≠k’, k is not in the tree, so output FALSE.

**Insertion Operation**

- To insert key k into tree, first do find(k) to get to a leaf.
- If find returns a node, then k already in D, so stop.
- Otherwise, if find returns node x.
- Let k’=x.key.
- If k<k’, make a left child for x, and set its key to k.
- Else make right child for x, and set its right key to k.

**Find Successors**

- The successor to a key k is the smallest key in D larger than k.
- Ex succ(4) = 6, succ(10) = 11.

- Assume k is in D. To find k’s successor, first find node x with key k.
- If x has no right child, then
- If x is a left child, x’s parent is its successor.
- Ex succ(3) = 4

- If x is a right child, keep following parent pointers until an ancestor has a bigger key.
- Ex succ(9) = 10

- Otherwise x has a right child.
- Go to the right child, then keep following left children till reach a leaf.
- Ex succ(10) = 11.

- Go to the right child, then keep following left children till reach a leaf.

**Deletion Operaiton**

- First, do a find(k) to get to a node x.
- If find(k) failed, stop.
- Otherwise, consider whether x has 0, 1 or 2 children.
- 0 children Just remove x.
- 1 child Remove x, connect x’s child to x’s parent.
- 2 children First, find the successor k’ of k.
- Since x has a right child, we’re in case 3 of the successor operation.
- k’ has no left child.

- Consider whether k’ is right child of k.
- Case 1 If k’ is right child of k, then replace k by k’.
- Case 2 Otherwise, replace k by k’, and replace k’ by its right child.

- 若删除元素有两个元素，找到删除元素的下一个元素(successor)并用其将删除元素替换

### Balance and Complexity

- It’s too costly to make all nodes completely balanced.
- We settle for nearly balanced trees, where for all nodes x: |ht(x.r)-ht(x.l)| ≤ 1, where ht(z) is height of a node z.
- For each node x, let bal(x) = ht(x.r) – ht(x.l).

### Rotations – Fix Imbalance

If a node is imbalanced, then rotating at the node can improve its balance.

Two types, left and right rotation.

## AVL Tree

- AVL tree is a BST where the balance at all nodes is -1, 0 or 1.
- Call this the AVL invariant.

### Operatioins

**Insertion operation**

- After insertion, if all nodes have balance 0, 1 or -1, done.
- Otherwise, find the lowest ±2 balance node x on the path from the inserted node to the root.
- Lemma After the rotation(s), all nodes outside of x’s subtree also have balance 1, -1 or 0.

**Deletion operation**

- After deletion, again find lowest ±2 balance node x, if any.
- Lemma After the rotation(s), ht(x) decreases by at most 1.

### Complexity

- Thm An AVL tree with n nodes has O(logn) height.
- Proof Let s(h) be the minimum number of nodes in an AVL tree with height h.
- s(1)=1, s(2)=2.
- For h≥3, the root of the AVL tree has one subtree with height h-1, and another subtree with height ≥ h-2, by the AVL invariant.
- Both subtrees are AVL trees, so s(h) ≥ 1 + s(h-1) + s(h-2).
- Also, s(h-1) ≥ s(h-2).
- So s(h) ≥ $2^{h/2}$.
- So if there are n nodes in the AVL tree, n ≥ $2^{h/2}$, and so h=O(logn)

- Finding, inserting and deleting keys from an AVL tree with n nodes take O(logn) time.

**Other balanced trees**

- RB trees also use rotations to maintain approximate balance, but have looser balance conditions.
- Advantage of RB trees is that they require only O(1) rotations for insertion and deletions.
- AVL trees require O(1) rotations for insertions, but O(logn) rotations for deletions.

- However, AVL trees have tighter bounds on height.
- AVL tree with n nodes has height ≤ $1.44log_2{n}$.
- RB tree with n nodes has height ≤ $2log_2{n}$.
- So AVL trees are faster for search intensive workloads, whereas RB trees are better for more dynamic BSTs.