In this post, we will talk about a special kind of tree - the binary search tree.
This post covers the fundamentals. The purpose is to build the fundamentals for the really interesting parts, which will be covered during the next posts: Enhancing program execution speed with help of the
A little bit of theory
We will not to go to deep here, but we need to know some basics in order to understand the rest of the post.
What is a binary tree?
A binary tree is a structure where each node can have up to two children (left child and right child), also called nodes. A binary tree has some very convenient characteristics as we will see soon. Let's look at the following example:
In total there are 11 nodes. For each node, the left children are smaller than the node, the right children are larger. This is an example for a sorted tree. It is also balanced, because all layers are complete except for the lowest one.
A binary tree with
n layers can store up to
(2^n) -1 entries:
Two layers: 1+2 = 3 nodes
Three layers: 1+2+4 = 7 nodes
Four layers: 1+2+4+8 = 15 nodes
Why should I care about binary trees?
Binary trees enable extremely efficient searching processes. Say we want to look up node 14. We start at the root (value 50):
14 < 50 --> go to left
14 < 17 --> go left
14 > 12 --> go right --> found it!
This means that if the tree is balanced and sorted, every node can be reached within max.
log(n) (binary) steps. Imagine you want to reach a specific items in a list of length
n. If you're unlucky, your item is the very last one, and you need to do
This means: If you have 1 Million entries to search, in worst case, it will take you 20 steps in a binary tree, but 1 Mio. steps in a normal list. This should be enough for a motivation, right? Let's start!
Now let's plant a PicoLisp binary tree!
Simple index trees
The function that inserts a single item into a binary search tree is called
idx (for "index"). Let's try to recreate the tree from the example (only the first 3 layers to keep it short). Start the REPL by typing
pil + in the console and try:
: (off Tree) # declare empty tree -> NIL : (for X (50 17 12 23 72 54 76) (idx 'Tree X T)) # loop through list and add items -> NIL
We have two possibilities to check our
Tree: The default prints a list which follows the (recursive) syntax:
(root (left-child) right-child). A more graphical print-out is possible using the function
view with the
T flag for binary trees.
: Tree -> (50 (17 (12) 23) 72 (54) 76) : (view Tree T) 76 72 54 50 23 17 12
This corresponds exactly to our example above. Good! Let's check its depth using the
depth returns two values: the maximum depth of the longest branch, and the average depth of all nodes.
: (depth Tree) -> (4 . 3) # max. depth: 4, average depth: 3
Worst case scenario: Sorted lists
In our previous example, we received a perfectly balanced tree, which means that all layers were "filled" up before the next layer was started. However, this cannot be guaranteed as we will see in the next example.
Let's enter a sorted list.
: (for X (1 2 3 4) (idx 'Tree X T)) -> NIL : (view Tree T) 4 3 2 1 : (depth Tree) -> (4 . 3) # depth: 4, average: 3
Oh no, our tree almost looks like a list! We lost our favorite binary tree feature, the logarithmic search.
idx function works best with random values. Otherwise we need to improve the structure by balancing the tree during the insertion. This can be done using the
balanced automatically creates balanced trees out of a sorted input list. If the values are not sorted yet, we can use
sort function before we hand them over.
: (balance 'Tree (sort ( 1 7 4 3 5 6 2 ))) : (view Tree T) 7 6 5 4 3 2 1
This looks much better, right? Now we can reach any of the 7 nodes within max. 2 steps.
With this, we have covered the fundamentals of binary trees in PicoLisp. In the next posts, we will solve the "Tree Traversal Task" from the Rosetta Code to get a feeling how binary tree search works.