Concepts and Data Types of PicoLisp
Form follows Function
6 min read
We mentioned in previous posts that PicoLisp is striving for "simplicity". But what does that mean, actually?
The Simple Way of PicoLisp
If you have studied a higher programming language like Python or Java, you might have come across many data types: strings, arrays, integers, float, boolean... However, you probably never thought about how these data types are represented in the internal memory. How does the computer "know" what is the value of "A"? What actually is a "function"?
In most modern programming languages, this is intransparent for the programmer, who shouldn't know and also shouldn't care. This is a big difference to the PicoLisp philosophy and one of the main reasons why it is so interesting to learn:
In PicoLisp, you know everything about the internal machinery.
And the reason for this is because the internal implementation is drastically simplified to the core. The high-level constructs of the programming language directly map to that machinery, making it understandable and predictable. In this regards, it is similar to assembly language programming, where the programmer has complete control over the machine.
This article will only highlight the main concepts. We will not cover things like "pointer offsets", "transient and external symbols" and stuff like that. For the "real thing", refer to the PicoLisp Reference Page.
At the Core: The Cell.
The PicoLisp virtual machine is both simpler and more powerful than most current (hardware) processors. At the lowest level, it is constructed from a single data structure called "cell": A cell is a pair of 64-bit machine words, which traditionally are called CAR and CDR in the Lisp terminology. These words can represent either a numeric value (scalar) or the address of another cell (pointer). All higher level data structures are built out of cells.
Usually, we need to know two things about our data in order to work with it:
- Where is it stored?
- What is the type?
The location is stored in pointers, which basically represent the memory address of a certain cell. This is an example of a 64-bit pointer to a list:
The "X" stands for "don't care" (can be either 0 or 1). The last four digits represent the data type.
0000 shows that it is a list pointer.
On the virtual machine level, PicoLisp supports
- three base data types: Numbers, Symbols and Lists,
- the special symbol NIL.
Numbers and Symbols are also referred to as "atoms". Let's go through it step by step.
Like already mentioned, a cell is a pair of 2x 64 bit words, where the last 4 bits are reserved to specify the data type.
If the number is less than 60 bits long (short number), then the value is directly represented by the pointer.
If the number is longer than 60 bits (big number), we need another cell to represent the other digits. This could look like this:
As we already heard, the last four bits represent the data type. A short number is non-zero at the second-last bit, i.e.
S010, while a big number is non-zero at the third-last bit (
S stands for the so-called sign-bit and is 0 if the number is positive and 1 if the number is negative.
Example: How would the number 10 is represented?
- Calculate the binary representation of 10:
- Data type "positive short number": the pointer ends with
- Fill up the rest with
Now let's talk about symbols. A symbol is more complex than a number. Each symbol has a value, and optionally a name and an arbitrary number of properties. It is recognized by the interpreter if it looks like this:
Example Let's take a variable named
A with the value
A is represented by 65 in ASCII. So the cell will have the following content:
+-----+-----+ | 65 | 7 | Cell of symbol named "A" with value 7 +-----+-----+
As you can see, the CAR contains the name of the symbol. The CDR of a symbol cell contains the value (also called
As you might have noticed, there is no "String" datatype in picoLisp. This is because technically speaking, a string is also a symbol. Let's say I want to decleare a variable called "name" with the value "Mia", which is a string. Then it will look like this:
+------+-----+ |"name"| | | +------+--+--+ | V +------+-----+ | "Mia"| | +------+-----+ ("name" and "Mia" should be ASCII-representations).
We have a symbol with the name "name". Its VAL points to another symbol with the name "Mia".
Now let's have a little bit more advanced example of a symbol with properties. A property is a key-value pair. Note that for a key-value pair, the CAR contains the
VAL and the CDR contains the
key. (This is opposite to symbols!)
This is the example of the symbol
A with value
7 from above. Let's say it has a property
n of value
3. Then the cells will look like this:
+-----+-----+ A | | | 7 | +--+--+-----+ | V +-----+-----+ | | | 65 | ASCII "A" = 65 +--+--+-----+ | V +-----+-----+ | 3 | | | +-----+--+--+ | V +-----+-----+ n | 110 | / | ASCII "n" = 110 +-----+-----+ '/' represents a pointer to the symbol NIL.
A still has its value
7 in the CDR, while the CAR points to another cell with the
symbol name in CDR. The CAR points to an arbitrary number of key-value-pairs, where the
VAL is stored in the CAR and the
key pointer is stored in the CDR.
The special symbol
There is one special symbol, which is
NIL.Among others, it is used:
- to represent the boolean value "false"
- to represent the value "Not a Number"
- as an end-of-list marker
- to represent the empty list
- to represent undefined variables
It exists only once in the whole system: Everytime you see a
NIL in the source code, it is always pointing exactly to the same cell.
Lists are the base type for every composite data structure like trees, arrays, stacks and so on. Similar to big numbers, the CDR of the cell points to the next cell in order to connect the cells to a list.
How about the rest?
We have covered "big" and "small" numbers, symbols and lists, including strings and symbols with properties (sometimes called Objects in some other languages). But don't we need some more?
How about Boolean? It is quite simple: We have
NILas representation for all "false" values. Everything not
NILis "true" (
T). No need for another datatype.
How about Functions? Functions are "nothing special". They are either pointers to the position where the program is stored in memory, or lists. We will see more examples of these later. It is one of the cores of the functional programming principles in PicoLisp.
How about Arrays? Arrays are basically lists that take up consecutive space in the memory. In many cases, they can be replaced by lists. For some special cases, there are array-like implementations of lists in PicoLisp but this should also be covered in a seperate article.
There are three basic data types: Numbers, symbols and lists, all of which are built up from cells (CAR+CDR in lisp terminology). Complex values like key-value pair, lists or big numbers are created by using one part of the cell as pointer to another cell, which contains the needed values.
This should be enough theory to get started! Also check out the reference guide whenever the information covered in this article was not detailed enough.