Concepts and Data Types of PicoLisp

Concepts and Data Types of PicoLisp

Form follows Function

·

6 min read

We mentioned in previous posts that PicoLisp is striving for "simplicity". But what does that mean, actually?

The Simple Way of PicoLisp

If you have studied a higher programming language like Python or Java, you might have come across many data types: strings, arrays, integers, float, boolean... However, you probably never thought about how these data types are represented in the internal memory. How does the computer "know" what is the value of "A"? What actually is a "function"?

In most modern programming languages, this is intransparent for the programmer, who shouldn't know and also shouldn't care. This is a big difference to the PicoLisp philosophy and one of the main reasons why it is so interesting to learn:

In PicoLisp, you know everything about the internal machinery.

And the reason for this is because the internal implementation is drastically simplified to the core. The high-level constructs of the programming language directly map to that machinery, making it understandable and predictable. In this regards, it is similar to assembly language programming, where the programmer has complete control over the machine.

This article will only highlight the main concepts. We will not cover things like "pointer offsets", "transient and external symbols" and stuff like that. For the "real thing", refer to the PicoLisp Reference Page.


At the Core: The Cell.

cellcompare.png

The PicoLisp virtual machine is both simpler and more powerful than most current (hardware) processors. At the lowest level, it is constructed from a single data structure called "cell": A cell is a pair of 64-bit machine words, which traditionally are called CAR and CDR in the Lisp terminology. These words can represent either a numeric value (scalar) or the address of another cell (pointer). All higher level data structures are built out of cells.

Usually, we need to know two things about our data in order to work with it:

  • Where is it stored?
  • What is the type?

The location is stored in pointers, which basically represent the memory address of a certain cell. This is an example of a 64-bit pointer to a list:

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx0000

The "X" stands for "don't care" (can be either 0 or 1). The last four digits represent the data type. 0000 shows that it is a list pointer.


Data Types

On the virtual machine level, PicoLisp supports

  • three base data types: Numbers, Symbols and Lists,
  • the special symbol NIL.

Numbers and Symbols are also referred to as "atoms". Let's go through it step by step.

Numbers

Like already mentioned, a cell is a pair of 2x 64 bit words, where the last 4 bits are reserved to specify the data type.

If the number is less than 60 bits long (short number), then the value is directly represented by the pointer.

If the number is longer than 60 bits (big number), we need another cell to represent the other digits. This could look like this:

happybignum.png

As we already heard, the last four bits represent the data type. A short number is non-zero at the second-last bit, i.e. S010, while a big number is non-zero at the third-last bit (S100). S stands for the so-called sign-bit and is 0 if the number is positive and 1 if the number is negative.

Example: How would the number 10 is represented?

  1. Calculate the binary representation of 10: 1010
  2. Data type "positive short number": the pointer ends with 0010
  3. Fill up the rest with 0's
0000000000000000000000000000000000000000000000000000000010100010

Symbols

Now let's talk about symbols. A symbol is more complex than a number. Each symbol has a value, and optionally a name and an arbitrary number of properties. It is recognized by the interpreter if it looks like this:

 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx1000

Example Let's take a variable named A with the value 7. A is represented by 65 in ASCII. So the cell will have the following content:

      +-----+-----+
      |  65 |  7  |       Cell of symbol named "A" with value 7
      +-----+-----+

As you can see, the CAR contains the name of the symbol. The CDR of a symbol cell contains the value (also called VAL).

As you might have noticed, there is no "String" datatype in picoLisp. This is because technically speaking, a string is also a symbol. Let's say I want to decleare a variable called "name" with the value "Mia", which is a string. Then it will look like this:

      +------+-----+
      |"name"|  |  |         
      +------+--+--+
                |
                V
         +------+-----+
         | "Mia"|     |
         +------+-----+

("name" and "Mia" should be ASCII-representations).

We have a symbol with the name "name". Its VAL points to another symbol with the name "Mia".


Now let's have a little bit more advanced example of a symbol with properties. A property is a key-value pair. Note that for a key-value pair, the CAR contains the VAL and the CDR contains the key. (This is opposite to symbols!)

This is the example of the symbol A with value 7 from above. Let's say it has a property n of value 3. Then the cells will look like this:

      +-----+-----+
   A  |  |  |  7  |
      +--+--+-----+
         |
         V
         +-----+-----+
         |  |  |  65 |   ASCII "A" = 65
         +--+--+-----+
            |
            V
            +-----+-----+
            |  3  |  |  |
            +-----+--+--+
                     |
                     V
               +-----+-----+
            n  | 110 |  /  |   ASCII "n" = 110
               +-----+-----+

'/' represents a pointer to the symbol NIL.

The symbol A still has its value 7 in the CDR, while the CAR points to another cell with the symbol name in CDR. The CAR points to an arbitrary number of key-value-pairs, where the VAL is stored in the CAR and the key pointer is stored in the CDR.

The special symbol NIL

There is one special symbol, which is NIL.Among others, it is used:

  • to represent the boolean value "false"
  • to represent the value "Not a Number"
  • as an end-of-list marker
  • to represent the empty list
  • to represent undefined variables
  • ...

It exists only once in the whole system: Everytime you see a NIL in the source code, it is always pointing exactly to the same cell.


Lists

Lists are the base type for every composite data structure like trees, arrays, stacks and so on. Similar to big numbers, the CDR of the cell points to the next cell in order to connect the cells to a list.

happyList.png


How about the rest?

We have covered "big" and "small" numbers, symbols and lists, including strings and symbols with properties (sometimes called Objects in some other languages). But don't we need some more?

  • How about Boolean? It is quite simple: We have NIL as representation for all "false" values. Everything not NIL is "true" (T). No need for another datatype.

  • How about Functions? Functions are "nothing special". They are either pointers to the position where the program is stored in memory, or lists. We will see more examples of these later. It is one of the cores of the functional programming principles in PicoLisp.

  • How about Arrays? Arrays are basically lists that take up consecutive space in the memory. In many cases, they can be replaced by lists. For some special cases, there are array-like implementations of lists in PicoLisp but this should also be covered in a seperate article.


Summary

There are three basic data types: Numbers, symbols and lists, all of which are built up from cells (CAR+CDR in lisp terminology). Complex values like key-value pair, lists or big numbers are created by using one part of the cell as pointer to another cell, which contains the needed values.

This should be enough theory to get started! Also check out the reference guide whenever the information covered in this article was not detailed enough.


Sources

Cover pic: 100 Years of Bauhaus
software-lab.de/doc/ref.html