Working with Lists: Destructive Operations

In functional languages, we care a lot about the so-called "side effects" of functions. Based on the definition of Eric Normand, a function with side effect could also be called an action, while a function without side effekt is a calculation (you can find more on this in the post "What is functional programming?".

What is the difference? If a function relies on, or modifies, something outside its local environment to do something, for example write to a database, file, or mutate a global variable, it is an action and has side effects.

Clearly, it is very helpful to distinguish between calculations and actions in order to keep the code reusable and maintainble. In this post, I want to give a little overview about list functions with and without side-effects and show the difference between them.

What is a destructive list operation?

In case of lists functions, actions are often called "destructive" functions. When exactly is it destructive? I found several definitions, not all of them applicable to PicoLisp. In this post, I want to define "destructive" as follows:

A destructive list operation is an operation that modifies cells which are accessible via other refernces. In other words, if there is a pointer to the original list, that pointer will point also to the modified list.

For many list functionalities, we have destructive and non-destructive operations as pair. They basically fulfill the same functionality, but while the non-destructive operation returns a copy of the function, the destructive operation modifies the original data.

The advantages of both approaches are clear: The non-destructive version is more safe and reusable, because it has no side-effects. The destructive version modifies the original data, which is of course more dangerous, but also more efficient. Let's go through some examples.

`append` vs. `conc`

As first example, let's take a look at append and conc. We can use both functions to create a new list C out of two lists A and B.

First let's define A and B:

: (setq A '(a b))
-> (a b)
: (setq B '(c d))
-> (c d)

As you might know, PicoLisp lists are built out of cells. This means, we can draw the cell representation of A and B like this:

  +-----+-----+     +-----+-----+
A |  a  |  ---+---->|  b  |  /  |
  +-----+-----+     +-----+-----+

  +-----+-----+     +-----+-----+
B |  c  |  ---+---->|  d  |  /  |
  +-----+-----+     +-----+-----+

The pointer of A points to the cell with val a, and the pointer of B points to the cell with val b. Now we concatenate those two lists with append.

: (setq C (append A B))
-> (a b c d)

append is non-destructive. This means that A and B are unaffected by this function. If we check A, B and C, we see that we can access all those lists separately:

: A
-> (a b)
: B
-> (c d)
: C
-> (a b c d)

If we look at the internal representation in cells, you will see why: C contains a copy of the A list.

  +-----+-----+     +-----+-----+     +-----+-----+     +-----+-----+
C |  a  |  ---+---->|  b  |  ---+---->|  c  |  ---+---->|  d  |  /  |
  +-----+-----+     +-----+-----+     +-----+-----+     +-----+-----+
                                         ^
  +-----+-----+     +-----+-----+        |
A |  a  |  ---+---->|  b  |  /  |        |
  +-----+-----+     +-----+-----+        |
                                         |
                                         |
B ---------------------------------------+

Now on the other hand, let's look at conc.

: (setq D (conc A B))
-> (a b c d)

The output is the same, but if we check A and B, you will see that A has been modified.

: A
-> (a b c d)
: B
-> (c d)

This is because B has been appended to the original list A.

: (setq D (conc A B))
-> (a b c d)

  +-----+-----+     +-----+-----+     +-----+-----+     +-----+-----+
D |  a  |  ---+---->|  b  |  ---+---->|  c  |  ---+---->|  d  |  /  |
  +-----+-----+     +-----+-----+     +-----+-----+     +-----+-----+
     ^                                   ^
     |                                   |
A ---+                                   |
                                         |
                                         |
                                         |
B ---------------------------------------+

In other words, we actually don't need the new variable D. We can simply run (conc A B) and use A. Here you can see why the destructive function is both more efficient and more dangerous:

We save the list copying process because we operate on the list directly. This can be a huge efficiency increase (depending on the use case, of course).
If the programmer is not aware that the original list A is modified, it can lead to bugs that are hard to detect.

`reverse` and `flip`

Next example: reverse and flip, where reverse returns a copy while flip modifies the original list.

: (setq L '(a b c d))
-> (a b c d)
: (setq R (reverse L))
-> (d c b a)

L has kept its original value:

: R
-> (d c b a)
: L
-> (a b c d)

In other words, reverse simply creates a copy of all cells.

On the contrary, flip modifies its argument.

: (setq F (flip L))
-> (D C B A)
: L
-> (a)
: F
-> (d c b a)

L only contains the CAR. Again, this can be surprising, right? Let's look at the cell representation to see what's going on.

F ---------------------------------------+
                                         |
                                         |
     +-----------------------+           |
     |                       |           |
     v                       |           v
  +-----+-----+     +-----+--|--+     +-----+-----+
L |  a  |  /  |     |  b  |  |  |     |  c  |  |  |
  +-----+-----+     +-----+-----+     +-----+--|--+
                       ^                       |
                       |                       |
                       +-----------------------+

As you can see, flip basically changes the pointer direction within the cells. As side effect, the CAR of L "loses" it's reference to the CDR. Again, this operation is obviously more efficient as we avoid copying all cells, but you need to be aware that the original list becomes inaccessible by this.

`cut` and `rid`

Now let's look at removing items from a list with cut and rid. Despite the name, cut is not destructive according to the definition above.

Let's check out what happens. We take a list A and "cut" out the first two elements.

: (setq A '(a b c d)
-> (a b c d)
: (setq C (cut 2 'A)
-> (a b)

Now C contains the first two and A contains the rest of the list.

: A
-> (c d)
: B
-> (a b)

Now you might say, "oh, but A is different to before, so isn't this destructive"?

The point is, that if we had a pointer to the original list, we could still access all original items of A! Let's try again:

: (setq A '(a b c d)
-> (a b c d)
: (setq B A)
-> (a b c d)

: (setq C (cut 2 'A)
-> (a b)
: A
-> (c d)
: B
-> (a b c d)

Visualized as ASCII:

  +-----+-----+     +-----+-----+
C |  a  |  ---+---->|  b  |  /  |
  +-----+-----+     +-----+-----+

  +-----+-----+     +-----+-----+
A |  c  |  ---+---->|  d  |  /  |
  +-----+-----+     +-----+-----+

  +-----+-----+     +-----+-----+     +-----+-----+     +-----+-----+
B |  a  |  ---+---->|  b  |  ---+---->|  c  |  ---+---->|  d  |  /  |
  +-----+-----+     +-----+-----+     +-----+-----+     +-----+-----+

On the contrary, if we remove items with rid, a second pointer will not help us at all. This is why we can call rid destructive and cut non-destructive.

: (setq X '(a b c d))
-> (a b c d)
: (setq Y X)
-> (a b c d)

Now if we remove an item from the middle of the list, both pointers are affected:

: (rid 'X 'b)
-> (a c d)
: Y
-> (a c d)
: X
-> (a c d)

Some more destructive functions

All destructive functions are marked in the documentation as destructive, together with some examples how to use them.

For example: sort:

: (setq A (4 2 3 1))
-> (4 2 3 1)

  +-----+-----+     +-----+-----+     +-----+-----+     +-----+-----+
A |  4  |  ---+---->|  2  |  ---+---->|  3  |  ---+---->|  1  |  /  |
  +-----+-----+     +-----+-----+     +-----+-----+     +-----+-----+


: (setq B (sort A))
-> (1 2 3 4)

A ---+
     |
     |                 +-----------------------------------------+
     |                 |                                         |
     |                 |     +-----------+                       |
     |                 |     |           |                       |
     v                 v     |           v                       |
  +-----+-----+     +-----+--|--+     +-----+-----+     +-----+--|--+
  |  4  |  /  |     |  2  |  |  |     |  3  |  |  |     |  1  |  |  |
  +-----+-----+     +-----+-----+     +-----+--|--+     +-----+-----+
     ^                                         |          ^
     |                                         |          |
     +-----------------------------------------+          |
                                                          |
                                                          |
B --------------------------------------------------------+

Similarly to flip, sort modifies the pointers in the cells, which means that the original list is destroyed.

con concatenates the first cell of a list with another list, which means that the CDR of the original list is lost.

: (setq L '(a b c d))
-> (a b c d)
: (setq M '(e f g h))
-> (e f g h)
: (con L M)
-> (e f g h)
: L
-> (a e f g h)

In the next Euler post ("10001st prime number"), we will see an example how con can be used to dynamically remove elements from a list.

Similary, queue destructively concatenates elements to the end of a list:

: (queue 'A 1)
-> 1
: (queue 'A 2)
-> 2
: (queue 'A 3)
-> 3
: A
-> (1 2 3)

Sources

software-lab.de/doc/index.html