Although any expression in Joy can be considered to describe a tree with the quotes as compound nodes and the non-quote values as leaf nodes, in this page I want to talk about ordered binary trees and how to make and use them.
The basic structure, in a crude type notation, is:
BTree :: [] | [key value BTree BTree]
That says that a BTree is either the empty quote []
or a quote with four items: a key, a value, and two BTrees representing the left and right branches of the tree.
Let's take a crack at writing a function that can recursively iterate or traverse these trees.
[]
¶The stopping predicate just has to detect the empty list:
BTree-iter == [not] [E] [R0] [R1] genrec
And since there's nothing at this node, we just pop
it:
BTree-iter == [not] [pop] [R0] [R1] genrec
[key value left right]
¶Now we need to figure out R0
and R1
:
BTree-iter == [not] [pop] [R0] [R1] genrec
== [not] [pop] [R0 [BTree-iter] R1] ifte
Let's look at it in situ:
[key value left right] R0 [BTree-iter] R1
R0
is almost certainly going to use dup
to make a copy of the node and then dip
on some function to process the copy with it:
[key value left right] [F] dupdip [BTree-iter] R1
[key value left right] F [key value left right] [BTree-iter] R1
For example, if we're getting all the keys F
would be first
:
R0 == [first] dupdip
[key value left right] [first] dupdip [BTree-iter] R1
[key value left right] first [key value left right] [BTree-iter] R1
key [key value left right] [BTree-iter] R1
Now R1
needs to apply [BTree-iter]
to left
and right
. If we drop the key and value from the node using rest
twice we are left with an interesting situation:
key [key value left right] [BTree-iter] R1
key [key value left right] [BTree-iter] [rest rest] dip
key [key value left right] rest rest [BTree-iter]
key [left right] [BTree-iter]
Hmm, will step
do?
key [left right] [BTree-iter] step
key left BTree-iter [right] [BTree-iter] step
key left-keys [right] [BTree-iter] step
key left-keys right BTree-iter
key left-keys right-keys
Wow. So:
R1 == [rest rest] dip step
We have:
BTree-iter == [not] [pop] [[F] dupdip] [[rest rest] dip step] genrec
When I was reading this over I realized rest rest
could go in R0
:
BTree-iter == [not] [pop] [[F] dupdip rest rest] [step] genrec
(And [step] genrec
is such a cool and suggestive combinator!)
F
per-node processing function.¶[F] BTree-iter == [not] [pop] [[F] dupdip rest rest] [step] genrec
Working backward:
[not] [pop] [[F] dupdip rest rest] [step] genrec
[not] [pop] [F] [dupdip rest rest] cons [step] genrec
[F] [not] [pop] roll< [dupdip rest rest] cons [step] genrec
Ergo:
BTree-iter == [not] [pop] roll< [dupdip rest rest] cons [step] genrec
from notebook_preamble import D, J, V, define, DefinitionWrapper
define('BTree-iter == [not] [pop] roll< [dupdip rest rest] cons [step] genrec')
J('[] [23] BTree-iter') # It doesn't matter what F is as it won't be used.
J('["tommy" 23 [] []] [first] BTree-iter')
J('["tommy" 23 ["richard" 48 [] []] ["jenny" 18 [] []]] [first] BTree-iter')
J('["tommy" 23 ["richard" 48 [] []] ["jenny" 18 [] []]] [second] BTree-iter')
Let's consider adding nodes to a BTree structure.
BTree value key BTree-add == BTree
If the current node is []
then you just return [key value [] []]
:
BTree-add == [popop not] [[pop] dipd BTree-new] [R0] [R1] genrec
Where BTree-new
is:
value key BTree-new == [key value [] []]
value key swap [[] []] cons cons
key value [[] []] cons cons
key [value [] []] cons
[key value [] []]
BTree-new == swap [[] []] cons cons
define('BTree-new == swap [[] []] cons cons')
V('"v" "k" BTree-new')
(As an implementation detail, the [[] []]
literal used in the definition of BTree-new
will be reused to supply the constant tail for all new nodes produced by it. This is one of those cases where you get amortized storage "for free" by using persistent datastructures. Because the tail, which is ((), ((), ()))
in Python, is immutable and embedded in the definition body for BTree-new
, all new nodes can reuse it as their own tail without fear that some other code somewhere will change it.)
We now have to derive R0
and R1
, consider:
[key_n value_n left right] value key R0 [BTree-add] R1
In this case, there are three possibilites: the key can be greater or less than or equal to the node's key. In two of those cases we will need to apply a copy of BTree-add
, so R0
is pretty much out of the picture.
[R0] == []
The first thing we need to do is compare the the key we're adding to see if it is greater than the node key and branch
accordingly, although in this case it's easier to write a destructive predicate and then use ifte
to apply it nullary
:
[key_n value_n left right] value key [BTree-add] R1
[key_n value_n left right] value key [BTree-add] [P >] [T] [E] ifte
[key_n value_n left right] value key [BTree-add] P >
[key_n value_n left right] value key [BTree-add] pop roll> pop first >
[key_n value_n left right] value key roll> pop first >
key [key_n value_n left right] value roll> pop first >
key key_n >
Boolean
P > == pop roll> pop first >
P < == pop roll> pop first <
P == pop roll> pop first
define('P == pop roll> pop first')
V('["k" "v" [] []] "vv" "kk" [0] P >')
Here the parantheses are meant to signify that the right-hand side (RHS) is not literal, the code in the parentheses is meant to have been evaluated:
[key_n value_n left right] value key [BTree-add] T == [key_n value_n left (BTree-add key value right)]
infra
on K
.¶So how do we do this? We know we're going to want to use infra
on some function K
that has the key and value to work with, as well as the quoted copy of BTree-add
to apply somehow:
right left value_n key_n value key [BTree-add] K
...
right value key BTree-add left value_n key_n
Pretty easy:
right left value_n key_n value key [BTree-add] cons cons dipdd
right left value_n key_n [value key BTree-add] dipdd
right value key BTree-add left value_n key_n
So:
K == cons cons dipdd
And:
[key_n value_n left right] [value key [BTree-add] K] infra
T
.¶So now we're at getting from this to this:
[key_n value_n left right] value key [BTree-add] T
...
[key_n value_n left right] [value key [BTree-add] K] infra
And so T
is just:
value key [BTree-add] T == [value key [BTree-add] K] infra
T == [ K] cons cons cons infra
define('K == cons cons dipdd')
define('T == [K] cons cons cons infra')
V('"r" "l" "v" "k" "vv" "kk" [0] K')
V('["k" "v" "l" "r"] "vv" "kk" [0] T')
This is very very similar to the above:
[key_n value_n left right] value key [BTree-add] E
[key_n value_n left right] value key [BTree-add] [P <] [Te] [Ee] ifte
In this case Te
works that same as T
but on the left child tree instead of the right, so the only difference is that it must use dipd
instead of dipdd
:
Te == [cons cons dipd] cons cons cons infra
This suggests an alternate factorization:
ccons == cons cons
T == [ccons dipdd] ccons cons infra
Te == [ccons dipd] ccons cons infra
But whatever.
define('Te == [cons cons dipd] cons cons cons infra')
V('["k" "v" "l" "r"] "vv" "kk" [0] Te')
This means we must find:
[key_n value_n left right] value key [BTree-add] Ee
...
[key value left right]
This is another easy one:
Ee == pop swap roll< rest rest cons cons
[key_n value_n left right] value key [BTree-add] pop swap roll< rest rest cons cons
[key_n value_n left right] value key swap roll< rest rest cons cons
[key_n value_n left right] key value roll< rest rest cons cons
key value [key_n value_n left right] rest rest cons cons
key value [ left right] cons cons
[key value left right]
define('Ee == pop swap roll< rest rest cons cons')
V('["k" "v" "l" "r"] "vv" "k" [0] Ee')
define('E == [P <] [Te] [Ee] ifte')
BTree-add
¶BTree-add == [popop not] [[pop] dipd BTree-new] [] [[P >] [T] [E] ifte] genrec
Putting it all together:
BTree-new == swap [[] []] cons cons
P == pop roll> pop first
T == [cons cons dipdd] cons cons cons infra
Te == [cons cons dipd] cons cons cons infra
Ee == pop swap roll< rest rest cons cons
E == [P <] [Te] [Ee] ifte
BTree-add == [popop not] [[pop] dipd BTree-new] [] [[P >] [T] [E] ifte] genrec
define('BTree-add == [popop not] [[pop] dipd BTree-new] [] [[P >] [T] [E] ifte] genrec')
J('[] 23 "b" BTree-add') # Initial
J('["b" 23 [] []] 88 "c" BTree-add') # Less than
J('["b" 23 [] []] 88 "a" BTree-add') # Greater than
J('["b" 23 [] []] 88 "b" BTree-add') # Equal to
J('[] 23 "a" BTree-add 88 "b" BTree-add 44 "c" BTree-add') # Series.
We can use this to make a set-like datastructure by just setting values to e.g. 0 and ignoring them. It's set-like in that duplicate items added to it will only occur once within it, and we can query it in $O(\log_2 N)$ time.
J('[] [3 9 5 2 8 6 7 8 4] [0 swap BTree-add] step')
define('to_set == [] swap [0 swap BTree-add] step')
J('[3 9 5 2 8 6 7 8 4] to_set')
And with that we can write a little program to remove duplicate items from a list.
define('unique == [to_set [first] BTree-iter] cons run')
J('[3 9 3 5 2 9 8 8 8 6 2 7 8 4 3] unique') # Filter duplicate items.
cmp
combinator¶Instead of all this mucking about with nested ifte
let's just go whole hog and define cmp
which takes two values and three quoted programs on the stack and runs one of the three depending on the results of comparing the two values:
a b [G] [E] [L] cmp
------------------------- a > b
G
a b [G] [E] [L] cmp
------------------------- a = b
E
a b [G] [E] [L] cmp
------------------------- a < b
L
We need a new non-destructive predicate P
:
[key_n value_n left right] value key [BTree-add] P
[key_n value_n left right] value key [BTree-add] over [Q] nullary
[key_n value_n left right] value key [BTree-add] key [Q] nullary
[key_n value_n left right] value key [BTree-add] key Q
[key_n value_n left right] value key [BTree-add] key popop popop first
[key_n value_n left right] value key popop first
[key_n value_n left right] first
key_n
[key_n value_n left right] value key [BTree-add] key [Q] nullary
[key_n value_n left right] value key [BTree-add] key key_n
P == over [popop popop first] nullary
Here are the definitions again, pruned and renamed in some cases:
BTree-new == swap [[] []] cons cons
P == over [popop popop first] nullary
T> == [cons cons dipdd] cons cons cons infra
T< == [cons cons dipd] cons cons cons infra
E == pop swap roll< rest rest cons cons
Using cmp
to simplify our code above at R1
:
[key_n value_n left right] value key [BTree-add] R1
[key_n value_n left right] value key [BTree-add] P [T>] [E] [T<] cmp
The line above becomes one of the three lines below:
[key_n value_n left right] value key [BTree-add] T>
[key_n value_n left right] value key [BTree-add] E
[key_n value_n left right] value key [BTree-add] T<
The definition is a little longer but, I think, more elegant and easier to understand:
BTree-add == [popop not] [[pop] dipd BTree-new] [] [P [T>] [E] [T<] cmp] genrec
from joy.library import FunctionWrapper
from joy.utils.stack import concat
from notebook_preamble import D
@FunctionWrapper
def cmp_(stack, expression, dictionary):
'''
cmp takes two values and three quoted programs on the stack and runs
one of the three depending on the results of comparing the two values:
a b [G] [E] [L] cmp
------------------------- a > b
G
a b [G] [E] [L] cmp
------------------------- a = b
E
a b [G] [E] [L] cmp
------------------------- a < b
L
'''
L, (E, (G, (b, (a, stack)))) = stack
expression = concat(G if a > b else L if a < b else E, expression)
return stack, expression, dictionary
D['cmp'] = cmp_
from joy.library import FunctionWrapper, S_ifte
@FunctionWrapper
def cond(stack, expression, dictionary):
'''
like a case statement; works by rewriting into a chain of ifte.
[..[[Bi] Ti]..[D]] -> ...
[[[B0] T0] [[B1] T1] [D]] cond
-----------------------------------------
[B0] [T0] [[B1] [T1] [D] ifte] ifte
'''
conditions, stack = stack
if conditions:
expression = _cond(conditions, expression)
try:
# Attempt to preload the args to first ifte.
(P, (T, (E, expression))) = expression
except ValueError:
# If, for any reason, the argument to cond should happen to contain
# only the default clause then this optimization will fail.
pass
else:
stack = (E, (T, (P, stack)))
return stack, expression, dictionary
def _cond(conditions, expression):
(clause, rest) = conditions
if not rest: # clause is [D]
return clause
P, T = clause
return (P, (T, (_cond(rest, ()), (S_ifte, expression))))
D['cond'] = cond
J("1 0 ['G'] ['E'] ['L'] cmp")
J("1 1 ['G'] ['E'] ['L'] cmp")
J("0 1 ['G'] ['E'] ['L'] cmp")
from joy.library import DefinitionWrapper
DefinitionWrapper.add_definitions('''
P == over [popop popop first] nullary
T> == [cons cons dipdd] cons cons cons infra
T< == [cons cons dipd] cons cons cons infra
E == pop swap roll< rest rest cons cons
BTree-add == [popop not] [[pop] dipd BTree-new] [] [P [T>] [E] [T<] cmp] genrec
''', D)
J('[] 23 "b" BTree-add') # Initial
J('["b" 23 [] []] 88 "c" BTree-add') # Less than
J('["b" 23 [] []] 88 "a" BTree-add') # Greater than
J('["b" 23 [] []] 88 "b" BTree-add') # Equal to
J('[] 23 "a" BTree-add 88 "b" BTree-add 44 "c" BTree-add') # Series.
It may seem silly, but a big part of programming in Forth (and therefore in Joy) is the idea of small, highly-factored definitions. If you choose names carefully the resulting definitions can take on a semantic role.
get-node-key == popop popop first
remove-key-and-value-from-node == rest rest
pack-key-and-value == cons cons
prep-new-key-and-value == pop swap roll<
pack-and-apply == [pack-key-and-value] swoncat cons pack-key-and-value infra
BTree-new == swap [[] []] pack-key-and-value
P == over [get-node-key] nullary
T> == [dipdd] pack-and-apply
T< == [dipd] pack-and-apply
E == prep-new-key-and-value remove-key-and-value-from-node pack-key-and-value
BTree-iter
that does In-Order Traversal¶If you look back to the non-empty case of the BTree-iter
function we can design a varient that first processes the left child, then the current node, then the right child. This will allow us to traverse the tree in sort order.
BTree-iter-order == [not] [pop] [R0 [BTree-iter] R1] ifte
To define R0
and R1
it helps to look at them as they will appear when they run:
[key value left right] R0 [BTree-iter-order] R1
Staring at this for a bit suggests dup third
to start:
[key value left right] R0 [BTree-iter-order] R1
[key value left right] dup third [BTree-iter-order] R1
[key value left right] left [BTree-iter-order] R1
Now maybe:
[key value left right] left [BTree-iter-order] [cons dip] dupdip
[key value left right] left [BTree-iter-order] cons dip [BTree-iter-order]
[key value left right] [left BTree-iter-order] dip [BTree-iter-order]
left BTree-iter-order [key value left right] [BTree-iter-order]
So far, so good. Now we need to process the current node's values:
left BTree-iter-order [key value left right] [BTree-iter-order] [[F] dupdip] dip
left BTree-iter-order [key value left right] [F] dupdip [BTree-iter-order]
left BTree-iter-order [key value left right] F [key value left right] [BTree-iter-order]
If F
needs items from the stack below the left stuff it should have cons
'd them before beginning maybe? For functions like first
it works fine as-is.
left BTree-iter-order [key value left right] first [key value left right] [BTree-iter-order]
left BTree-iter-order key [key value left right] [BTree-iter-order]
First ditch the rest of the node and get the right child:
left BTree-iter-order key [key value left right] [BTree-iter-order] [rest rest rest first] dip
left BTree-iter-order key right [BTree-iter-order]
Then, of course, we just need i
to run BTree-iter-order
on the right side:
left BTree-iter-order key right [BTree-iter-order] i
left BTree-iter-order key right BTree-iter-order
BTree-iter-order
¶The result is a little awkward:
R1 == [cons dip] dupdip [[F] dupdip] dip [rest rest rest first] dip i
Let's do a little semantic factoring:
fourth == rest rest rest first
proc_left == [cons dip] dupdip
proc_current == [[F] dupdip] dip
proc_right == [fourth] dip i
BTree-iter-order == [not] [pop] [dup third] [proc_left proc_current proc_right] genrec
Now we can sort sequences.
define('BTree-iter-order == [not] [pop] [dup third] [[cons dip] dupdip [[first] dupdip] dip [rest rest rest first] dip i] genrec')
J('[3 9 5 2 8 6 7 8 4] to_set BTree-iter-order')
Let's derive a function that accepts a tree and a key and returns the value associated with that key.
tree key BTree-get
------------------------
value
[]
¶As before, the stopping predicate just has to detect the empty list:
BTree-get == [pop not] [E] [R0] [R1] genrec
But what do we do if the key isn't in the tree? In Python we might raise a KeyError
but I'd like to avoid exceptions in Joy if possible, and here I think it's possible. (Division by zero is an example of where I think it's probably better to let Python crash Joy. Sometimes the machinery fails and you have to "stop the line", methinks.)
Let's pass the buck to the caller by making the base case a given, you have to decide for yourself what [E]
should be.
tree key [E] BTree-get
---------------------------- key in tree
value
tree key [E] BTree-get
---------------------------- key not in tree
tree key E
Now we define:
BTree-get == [pop not] swap [R0] [R1] genrec
Note that this BTree-get
creates a slightly different function than itself and that function does the actual recursion. This kind of higher-level programming is unusual in most languages but natural in Joy.
tree key [E] [pop not] swap [R0] [R1] genrec
tree key [pop not] [E] [R0] [R1] genrec
The anonymous specialized recursive function that will do the real work.
[pop not] [E] [R0] [R1] genrec
[key value left right]
¶Now we need to figure out R0
and R1
:
[key value left right] key R0 [BTree-get] R1
We want to compare the search key with the key in the node, and if they are the same return the value and if they differ then recurse on one of the child nodes. So it's very similar to the above funtion, with [R0] == []
and R1 == P [T>] [E] [T<] cmp
:
[key value left right] key [BTree-get] P [T>] [E] [T<] cmp
So:
get-node-key == pop popop first
P == over [get-node-key] nullary
The only difference is that get-node-key
does one less pop
because there's no value to discard. Now we have to derive the branches:
[key_n value_n left right] key [BTree-get] T>
[key_n value_n left right] key [BTree-get] E
[key_n value_n left right] key [BTree-get] T<
The cases of T>
and T<
are similar to above but instead of using infra
we have to discard the rest of the structure:
[key_n value_n left right] key [BTree-get] T> == right key BTree-get
[key_n value_n left right] key [BTree-get] T< == left key BTree-get
So:
T> == [fourth] dipd i
T< == [third] dipd i
E.g.:
[key_n value_n left right] key [BTree-get] [fourth] dipd i
[key_n value_n left right] fourth key [BTree-get] i
right key [BTree-get] i
right key BTree-get
And:
[key_n value_n left right] key [BTree-get] E == value_n
E == popop second
So:
fourth == rest rest rest first
get-node-key == pop popop first
P == over [get-node-key] nullary
T> == [fourth] dipd i
T< == [third] dipd i
E == popop second
BTree-get == [pop not] swap [] [P [T>] [E] [T<] cmp] genrec
# I don't want to deal with name conflicts with the above so I'm inlining everything here.
# The original Joy system has "hide" which is a meta-command which allows you to use named
# definitions that are only in scope for a given definition. I don't want to implement
# that (yet) so...
define('''
BTree-get == [pop not] swap [] [
over [pop popop first] nullary
[[rest rest rest first] dipd i]
[popop second]
[[third] dipd i]
cmp
] genrec
''')
J('[] "gary" [popop "err"] BTree-get')
J('["gary" 23 [] []] "gary" [popop "err"] BTree-get')
J('''
[] [[0 'a'] [1 'b'] [2 'c']] [i BTree-add] step
'c' [popop 'not found'] BTree-get
''')
Now let's write a function that can return a tree datastructure with a key, value pair deleted:
tree key BTree-delete
---------------------------
tree
If the key is not in tree it just returns the tree unchanged.
So:
BTree-Delete == [pop not] swap [R0] [R1] genrec
[Er] BTree-delete
-------------------------------------
[pop not] [Er] [R0] [R1] genrec
[n_key n_value left right] [BTree-get]
[n_key n_value left right] [BTree-get] E
[n_key n_value left right] [BTree-get] T<
Now we get to figure out the recursive case:
w/ D == [pop not] [Er] [R0] [R1] genrec
[node_key node_value left right] key R0 [D] R1
[node_key node_value left right] key over first swap dup [D] R1
[node_key node_value left right] node_key key key [D] R1
And then:
[node_key node_value left right] node_key key key [D] R1
[node_key node_value left right] node_key key key [D] cons roll> [T>] [E] [T<] cmp
[node_key node_value left right] node_key key [key D] roll> [T>] [E] [T<] cmp
[node_key node_value left right] [key D] node_key key [T>] [E] [T<] cmp
Now this:;
[node_key node_value left right] [key D] node_key key [T>] [E] [T<] cmp
Becomes one of these three:;
[node_key node_value left right] [key D] T>
[node_key node_value left right] [key D] E
[node_key node_value left right] [key D] T<
[node_key node_value left right] [key D] T>
-------------------------------------------------
[node_key node_value left key D right]
First:
right left node_value node_key [key D] dipd
right left key D node_value node_key
right left' node_value node_key
Ergo:
[node_key node_value left right] [key D] [dipd] cons infra
So:
T> == [dipd] cons infra
T< == [dipdd] cons infra
[node_key node_value left right] [key D] E
We have to handle three cases, so let's use cond
.
The first two cases are symmetrical, if we only have one non-empty child node return it.
E == [
[[pop third not] pop fourth]
[[pop fourth not] pop third]
[default]
] cond
(If both child nodes are empty return an empty node.)
The initial structure of the default function:
default == [E'] cons infra
[node_key node_value left right] [key D] default
[node_key node_value left right] [key D] [E'] cons infra
[node_key node_value left right] [[key D] E'] infra
right left node_value node_key [key D] E'
If both child nodes are non-empty, we find the highest node in our lower sub-tree, take its key and value to replace (delete) our own, then get rid of it by recursively calling delete() on our lower sub-node with our new key.
(We could also find the lowest node in our higher sub-tree and take its key and value and delete it. I only implemented one of these two symmetrical options. Over a lot of deletions this might make the tree more unbalanced. Oh well.)
First things first, we no longer need this node's key and value:
right left node_value node_key [key D] roll> popop E''
right left [key D] node_value node_key popop E''
right left [key D] E''
Then we have to we find the highest (right-most) node in our lower (left) sub-tree:
right left [key D] E''
Ditch the key:
right left [key D] rest E'''
right left [D] E'''
Find the right-most node:
right left [D] [dup W] dip E''''
right left dup W [D] E''''
right left left W [D] E''''
Consider:
left W
We know left is not empty:
[L_key L_value L_left L_right] W
We want to keep extracting the right node as long as it is not empty:
left [P] [B] while W'
The predicate:
[L_key L_value L_left L_right] P
[L_key L_value L_left L_right] fourth
L_right
(This has a bug, can run on []
so must be guarded:
if_not_empty == [] swap [] ifte
?fourth == [fourth] if_not_empty
W.rightmost == [?fourth] [fourth] while
The body is also fourth
:
left [fourth] [fourth] while W'
rightest W'
We know rightest is not empty:
[R_key R_value R_left R_right] W'
[R_key R_value R_left R_right] uncons uncons pop
R_key [R_value R_left R_right] uncons pop
R_key R_value [R_left R_right] pop
R_key R_value
So:
W == [fourth] [fourth] while uncons uncons pop
And:
right left left W [D] E''''
right left R_key R_value [D] E''''
Final stretch. We want to end up with something like:
right left [R_key D] i R_value R_key
right left R_key D R_value R_key
right left' R_value R_key
If we adjust our definition of W
to include over
at the end:
W == [fourth] [fourth] while uncons uncons pop over
That will give us:
right left R_key R_value R_key [D] E''''
right left R_key R_value R_key [D] cons dipdd E'''''
right left R_key R_value [R_key D] dipdd E'''''
right left R_key D R_key R_value E'''''
right left' R_key R_value E'''''
right left' R_key R_value swap
right left' R_value R_key
So:
E' == roll> popop E''
E'' == rest E'''
E''' == [dup W] dip E''''
E'''' == cons dipdd swap
Substituting:
W == [fourth] [fourth] while uncons uncons pop over
E' == roll> popop rest [dup W] dip cons dipdd swap
E == [
[[pop third not] pop fourth]
[[pop fourth not] pop third]
[[E'] cons infra]
] cond
Minor rearrangement:
W == dup [fourth] [fourth] while uncons uncons pop over
E' == roll> popop rest [W] dip cons dipdd swap
E == [
[[pop third not] pop fourth]
[[pop fourth not] pop third]
[[E'] cons infra]
] cond
W.rightmost == [fourth] [fourth] while
W.unpack == uncons uncons pop
E.clear_stuff == roll> popop rest
E.delete == cons dipdd
W == dup W.rightmost W.unpack over
E.0 == E.clear_stuff [W] dip E.delete swap
E == [
[[pop third not] pop fourth]
[[pop fourth not] pop third]
[[E.0] cons infra]
] cond
T> == [dipd] cons infra
T< == [dipdd] cons infra
R0 == over first swap dup
R1 == cons roll> [T>] [E] [T<] cmp
BTree-Delete == [pop not] swap [R0] [R1] genrec
By the standards of the code I've written so far, this is a huge Joy program.
DefinitionWrapper.add_definitions('''
first_two == uncons uncons pop
fourth == rest rest rest first
?fourth == [] [fourth] [] ifte
W.rightmost == [?fourth] [fourth] while
E.clear_stuff == roll> popop rest
E.delete == cons dipdd
W == dup W.rightmost first_two over
E.0 == E.clear_stuff [W] dip E.delete swap
E == [[[pop third not] pop fourth] [[pop fourth not] pop third] [[E.0] cons infra]] cond
T> == [dipd] cons infra
T< == [dipdd] cons infra
R0 == over first swap dup
R1 == cons roll> [T>] [E] [T<] cmp
BTree-Delete == [pop not] swap [R0] [R1] genrec''', D)
J("['a' 23 [] ['b' 88 [] ['c' 44 [] []]]] 'c' ['Er'] BTree-Delete ")
J("['a' 23 [] ['b' 88 [] ['c' 44 [] []]]] 'b' ['Er'] BTree-Delete ")
J("['a' 23 [] ['b' 88 [] ['c' 44 [] []]]] 'a' ['Er'] BTree-Delete ")
J("['a' 23 [] ['b' 88 [] ['c' 44 [] []]]] 'der' ['Er'] BTree-Delete ")
J("['a' 23 [] ['b' 88 [] ['c' 44 [] []]]] 'der' [pop] BTree-Delete ")
One bug, I forgot to put not
in the first two clauses of the cond
.
The behavior of the [Er]
function should maybe be different: either just silently fail, or maybe implement some sort of function that can grab the pending expression up to a sentinel value or something, allowing for a kind of "except"-ish control-flow?
Then, once we have add, get, and delete we can see about abstracting them.
Let's consider a tree structure, similar to one described "Why functional programming matters" by John Hughes, that consists of a node value and a sequence of zero or more child trees. (The asterisk is meant to indicate the Kleene star.)
tree = [] | [node [tree*]]
treestep
¶In the spirit of step
we are going to define a combinator treestep
which expects a tree and three additional items: a base-case value z
, and two quoted programs [C]
and [N]
.
tree z [C] [N] treestep
If the current tree node is empty then just leave z
on the stack in lieu:
[] z [C] [N] treestep
---------------------------
z
Otherwise, evaluate N
on the node value, map
the whole function (abbreviated here as k
) over the child trees recursively, and then combine the result with C
.
[node [tree*]] z [C] [N] treestep
--------------------------------------- w/ K == z [C] [N] treestep
node N [tree*] [K] map C
Since this is a recursive function, we can begin to derive it by finding the ifte
stage that genrec
will produce. The predicate and base-case functions are trivial, so we just have to derive J
.
K == [not] [pop z] [J] ifte
The behavior of J
is to accept a (non-empty) tree node and arrive at the desired outcome.
[node [tree*]] J
------------------------------
node N [tree*] [K] map C
So J
will have some form like:
J == .. [N] .. [K] .. [C] ..
Let's dive in. First, unquote the node and dip
N
.
[node [tree*]] i [N] dip
node [tree*] [N] dip
node N [tree*]
Next, map
K
over teh child trees and combine with C
.
node N [tree*] [K] map C
node N [tree*] [K] map C
node N [K.tree*] C
So:
J == i [N] dip [K] map C
Plug it in and convert to genrec
:
K == [not] [pop z] [i [N] dip [K] map C] ifte
K == [not] [pop z] [i [N] dip] [map C] genrec
[not] [pop z] [i [N] dip] [map C] genrec
[not] [pop z] [i [N] dip] [map C] genrec
[not] [z] [pop] swoncat [i [N] dip] [map C] genrec
[not] z unit [pop] swoncat [i [N] dip] [map C] genrec
z [not] swap unit [pop] swoncat [i [N] dip] [map C] genrec
\ .........TS0............./
\/
z TS0 [i [N] dip] [map C] genrec
z [i [N] dip] [TS0] dip [map C] genrec
z [[N] dip] [i] swoncat [TS0] dip [map C] genrec
z [N] [dip] cons [i] swoncat [TS0] dip [map C] genrec
\ ......TS1........./
\/
z [N] TS1 [TS0] dip [map C] genrec
z [N] [map C] [TS1 [TS0] dip] dip genrec
z [N] [C] [map] swoncat [TS1 [TS0] dip] dip genrec
z [C] [N] swap [map] swoncat [TS1 [TS0] dip] dip genrec
The givens are all to the left so we have our definition.
treestep
¶ TS0 == [not] swap unit [pop] swoncat
TS1 == [dip] cons [i] swoncat
treestep == swap [map] swoncat [TS1 [TS0] dip] dip genrec
DefinitionWrapper.add_definitions('''
TS0 == [not] swap unit [pop] swoncat
TS1 == [dip] cons [i] swoncat
treestep == swap [map] swoncat [TS1 [TS0] dip] dip genrec
''', D)
[] 0 [C] [N] treestep
---------------------------
0
[n [tree*]] 0 [sum +] [] treestep
--------------------------------------------------
n [tree*] [0 [sum +] [] treestep] map sum +
J('[] 0 [sum +] [] treestep')
J('[23 []] 0 [sum +] [] treestep')
J('[23 [[2 []] [3 []]]] 0 [sum +] [] treestep')
Let's simplify the tree datastructure definition slightly by just letting the children be the rest
of the tree:
tree = [] | [node tree*]
The J
function changes slightly.
[node tree*] J
------------------------------
node N [tree*] [K] map C
[node tree*] uncons [N] dip [K] map C
node [tree*] [N] dip [K] map C
node N [tree*] [K] map C
node N [tree*] [K] map C
node N [K.tree*] C
J == uncons [N] dip [K] map C
K == [not] [pop z] [uncons [N] dip] [map C] genrec
define('TS1 == [dip] cons [uncons] swoncat') # We only need to redefine one word.
J('[23 [2] [3]] 0 [sum +] [] treestep')
J('[23 [2 [8] [9]] [3] [4 []]] 0 [sum +] [] treestep')
I think these trees seem a little easier to read.
BTree = [] | [[key value] left right]
What kind of functions can we write for this with our treestep
? The pattern for processing a non-empty node is:
node N [tree*] [K] map C
Plugging in our BTree structure:
[key value] N [left right] [K] map C
[key value] uncons pop [left right] [K] map i
key [value] pop [left right] [K] map i
key [left right] [K] map i
key [lkey rkey ] i
key lkey rkey
J('[[3 0] [[2 0] [] []] [[9 0] [[5 0] [[4 0] [] []] [[8 0] [[6 0] [] [[7 0] [] []]] []]] []]] 23 [i] [uncons pop] treestep')
Doesn't work because map
extracts the first
item of whatever its mapped function produces. We have to return a list, rather than depositing our results directly on the stack.
[key value] N [left right] [K] map C
[key value] first [left right] [K] map flatten cons
key [left right] [K] map flatten cons
key [[lk] [rk] ] flatten cons
key [ lk rk ] cons
[key lk rk ]
So:
[] [flatten cons] [first] treestep
J('[[3 0] [[2 0] [] []] [[9 0] [[5 0] [[4 0] [] []] [[8 0] [[6 0] [] [[7 0] [] []]] []]] []]] [] [flatten cons] [first] treestep')
There we go.
treestep
.¶From here:
key [[lk] [rk]] C
key [[lk] [rk]] i
key [lk] [rk] roll<
[lk] [rk] key swons concat
[lk] [key rk] concat
[lk key rk]
So:
[] [i roll< swons concat] [first] treestep
J('[[3 0] [[2 0] [] []] [[9 0] [[5 0] [[4 0] [] []] [[8 0] [[6 0] [] [[7 0] [] []]] []]] []]] [] [i roll< swons concat] [uncons pop] treestep')
Let's reexamine:
[key value left right] R0 [BTree-iter-order] R1
...
left BTree-iter-order key value F right BTree-iter-order
[key value left right] unstack swap
key value left right swap
key value right left
key value right left [BTree-iter-order] [cons dipdd] dupdip
key value right left [BTree-iter-order] cons dipdd [BTree-iter-order]
key value right [left BTree-iter-order] dipdd [BTree-iter-order]
left BTree-iter-order key value right [BTree-iter-order]
left BTree-iter-order key value right [F] dip [BTree-iter-order]
left BTree-iter-order key value F right [BTree-iter-order] i
left BTree-iter-order key value F right BTree-iter-order
So:
R0 == unstack swap
R1 == [cons dipdd [F] dip] dupdip i
[key value left right] R0 [BTree-iter-order] R1
[key value left right] unstack swap [BTree-iter-order] [cons dipdd [F] dip] dupdip i
key value right left [BTree-iter-order] [cons dipdd [F] dip] dupdip i
key value right left [BTree-iter-order] cons dipdd [F] dip [BTree-iter-order] i
key value right [left BTree-iter-order] dipdd [F] dip [BTree-iter-order] i
left BTree-iter-order key value right [F] dip [BTree-iter-order] i
left BTree-iter-order key value F right [BTree-iter-order] i
left BTree-iter-order key value F right BTree-iter-order
BTree-iter-order == [not] [pop] [unstack swap] [[cons dipdd [F] dip] dupdip i] genrec
cons cons
¶cons2 == cons cons
Refactoring:
BTree-new == swap [[] []] cons2
T == [cons2 dipdd] cons2 cons infra
Te == [cons2 dipd] cons2 cons infra
Ee == pop swap roll< rest rest cons2
It's used a lot because it's tied to the fact that there are two "data items" in each node. This point to a more general factorization that would render a combinator that could work for other geometries of trees.
A general form for tree data with N children per node:
[[data] [child0] ... [childN-1]]
Suggests a general form of recursive iterator, but I have to go walk the dogs at the mo'.
For a given structure, you would have a structure of operator functions and sort of merge them and run them, possibly in a different order (pre- post- in- y'know). The Cn
functions could all be the same and use the step
trick if the children nodes are all of the right kind. If they are heterogeneous then we need a way to get the different Cn
into the structure in the right order. If I understand correctly, the "Bananas..." paper shows how to do this automatically from a type description. They present, if I have it right, a tiny machine that accepts some sort of algebraic data type description and returns a function that can recusre over it, I think.
[data.. [c0] [c1] ... [cN]] [F C0 C1 ... CN] infil
--------------------------------------------------------
data F [c0] C0 [c1] C1 ... [cN] CN
[F]
a parameter.¶We can generalize to a sort of pure form:
BTree-iter == [not] [pop] [[F]] [R1] genrec
== [not] [pop] [[F] [BTree-iter] R1] ifte
Putting [F]
to the left as a given:
[F] unit [not] [pop] roll< [R1] genrec
[[F]] [not] [pop] roll< [R1] genrec
[not] [pop] [[F]] [R1] genrec
Let's us define a parameterized form:
BTree-iter == unit [not] [pop] roll< [R1] genrec
So in the general case of non-empty nodes:
[key value left right] [F] [BTree-iter] R1
We just define R1
to do whatever it has to to process the node. For example:
[key value left right] [F] [BTree-iter] R1
...
key value F left BTree-iter right BTree-iter
left BTree-iter key value F right BTree-iter
left BTree-iter right BTree-iter key value F
Pre-, ??-, post-order traversals.
[key value left right] uncons uncons
key value [left right]
For pre- and post-order we can use the step
trick:
[left right] [BTree-iter] step
...
left BTree-iter right BTree-iter
We worked out one scheme for ?in-order? traversal above, but maybe we can do better?
[key value left right] [F] [BTree-iter] [unstack] dipd
[key value left right] unstack [F] [BTree-iter]
key value left right [F] [BTree-iter]
key value left right [F] [BTree-iter] R1.1
Hmm...
key value left right [F] [BTree-iter] tuck
key value left right [BTree-iter] [F] [BTree-iter]
[key value left right] [F] [BTree-iter] [unstack [roll>] dip] dipd
[key value left right] unstack [roll>] dip [F] [BTree-iter]
key value left right [roll>] dip [F] [BTree-iter]
key value left roll> right [F] [BTree-iter]
left key value right [F] [BTree-iter]
left key value right [F] [BTree-iter] tuck foo
left key value right [BTree-iter] [F] [BTree-iter] foo
...
left BTree-iter key value F right BTree-iter
We could just let [R1]
be a parameter too, for maximum flexibility.
If I understand it correctly, the "Bananas..." paper talks about a way to build the processor function automatically from the description of the type. I think if we came up with an elegant way for the Joy code to express that, it would be cool. In Joypy the definitions can be circular because lookup happens at evaluation, not parsing. E.g.:
A == ... B ...
B == ... A ...
That's fine. Circular datastructures can't be made though.