Tuesday, May 24, 2005

Argument order

For multiple-arity functions, it would be more abstract to pass arguments by label rather than order. Just like pattern matching is more abstract than using explicit selectors, passing in arguments by label allows you to pass arguments in any order, and would allow for a more general form of currying. It would also simplify partial evaluation, since you wouldn't have to rearrange a function's arguments to apply it to a subset of them.

Nevertheless, tuples are a lighter-weight data structure than records, and lighter-weight solutions are easier to write and often easier to read. Positional function arguments are here to stay.

Because the order of arguments isn't always particularly relevant, the order can seem arbitrary. As the designer of a procedure, it's sometimes hard to know what order to choose. Here's one criterion: think about currying. It can be useful to put the arguments expected to be known earlier before the arguments expected to be known later.

For example, in Syntactic Abstraction in Scheme, the heart of the definition of the macro expansion algorithm is the rule for macro applications:
expand(e, r) =
  case parse(e, r) of
    ...
    macro-application(i, e) → expand(mark(t(mark(e, m)), m), r)
      where t = r(resolve(i)) and m is fresh
This is pretty hard to read. Without getting too much into the specific details of the algorithm, the idea of this rule is that to expand a macro application, you mark the expression with a fresh mark, apply the macro transformer t to the marked expression, then mark the output of that transformer again with the same mark, and finally expand the result.

There's a functional pipeline buried in that rule, but it's obscured by the order of arguments. This is actually a good candidate for point-free style. Since we expect the environment argument r for expand to be known before the expression, and the mark argument m for mark to be known before the expression argument, let's swap the order of arguments to both those functions. Now we get the new rule
...
macro-application(i, e) → ((expand r) ⋅ (mark m) ⋅ t ⋅ (mark m)) e
...

Thursday, May 19, 2005

Freshness... with respect

In A New Approach to Abstract Syntax with Variable Binding, Pitts and Gabbay describe a "freshness" quantifier for their variant of set theory, where you can quantify over fresh names. This allows you to describe freshness conditions in a more formal way than the usual "x does not occur in M," and even reason about fresh names with formal logic. The quantification N x can be read: "for fresh x ..." This has a cool, dual universal/existential property: because the ambient universe of names is infinite, it's always true that there exists a fresh x, and because we don't care which one we choose, any proposition we prove using the freshness quantifier holds for all such fresh names.

What confused me for a while was that the quantifier just says "x is fresh;" it doesn't say "x is fresh with respect to M". But it seemed like if you take a name that's fresh at one point, but then use it in another context, you might have accidentally picked a name that conflicted with a name in the new context. In other words, I was concerned that having a local freshness condition without talking globally about all the names you're ever going to use doesn't carry enough information to avoid an incorrect choice of fresh name.

You have to dig a little deeper to see why this isn't a problem. The real meaning of the quantifier comes from the idea that the universe of names is infinite, but the construction of any particular term only involves a finite number of names. So not only can you find a fresh name at any point; but you can in fact choose any name other than the ones used in that term. So all but finitely many names are fresh. This is the more precise interpretation of N x: "for all but finitely many names x ..."

So the freshness quantifier isn't actually committing to a name. It's just saying that there's a huge set of names for which the proposition holds. Thus moving in and out of scope of particular terms doesn't affect the truth of a proposition built with the quantifier, because while the particular set of fresh names might change, the fact that that set involves all but a finite number of names does not. This allows us to abstract away from the actual choice of fresh names in a program when we're writing propositions about them.

Friday, May 13, 2005

The lambda machine

If the lambda calculus is a minimalist assembly language for modeling computation, then we can look at a lambda program as a mathematical black box with two possible outputs: halt and loop. These "outputs" of course correspond to whether a reduction in the abstract machine terminates, but we could just as well give them arbitrary labels, say, true and false.

When you encode a datatype and a computation in the lambda calculus, you might want to check that your computation gives the correct result, not just whether it halts. But the theorem will still be stated as a truth-valued proposition: either the computation got the right result, or it didn't. So assuming the encoding includes operations that can compare two results, you can always wrap the computation in an observer that drives the "lambda machine" to produce its true output (halt) if the computation produced the right result, and false output (loop forever) if the computation produced anything else.

For example, say we want to show that two different implementations of a function, f and f′, always produce the same result. We could do this directly by comparing their results, but this involves reasoning about the encoded result values. Since the datatype already has operations that simulate the inspection of its values, this would in a sense be a duplication of effort. Instead, we can use the observer operations--an equality test, for example--to compare the results to an expected value and then either halt or loop.

Thus if there is a context in which f and f′ are not equivalent, that is, in which they produce different results, then this implies that there is a context in which f causes the machine to halt and f′ causes the machine to loop. The statement of contextual equivalence is usually given as the contrapositive: if there are no such contexts, that is, in all contexts they either both halt or both loop (they "co-terminate"), then f and f′ are equivalent.

So these two outputs from the lambda machine are sufficient to formulate the statement of any proposition about expressions, so long as it depends only on properties that are observable within the lambda calculus.

How to read a semantics

Some tips for myself on how to read a semantics:
  1. Follow the textual description first. Usually they have thought about an order in which you should try to understand the rules.
  2. If you get stuck on one of the rules, circle the part that's confusing or write down what you don't get and skip it. Don't spend too long on any one part. (More often than not I embarrass myself by beating my head over something confusing only to find that it's explained in the next paragraph.)
  3. To understand the role a particular expression form plays, read its introduction and elimination rules. To understand the role an environment plays, read its extension and lookup rules. To understand any one of these rules, read its dual rule. Most of these rules make much more sense in pairs.
Another couple of general tips: take notes on the semantics and actually copy down the rules, but don't copy down the rules that are simple structural inductions or unchanged from previous variations. Also, take breaks. And never get caught reading papers after 2am.

Tuesday, May 10, 2005

Lambda-encoding derivation, take 2

I had the basic idea of the lambda-encoding of lambda terms right, but I was sloppy with the type derivation. The important piece that I missed was that the result type of the continuation needs to appear in two places. In general, we say a type α is equivalently representable as a function from continuations to final results:
α = (α → ο) → ο
Or you can think of this in logical terms as the equivalence
α = (α ⇒ ο) ⇒ ο
I can prove this easily with classical logic, but I had trouble finding resources presenting the axioms and standard equivalences of intuitionistic logic, and I don't feel like trying to come up with a correct, intuitionistic derivation.

The derivation of disjoint unions as pure arrow types is simple, at least using the normal rules of classical logic:
  α ∨ β
= ((α ∨ β) ⇒ ο) ⇒ ο
= ((α ⇒ ο) ∧ (β ⇒ ο)) ⇒ ο
= (α ⇒ ο) ⇒ (β ⇒ ο) ⇒ ο
Notice how the currying makes much more sense than it did last time, now that the second occurrence of the result type is there. Implication and conjunction are obviously not equivalent (duh)! But you can curry a conjunction that occurs on the left-hand side of an implication.

Again, I'm not sure how hard this is to prove intuitionistically. But assuming that goes through okay, we get the type equivalence:
α + β = (α → ο) → (β → ο) → ο

Saturday, May 07, 2005

Generative vs. generative

Crap. I just realized that I was borrowing a term to describe macros that already has another meaning in the macro literature. Generative macros, according to Ganz et al, are macros that do not inspect their arguments syntactically. I was using the term as I think it's used in the parameterized module literature, to mean procedures that produce different results every time they are invoked. Mumble.

Lambda-encoded lambda terms

In The Theory of Fexprs is Trivial, Mitch encodes a datatype in the lambda calculus that itself represents the abstract syntax of lambda calculus terms. The encoding trick involves both higher-order abstract syntax and some cute type equivalences.

Here's a data definition of the abstract syntax terms:
data Term α = Var α
| Abs (α → Term α)
| App (Term α) (Term α)
Notice that the variant representing abstractions is itself encoded using a (meta-language) abstraction. So we can represent the program λx.(x x) as:
Abs (λx.(App (Var x) (Var x)))
This is already a useful hack, because we don't have to come up with a datatype to represent variables, and if we wanted to deal with substitution it would be handled automatically by the substitution mechanisms of the meta-language.

But we need to have some way of representing algebraic datatypes. For this we use the following equivalences:
  α + β → ο
≈ (α → ο) × (β → ο)
≈ (α → ο) → (β → ο)
So to reduce the implementation of a k-variant disjoint union to pure lambda calculus, we CPS the values and split out their continuations into k separate partial continuations. Thus we get our final encoding of the abstract syntax of lambda terms:
⌈x⌉     = λabc.ax
⌈λx.M⌉ = λabc.b(λx.⌈M⌉)
⌈(M N)⌉ = λabc.c(⌈M⌉ ⌈N⌉)
Update: I hadn't made the Term datatype polymorphic in its variable type. I think this is right now.