Saturday, June 12, 2010

RAII vs finally

Since I'm pretty new to C++, I wasn't too deeply familiar with RAII; like most Schemers I just thought of it as "C++'s version of dynamic-wind."

This week I learned an important distinction between C++ destructors and Java's finally. The latter, of course, unilaterally executes when the body terminates, regardless of how or when it terminates. The thing that gives destructors more expressiveness for dealing with cleanup is that they only execute for the objects that have been initialized. This means that if control exits a block after only half of the stack-local objects have been constructed, only those half of the objects have their destructors invoked. With finally, all that bookkeeping is the responsibility of the programmer.

(That said, I still see RAII used all over the place to construct awkward, special-purpose classes whose sole purpose is to run some cleanup code. In these cases, having to create a named object and a named class to go along with it is pretty perverse.)

Monday, May 03, 2010

A Theory of Typed Hygienic Macros

A Theory of Typed Hygienic Macros
PhD Dissertation, 2010

We present the λm-calculus, a semantics for a language of hygienic macros with a non-trivial theory. Unlike Scheme, where programs must be macro-expanded to be analyzed, our semantics admits reasoning about programs as they appear to programmers. Our contributions include a semantics of hygienic macro expansion, a formal definition of α-equivalence that is independent of expansion, and a proof that expansion preserves α-equivalence. The key technical component of our language is a type system similar to Culpepper and Felleisen’s “shape types,” but with the novel contribution of binding signature types, which specify the bindings and scope of a macro’s arguments.

Friday, April 23, 2010

Effective ML

Yaron Minsky has giving great advice on effective ML lately. For my money, the single most important lesson is #3: make illegal states unrepresentable. There are so many benefits-- and I've seen the awful drawbacks of not following it.

Thursday, April 22, 2010

Cyclic reference graphs FTW

I've just created my new Mozilla-hosted blog! I don't have intention of stopping this one here, but I'm moving my Mozilla-related (and consequently ECMAScript-related) thoughts to that space.

Also, um, I should hopefully have some pretty good news in a couple weeks.

Wednesday, April 14, 2010

Single-frame continuations for Harmony, ctd

Continuing last week's series of posts on single-frame continuations for ECMAScript Harmony:

The inimitable Waldemar Horwat points out the flaw in my strawman proposal (strawman strawman?):
Since in the final expression you have A(... A(x)), you'll just end up executing the same finally block twice under certain circumstances.
This program demonstrates the bug:
function captured() {
try {
handler->();
throw "throw";
}
finally {
alert("finally!");
}
}

function handler(k) {
k();
}
The captured function calls handler, with the finally clause on the stack. Then handler invokes the captured continuation, which places the finally clause on the stack again. So after throwing an exception, unwinding the stack passes through two copies of the finally clause.

The takeaway for me is that a capturing mechanism that involves a call to a user function has a pigeonhole problem: the suspended continuation ought to capture the catch and finally clauses of the function activation, but the call to the handler also ought to be protected by those same catch and finally clauses. And yet the finally block should only executed once per activation. Note that generators do not suffer from this problem, since yield immediately suspends and aborts the activation, without calling user code in the middle.

For what it's worth, I tried the above program with NarrativeJS and it died with an internal error. But I didn't investigate further, so that may have been a mistake on my part. (It's difficult to tell what NJS should do since the docs don't really specify its semantics.) Nonetheless, I'm starting to think that finally (and in particular, the intended invariant that finally happens at most once per activation) renders unworkable any attempt to combine continuation-capture with a function call.

Actually, there are two more takeaways: 1) the notation of evaluation contexts is a nice way to describe and demonstrate the bug--notice Waldemar's wording!--and 2) thank goodness for Waldemar.

Wednesday, April 07, 2010

Harmony first-class activations

This morning I've been blogging about single-function activations for Harmony, the history and design space of continuations, and informal models and design sketches for control operators. A few more thoughts here.

One-shot continuations

(I won't try to model this in the framework I sketched earlier, because it introduces mutation, and then we have to add a heap to the model, and it gets mucky enough not to be worth it here.)

You can limit the expressivity of continuations to only allow control to enter them at most once. This is totally compatible with most any design we pick along the other dimensions. For example, in the semantics I sketched earlier, if the callee throws an exception, we throw back into the activation, and it becomes marked as un-reusable. So if the callee saved the captured continuation, invoking it later would be an error. But if the callee calls the captured continuation before returning and then returns normally, the caller skips past the continuation and returns normally.

There's one very subtle aspect of multi-shot continuations that is probably their single biggest liability: finally. In ES, when you execute a try block with a finally block, the finally block is guaranteed to be executed exactly once (assuming the body of the try block completes without an infinite loop and the program isn't abruptly terminated). Now, even with one-shot continuations, it's possible that you'll suspend the continuation before the try block completes and never resume it. But if you do resume it and the try block completes, it'll still execute the finally block exactly once.

With multi-shot continuations, you can resume that code as many times as you like, and that finally block will get executed again and again. (This is analogous to Common Lisp's unwind-protect and Scheme's dynamic-wind.) This is pretty subtle, and makes it harder to write correct "clean-up" code.

Implementing JS1.7 generators

One-shot, single frame continuations should be expressive enough to implement JS1.7 generators pretty easily. I sent a draft implementation of generators via one-frame continuations to the es-discuss list. As soon as there's a prototype implementation of single-frame continuations somewhere, it should be possible to test that code (mutatis mutandis).

My current preference

So far I sort of prefer the semantics I described earlier today, with one-shot continuations. But I need to implement and experiment before I trust my opinion. There are certainly questions of API and syntax design that aren't addressed by my design sketches. For that I'd really prefer to look at real callback-heavy DOM code and see what would fit the most smoothly.

Thinking about continuations

Following up on my previous posts about continuations and Harmony, I want to show how I use the modeling tools of evaluation contexts and operational semantics to sketch out informal designs. I'll use a plausible syntax for a new Harmony operator (based on Neil Mix's NarrativeJS) and start with a reasonably simple semantics based on the well-known shift/reset operators, and sort of work out from there.

I'll keep this as light and informal as possible. To stay on point, I'll ignore most of the ES features-- no mutation (so we don't have to model the heap / object graph), and not even scope chains. For a faithful and precise model, we'd need all of these things. But for design sketches, I can afford to be sloppier.

Stacks and activations

I'll model the stack as a non-empty sequence of function activations:
S ::= A | S(A)
An activation is a function body, but we want to highlight the current statement being executed or expression being evaluated (aka the code pointer). So I'll model the activation as two things: the next bit of code to execute (a statement or expression), and the function body without that bit of code. If the code pointer is at an expression, the activation is a function body with a placeholder that I'll spell () (an "expression hole"); if it's a statement, the activation is a function body with {()} (a "statement hole") in place of the current statement.

For the curious, what I just did was -- informally -- define evaluation contexts for ES.

If that was a little too opaque, here's an example. Let's say we have a function
function foo(x) { f(x); g(x, 2 * x, null); return x + 4; }
If we call foo(42), then at first, our current code is f(42); and our activation is
{ {()} g(42, 2 * 42, null); return 42 + 4; }
When we're about to evaluate the second argument to g, then the current code is 2 * 42 and the current activation is
{ g(42, (), null); return 42 + 4; }
When we get to the return statement, current code is return 42 + 4; and the current activation is
{ {()} }
Notice how after we execute some code, it's gone from the activation; the activation just needs to keep track of the code we have left to run.

For talking about the activation and the current code together, I'll write A(expr) or A{stmt}. This is what people call "plugging" code into the hole of a context. Essentially, it's just a modeling trick that makes it convenient to call out where the current code pointer is, but it also turns out to be really natural for talking about first-class control operators-- because it explicitly models the very things we want to manipulate.

Operational semantics

The above is more or less enough to start writing simple transition rules that describe how programs run, e.g.:
S(A{return val;}) → S(val)
In English, this rule says: "in a stack starting with S and with activation A on top, when the next code to execute is the statement return val; (for some value val), then remove the current activation and place val in the expression hole of the stack."

One thing to make clear: this transition arrow describes a runtime state transition of the program, not a compile-time transformation.

Starting with shift and reset

Here's how a design based on shift and reset might work, specified in one line:
S(A(val->(val1, ..., valn))) → S(val(val1, ..., valn, function(x) A(x)))
The special syntax of a capturing function call comes is from NarrativeJS. The transition rule describes several things:
  1. After evaluating the callee and its arguments, the operator immediately aborts the current activation and saves it in a function.
  2. Next, it calls the callee with its arguments along with the captured continuation value as an additional argument.
  3. If the captured continuation is called, it executes the captured function activation, with its new argument plugged into the hole of that activation.
There's a serious problem with this semantics for ECMAScript: exception handlers. Aborting A will discard any try or finally blocks in A before we call the function. So you'd get surprising behavior like:
function mightFail(k) {
// ...
throw "boo!";
}
function main() {
try {
mightFail->()
}
catch (x) {
return 42
}
}
main() // uncaught exception: "boo!"
A pretty clear violation of principle of least astonishment.

When to abort

Having the semantics in such a concise format makes it easy to tweak and experiment with. Let me modify the above semantics in the following way: instead of aborting before we call the callee, let's abort the activation after we return from the call. That way, if it throws exceptions, they'll be caught by any handlers in A.

We'll have to invent an expression form that aborts the current function activation after evaluating an expression. This is pretty much just like return, exception it's an expression form.
S(A(val->(val1, ..., valn)))
→ S(A(RETURN val(val1, ..., valn, function(x) A(x))))
Actually, we could do this without positing a magic RETURN operator if we have something like the newly-proposed let expressions, which allow you to execute statements inside an expression. Then we'd have:
S(A(val->(val1, ..., valn)))
→ S(A(let (x = val(val1,..., valn, function(x) A(x))) {return x;}))
So that's a brief demo of using evaluation contexts and operational transition rules to sketch the semantics of continuation operators.

Edit: I forgot to mention, it's easy enough to specify the behavior of RETURN:
S(A(RETURN val)) → S(val)
As I say, just like return statements, but as an expression form. To be clear, this is just a specification mechanism, not something I'd suggest as a language feature.

The design space of continuations

I mentioned earlier that the design space for first-class continuations has a lot of historical research behind it. This list of papers on continuations up to around 2005 is a start, but by no means complete. (The history of continuations is actually pretty astoundingly far-reaching. Apparently my PhD advisor's PhD advisor was one of a number of people who independently discovered continuations in different research programs and under different guises.)

The research literature has been helpful to me in understanding the design space of control operators better, particularly because it gives me good models for reasoning, formally or informally, about control effects.

Here are some of the design questions raised by the research literature and good modeling frameworks. This isn't a complete list, and it isn't the place to cite adequately. All I'm interested in is highlighting some of the takeaways.

How much of the continuation can be captured?

Scheme's call/cc allows you to capture the "whole" continuation, up to wherever the language implementor decides is the limit. But delimited continuations make this boundary more explicit, which has a couple benefits. First, by installing a delimiter, you can prevent code that you call from capturing parts of your continuation that you want to keep private. Second, when you capture a continuation, you have a clearer picture of what you're capturing.

What elements of the continuation are captured?

The answer at first seems obvious: you capture the control context and the environment (aka scope chain). But it gets trickier when you have things like exception handlers and other information associated with the dynamic state of the control context.

What kinds of delimiters do you give to users?

In our case, we're only interested in an implicit delimiter at function boundaries. But there are a number of different designs of control delimiters.

What kinds of control-abort operators do you give to users?

Turns out there's at least one in every C-like language: return. That's probably enough for our purposes. But there are several possibilities there, too, and they can get surprisingly subtle.

Does executing a captured continuation install a new delimiter?

This is the key difference between Felleisen's F/# operators [*] and Danvy and Filinski's shift/reset operators. With the former, you capture a continuation up to the nearest delimiter, but when you invoke the captured continuation, there's no new delimiter. With the latter, a captured continuation reinstalls a new delimiter every time you invoke it.

How many times can you enter a continuation?

The most general design allows a captured continuation to be used any number of times. With "one-shot" continuations, you can only enter a continuation once. Notice that "entering a continuation" could mean a number of things: returning normally, unwinding it by throwing exceptions, or calling into it later via a captured continuation.

When do we abort the current continuation?

This one is really, really variable. There are many plausible points in the semantics where an exit can occur, and many of them lead to plausible but distinct designs.

[*] That's pronounced "eff" and "prompt" -- no relation to F#. I just noticed that!

Delimited continuations? In ECMAScript?

Well, no.

Besides breaking a lot of language invariants, first-class continuations would be a nightmare for portability, since different implementations implement different API's natively. Either you mandate capturing continuations across native frames, which is a hardship for implementers and a performance-killer, or you don't, which causes observably different behavior across browsers when they place native delimiters at different points in the continuations-- leading to lots of painful bug reports and nasty market dynamics where everyone has to tailor their implementations to track the most popular.

So that won't happen. But!

JS 1.7 introduced Pythonic generators, which allow you to suspend a single function activation. This alleviates some of the painful CPS patterns people have to use on the web to interact with the event-loop-concurrent, single-thread-of-control semantics of JavaScript on the web. But it's also a little ad hoc. And Kris Zyp recently made the helpful suggestion that we could explore a design for a simpler, more general single-frame continuation-capture operation for Harmony.

What we're talking about here is a restricted version of delimited continuations: the delimiters are implicit--they're at function boundaries. But otherwise it's very much in that space. And there's a loooot of prior work on this topic. I'll post later with a little background and some notes about the design space of delimited continuations for ES.

Saturday, February 27, 2010

Generalizing Javadot

The idea of Javadot is to allow the rich lexical syntax of Lisp and Scheme with the elegance of the dot-notation from the C tradition by simply allowing scope to trump lexical splitting: if foo.bar is in scope as an identifier, then it parses as a variable reference; otherwise it parses as "foo" "." "bar". It's a simple compromise, it's easy to understand, it plays well with lexical scope, and (in an infix language) you can always circumvent it with whitespace. (I don't think it needs to complicate parsing too much, either, since you can do a post-hoc splitting of the lexeme, rather than forcing the lexer to understand scope the way you're forced to with the C typedef ambiguity).

My question is: couldn't you use this idea in an infix language, and generalize it to work for all infix operators? This would allow the use of other common operators such as -, +, *, /, <, and >, all of which I've loved being able to use in identifier names in Scheme, and all of which I also really like being able to use as infix operators.

Friday, February 12, 2010

Eich's Law

Found this gem in a C++ comment while digging in the SpiderMonkey codebase:
After much testing, it's clear that Postel's advice to protocol designers ("be liberal in what you accept, and conservative in what you send") invites a natural-law repercussion for JS as "protocol":

"If you are liberal in what you accept, others will utterly fail to be conservative in what they send."
The comment is unsigned, but it sounds like Brendan.

Monday, January 25, 2010

Wading into the C

No time for deep thoughts these days; too much hacking, dissertating, designing, and committeefying going on. Just a couple notes based on recent experiences hacking in C/C++:

1. Not being able to rely on recursion makes me sad.

2. "Downwards macro-args" in C:
#define MY_ENUM_LIST(m) \
m(RED, 0), \
m(GREEN, 1), \
m(BLUE, 2)

#define DEF_ENUM_ENTRY(c, v) c = v
#define QUOTE_ENUM_ENTRY(c, v) #c

typedef enum rgb {
MY_ENUM_LIST(DEF_ENUM_ENTRY)
} rgb;

const char *rgb_names[] = {
MY_ENUM_LIST(QUOTE_ENUM_ENTRY)
};
3. I am fast becoming acquainted with gdb.

4. And a few choice command-line shortcuts really save extraordinary amounts of time. My new fav: the !? bash-history shortcut.