Wednesday, April 30, 2008

Dynamic languages need modules

A dynamic language starts as an implementation. And that implementation almost always includes some variation on a REPL. Look at Lisp, Scheme, Python, Ruby, JavaScript, Lua, etc. One of the things that makes these languages "dynamic" is that they're attached to some long-running process: an IDE, a user prompt, an embedding application, a web page. So these languages are simply implemented with some global table of definitions that gets updated through the lifetime of the host process. Easy enough to understand from the implementor's perspective, but what happens to the user of the embedded language?

The problem with the "top-level" (that ever-changing table of program definitions) is the creeping specter of dynamic scope. While static vs. dynamic typing may be a debate that civilization takes to its grave, the dust seems to have more or less settled on dynamic scoping. The fact is, even though many decisions a program makes must depend on its context, it's very hard to understand a program if you can't nail down its definitions.

To some degree, if a dynamic language has lambda, you can escape the top-level with a design pattern that simulates a poor man's module. The module pattern is almost always a variation of what Schemers call "left-left-lambda"--the immediate application of a function literal. Bindings inside the lambda are no longer floating in that airy and transient stratosphere of the top-level; they're nailed to the function that is being applied. And you know that function is being applied only once, because once you've created it, it's applied an discarded.

This pattern goes a long way, and if you have macros, you can create sugar to give the pattern linguistic status. But a module system it ain't.

Linking: Nothing in this pattern deals with the relationships between modules. There's no way to declare what a module's imports and exports are. In fact, if you want a module to communicate with any other modules, the top-level's poison tends to seep back in. To export a value, a lambda-module can mutate some global variable, or it can return a value--but where does the caller save the value? You can always nest these solutions within more lambda-modules, but ultimately there's a problem of infinite regress: in the end you have to have at least one special top-most module surrounding your program.

Separate development: And that's only if you have control over the whole program. If you want to create a library and share it, there needs to be some shared common framework in which people and organizations can share code without stomping on each other's invariants or polluting each other's global environments. To be sure, a built-in module system doesn't eliminate all such issues (you still need conventions for naming and registering modules within a common framework), but modules help to standardize on these issues, and they can provide helpful errors when modules step on each other's toes, rather than silently overwriting one another.

Loading: There's not much flexibility in the loading of an immediately applied function. If your language involves multiple stages of loading, the implementation may be able to be smarter about loading and linking multiple modules at once.

Scoping conveniences: Lexical scope is a widget in the programmer's UI toolkit, and for different scenarios, there are different appropriate designs. The tree shape of expressions makes the lexical scoping rule ("inner trumps outer") appropriate; it favors the local over the global. But, ignoring nested modules for the moment, modules aren't tree shaped; they're more like a global table. In a sense, all modules are peers. So when you import the same name from two different modules, which one should win? You could say that whichever you import later wins, but this is much more subtle than the obvious nesting structure of ordinary variable bindings. I find it's more helpful for the module system to give me an error if I import the same name from different sources (unless it's a diamond import). Other useful facilities are selective import, import with renaming, or import with common prefixes. These are subtle usability designs where modules differ from lambda.

Extensibility: In PLT Scheme, we've used the module system as the point for language design and extension. By allowing modules to be parameterized over their "language", we have a natural way for introducing modalities into PLT Scheme. As languages grow, these modalities are an inevitability (cf. "use strict" in Perl and ECMAScript Edition 4). Buried within pragmas or nested expressions, this makes the design of the language much harder. But within a module, bindings are sacrosanct and interactions with other modules are limited to imports and exports. This significantly cuts down on the space of possible interactions and interferences between the language's modalities. As an example, Sam Tobin-Hochstadt has made good use of this for the design of Typed Scheme, a statically typed modality of PLT Scheme that can still interact reliably with dynamically typed modules.

The unrestricted mutation of the top-level environment is a good thing thing for many purposes: interactive development, self-adjusting execution environments, etc. But it's terrible for nailing down program definitions. Modules are a way of circumscribing a portion of code and declaring it "finished". It can still be useful to have an environment where modules can be dynamically loaded and possibly even replaced, but it's critical for the language to provide the programmer with basic invariants about the definitions within a module.

All of this is stuff I wish I'd had a clearer head about earlier in the ES4 process. But I hope that, down the road, we'll consider a module system for ECMAScript.

Monday, April 28, 2008

Literate code

I used to find the notion of literate code attractive, but right now my dissertation prototype implementation is undergoing some serious redevelopment and the horribly out-of-date docs are dragging me down. Now I find myself wishing I'd kept them separate.

Tuesday, April 22, 2008

How to spell StopIteration

Some of the pseudo-constants in JavaScript are actually mutable global variables: undefined is a notorious example. Luckily, there's generally some reliable way to generate such a value without having to hope that no one has reassigned a global variable.

For undefined, it's the void operator, which, given any argument, produces the value that undefined is initially bound to. Conventionally, people write void(0).

In JavaScript 1.7 and later, there's a special constant called StopIteration that's thrown after an iterator runs out of values. This too is a mutable global. But this value is a little trickier to get at reliably. Here's the simplest expression I could come up with that produces the StopIteration value:
(function(){try{(function(){true||(yield)})().next()}catch(e){return e}})()
The inner function is a generator by virtue of textually containing the yield keyword, but it returns immediately, yielding no values, because of the short-circuited logical or. (Note that because of yield's finicky grammar, it has to be parenthesized.) So by calling next() on it, it immediately raises StopIteration, which the outer function catches and returns.

Sunday, April 20, 2008

Compilation, obfuscation, encryption

It's interesting that we tend to use compilation as a proxy for encryption. It seems to me it would help clarify the system design of a language semantics and architecture if you separate the concerns of code ownership with performance. Compilation is primarily used to cache the transformation of one semantics into another--a partial evaluation of an interpreter. Using this as a weak encryption mechanism is pretty silly. If you want to encrypt program source, why not use real encryption? That way you can use whatever representation of the code you want; source, byte-code, microcode, whatever.

Friday, April 11, 2008

A poem for the weekend

Listen to Mustn'ts, child, listen to the Don'ts.
Listen to the Shouldn'ts, the Impossibles, the Won'ts.
Listen to the Never Haves, then listen close to me.
Anything can happen, child, Anything can be.

--Shel Silverstein

Tuesday, April 08, 2008

Different perspectives on parameterized types

To the PL theorist, parameterized types are a necessity for the expressiveness of a statically typed language. Imagine Hindley-Milner without them: you'd have to duplicate definitions all over the place. But in practice, statically typed languages have loopholes--long before generics, Java always had the type Object so that you could essentially "turn off" type-checking when you couldn't type things precisely. This meant that you could always write generic operations and generic collections, but without type-checking.

So in practice, people don't look at parameterized types as increasing the expressiveness of Java at all; it just looks like a way to increase the amount of type-checking in the language, to make it stricter. And they're right.

So you have two correct perspectives on parameterized types: roughly, that they make the language less restrictive and more restrictive. It's not actually a contradiction, but it's enough to have caused me confusion in some conversations.

Sunday, April 06, 2008

Keep your /tmp clean

For a couple years my laptop's cygwin installation has been dramatically, inexplicably slow to start up the first time after rebooting. I mean, like, 5 to 10 minutes to start up. (You can imagine how much more this makes me love Windows Update for forcing a reboot every time they patch the latest buffer overrun that authorizes Solitaire to initiate a nuclear first strike against North Dakota.)

Well, today I found the culprit. A program I haven't used in quite a long time had left thousands of orphaned temp files in /tmp. Don't ask my why, but this was enough to obliterate the performance of cygwin across the board, and especially on startup. Maybe /tmp is memory-mapped? No idea. Anyway, I wish I'd known to keep my /tmp clean. Now you know.

ESOP in a heartbeat

I presented my latest paper, A Theory of Hygienic Macros, co-authored with Mitch, at ESOP in Budapest this past week. Because I'm a genius, I managed to let my passport expire. Conveniently, there's an expiration notification service at every airport known as "not letting you on the plane."

It's a long story, but it involves emergency passport renewal, an extra weekend in San Francisco, and very merciful conference organizers and attendees.

All of this means I got to spend a total of about 48 hours in lovely Budapest--about long enough to see the Danube and a few of its glorious monuments, experience cheap beer, try delicious Tokaji, present a paper, chat with a few friends, and get right back on a plane again.

What a shame I couldn't have stayed longer. But the talk went great, and I'm still very glad I got to go.

For the record, the San Francisco passport office is staffed exclusively with miracle workers.

Case exhaustiveness for interactivity

In functional programming we're very good at exhaustive case analysis of our data definitions. But I've found I'm pretty bad at reasoning about cases in interactive programs. (Embarrassingly, I recently lost a week's worth of data in an Ajax application I wrote due to a bad response to an unpredicted server failure.)

What are all the possible actions the user could perform? What are all the failure cases? What are all the possible interleavings of interactions? These are harder questions than enumerating variants of a disjoint union.

I think this is where dynamic safety (e.g., runtime tag checking in dynamically typed languages or case-dispatch catchalls in statically typed languages) is so critical: if you don't have a sound model for enumerating the space of possible behaviors, you have to make sure that there's some reasonable recovery programmed into the system for when you encounter an event you didn't predict. This reminds me of recovery-oriented computing and Erlang's "let it fail" philosophy.

Wednesday, April 02, 2008

Monkey-patching evolves

Do you like the ability to dynamically mutate the definition of your program, but keep getting thwarted by sophisticated static analyses such as grep? Your prayers have been answered: ninja-patching is here.