Friday, August 18, 2006

A LISPie day. Data, Code, Context and Hints

I came to two conclusions today, about LISP. One the one hand I crystallized some thoughts on some of the flaws in LISP. On the other hand, I finally got the point one of rallying cries of the LISP community, namely “In Lisp, your code is data, and your data is code”.

Ever since I first heard that phrase, and had a basic knowledge of LISP, I understood the basic idea behind it, but until today, I didn’t have the right context. As a result, I think I focused more on the first half, which equates to the concept of functional programming, or at least first class functions. The second part seemed far less interesting, and somewhat of a tautology. But reading yet another post from Steve Yegge, showed me a context that made the statement far more interesting. The section that really did it for me was Beyond Logs.

What is curious is that earlier in the day I had a discussion that really solidified some vague thoughts I had about why I, and probably many others, have been so turned off by LISP, despite its capability. You often hear LISP lovers comment, that LISP has no (or very little) syntax. And it’s true. But that isn’t always a good thing. In fact, I think it’s a fundamental flaw for the majority of what programmers traditionally consider code. It’s nice to be able to think of code as data, and data as code, but it’s nice to be able to tell the difference too, or more specifically to be able to tell what is “more” code, or “more” data.

A truth, which goes rather unnoticed by too many language, tool, framework and library designers, is that we spend much more time reading code than writing it. Reading may seem like a very simple and straightforward activity, because it comes so naturally after years of practice, especially when compared to writing English prose. But what makes writing English so difficult is that you have to structure it so that someone else can understand it later.

There are many tricks to English, and it’s not just stupid syntax. The complex syntax rules of English may seem to make writing more difficult, but many of them, when followed contribute to readability. Without those rules, writing would actually be more difficult because not only would you need to devise a similar structure, but you’d have to somehow communicate that to readers. Having a set of rules, no matter how arcane is preferable to no rules.

And that is where LISPs lack of strong syntax becomes a problem. Sure good LISP coders will create a well thought out sublanguage, and avoid using cryptic shortcuts and other reading impediments. With high level elements this works well, because it is the sublanguage which eventually describes the solution. But at other levels, this can be a serious impediment because it is hard to separate out the layers. The syntax of most other languages gives the readers more “hints”. Some of these hints may seem unnecessary when writing code (brackets, parentheses), but they help you spot different sections of code. In the C derived families, when you see parentheses, you’ll think function call. When you see braces, you think loop/conditional. Brackets make you think “array”, etc.

Those same structures exist in LISP, but they all look nearly identical. That’s great for writing code, you don’t have to think about which character to use (has any LISP user remapped the keyboard so they don’t need to use shift for a parenthesis?). For reading, however, the homogeneity is very poor. It would almost be like removing tense from all verbs. Usually you can decipher tense from context, but it’s almost always difficult if done wrong. Programming languages are more deterministic than natural languages so it’s always possible to decipher LISP code from context, but it’s not always going to be easy.

The most interesting part of all this is how it ties into the data is code concept. If you’re putting code into your data, you really should have a way to make it stand out. The complexity of code, is higher than data and deserves special attention, when inside a mixed environment. It’s best to separate data from core code, so that it can change independently. I see the value of one unified model for both code and data, and the ability to work in the blurry areas in between. But as a reader, I’d prefer there to be some syntactical differentiation regardless of what’s under the covers.

It’s possible a super-intelligent IDE might be able to fix these problems, but I haven’t put enough thought into it to know if that’s truly feasible with the existing LISP syntax, and from my current review of the state of the LISP IDEs, there is a very long way to go before that type of capability starts to show up.

11 comments:

Anonymous said...

Despite common thinking Lisp has LOTS of 'syntax'. But it is not C-like syntax. You have to unlearn your C/C++/Java habits and free your mind to understand Lisp syntax rules and conventions. Lisp developers are trained to see these patterns and invent their own patterns where needed.

Just to name a few.

Basic datastructures have low-level syntax: Strings, characters, lists, symbols, numbers, arrays, and so on.

Naming conventions:

Variables: *foo*
Constants: +foo+
Predicate: foo-p
Setting places: foof like in setf, remf, ...
Accessor: class-accessor like foo-age
Function names have to be descriptive: update-hidden-rectangle
Defining macros: DEFsomething like defun, defmacro, defclass, ...
Scope building: with-foo, like with-secure-connection
Binding: let-something like let-record

and lots more...

then syntax. Look at the ANSI CL doc and see the syntax for the various macros and special forms.
Look up the syntax for DEFCLASS.

(defclass foo (superclass) (slot1 slot2)(:documentation "example"))

Now remove the parentheses:

defclass foo superclass (slot1 slot2) :documentation "example"

The syntax is there, so just dim the parentheses and they will disappear. The Lisp developer is trained to see the structure/syntax and not the parentheses. The parentheses are there to create the structure and to be able to map the syntax to lists.

There are lots of syntactical patterns that are used in Lisp code and supported by the language so you can define similar syntax patterns. These patterns are not like those in C.

Some patterns:

Arglist usage:

(foo-function arg1 arg2 :keysym1 arg3 :keysym2 arg4)

Arglist definition:
... (arg1 arg2 &key keysym1 keysym2)

Similar arglist will be used in some other places.

Another code pattern: functional bodies

(definer name args (documentation | declaration)* body)

Above pattern also is often found.

Another code pattern: definition with many options

(def-something name ((item options*)*) (options*))

For example like:

(defclass foo (superclass-bar)
((slot1 :type number :documentation "...")
(slot2 :accessor foo-slot2))
(:documentation "class-documentation"))

Above pattern is also seen very often.

Another pattern: binding lists

For example:
(record-let ((r1 window-rec) (r2 sound-rec)) ...)

These were just some examples.

So there are lots of code patterns in Lisp that are used. The trained Lisp programmer understands those and can also create similar code patterns with macros.

Leo Comerford said...

Martin Fowler's essay on "language workbenches" is relevant.

Leo Comerford said...

has any LISP user remapped the keyboard so they don’t need to use shift for a parenthesis?

The space-cadet keyboards of yore had them to the right of 'p', displacing the square brackets. (Source page.) Actually the really ruthless Huffman coding would be to put them unshifted on the home row. How often does anyone use 'j' or 'k' anyway? :)

Jules Jacobs said...

I swapped () and [].

Lisp really does need more syntax. lambda for example needs a special syntax like in Smalltalk.

(lambda (a) (foo a))
=>
[foo #]

# denotes the implicit argument.

(lambda (a b) (foo a b))
=>
[foo #1 #2]

or alternatively:

(lambda (a) (foo a))
=>
{a|foo a}

(lambda (a b) (foo a b))
=>
{a b|foo a b}

So:
(map (lambda (a) (+ a 2)) '(1 2 3)
=>
(map [+ # 2] '(1 2 3))

Lisp doesn't need much more syntax because indentation is used to read code easily. The editor can automatically indent everything because of the ()'s. defun/define don't need special syntax because it's easy to see what's happening by looking at the indentation.

Remember that:
foo(a, b)

is easier to read for you because you've seen it more often.

(foo a b)

is equally readable after a while.

Anonymous said...

Well, gee. Another person with little knowledge about but great opinion of Lisp. Instead of attempting to refute your argument, which is inherently a subjective one and not at all applies to me and other Lisp programmers (there is a reason why we _like_ that syntax), I shall refer you to Common Lisp's reader macros. Lisp allows and encourages the programmer to shape the language towards ideas, instead of the mainstream other-way-around, microsyntax included.

Ryan Baker said...

Thanks all, I appreciated the comments so much they inspired another post.

P.S. Jules – I disagree. foo(a, b) has the advantage of separating the action from the nouns. Now all the data inside the parentheses is of the same purpose. Also consider (a b c), and tell me what it is. Is it a list, or a function call? Well it is a list, but not necessarily a function call, and the only way to find that it is a function call is to look for “a”. I do however like your lambda suggestion; it illustrates my point very precisely.

Kartik Agaram said...

Lisp's syntax may not be ideal, but you get most of the way there by relying more on keyword arguments than on argument index or indentation. It's unfortunate that that's a relatively controversial part of lisp culture. A while ago I wrote a brief writeup on a bind macro that hints at this. (see the footnote)

Ryan Baker said...

I can see how that would help, but in theory an IDE should be able to display/hide argument names when you use indexed arguments.

Is there a LISP environment that does this? Environments like Visual Studio and Eclipse offer "hover tips" for this data, which is the same data in a not as useful form. I've seen it for C#, Java, C++, VB and F# (an OCaml varaint), but not for Ruby.

I can see how dynamic typing and metaprogramming can make it more difficult, but is it still possible?

Anthony C said...

Ryan - (a b c) will always be a function call. Well it could be macro expansion too, but I don't think that was your point. To get a list, you'll need to use '(a b c) or (quote (a b c)). Both of these constructs are very easy to spot in code.

Ryan Baker said...

Oops, I wasn't thinking of that... Still I'd stand by the first part and say it's simpler if the parameter list looks like a list, and the noun part doesn't have any similarity in appearance to a list item.

Jules Jacobs said...

You need to read Beyond Logs again ;-). (foo a b) could be a function call. It doesn't have to. That's just how the standard eval handles it. It's just a list that's given to the eval procedure.

That's why it's a bad idea to make that foo(a, b): foo(a,b) would be expanded to (foo a b) anyway before execution. If you want to be consistent you have to make the list syntax foo(a,b) too. foo(a,b) is thus a list of foo, a and b. That's not natural in my opinion.

You could have a separate syntax for code and quoted lists but now you don't have code = data anymore.