↓ Archives ↓

Wikipedia: Halt and Catch Fire

A lovely piece of computer history courtesy of Wikipedia:

Halt and Catch Fire, known by the mnemonic HCF, was originally a fictitious computer machine code instruction claimed to be under development at IBM for use in their System/360 computers, along with many other amusing instructions such as “Execute Operator”.

XSLT + SOAP?

In my in-progress project, Ixan, I have shied away from developing its supplemental stylesheets in XSLT 2.0 in favor of XSLT 1.0 + EXSLT. XSLT 2.0 is truly awesome, and truly better than XSLT 1.0. But its only “open-source” implementation, Saxon-HE, has certain issues I can’t reconcile for the projects at hand.

Saxon-HE is more like an obnoxious shareware demo (the non-PC term is crippleware) than a truly FOSS package, lacking most of the cool, new stuff that makes this version of XSLT better than the last, including schema awareness and complete extension mechanisms. If you want those, you have to shell out for them. And while I’m not diametrically opposed to making money from one’s own work (which is excuse the author of Saxon—and editor of the XSLT 2.0 spec—uses for not opening more of it up), it’s trouble that there’s nowhere a free software developer (who isn’t interested in developing XSLT 2.0 libraries) can turn to complete a project using the full version.

And there’s another big glitch that ruins Saxon for me: It can only be run in a Java or CLI environment. Now, I’m a huge fan of Java, and it tends to be my language of choice when I need static typing. However, there are certain areas of the developer universe where nothing really has any clout until there’s a C implementation. Yes, it’s technically possible to access a Java VM from C, and thereby technically possible to bridge it to the C-based scripting environments (e.g. Perl, PHP, Ruby [in C], Python [in C] and so forth), but it’s a whole lot of extra steps to even get that interplay working, let alone to deploy to systems beyond the developer’s own.

What I’d like to see happen is any or all of these:

  • A complete FOSS implementation of XSLT 2.0 in C. It would preferably use one of the several widely supported portable C-based runtimes (GLib as with GNOME, NSPR as with Firefox, APR as with Apache HTTP Server) so that it would almost instantaneously work on several platforms. A well-written, portable C library would be an important step forward in adoption.
  • A complete FOSS implementation of XSLT 2.0 in Vala. Vala is an object-oriented programming language that is compiled to C sources, using GObject (based on GLib) for its object system. This would be almost as good as a C implementation and could do most of the same tricks.
  • A complete FOSS implementation of XSLT 2.0 in any language. I reiterate that there is no such thing yet! Even an implementation in bf, INTERCAL, or awk would be better than nothing.
  • A project to create FOSS implementations of the XSLT 2.0 functionality missing from Saxon-HE. It’s curious to me that nobody’s spearheaded this one yet. (After all, even more troublesome similar efforts such as GNU Classpath and Mono have existed, right?) Saxon-HE may be incomplete, but it is still somewhat functional and covered by MPL. Why reinvent the wheel?

Anyway, I don’t think any of the above are destined to be my labor of love, so I hope somebody else gives them a try.

What I’m turning over in my head is the best way to make xsltproc expandable. libxslt (which drives xsltproc) is a C-based XSLT 1.0 processor with support for extensions (such as a few EXSLT elements and functions supplied by libexslt). Despite its limitations, this library is fast and very functional for me. With libexslt installed, it even supports the EXSLT construct for defining functions callable from XPath, which I take delight in abusing regularly.

Of course, these extensions it supports must be written in C (or in code that can be called by C). I respect C’s massive role in the infrastructure of virtually everything I do, but developing C is trouble. I consider myself quite competent in the language, and it does in fact have an elegant primitiveness to it. While writing it, I tend to reflect on how nice it is to have a break from declaring things public or static, dealing with namespaces, and disambiguating the meaning of certain operators like = and ==. However, there’s one tiresome and error-prone thing that keeps me from staying too long: Memory allocation/deallocation. C doesn’t do any reference counting or garbage collection automatically, so things that you create don’t just go away when you’re done with them—they linger until you explicitly shove them out the door. And sometimes, if you’re busy, you simply forget to do that.

Of course, C is also troublesome because you can’t necessarily distribute your pre-built product expecting everything to be OK. If I write a Perl script and then post it, you can download it and run it straight away (provided that you have all of the dependencies, which CPAN will happily install for you if you ask). With Java, you can even compile your code and distribute that as a product, and chances are that a properly configured Java environment will be able to run it without much trouble. Meanwhile, unless you’re sure that your customer has a really similar setup to yours, you’re going to have to distribute all of the source with a meticulously crafted configure script or Makefile or near equivalent. The environment does very little to abstract anything away, so your scripts have to ask a bunch of questions about the C environment. In many cases, for the less-well-tested platforms especially, the part that you write to deal with those answers is very likely to fail, and the end user may not be able to tell you anything useful about it.

C builds are ideally left to package maintainers for OSes or very capable C hackers.

So, inevitably, the task of writing at least something will require a bit of C code. To minimize the amount of that C code, the best thing to do would be to implement some sort of scripting interface. Then, the C code can call out to some code from another universe (one without manual memory allocation) and then return an answer.

Obviously this is already done in things like, for example, XML::LibXSLT in Perl. What I was trying to consider is some interface that is more general than one language (a dependency on a scripting language might be ugly), more approachable than the C binding mechanism (ideally not requiring any compilation to extend), smaller in scope (perhaps not a scripting language by itself), and smaller in size (because it’s C, of course).

What does this describe? In my opinion, it describes a C middleware library—a SOAP library, for example, or even something more intensely efficient like Google’s Protocol Buffers. This is more general than one language since basically any environment with sockets can do RPC. It’s more approachable than binding in C because as a remote process, the application is guaranteed not to have to compile due to the caller, it’s smaller in scope because that’s what RPC is—the entire application sits outside the caller instead of the caller implementing the whole application.

It might not be the fastest thing, but anything that needs to be that much faster might be worth rewriting in C anyway, or at least an in-process scripting extension. :-)

Check this concept:

<soap:bind endpoint="http://example.com/api" prefix="xyzzy"
        templates="gen-foo"
        functions="is-foo is-bar is-unruly=is%20unruly"
        xmlns:soap="http://purl.org/halfgeek-names/xslt+soap"/>

A single new element like this could be used to easily incorporate functionality from anything with a SOAP library (which is currently any language I would bother developing with).

Think about it, won’t you?

Ixan presses forth

There hasn’t been a post here for a few days. Work is continuing on Ixan, but on other aspects than the reference parser. Projects have been started on Launchpad for the parser and the specification document. A significant amount of documentation has even been written, and I worked out a nice look for it based the SAX parser for POD and styles implemented in CSS and XSLT.

It’s nowhere near done, but you can check out what I’ve got at http://ixan.halfgeek.org/.

Ixan likes you!

I don’t think my descriptions of Ixan so far have been extremely illustrative of how much you’re probably going to want to use it when it’s done enough to use. Let’s remedy that now. The following is intended to be a cursory, not-too-deep walkthrough of how Ixan can be used to form a nested menu in XUL (and, by extension, arbitrary other documents based on the same data). I have a bad habit of rambling, so skip the parts you don’t understand; they’re not as important as staying with the pace of the idea. Please comment with any questions you might have.

The following is an Ixan tree that I made up.

Colors
        Primary
                Red
                Yellow
                Blue
        Secondary
                Orange
                Green
                Purple @value=violet

I entered it in a fairly natural fashion; all of the indentation is done with tabs (other whitespace is allowed but should be kept consistent; tab != 8 spaces). Note that on the purple line there is a “switch”, an annotation qualifying the text value of the line.

The Ixan source is passed through a script currently called ixan2xml:

ixan2xml < colors.ixan > colors.ante

The result is an <ix:ante/> XML document (“ante” being short for “antecedent”) similar to the following:

<ix:ante xmlns:ix="http://purl.org/halfgeek-names/ixan/main">
        <ix:line><ix:label>Colors</ix:label></ix:line>
        <ix:level>
                <ix:line><ix:label>Primary</ix:label></ix:line>
                <ix:level>
                        <ix:line><ix:label>Red</ix:label></ix:line>
                        <ix:line><ix:label>Yellow</ix:label></ix:line>
                        <ix:line><ix:label>Blue</ix:label></ix:line>
                </ix:level>
                <ix:line><ix:label>Secondary</ix:label></ix:line>
                <ix:level>
                        <ix:line><ix:label>Orange</ix:label></ix:line>
                        <ix:line><ix:label>Green</ix:label></ix:line>
                        <ix:line>
                                <ix:label>Purple</ix:label>
                                <ix:switch ix:sigil="@" ix:op="=" ix:name="value">violet</ix:switch>
                        </ix:line>
                </ix:level>
        </ix:level>
</ix:ante>

This is really little more than a parse tree for the original Ixan file, but it has noted some important bits. For example, when I said “Purple @value=violet”, the parser concluded that I wanted a line that has “Purple” as its literal label, but has a switch that associates a key “@value” (actually a key “value” qualified by a sigil “@”) to a value “violet”.

More importantly, the tree is now in a readily processable XML format, meaning that there is a significant presence of well-tested tools with which to post-process, including XSLT processors. For example, I have the following XSLT sheet handy:

<?xml version="1.0"?>
<xsl:transform
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:exsl="http://exslt.org/common"
 xmlns:ix="http://purl.org/halfgeek-names/ixan/main"
 extension-element-prefixes=""
 exclude-result-prefixes="xsl exsl ix"
 version="1.0"
 
 xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul"
>
  <xsl:output
   method="xml"
   version="1.0"
   encoding="utf-8"
   indent="yes"
 />
 
  <xsl:strip-space elements="*"/>
 
  <xsl:template match="/ix:ante">
    <window id="menu" title="menu" width="300" height="200">
      <toolbox flex="1">
        <menubar><xsl:apply-templates/></menubar>
      </toolbox>
      <script><![CDATA[
       var mi = document.getElementsByTagName('menuitem'), i;
       for(i = 0; i < mi.length; ++i) {
         (function(item) {
           item.addEventListener('command', function() {
             alert("Value: " + item.value);
           }, false);
         })(mi[i]);
       }
     ]]></script>
    </window>
  </xsl:template>
 
  <xsl:template match="ix:line">
    <xsl:variable name="vs" select="./ix:switch[@ix:sigil='@' and @ix:name='value'][1]"/>
   
    <!– Only render an option if this line isn’t a group caption –>
    <xsl:if test="not( following-sibling::*[1]/self::ix:level )">
      <menuitem label="{ix:label}">
        <xsl:attribute name="value">
          <xsl:choose>
            <xsl:when test="$vs"><xsl:value-of select="$vs"/></xsl:when>
            <xsl:otherwise><xsl:value-of select="ix:label"/></xsl:otherwise>
          </xsl:choose>
        </xsl:attribute>
      </menuitem>
    </xsl:if>
  </xsl:template>
 
  <xsl:template match="ix:level">
    <!– This is where the text of a line preceding a level goes –>
    <menu label="{preceding-sibling::*[1]/self::ix:line}">
      <menupopup><xsl:apply-templates/></menupopup>
    </menu>
  </xsl:template>
 
</xsl:transform>

I might use this sheet together with xsltproc:

xsltproc xulify.xsl colors.ante > colors.xul

The resulting XUL document is this:

<?xml version="1.0" encoding="utf-8"?>
<window xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul" id="menu" title="menu" width="300" height="200">
  <toolbox flex="1">
    <menubar>
      <menu label="Colors">
        <menupopup>
          <menu label="Primary">
            <menupopup>
              <menuitem label="Red" value="Red"/>
              <menuitem label="Yellow" value="Yellow"/>
              <menuitem label="Blue" value="Blue"/>
            </menupopup>
          </menu>
          <menu label="Secondary">
            <menupopup>
              <menuitem label="Orange" value="Orange"/>
              <menuitem label="Green" value="Green"/>
              <menuitem label="Purple" value="violet"/>
            </menupopup>
          </menu>
        </menupopup>
      </menu>
    </menubar>
  </toolbox>
  <script>
        var mi = document.getElementsByTagName(‘menuitem’), i;
        for(i = 0; i &lt; mi.length; ++i) {
          (function(item) {
            item.addEventListener(‘command’, function() {
              alert("Value: " + item.value);
            }, false);
          })(mi[i]);
        }
      </script>
</window>

If you save that to a .xul file on your machine and open it up in Firefox (or another Mozilla-based browser), you’ll note it does exactly what is expected.

A screenshot taken while running colors.xul When any of the leaf menu items is clicked, an alert appears displaying that item’s value attribute. Because I set a special value for the Purple item, the alert for that says “violet” instead.

[end of direct illustration.]

At this point, hopefully it’s obvious that what I’m attempting to create is a tool that allows a person to enter certain types of data fluently while also keeping the door open for easily setting additional parameters later. It’s intended as a supplement to XML and not a replacement. Actually, one might even think of it as an extension to XSLT, allowing it to process a certain kind of non-XML (but pre-XML) data. The ixan2xml tool (this might be renamed ixan2ante soon) doesn’t make many assumptions about the semantics of the data it reads; it simply makes it easier to pick things out with XPath queries.

Something I won’t cover in full detail yet but will also be important is the concept of constant substitution. In many places, the Dare parser (the current, experimental version) watches for certain escapes. For example, as label text it would interpret “Alpha ${bravo} charlie” as something like

Alpha <ix:expand ix:name="bravo"/> charlie

It also watches out for ?set directives, which put blocks of text into the ix:ante under a certain name. In ixan source, it would look something like

?set charlie
        Indent-bounded text
            (May contain multiple levels)

The resulting ix:ante node would look something like this (though the format isn’t quite set yet):

<ix:set ix:block-type="indent" ix:op="" ix:name="charlie">Indent-bounded text (May contain multiple levels)</ix:set>

Note that the Dare parser itself only marks where interpolation has been requested and doesn’t actually perform said interpolation. For that and a couple of other issues, I’m already at work on an XSLT post-processor called ante2neat, which (as the name suggests), changes an ix:ante to an ix:neat. This conversion isn’t lossless, but it will establish a certain set of conventions for the general idiom of casting ixan as XML. Apart from resolving set/expand interpolation (including dealing with circular references) it will also do slight reordering of lines so that a line of text and a following indented level are paired together as head and body of a superelement if it seems like that’s what it’s supposed to do. (That behavior is easy to bypass by entering a blank line or an explicit “?noop” directive between the two.) At some point it may also make certain assumptions about where certain switch types might map to attributes or child elements.

ix:neat is intended to get a few generally applicable organization tasks out of the way before handing off to a more application-specific transform; if the typical user’s transform pipeline starts with a transform from ix:neat rather that from ix:ante, that mission is accomplished. Dealing directly with ix:ante may occasionally be necessary for weirder requirements.

As before, the current code is in my junk bin at Launchpad. It’s slightly tricky to get started, but the following may help:

# Download the current source branch to a new "ixan" directory.
bzr branch lp:~psmay/+junk/ixan.dev ixan
cd ixan
 
# prun runs a perl script from Ixan-Parser-Dare/bin after setting up some
# environment.
# ixan2xml uses the Dare parser to convert an ixan to an ix:ante document.
./prun ixan2xml <test2.ixan >test2.ante.tmp.xml
 
# ante2neat is an XSLT sheet that transforms ix:ante to ix:neat. (It’s not yet
# feature-complete.)
xsltproc xslt/ante2neat.xsl test2.ante.tmp.xml >test2.neat.tmp.xml

If you have any problems that you think Ixan would be of some help in magically handling, comment away—an extended set of use cases would be helpful in shaping the future of Ixan. (Log in with OpenID if you can!)

A peek at “Ixan”

I’ve been working on a perl program that turns a tab-indented tree of text into a more or less direct (and largely round-trippable) XML representation. While this is not at all useful standing on its own, when post-processed with an XSLT pipeline it should become a serious means for entering data arranged in terms of both hierarchical relationships and ad-hoc metadata “tags”, then turning it into some significantly more useful form.

Part of the drive to create this is a need to be able to quickly enter some specifications for an HTML form and then be able to simultaneously generate HTML for the form, JavaScript to enable/disable certain fields on the fly depending on the values of others, and a matching schema for database fields.

It’s called Ixan (Indented XML-Andecedent Notation). It’s being implemented as a Perl SAX2 driver so that driving a SAX pipeline directly from the document is trivial. If you’re curious enough to check it out (and go to the trouble of installing a few dependencies), it’s in my junk bin on Launchpad.

Classic WTF: Meaninglessness

A little justification for our existence as software developers, courtesy of The Daily WTF.

Louis and Frans, in addition to doing their day to day calibration tasks, were responsible for going through the system — some proprietary DOS-based database application — and “recoding” each and every device. You see, sometime back in the 70’s or 80’s, someone decided that they would use the date “99” (as in, 1999) to indicate that the device never needed to be calibrated. And lo and behond, a few decades later, 1999 was starting to become a “real” date, and one that was just around the corner. Obviously, that was going to cause all sorts of problems.

Functional Programming

XSLT is a strangely fascinating development language. It’s a very good tool for its job, which is in general to turn some XML into some other XML, but if it didn’t have this to its credit it could easily be considered an esoteric language. I say this because while its facilities for querying XML documents is extremely powerful, performing what would be considered everyday tasks with XSLT is such a challenge as to be called a sport.

XSLT is technically defined as, among other things, a domain-specific language (one that is designed for only a very small part (or “domain”) of the general set of problems to be solved—in this case, transforming XML.

More importantly, XSLT has most or all of the hallmarks of a functional programming language (note: languages like C are not “functional”; what such languages call “functions” are equivalent to procedures). For those of you who, like me, were raised on BASIC and eventually graduated to C and Perl without ever having dealt with Lisp or Scheme, a functional programming language is theoretically one where a program is defined as a whole lot of nested mathematical functions being evaluated until the one at the top yields a final result.

It sounds straightforward enough—after all, you can nest functions and even make recursive calls in C, right? But there are some major connotations involved that are fit to frustrate the life out of many programmers, even the competent ones.

In C (let’s say for the remainder of the discussion C represents most or all procedural languages), it’s possible to mutate the state of the environment by altering the values of variables that exist outside the function but are still in scope. Those variables have a lifetime that includes the lifetimes of all calls made to the function; it therefore constitutes state.

In mathematics, functions don’t have such “side effects”. For example, it is possible to express

function f(x) { return x + 1; }

as a mathematical function f(x) = x + 1, but there is no equivalent for

var z = 0;
function g(x) { z = x; }

Therefore, in a “pure” functional language, you don’t really have variables to which you can assign values—you can pass values as parameters to a function, but the function itself cannot mutate those values either. In the context of C, this would be about like qualifying every variable and every function parameter definition in an entire program with const. This was nothing short of brain-rending when I first found it out. How can one write an entire program without mutables?

(It’s not so bad. I’ll get to that in a moment.)

The other big, weird thing about functional programming is the notion that there is no conditional loop (e.g. for, while, until, and selected uses of goto). There can be for-each loops, but these differ substantially from conditional loops in that they iterate over a finite sequence of values, which is guaranteed to terminate, instead of cycling repeatedly until a condition is true, which cannot (in general) be guaranteed to happen.

For example, in Perl, it’s easy enough to roundaboutly arrive at the string “12345″ using iteration:

my $out = "";
for(my $i = 1; $i <= 5; ++$i) {
        $out .= $i;
}

I count two reasons I can’t do this in a pure functional language. For one, it uses a conditional loop. The example above can easily use a for-each instead, so let’s do that.

my $out = "";
foreach(1 .. 5) {
        $out .= $_;
}

Now there’s no conditional loop, but there’s another problem from earlier: We’re accumulating in a variable, which is something we’re not allowed to do.

In the functional idiom (which is also well supported by Perl, incidentally), one might solve the problem this way:

sub snikt {
        my($cur, $max) = @_;
        return "$cur" . (($cur == $max) ? "" : snikt($cur+1,$max))
}
my $out = snikt(1,5);

I’ve succeeded in my task here—the result is the string “12345″ and I’ve arrived without starting any loops or mutating variables. To do this, I’ve sort of had to turn the original iterative algorithm inside out.

The pattern ought to be familiar to anyone who has majored in CS—think back to implementing fact() and/or fib(). It’s called recursion, and it’s something that happens whenever a function calls on itself (possibly with different parameters). It turns out that recursive algorithms, while implemented in many procedural languages, are what some would call the bread and butter of functional programming.

Farming part of a function’s implementation out to another function (or another call to itself) is a means of simulating state. This is so because a function call allocates memory (in almost the same way that declaring a variable does) to store its parameters. Sane programming thus begins to be possible.

An astute observer might be worried that this sort of thing would quickly consume the stack yielding a swift out-of-memory error. However, high-profile interpreters for functional languages can generally a lot of optimization, especially in the case of what is called tail recursion. This happens when a function makes a call to itself, but only just before exiting. The interpreter can reword this internally (this is tail call optimization), automatically writing the equivalent iterative loop behind the scenes, which saves time, and reusing the previous call’s stack space instead of allocating a new context, saving both time and stack space.

So, why would we go to the trouble of learning a functional programming language? I’m not completely clear on all of the points, but here are some that I think are probably true.

  • Since pure functions (like the majority of those used in functional programming) cannot mutate variables or have side effects, the runtime doesn’t have to worry about concurrency issues; these functions are automatically (thread-safe and) reentrant. The system is free to parallelize any such functions to arrive at a result.
  • Pure functions by definition are referentially transparent, meaning that they must have no side effects and must always give the same response for any given parameter list. Since their responses don’t vary between calls, referentially transparent functions are eligible for an optimization called memoization (not to be confused with “memorization”) where the return values from a time-intensive function may be cached for speed.
  • The arguments also include those for declarative programming (of which functional is a subset) versus imperative programming (which includes procedural). In imperative programming, you (as the programmer) direct the computer every step of the way in terms of implementation; in declarative, ideally, you simply set forth goals and let the computer decide the details of how to reach them. Writing only the ends and leaving out the means leaves a great deal of room for the computer to optimize code. This would be of some use on systems where on-the-fly code profiling and optimization is possible.
  • While we’re on that topic, declarative programming should theoretically be an aid in human understanding of a computer program—if I express a program in terms of goals instead of steps, someone reading the source would be able to get the point of the program without becoming mired in the details of its implementation.

This post was going to have a lot more to do with XSLT than it ended up doing. I intend to say what I was going to say in a new post that has more examples and less rambling.