Graut

From HalfgeekKB
Jump to navigation Jump to search

Graut is a specification for a preprocessor and templating engine. It accidentally resembles a very poor clone of Lisp, but the scope is much different.

Synopsis

A Graut source consists of plain text interleaved with function calls. Each function call begins with #(, ends with ), and contains a series of elements.

Specifically, a function call consists of the sigil # followed by a list containing elements. An element is one of these things:

  • A list, which starts with (, ends with ), and contains zero or more elements that are usually (but, depending on the element, not necessarily) separated by whitespace.
  • An atom, which is a string in one of a few forms:
    • An unquoted string, which is one or more consecutive characters in [!%&*+,./:;=?^|~A-Za-z0-9_-] or at or above U+80.
    • Monoquoted strings, '...' or "...", with JSON-esque backslash escapes, which also interpolate function calls as with the top level. (\# will prevent # from starting a call.)
    • Triquoted strings, '''...''' or """...""", with no escapes, which do not interpolate.
    • Shallow heredocs, <SOMETOKEN<...>SOMETOKEN>, with no escapes and a selectable close token. (The token has the same specification as an unquoted string.)
    • Deep heredocs, <<SOMETOKEN<...>SOMETOKEN>>, similar to shallow heredocs, but also allowing inline function calls #(...) in the same manner as at the top level.
  • A function call, #(...), consisting of the sigil # followed directly by a list expression.
  • An inlined list, @(...), consisting of the sigil @ followed directly by a list expression.
  • A variable expansion, $"...", consisting of the sigil $ followed directly by a name (a key).
  • An nonce, $, consisting of the sigil $ not followed directly by an expression (e.g. followed by a space or a )).

In truth, these comprise only the most common interpretation of a given abstract syntax tree.

Abstract syntax trees

The syntax tree of a Graut document consists entirely of the following node types:

  • A text node represents a string zero or more characters. Text nodes for fragments of monoquoted strings have already had backslash escapes processed; otherwise the text is exactly as read from source.
  • A frame node contains a mix of text nodes and #(...) subtrees (which are sigil nodes). String literals, as well as the top level of the document, become frame nodes. Literal strings (unquoted atoms, triquoted strings, shallow heredocs) contain exactly one text node (even for an empty string) and nothing else. The frame node contains information about the open quote mark or heredoc token, if any.
  • A list node consists of zero or more items bounded by ( and ).
  • A sigil node is one of the characters $@# as either a nullary or prefix unary operator. Specifically, it is unary if followed, without intervening space, by a topic, which is string, a list, or another sigil; it is nullary otherwise. If unary, the node contains the topic.
    • Note that $ represents a nonce constant. Nonces are used in Graut to model impure functions. Each instance of $ is assigned its own fixed, ineffable value. This binding is modeled as happening at the beginning of each run.

Not all possibilities are legal in all situations.

Evaluation contexts

The meaning of a tree depends on the context in which it is evaluated. In most situations, a tree is not evaluated unless and until its evaluated value is needed (i.e., lazy evaluation is in effect).

Value context

The top level of a document is evaluated in the value context.

Values that are intermediate have some internal meaning that cannot be properly expressed in output without conversion. It is an error to attempt to output such a value directly, and all valid usages convert them to some other form.

In this context, nodes have these meanings:

  • A frame node evaluates to the string value it represents.
  • A text node evaluates to the concatenated string-context values of its children.
  • A list node evaluates to a list value that is the value-context value of each of the elements of the node's tree-context evaluation.

...

String context

  • A frame node or a text node evaluates to its value-context value.
  • A list node evaluates to the concatenated string-context values of its elements.
  • A sigil node evaluates to the string-context value of its value-context value.

Key context

A key is an internal serialized form for string and list values so that they may be used as variable names, namespaces, hash keys, etc.

A string value is directly representable; a list value is directly representable when all of its elements are either string values or values of other directly representable lists. No other values can be represented directly.

  • A frame node or text node is represented as its string-context value.
  • A list node is represented as a list value whose elements have been converted to representable values.
  • A sigil node is represented by the representation that applied its value-context value.

Key values are normalized so that any two directly representable values have exactly the same representation; it is, however, possible for distinct values to share a representation if they are not directly representable (i.e., if they are or contain data other than strings and lists). They also have a deterministic ordering, though this ordering is only for canonicalization purposes and may not necessarily be coherent to the contained information.

Serialization

See Graut/Serialized S-expressions.

List-requiring context

List-requiring context accepts a list, a variable bound to a list, a function call evaluating to a list, or inlined elements ultimately consisting of a single element that is a list. Function calls are only evaluated where a @ is expanded; otherwise, everything is left alone.

  • A list node is replaced by a list node consisting of the list-element-context values of its elements, with any intermediate subseq elements expanded.
  • A unary $ is replaced by its value-context value, which is then evaluated.
  • A unary # is replaced by its value-context value, which is then evaluated.
  • A unary @ is replaced by its list-element-context value, which is then evaluated.
  • All other input is illegal.

List-element context

This allows recursive expansion of @.

  • For a unary @, its topic is evaluated in list-requiring context. If the resulting list contains exactly one element, evaluate to that element; otherwise, evaluate to an intermediate subseq containing all elements.
  • Otherwise, a node evaluates to itself.

Resolved context

Note that bound names are lexically scoped. Expressions may need to be annotated with their scope of origin.

  • Anything other than unary $ evaluates as itself.
  • A unary $ is replaced by the resolved tree of its topic evaluated in key context (error if no such key), which is then evaluated.

Sigils as used in expressions

# (Function call sigil)

#list
// i.e.
#(function-indicator param param ...)
#() -> #($(SYSTEM nop))

A function call sigil evaluates to the result of a function evaluated with zero or more parameters.

The first element of the list indicates which function to use. Theoretically, the first element should evaluate to a lambda function, but for notational convenience some other things are accepted. The rules are these:

  • If the list is empty, it is replaced with a list containing the single element $(SYSTEM nop).
  • If the first list element evaluates to a lambda function, that function is used.
  • If the first list element is a literal atom a such that #(lookup a #(func (fullname) #(is-function $fullname))) returns a non-empty name, the expansion of that name is used.
    • Note that this applies only to literal string values; not to other expressions that evaluate to a string.
  • Otherwise, the first list element is interpreted as a key k and the function indicated by $k is used; if this variable does not exist or is not a lambda function, an error occurs.

The list expression need not be literal (except within interpolated text, where #( is used as a lexical cue).

If the list expression is a (potentially infinite-length) stream, only as many elements are read as necessary to fill the function's formal parameters.

A function will, by default, allow more parameters to be passed than there are formal parameters to catch them. Some syntax should be introduced to catch the rest (the tail of the parameter list).

$ (Variable expand sigil)

$key

Evaluates to the value of the variable with the given key as it exists in the current scope.

"Variable" here indicates that the named value need not remain the same between calls; a variable or parameter is read-only after being set for the first time, like a final variable in Java.

$ (Nonce sigil)

$

Unfollowed by an expression, $ evaluates to a new (previously unused) nonce value.

A nonce is a value whose only purpose is to be distinct from any previously generated nonces within this run. Other than supporting a test for equality, it is fully opaque and its internal representation is implementation-dependent (for example, a monotonic counter or a UUID generator could be used). Nonces have no meaningful order; particularly, an older one is not guaranteed to be less than or greater than a newer one.

Functions in Graut are modeled, and allowed to be memoized, as pure functions; i.e., for any given set of (normalized) argument values, the same result is returned. Expecting a nonce as a parameter allows a function to return a different value for the same parameters, which is useful for activities with inherent state or other side effects, such as random number generation. For example, the function call #($notSoRandom 0 10) would have to return the same result every time, but a call like #($random $ 0 10) could return a different value every evaluation.

After it is generated, a nonce can be treated as a constant value. Consider the following:

#(def result #(* #($random $ 0 10) #($random $ 0 10)))
#(cat "first time: " $result " second time: " $result)

Even if the value of $result is always lazily expanded, as long as memoization is in place the same number will be printed in both places. Unlike a function call, the evaluation of $ is always strict, so specific values are immediately inserted. One might imagine the definition in the above to be processed as something like the following:

#(def result #(* #($random ineffable-a 0 10) #($random ineffable-b 0 10)))
#(* #($random ineffable-a 0 10) #($random ineffable-b 0 10))

Then, the other call would expand to:

#(cat "first time: "
  #(* #($random ineffable-a 0 10) #($random ineffable-b 0 10))
  " second time: "
  #(* #($random ineffable-a 0 10) #($random ineffable-b 0 10))
 )

So, even though $random is (theoretically) being called anew every time $result is expanded, the same nonces are being used every time, and the same result is yielded.

(This is liable to lean most heavily on memoization. Implementations that support long-running programs will need to give some attention to avoiding memory leaks by figuring out when to discard stashed results.)

Care must be taken to ensure that $ appears in places where it makes sense. No matter what the depth, each $ is only ever given one value; in particular, a $ appearing in the body of a function will have the same value each time that function is called. For distinct values on each call, the function should accept a nonce as a parameter.

A single nonce can be used to generate an arbitrarily long stream of nonces using the #(nonces nonce) function; for example, given some single nonce you may wish to make multiple calls with distinct nonces.

// Let our input nonce be in $nonce
#(def nn #(take 3 #(nonces $nonce)))
#(def n1 #(element 0 $nn))
#(def n2 #(element 1 $nn))
#(def n3 #(element 2 $nn))
#(cat #($random $n1 0 10) #($random $n2 0 10) #($random $n3 0 10))

// Or, more idiomatically
#(
  #(func (n1 n2 n3)
    #(cat #($random $n1 0 10) #($random $n2 0 10) #($random $n3 0 10))
  )
  @#(nonces $nonce)
)

For any given $nonce, #(nonces $nonce) shall return exactly the same sequence of nonces each time it is called. The sequence is an infinitely long stream that is generated and memoized as necessary.

@ (Inlined elements sigil)

@list
// i.e.
@(elem elem ...) -> elem elem ...

An item with a @ sigil is treated as a list whose elements are to be inlined.

(A B @(C D E) F G)	-> (A B C D E F G)

It is an error for the subject of a @ to evaluate to anything other than a list.

In actuality, any list containing a @ is treated as a notation for a call to $(SYSTEM flatten). Any list containing at least one @ item is implicitly converted using the following idiom:

If a list contains at least one @ item, each non-@ item in the list is converted to a @ item by wrapping it in a list first:

(A B @(C D E) F G)	-> (@(A) @(B) @(C D E) @(F) @(G))
	... or, since the elements don't technically need to be individual ...
(A B @(C D E) F G)	-> (@(A B) @(C D E) @(F G))

Any list consisting only of @ items can then be converted to a $(SYSTEM flatten) call by removing the @ sigils and using the resulting items as arguments.

(@(A) @(B) @(C D E) @(F) @(G))	-> #($(SYSTEM flatten) (A) (B) (C D E) (F) (G))
	... or ...
(@(A B) @(C D E) @(F G))	-> #($(SYSTEM flatten) (A B) (C D E) (F G))

This effort would be entirely unnecessary if @ worked only on literal lists, but it can be used on any list expression.

                    (  P Q  @$r   S  @(T U))	->
                    (@(P Q) @$r @(S) @(T U))	->
#($(SYSTEM flatten)   (P Q)  $r  (S)  (T U))

Naturally, @ can only be used within a list. In particular, it cannot be used as the subject of another sigil, but a list containing it can. For example, #@(A B C) doesn't make sense, but #(A B C) (or, equivalently, #(@(A B C))).

Optimizations apply for degenerate cases.

  • Literal lists can be inlined as part of the parsing process.
  • Whenever the expression (@item) is valid, it is exactly equivalent to item. (A call to $(SYSTEM flatten) is unnecessary, though it may still be necessary to validate that item evaluates to a list.)

Note that inlining a stream (which is potentially infinite-length) will cause the containing list to become a stream itself and will not force strict evaluation.

Notes to be evaluated

Extract specs

An extract spec is a structured list of names provided as a parameter to #(func ...) or #(extract ...). The extract spec is evaluated until it is a key (a key is either an atom or a list of keys).

In an extract spec, a quantifier suffix of * (zero or more), + (one or more), or ? (zero or one) has a special meaning even though it is not technically a sigil. Values extracted in this way are treated as lists. An error will occur if the extract pattern cannot be matched to the values being extracted. Some combinations of quantifiers may be invalid. Extracting directly to a name ending with one of these characters is not possible (this can be worked around using #(def ...)).

A key named by the empty string may be used any number of times to discard a value after extracting it.

Extraction				Equivalent def
----------				--------------
#(extract abc (1 2 3 4))		#(def abc (1 2 3 4))
#(extract (a b c) (1 2 3 4))		error (length mismatch)
#(extract (a b c d) (1 2 3 4))		#(def a 1)#(def b 2)#(def c 3)#(def d 4)
#(extract (a b c "") (1 2 3 4))		#(def a 1)#(def b 2)#(def c 3)
#(extract (a "" c d) (1 2 3 4))		#(def a 1)#(def c 3)#(def d 4)
#(extract (a bcd*) (1 2 3 4))		#(def a 1)#(def bcd (2 3 4))
#(extract (a b cd*) (1 2 3 4))		#(def a 1)#(def b 2)#(def cd (3 4))
#(extract (abc* d) (1 2 3 4))		#(def abc (1 2 3))#(def d 4)
#(extract (ab* c d) (1 2 3 4))		#(def ab (1 2))#(def c 3)#(def d 4)
#(extract (a bc* d) (1 2 3 4))		#(def a 1)#(def bc (2 3))#(def d 4)
#(extract (a * d) (1 2 3 4))		#(def a 1)#(def d 4)
	#''' Note that the name * is actually "" followed by "*" '''
#(extract (a bc* d) (5 6)		#(def a 5)#(def bc ())#(def d 6)
#(extract (a bc+ d) (5 6)		error (length mismatch)
#(extract (a bc* d) (7))		error (length mismatch)
#(extract (a b c d?) (1 2 3 4))		#(def a 1)#(def b 2)#(def c 3)#(def d (4))
#(extract (a b c d?) (1 2 3))		#(def a 1)#(def b 2)#(def c 3)#(def d ())
#(extract (a? b*) (1 2 3))		#(def a (1))#(def b (2 3))
#(extract (a? b*) (1 2))		#(def a (1))#(def b (2))
#(extract (a? b*) (1))			#(def a (1))#(def b ())
#(extract (a? b*) ())			#(def a ())#(def b ())

Syntax-level comments

Let the # sigil followed immediately by a quoted string be treated as a comment. This may appear anywhere that an ordinary string is valid, is not interpolated, and is skipped by the parser. This would allow somewhat more flexibility and readability than the previous notion of calling #(nop) for each comment.

Built-in functions

See Graut/Standard library.