Graut/Standard library

From HalfgeekKB
Jump to navigation Jump to search

Built-in functions

Namespace SYSTEM

Each of these exists in the SYSTEM namespace, such that a builtin named name could be called using $(SYSTEM name). This namespace is normally implicit, but can be explicitly referenced if necessary. This documentation presumes that this namespace is still implicit for all names.

cat

#(cat expr expr ...)

Coerces each expr to a string value, then returns the results concatenated as a string.

choose

#(choose (cond exprt)+ exprf?)

Evaluates each of the supplied cond until the first that returns true, then returns that condition's exprt. If no condition evaluates to true, returns exprf (which defaults to ()).

def

#(def key value)

Bind $key, for the extent of the current scope, to resolve to value. Return empty string.

Error if the same key has already been set for this scope (not for an enclosing scope).

Note: In early drafts this is sometimes called #(= ...). That function is now used exclusively for numeric comparisons.

element

#(element index list) -> #(head #(skip index list))

extract

#(extract extractspec value)

Similar to def, but performs a destructuring assignment. An extract spec is either an atom or a list of zero or more extract specs. Each atom indicates a name in the current scope. For example, the result of

#(extract (alpha (bravo (charlie))) (1 (2 (3))))

is equivalent to the result of

#(def alpha 1)#(def bravo 2)#(def charlie 3)
Discard by assigning to empty

The empty atom may be specified multiple times and any values assigned to the empty atom is discarded; this can be used for partial destructuring. For example, the result of

#(extract ("" (bravo (""))) (1 (2 (3))))

is equivalent to the result of

#(def bravo 2)
No non-atom keys can be assigned

Only atoms indicate values that can be assigned, so a definition of a list-like key such as

#(def (mike november) 123)

has no equivalent extract spec.

false

#(false) -> #( $((SYSTEM boolean) parse-canonical) false )

Returns a false value.

filter

#(filter list function)

Returns a new list collected by including only elements for which function returns true.

flat-for

#(flat-for list function) -> #(flatten #(for list function))

flatten

#(flatten lists)

Returns a single list containing the immediate top-level elements of all parameters. Error if any parameter is not resolvable to a list.

Similar to #(stream ...), but all expressions are evaluated immediately. However, expressions that resolve to streams will not be strictly evaluated; the result of this function is itself a stream if one of the lists is a stream.

for

#(for list function)
// i.e.
#(for list #(func (paramname) expr-using-param))

Map function; returns a new list collected by applying function to each element of list.

If list is a stream, then the result is a stream as well.

force

#(force list)

Returns, as a list, the elements of list after having been strictly evaluated. If list is already a plain list, then it is returned. If list is a stream, a list is returned containing each of its elements as strictly evaluated.

Note that an infinitely-long stream will cause this never to return. To force the first n elements of a stream, call #(force #(take n stream)).

func

#(func (paramspec) preexpr* expr)

Returns an anonymous function that can be used as the first element of a function call list. The produced function accepts the parameters named by paramspec, and returns the result of evaluating expr with the parameters in lexical scope.

paramspec is an extract spec. When the function is called, the parameters given are assigned as if by #(extract ...). For example, after defining examplefn as

#(def examplefn #(func (alpha (bravo "") charlie) #(do-something-with $alpha $bravo $charlie)))

calling

#(examplefn 10 (20 30) 40)

would produce the same result as

#(do-something-with 10 20 40)

preexpr are expressions evaluated before expr. Their results are discarded, so it is only really meaningful to use expressions with side effect here (in particular, local definitions such as #(def ...) and #(extract ...)).

Values returned by this function are (objects) in the namespace (SYSTEM function). They cannot be serialized as keys.

head

#(head list)

Returns the first element of list.

If list is empty, this produces an error.

if

#(if cond exprt exprf?)

Returns exprt if cond evaluates true or exprf otherwise.

exprf is optional and defaults to ().

import

#(import namespace)
#(import namespace name+)

int

#(int 10) -> #($((SYSTEM number) parse-canonical) 10 1)
#(int atom) -> #(int atom #(int 10))
#(int atom radix)
  -> #(int atom #(int radix #(int 10))) if radix is not already an integer
#(int number) -> number iff its denominator value is 1

Parses atom to an integer value, failing if the value does not look like an integer.

In this discussion, an integer value refers to any number value whose denominator part is 1.

  • Whitespace is allowed at the start, at the end, and/or between the sign and the digits.
  • Zero may have a negative sign (the sign is ignored).
  • After the first digit, the first non-digit and all following characters, if any, are truncated.
  • Digits are matched case-insensitively (for radix > 10, e.g. A and a have the same value).

This corresponds with the pattern:

/^\s*(-\s*)?([$digits]+)/i

The radix parameter, if not already an integer value, is converted to one using #(int ...). Its value must be in 2 .. 36. The digit class consists of the first radix characters of the string:

"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"

The value of each digit is equal to its zero-based index in that string (matched case-insensitively).

If a number is passed instead of an atom, the result is either the number itself (if already an integer) or the quotient of its numerator to its denominator truncated toward zero (e.g. 2.9 would round to 2.0 while -2.9 would round to -2.0).

is-function

#(is-function expr)

Returns true iff the expression looks enough like a function that it could be called as one.

This returns true for all return values of #(func ...).

This function does not perform any lookup; for example, even though these are normally equivalent:

#(nop sometext)
#($(SYSTEM nop) sometext)

The following are not:

#(is-function nop) -> false ("nop" is a string value)
#(is-function $(SYSTEM nop)) -> true ($(SYSTEM nop) resolves to a function)

length

#(length list)

Returns, as if via #(int...), the number of elements in the given list.

Note that it is possible for a list to be of infinite length; this function should only be used on finite lists.

lookup

#(lookup name)
#(lookup name test-function)
// i.e.
#(lookup name #(func (fullname) boolean-expr-using-fullname))

Determines whether the given name is available in the lookup path; if so, a list containing the qualified name is returned; otherwise, the empty list is returned instead.

If test-function is provided, a found name will not be returned unless the function, passed the qualified name, returns true. This can be used, for example, to test the next match if the first match is not of the right type.

The namespaces searched by this operation are determined by #(import ...). The lookup path begins as containing only the namespace SYSTEM; for example, if no #(import ...) has been run, the call #(lookup nop) would return (SYSTEM nop).

nonces

#(nonces nonce)
// e.g.
#(nonces $)

Using a nonce as a basis, returns a stream (an arbitrarily long list) of nonce values. The sequence of values in the stream is always the same for a given input nonce.

nop

#(nop comments)
#()

Returns the empty string without attempting to resolve any of the arguments. To be used for comments (e.g. in a triple-quoted string).

not

#(not cond) -> #(if cond #(false) #(true))

Returns the boolean inverse of cond.

num

#(num atom)

Converts a string in a traditional/JSON-like floating-point format to a rational number.

If the string contains a single /, the string is split at that character and each side is parsed individually, and the result is the rational whose numerator is the left-side result and whose denominator is the right-side result. This fails if the right side evaluates to 0.

If the string contains multiple /, this fails.

If the string contains no /, the expected format, after ignoring leading and trailing whitespace, is that of JSON's "number" rule or a superset thereof.

# JSON number format
/^
	( - )?			# sign (opt)
	( (0 | [1-9][0-9]*) )	# int part (req)
	( \. [0-9]+ )?		# frac part (opt)
	( [Ee][+-]? [0-9]+ )?	# exp part (opt)
$/x

After separating a floating-point value into the component parts:

  • Let s = the sign coefficient, being -1 if a negative sign is present or 1 otherwise
  • Let m = the string of digits before the fractional point (or all digits of the significand, if there is no fractional point)
  • Let n = the string of digits after the fractional point (or the empty string, if there is no fractional point)
  • Let e = the integer value of the exponent (or 0, if there is no exponent)

and noting the environment:

  • Let b be the base (radix) in which m and n are defined.

find num and den:

  • Let e′ = e ‌- length(n), where length(n) = number of digits in n. (This is the exponent after the point is shifted off the right.)
  • Let z = A string of max(0, e′‌) zeroes.
  • Let e″ = min(0, e′‌). (This is the exponent after the zero padding.)
  • Then num = (s)(|num|), where |num| = (mnz) (i.e., concatenated) as interpreted as an integer in base-b.
  • Then den = be.

The resulting number is then the ratio of num to den (which is sign-corrected but not necessarily in lowest terms).

The adjustments in this computation first ensure that the significand is an integer by shifting the fractional point off the right, then ensure that the exponent is no greater than zero (i.e., that the denominator will not be a fraction) by zero-padding the significand on the right if the exponent is positive. After these steps, the candidate numerator and denominator are both integers.

set-and

#(set-and list list ...)

Produces a new list containing each distinct element that appears in all given lists. The resulting list is in set-normal form.

set-normal

#(set-normal list)

Produces a new list containing each distinct element in list exactly once and in a deterministic order. The elements of the list are normalized to keys for ordering, and the resulting list contains the elements in their normalized forms.

Implementation note: It might be wise to have a bit on the resulting list structure specifying that this list is set-normal, so that repeat calls on the same list become no-ops.

set-or

#(set-or list list ...)

Produces a new list containing each distinct element that appears in any given list. The resulting list is in set-normal form.

skip

#(skip count list)

Return the list consisting of all items after the first count of list.

count is converted as if by #(int ...).

A negative count produces an error.

A count greater than the length of the list produces an empty list.

A count of 0 produces list itself.

skip-while

#(skip-while testfunc list)

Return the sequence consisting of all elements at or after the first element for which #(testfunc element) has a false result. The returned sequence is a stream. Note that strictly evaluating this stream results in an infinite loop if list is itself an infinite stream containing only values for which the test returns true.

stream

#(stream list-expressions)

Returns a stream, which is a lazily evaluated list. The result is similar to #(flatten ...), except the expressions themselves are not evaluated until needed.

For example, in the following, #(zip-with ...) is not called at all unless some element of index 2 or higher is strictly evaluated. (And #(zip-with ...) itself produces a stream.)

#(def fib
  #( stream (1 1) #(zip-with $fib #(tail $fib) #(func (a b) #(+ $a $b))) )
)

tail

#(tail list) -> #(skip 1 list)

take

#(take count list)

Return the list consisting of the first count items of list.

A negative count produces an error.

A count greater than the length of the list produces list itself.

A count of 0 produces an empty list.

take-while

#(take-while testfunc list)

Return the sequence consisting of all leading elements for which #(testfunc element) has a true result, ending either immediately before the first element for which the function returns false or at the end of the source, whichever comes first. The returned sequence is a stream.

true

#(true) -> #( $((SYSTEM boolean) parse-canonical) true )

Returns a true value.

zip

#(zip list list ...) -> // If zip is in terms of zip-with
  #(zip-with list list ... #(func (elem elem ...) ($elem $elem ...)))

Returns a new list collected by traversing all given lists in parallel and collecting the cross-section into a list at each element. Traversal ends at the end of the shortest list.

zip-with

#(zip-with list list ... function) -> // If zip-with is in terms of zip
  #(for #(zip list list ...) #(func (cross) #(function @$cross)))
// i.e.
#(zip-with list list ... #(func (elem elem ...) ...))

Returns a new list collected by traversing all given lists in parallel and applying function to the cross-section at each element. Traversal ends at the end of the shortest list.

Minimal subset of functions

The Cifl/Protocifl/Tags and unions spec, the initial application for graut, requires only the following functions in the SYSTEM namespace:

  • nop
  • def
  • func
  • for
  • if
  • not
  • true
  • false
  • set-and

It also specifies several functions specific to tags and unions, which would be in another namespace.

Built-in type constructors

number

#( $((SYSTEM number) parse-canonical) numerator denominator )

Returns a numeric value corresponding to the given string values. In Graut, a numeric value is always a finite rational; concepts such as infinity and NaN do not apply.

The canonical value space is defined thus:

  • The value is defined as the ratio of numerator to denominator.
  • numerator conforms to the pattern 0|(-?[1-9][0-9]*), describing a base-10 integer.
    • Zero is represented only by 0 (no additional digits or negative sign).
    • A non-zero value must not have 0 as its first digit.
  • denominator conforms to the pattern [1-9][0-9]*, describing a base-10 integer.
    • The value is never less than 1.
    • The first digit is never 0.
  • The ratio is in lowest terms; i.e. the greatest common divisor of the denominator and the (absolute value of the) numerator is 1.
    • If numerator is 0, denominator must be 1. (This follows from gcd(0,a) = a for a > 0.)

This value space is a superset of both integers and (finite, non-negative-zero) floating-point numbers.

Arbitrary-length integers should be used for the internal representation.

Integers:
0		-> #( $((SYSTEM number) parse-canonical) 0 1 )
123		-> #( $((SYSTEM number) parse-canonical) 123 1 )
-1		-> #( $((SYSTEM number) parse-canonical) -1 1 )

Floats in base-10:
1.234e-5 = 1234e-8 = 1234 / 10^8 = 617 / 50000000
		-> #( $((SYSTEM number) parse-canonical) 617 50000000 )
		
// Approximated 1/3
0.33333333333333333333333333333333 ->
		-> #( $((SYSTEM number) parse-canonical)
			33333333333333333333333333333333
			100000000000000000000000000000000 ) // 10^31
// Approximated pi
3.1415926535897932384626433832795 ->
		-> #( $((SYSTEM number) parse-canonical) 
			6283185307179586476925286766559
			2000000000000000000000000000000 )  // 2 * 10^30, lowest terms

Floats in base-16 (e.g. from Java Double.toHexString())
// Approximated 1/10
0x1.999999999999ap-4
		-> #( $((SYSTEM number) parse-canonical)
			3602879701896397
			36028797018963968 ) // 2^54, lowest terms

Actual rationals:
1/3		-> #( $((SYSTEM number) parse-canonical) 1 3 )
1/10		-> #( $((SYSTEM number) parse-canonical) 1 10 )


When collapsed into a key, a value of this type uses the namespace (SYSTEM number). Note that key ordering will not reflect the numeric value: Exactly equal numbers compare equal as keys, but distinct numbers (even nearly equal ones, such as 1/3 versus 0.33333...) may vary in either direction.

boolean

#( $((SYSTEM boolean) parse-canonical) atom )

Returns a boolean value corresponding to the string value. Error if the input string is not in canonical form.

The canonical form of a boolean value matches the pattern ^(true|false)$. To clarify:

  • If the value represented is true, the string is exactly true (U+74, U+72, U+75, U+65).
  • If the value represented is false, the string is exactly false (U+66, U+61, U+6C, U+73, U+65).

The pattern must match without normalizing the string (e.g. to adjust case, remove whitespace, etc.).

When collapsed into a key, a value of this type uses the namespace (SYSTEM boolean). Key ordering will cause a false value to precede a true value; no other order relationship is defined.

nonce

#( $((SYSTEM nonce) parse-canonical) internal-values )

It is illegal for non-internal code to call this function; it only exists for the purpose of deserializing keys containing nonces. To create a new nonce, using the unfollowed $ sigil notation. The interpreter replaces this with a nonce value that has not yet been produced this run.

The representation used is implementation-defined, as long as it retains its value through conversion to and from a key.

Copies of the same value must compare equal. Otherwise, no ordering is specified. Older values need not be always less than or always greater than newer values. There should be no meaningful conversion to another value type.

When collapsed into a key, a value of this type uses the namespace (SYSTEM nonce). A serialized value should not be seen outside the internals of the implementation. Attempting to deserialize a value from a source outside the internals is an error.