Cifl/Protocifl

From HalfgeekKB
Jump to navigation Jump to search

Names

Namespaces

These are the major namespaces in which qualified names exist:

  • Packages, boxes (variables), functions
  • Type names
  • Annotations
<!-- Possible node structure -->
<pc:qname id="example-cifl"><pc:name>cifl</pc:name></pc:qname>
<pc:qname id="cifl.lang"><pc:package ref="example-cifl"/><pc:name>lang</pc:name></pc:qname>

A type may share a name with a package or a variable, but not both. Note that a (non-raw) struct, union, or enum simultaneously define a type name and a package. (This has the effect of implementing an equivalent to instance fields and static members, respectively.)

Multiple functions may be declared with the same name as long as their signatures are sufficiently different. (See the bit about overloading.)

While annotations exist, Cifl doesn't currently include a syntax for declaring or defining annotations; they are a translator-specific detail.

Mangling

For various reasons, a name or identifier in Cifl does not have any single prescribed transliteration in C. By default, the translator is free to assign any C identifier, multiple such identifiers, or no such identifiers at all to correspond to a given entity in Cifl.

For purposes of readability and debugging, a translator is encouraged to select names that are strongly based on their Cifl equivalents where possible. For example, the name some.package.x might be implemented by a C entity _EXAMPLE_some_package_x.

Because any string is a valid identifier, it is likely to be practical to transform all names in such a way that none is likely to clash with a C keyword or an entity declared in a standard library. For example, a locally scoped `if` in cifl will need a different name in C; _EXAMPLE_if might be a suitable substitute.

The translation will necessarily vary somewhat between C compilers, and this includes name mangling. If a particular C compiler is bounded by heavy length constraints (e.g. that only the first 6 characters of an identifier matter), it may be necessary to abandon actual encoding and instead assign compact C identifiers, then provide a facility to translate back from debug messages as necessary.

(Conversely, if a translator already exists for a given compiler or environment, it would probably be helpful to interoperate unless the existing implementation is grossly deficient in this regard.)

In areas where C interoperability has been enabled (this should probably not be default, but will generally be necessary in places), an entity in cifl may be declared in C under a different name by specifying an annotation:

@cifl.c.lang.name(identifier) // functions, variables, typedefs
@cifl.c.lang.tag(identifier) // tags on structs, unions, enums

Implementations shall not be required to verify the uniqueness of the mapped names, even within the same translation unit, so using these annotations could be dangerous and may require a warning-suppression. Implementations not supporting these annotations shall reject any code that uses them. To produce valid C, any identifier specified must be a valid C identifier for the context; the translator itself need not enforce this.

For implementations that support @cifl.c.lang.name, instances in which it is legal but not used will tend to be equivalent to it having been used with some default name.

In all implementations, raw structs, unions, and enums must not be declared with any tag unless one is specified explicitly; @cifl.c.lang.tag is one such mechanism though others may be legal.

Identifiers

An identifier is a name for some entity relative to some scope. An identifier can be composed of any sequence of characters; however, those characters must appear in a backticked token (like a string literal, but using `` for quotes rather than "") unless the sequence follows rules similar to those of a C identifier (and is not also a lexically important token on its own).

backticked-identifier : ` backticked-identifier-part* `
backticked-identifier-part : <any character except ` or \> | backslash-escape
backslash-escape : \ (` | \ | <others as defined with string literals>)

identifier-bare : identifier-start identifier-part*
identifier-start : _ | $ | <letter>
identifier-part : identifier-start | <number>
  • The pseudo-identifier _ is a keyword but is parsed as an identifier. `_` spells an actual identifier with the same name.
  • Otherwise, where a bare identifier is legal, backticks around the same identifier refer to the same entity; e.g. foo and `foo` refer to the same thing.
  • Backticks may also be used where the same sequence of characters would be something lexically other than an identifier, such as ``, `123`, `if`, `name with spaces`, `+`, `a.b` (which is distinct from a.b, which means `a`.`b`), and so forth.

Qualified and unqualified names

An unqualified name consists of a single identifier (e.g. foo, `some name`).

A qualified name consists of a package name followed by the token . and then finally an unqualified name (e.g. org.example.foo, org.example.`some name`).

Named entities

Package

A package is a namespace within which other entities, including other packages, may be declared. A package may be declared with either a qualified name or an unqualified name. If a package is declared within another package, the inner name is considered relative to the outer name (even if qualified). If a package is declared at the top level of a file, its name is considered top-level, or non-relative.

A package may be declared at the top of a file to be applied to the entire file, similarly to Java or Perl. The contents of a block within a file may also be declared into a subpackage.

package Alpha;
// Code here is in package Alpha
package Bravo {
  // Code here is in package Alpha.Bravo
}
package Charlie.Delta {
  // Code here is in package Alpha.Charlie.Delta
}

The top-of-file declaration is equivalent to a block declaration around the same content.

package Alpha {
  // Code here is in package Alpha
  package Bravo {
    // Code here is in package Alpha.Bravo
  }
  package Charlie.Delta {
    // Code here is in package Alpha.Charlie.Delta
  }
}

The only permitted children of the top-level scope are packages; it is not legal to specify other entities (functions, types, etc.) outside of a package.

The package is strictly a compile-time construct; information about packages is not guaranteed available at run time.

Functions and variables

As in C, a file-scoped variable may be defined, either with public (e.g. external) or file-private (e.g. static) linkage. Package-based access to such a variable is not implemented: a file-private variable may be accessed from a package outside of which it was defined by qualifying the name, as long as that access occurs within the same file; however, no package may access the variable from outside the file, including the package in which it was declared.

Functions share a namespace with variables; their names are effectively treated as constant function pointers.

Typedefs, structs, unions, enums

A typedef declares a name in the current package. It has linkage exclusively at the cifl level, where it is made available to other cifl code unless it is declared file-private.

Cifl unions and enums implicitly combine a type with a package. For example, definitions like

enum ExampleE (FOO, BAR);
union ExampleU (int, double, pair(val a : int, val b : int));

might also be treated in many cases as

struct ExampleE (_ : int);
package ExampleE {
  val FOO = ExampleE(0);
  val BAR = ExampleE(1);
}

union ExampleU (
  @match_as(int) _ : int,
  @match_as(double) _ : double,
  @match_as(pair) _ : ExampleU.pair
);
package ExampleU {
  typedef pair : struct(val a : cifl.lang.int, val b : cifl.lang.int);
}

though the implementation probably wouldn't be quite the same. The context in which the name appears would dictate which interpretation is used.

Annotations

Annotations exist at the cifl level at compile time only. No mechanism is currently defined to declare or define annotations in cifl; their existence has particular significance only to the translator. Annotations that are defined by the language itself appear periodically in the definitions of this language.

Keywords

Lexical keyword tokens

  • break
  • case
  • continue
  • def (non-C): Defines a function or an inline closure.
  • default
  • do
  • enum
  • for
  • if: Unless otherwise noted, unless can be substituted for if anywhere in the syntax.
  • match (non-C): Extended replacement for C switch.
  • redo (non-C): Loop jump (like break and continue) targeting start of body statement.
  • return
  • struct
  • throw (non-C): Return from function (like return) with designated exception condition.
  • typedef
  • union
  • unless (non-C): Synonym for if with negation.
  • until (non-C): Synonym for while with negation.
  • val (non-C): Same as var, but declares a read-only (C const) variable.
  • var (non-C): Declares a read-write (C non-const) variable.
  • while: Unless otherwise noted, until can be substituted for while anywhere in the syntax.

Equivalents to missing C keywords

  • auto: Is implicit wherever available.
  • char: See #Basic types.
  • const: Use val instead of var.
  • double: See #Basic types.
  • else: if-else blocks are replaced by usage of either match void (which uses if and default cases) or by use of the ternary operator (blocks are treated as expressions).
  • extern: Is implicit wherever available.
  • float: See #Basic types.
  • goto: Omitted in favor of structural jumps break, continue, and redo.
  • inline: Is an annotation @inline.
  • int: See #Basic types.
  • long: See #Basic types.
  • register: Is an annotation @register.
  • restrict: Is an annotation @restrict.
  • short: See #Basic types.
  • signed: See #Basic types.
  • sizeof: Is parsed as a function call.
  • static: File-local names may be qualified @private. Persistent variables in functions are omitted in favor of explicitly file-scoped variables (if necessary, this decision could be reversed using an annotation like @static without changing the syntax).
  • switch: Replaced in functionality by match. Note that match doesn't implement fallthrough.
  • unsigned: See #Basic types.
  • volatile: Is an annotation @volatile
  • _Bool: See #Basic types.
  • _Complex: See #Basic types.
  • _Imaginary: See #Basic types.

Operators

Though Cifl is intended to be C-like, it departs slightly in the area of operators to be more like Scala in order to simplify the syntax and generalize operators.

One major change is that there is no longer a postfix form of ++ or --. It's possible that e.g. n++ in C might be notated (&n).next().

An operator character is a character in the set of at least !#%&*+-/:<=>?@\^|~ and would be extended to other appropriate symbols in Unicode.

Unary

The operators in the syntax that can be unary are +, ++, -, --, !, ~, *, and &. With the exception of - and ~, these cannot be overridden. Only ++ and -- mutate their operand, but they are defined in terms of operations that return a value while not performing any mutation. (On primitive types this redirection should be optimized away.)

All unary operators are right-associative.

Expr	Meaning
+N	N as rvalue
-N	N.`-_`()
!N	bool(N) ? false : true
~N	N.`~_`()
*N	Lvalue at address N
&N	Address of lvalue N

++N	N = N.`+1`()
--N	N = N.`-1`()

(Supporting interfaces, which do not mutate N)
N.`-_`()	Returns additive opposite (numeric: 0 - N)
N.`~_`()	Returns binary opposite (integer: N ^ ~0)
N.`+1`()	Returns next value in sequence (numeric: N + 1)
N.`-1`()	Returns previous value in sequence (numeric: N - 1)

Infix

An assignment operator is any operator that

  • Ends with =
  • Is not <=, >=, or !=, and
  • If more than one character long, does not begin with =.

Assignment operators share a single precedence level lower than all other infix operators.

Assignment operators and infix operators ending with : are right-associative. Other infix operators are left-associative.

Ternary

The ternary conditional co-operators ? : cannot be redefined. They have equal precedence and are both right-associative; it is (probably) an error for them not to appear together.

The infix operator ?: is a short form of the operation where the left operand is used as the left and middle operands of the full operator; e.g. A ?: B is equivalent to A ? A : B. Similarly, the assignment operator ?:= is applicable; e.g. A ?:= B is equivalent to A = A ?: B, which in turn means A = A ? A : B.

Note that ?: and ?:= are not special cases of the operator but single-token operators on their own; e.g. ?: does not mean the same as ? :.

Precedence

With some exceptions, operators have a precedence decided by their first character.

(Highest)
N[T]
N(A) N.M N->M
++N --N +N -N !N ~N *N &N
(other**)
N*P N/P N%P (other* [*/%])
N+P N-P (other* [+-])
(other* [:])
N<<P N>>P (other* [<>])
N<P N<=P N>P N>=P
N==P N!=P (other* [!=])
N&P N&&P (other* [&])
N^P (other* [^])
N|P N||P (other* [|])
(other* [?])
A?B:C A?:C
A=B (and other assignment operators)
(Lowest)

* Operators not explicit here that begin with these characters
** All operators not otherwise mentioned

Types

All derivative affixes to an existing type are expressed in prefix notation (contrast C, which uses prefix for struct, union, and enum, postfix for array and pointer types, and both for function types!).

In this document, the word box is used for the entity in C that is called a "variable" regardless of whether it is declared const (which technically makes it non-variable most of the time). For example, in the C statements int a; const int b = 5;, a and b are the names of boxes. (In C parlance they are both "variables", but intuitively—at least to users of normal math—b is not really a variable. We use "box" to account for the discrepancy.)

There areis a distinction between value types, which describe data values, and box types, which describe boxes that hold these value. "Type" in cifl normally refers to a value type, but there are also types beyond value types to classify the boxes that hold values.

  • A value type describes the format and interpretation of data that represent a value.
  • A box type describes what kind of value a box can hold, and also other properties, such as its mutability and volatility, that apply to boxes but not the values themselves.

The properties encoded in a box type are as follows:

  • Value type: The box holds (or can hold) this type of value.
  • var and val (C: absence or presence of const): The value in a writable box can be changed by the code in which it is scoped; the value in a read-only (i.e. non-writable) box must not be changed by this code. A box is declared writable, using the var keyword (which may be omitted in some places), or read-only, using the val keyword instead.
    • In this work, var or var box in this work refers to a writable box, and
    • val or val box refers to a read-only box.
  • @volatile (C: volatile): The value in a non-volatile box is asserted not to ever be changed except by the code in which it is scoped. The value in a volatile box may be changed by some logic outside the scope, such as an ISR or another thread, and must therefore not be cached but reread every time it is used. A volatile box is declared with @volatile. A val may still be marked volatile; if so, then its value might be changed from outside the scope but never from inside.
  • @restrict (C restrict): The value of a pointer in a restrict box (or any expression based thereupon) is asserted to be the only means by which code in this scope will access the storage it addresses. If this property is not set, the underlying C compiler may need to assume that regions of memory addresses by two pointers in scope could overlap, making caching harder to do safely. A restrict box is declared with @restrict; this annotation is not legal unless the value type is a pointer type.

Cifl specifies that a typedef can only map a name to a value-type; the box-type must be defined with the box itself. However, pointers, arrays (as counted array pointers), functions, structs, and C unions abstract their components as boxes, not values, and these components are declared with box types.

// WRONG: This is illegal
typedef const_int : val int; // illegal
var n : const_int = 5;	// A box declared by "var" is guaranteed to be writable
			// since no "const"-type qualifier can sneak in with a
			// typedef.
// RIGHT:
val n : int = 5;
// You may also specify a pointer to a val int
val pn : * val int = &n;
// etc.

(Note that stack arrays are declared with their own notation.)

Syntax summary

value-type :
	pointer-type |
	counted-array-type |
	function-type |
	structure-type |
	union-type |
	enum-type |
	named-type

box-type : vaq? annot* value-type

vaq : ('val' | 'var') ('*' constant-expression)?

pointer-type : '*' box-type

counted-array-type : '++' box-type

// Accounts for C and cifl function types
function-type : 'def' '-'? annot* ('(' type-param-list ')')? ':' value-type

// Ellipsis, despite the syntax, modifies the last parameter in the list,
// so an empty list cannot have an ellipsis.
type-param-list : type-param-list-non-empty?
type-param-list-non-empty : type-param-list-non-empty-fixed '...'?
type-param-list-non-empty-fixed : box-type (',' box-type)*

// Accounts for C and cifl types
structure-type : 'struct' '-'? annot* '(' param-list-fixed ')'

param-list-fixed : (param-spec (',' param-spec)*)?
param-spec : vaq? annot* identifier : annot* value-type


union-type : 'union' (c-union-type-rest | x-union-type-rest)
c-union-type-rest : '-' annot* '(' param-list-fixed ')'
x-union-type-rest : annot* '(' union-members ')'

enum-type : 'enum' (c-enum-type-rest | x-enum-type-rest)
c-enum-type-rest : '-' annot* '(' c-enum-list ')'
x-enum-type-rest : annot* '(' x-enum-list ')'

c-enum-list : (c-enum-item (',' c-enum-item)*)?
c-enum-item : annot* identifier ('(' constant-expression ')')?
x-enum-list : (x-enum-item (',' x-enum-item)*)?
x-enum-item : annot* identifier

named-type : full-name
full-name : identifier ('.' identifier)*

union-members : (union-member (',' union-member)*)?
// union-member : named-type | identifier '(' param-list-fixed ')'
// Factored:
union-member : identifier
	(	('.' identifier)* // treat as named-type
	|	'(' param-list-fixed ')' // treat as local union member
	)

Basic types

Imports

Many types that are keywords in C are changed to identifiers in Cifl. (Unlike C, the syntax is designed specifically not to require types to be keywords.)

Note that size_t is in this list; C doesn't call it a basic type, but it is ubiquitous in C code and its definition is distinct from the basic integers, so it's important enough to be treated as primitive here.

{
	"Cifl" : {
		"lang" : {
			"void" :	["type", "void"],
			"cchar" :	["type", "char"],
			"scchar" :	["type", "signed char"],
			"short" :	["type", "short"],
			"int" :		["type", "int"],
			"long" :	["type", "long"],
			"llong" :	["type", "long long"],
			"bool" :	["type", "_Bool"],
			"ucchar" :	["type", "unsigned char"],
			"ushort" :	["type", "unsigned short"],
			"uint" :	["type", "unsigned int"],
			"ulong" :	["type", "unsigned long"],
			"ullong" :	["type", "unsigned long long"],
			"float" :	["type", "float"],
			"double" :	["type", "double"],
			"ldouble" :	["type", "long double"],
			"size_t" :	["type", "size_t"],
		},
		"math" : {
			"floatc" :	["type", "float _Complex"],
			"doublec" :	["type", "double _Complex"],
			"ldoublec" :	["type", "long double _Complex"],
		}
	}
}

Derived types

Pointer/array types

Cifl type	Equiv C typedef   	Notes

* Q		typedef Q *Foo		complete
* _		typedef void *Foo	any pointer (possibly same as *(void))
// examples
var p1 : * int; // int *p1;
var p2 : * val int; // const int *p2;
var p3 : * @volatile int; // volatile int *p3;
var p4 : * _; // void *p4;

Q is an optionally qualified type, which may be complete or incomplete.

If the following annotations are found, they are applied to the lvalue-type of the parameter:

  • @cifl.lang.restrict, which enables the restrict modifier in C. If present, value-type must be a pointer type itself.
  • @cifl.lang.volatile, which enables the volatile modifier in C.

(NB: The const modifier from C is encoded in the use of val instead of var.)

++ Q		typedef const size_t Foo__length; typedef Q * const Foo__data;

This type is a counted array pointer; it stands in for both a pointer and a corresponding length. The exact implementation may vary, but this will usually stand in for two distinct boxes, providing an abstraction for when both values are available, such as with C99 variable-length (runtime-length would be more accurate) arrays as well as the parameters of int main(int argc, char** argv), which might be something like def main(args : ++ *cchar) : int instead.

++ Q degenerates to * Q automatically.

Stack arrays

Stack arrays are treated a bit differently than dynamic arrays because of their distinct meaning.

In C, declaring an array A of S elements of box type E doesn't quite declare a box as much as it declares an immutable name A by which S boxes of box type E may be systematically accessed using pointer arithmetic.

  • "const" doesn't apply: A is naturally immutable; it cannot be reassigned. Even if it were assignable, the conversion rules in C state that an array name used as assigned value is converted automatically to a pointer. Therefore, for example, you can't assign an int[3] to an int[3] because it is converted to an int* automatically.
  • "volatile" doesn't apply: The environment, if it treats A as a variable at all, won't reassign it.
  • "restrict" doesn't apply: An array type is not a pointer type, even if it converts to one.

Because of this, it'd be pretty weird for cifl to treat the array type as a value type. So, we won't. Here's what we do instead:

C Cifl Notes
int a[5]
var*5 a : int
complete
int a[]
var*_ a : int
incomplete
const int a[5]
val*5 a : int

This notation eliminates the problem by making the static array a non-type. Instead of declaring a single array, you declare some number of boxes directly. The name you provide for the boxes becomes a read-only counted array pointer pre-set to the corresponding location.

Intuitively, for example, the phrase

var*3 @volatile x : int;

is semantically equivalent to

val x : ++ @volatile int = <counted array pointer of length 3>;
var @volatile x_0 : int;
var @volatile x_1 : int;
var @volatile x_2 : int;

except that x_0, x_1, and x_2 are invisible.

These semantics are similar to those in C, where the same array is declared

volatile int x[3];

Function types

Cifl makes no use of C function types, because in practically every circumstance of utility C itself automatically converts such values to function pointers. A Cifl raw function type refers directly to a C function pointer type.

Cifl type		Equiv C typedef

def- : R		typedef R (*Foo)(void)
def- (T) : R		typedef R (*Foo)(T)
def- (T,U) : R		typedef R (*Foo)(T,U)
def- (T,U,_...) : R	typedef R (*Foo)(T,U,...)
def- (T,U,V...) : R	typedef R (*Foo)(T,U,_COUNTED(V))
			where _COUNTED(V) indicates Cifl +(V)
			(i.e. same data structure as Cifl
				def-(T, U, ++V)
			)

The - qualification causes the type to correlate directly to a C function.

The last parameter may be specified as a normal parameter type followed by .... If it is, then the function accepts typesafe Cifl varargs, which are passed in by reference using a pseudo-array bounded pointer, making it iterable by conventional array traversal.

The last parameter may alternatively be specified as _ ... (i.e., using the special type name bare _ without specifying val or var or any annotations that cannot be used with ...). If it is, then the function accepts (type-unsafe) standard C varargs and will provide some abstraction of va_list to retrieve them.

The return type (R) cannot be a C array type or function type. (Note, however, that it can be an array pointer or function pointer.)

Parameter types (T, U) may only specify the @register storage class. In a definition, after adjustment, no incomplete types are allowed (whether this applies to the function type itself is not yet determined).

C context for adjustments follows: In C, an array-typed argument is automatically degraded to a pointer to the element type; if static is also specified (with the other type qualifiers inside the brackets), the array must have at least the specified number of elements. A function-typed argument is automatically degraded to a pointer to the function type. In non-definitions, incomplete types are allowed (implementing functions must then complete them), as are variable-length arrays of length * (a C99 feature in which auto arrays may have a length that is not determined until run-time; these can be used as parameters whereby the length is specified as some expression computable from any names in scope, including parameters that precede it; wildcard * may be substituted in non-definitions).

TODO: Requirements for R, T, U.

Cifl type		Similar Cifl type
def : R			struct(context: *_, run: def-(*_) : R)
def(T) : R		struct(context: *_, run: def-(*_,T) : R)
def(T,U) : R		struct(context: *_, run: def-(*_,T,U) : R)
def(T,U,_...) : R	struct(context: *_, run: def-(*_,T,U,_...) : R)
def(T,U,V...) : R	struct(context: *_, run: def-(*_,T,U,V...) : R)

A Cifl function type pairs a C function pointer with a context object on which to run it. A call F(a,b) to some contexted function F would be rewritten by the compiler to prepend the context to the parameters; given the above proposed structure, it might be rewritten as F.run(F.context, a, b).

The meanings of varargs are the same as with def-.

Structure types

Cifl type		Equiv C typedef

struct(a:A)		typedef struct { A a; } Foo
struct(a:A,b:B)		typedef struct { A a; B b; } Foo
struct(a:A,_:B)		typedef struct { A a; B; } Foo
struct(val a : A)	typedef struct { const A a; } Foo
struct(
 @bits(8) a : int)	typedef struct { int a : 8; } Foo
struct()		typedef CIFL_EMPTY_STRUCT_T Foo

The members of a struct are specified much like the type parameters above, except that they also have names. The name part of a param spec is not optional, but bare _ may be specified to omit a name for the member.

The lvalue-type annotations (i.e., for restrict and volatile) are specified before the identifier, not after the :. A bitfield size for a member is specified using the @cifl.lang.bits annotation on the lvalue.

It is explicitly legal to specify an empty struct in Cifl, though C99 forbids an empty structure. For a strict C99 mode, the compiler may produce a type that has a nominal dummy non-empty interpretation, such as

// (dummy implementation for empty struct may vary)
typedef struct { char _; } CIFL_EMPTY_STRUCT_T;

Unions

Cifl type		Equiv C typedef

union-(a:A)		typedef union { A a; } Foo
union-(a:A,b:B)		typedef union { A a; B b; } Foo
union-(a:A,_:B)		typedef union { A a; B; } Foo
union-(val a : A)	typedef union { const A a; } Foo
union-(
 @bits(8) a : int)	typedef union { int a : 8; } Foo

A raw C-style union is specified in exactly the same manner as a struct, except that union- is substituted as the keyword.

Cifl type		Equiv C typedef

			// The following presumes this definition:
			#define UNION_MEMBER(T,name) struct { CIFL_UNION_TAG_T __tag; T __value } name
union(int)		typedef union { CIFL_UNION_TAG_T __tag;
				UNION_MEMBER(int, cifl_lang_int); } Foo
union(int,long)		typedef union { CIFL_UNION_TAG_T __tag;
				UNION_MEMBER(int, cifl_lang_int);
				UNION_MEMBER(long, cifl_lang_long); } Foo
union(int,long,		
  a(x : int))		typedef struct { int x; } Foo_a_type;
			typedef union { CIFL_UNION_TAG_T __tag;
				UNION_MEMBER(int, cifl_lang_int);
				UNION_MEMBER(long, cifl_lang_long);
				UNION_MEMBER(Foo_a_type, a); } Foo
union(z())		typedef CIFL_EMPTY_STRUCT_T Foo_z_type;
			typedef union { CIFL_UNION_TAG_T __tag;
				UNION_MEMBER(Foo_z_type, z); } Foo
union(
  union(int),
  union(long))		// same as union(int, long)

The cifl "union" is a bit of a misnomer; more appropriate names might include "type superposition", "type choice", or "mutual exclusion type". But the latter already has a better known meaning in computer science (the "mutex"), and the others are hard to shorten meaningfully. The result is just close enough to a C union that people should still hopefully understand the word "union". (In CS, this is usually called "tagged union" or "discriminated union".)

A cifl union type specifies a number of forms it can take, much like a C union. Unlike a C union, however, a value of the union type is only observable in its origin type and none of the others (that's the superposition bit). A tag value internal to the implementation serves to distinguish one of the multiple values stored in the same place.

All cifl union type members must be named by identifiers. For example, it is legal to typedef struct(a:int,b:int) to the name ABPair and then use ABPair as a union member; however, you cannot directly specify struct(a:int,b:int) as a member.

If a type name is specified as a member, that member has no name and is extractable solely by its type name. When using a match to extract the value, the type's local identifier (i.e., without leading namespace) may be substituted for its full name unless there are multiple members that share the name. If this is the case, an alias may be specified for one or more of the members, or one or more of the members may be omitted from the running by assigning the alias bare _.

// Wrong
union(cifl.lang.int, some.other.int) // cannot match using name int; ambiguous
union(int : cifl.lang.int, some.other.int) // illegal; explicit alias clashes with some.other.int
// Right
union(cifl.lang.int, _ : some.other.int) // int matches cifl.lang.int, match other using qualified name
union(cifl.lang.int, oint : some.other.int) // int matches cifl.lang.int, match other using name oint
union(a : cifl.lang.int, b : some.other.int) // a matches cifl.lang.int, b matches some.other.int, int does not match

Each member having a distinct qualified type serves a goal of seamlessness between union types.

  • A union typed variable may unconditionally be assigned any value of a union type that has the same members or a subset of the same members, as determined by their qualified names (and not by their matching aliases). No explicit inheritance is required.
  • A pattern match can, instead of only picking out a single member, separate a union value into entire subtypes to be processed separately.

To add a second member with the same type name as some other member, there are options.

  • You might typedef the type to another name and use that name. Unlike in most C contexts, a typedefed name is not a transparent local alias; it would be treated as a distinct type while having an identical representation. (The degree to which this is true is subject to change later.)
  • You may specify a new struct type with the desired type as its sole parameter.
  • You may specify an inline structure member, which combines aspects of these two approaches.
// Wrong
union Foo (cifl.lang.int, otherInt : cifl.lang.int) // Cannot have two members of same qualified type
// Right
union Foo (cifl.lang.int, otherInt(_: cifl.lang.int)) // One member is cifl.lang.int, other is Foo.otherInt

// somewhat similar to
//	package Foo { typedef otherInt : cifl.lang.int; }
//	union Foo (cifl.lang.int, Foo.otherInt)
// except that this probably won't be legal (package and union can't share a name)

If the parameter list has only one member (other conditions may apply), the member should be implemented internally using that member's type. If the parameter list has zero members, storage for that member need not be implemented at all (the internal tag will still distinguish it). If the parameter list has multiple members, the member will be represented by a struct of the same layout.

Enums

enum.raw(A)		typedef enum { Foo_A } Foo;
enum.raw(A,B)		typedef enum { Foo_A,Foo_B } Foo;
enum.raw(A,B(10),C)	typedef enum { Foo_A,Foo_B=10,Foo_C } Foo;
enum.raw()		typedef enum { __dummy_constant_name } Foo;

A raw C-style enum is implemented directly as a C enum. Within cifl, a member of an enum is addressed as such and is not declared directly into the namespace of the enum itself. A raw-enum member has a pseudo-method .value() that returns its integer value; the member itself is a typesafe instance of the enum.

Cifl type		Equiv Cifl union
enum(A)			union(A())
enum(A,B)		union(A(), B())
enum(A,B,C)		union(A(), B(), C())
enum()			union()

A cifl enum is merely a short form of the cifl union in which all members are nonexistent singleton types. Of particular importance is that a cifl enum member is not at all convertible to an integer. Also, all members are distinct and duplicates are not permitted. As with other unions, enums may be folded into larger unions (using a new union()), and its member types may be individually addressed.

void and other singletons

As described earlier, cifl provides a straightforward way to declare singletons (here, a type having only one possible value). For example, if you specify the enum

typedef Foo : enum(Sing) // or union(Sing())

Then Foo.Sing is a type, a box of which can only contain the value Foo.Sing(). This is of little use other than as a kind of atom. It gets generally more useful when grouped with other singletons into a union (or enumeration). For example, given the declaration

typedef Foo : enum(Sing, Sang, Sung)

then a variable of type Foo may be assigned Foo.Sing(), Foo.Sang(), or Foo.Sung(). Meanwhile, each of Foo.Sing, Foo.Sang, and Foo.Sung are also types themselves, each a singleton allowing only its corresponding value.

The following applies to the individual types, not to the enum or union types where they were instantiated.

  • Since only one value is possible, a box declared to be of a singleton type is effectively immutable (even if declared var) and non-volatile (even if declared @volatile).
    • Any use of the box as an rvalue may be replaced with the type's sole value.
    • If the box is declared var, assignment to it is legal where it would be otherwise (e.g. types must agree) but a no-op.
  • A parameter or box declared to be of a singleton type need not have an actual representation, or one that is of nonzero length, in the translated code. However, some representation may be necessary to preserve the semantics of cifl (for example, if parameter-directed overloading is used, the code will need some discriminator to determine which of the alternatives to use even if singleton parameters are elided).

In cifl, void means basically everything it does in C: It's a zero-length type, and thus has one distinct value (i.e., it is a unit type), but pretends in a few places to no value at all. Cifl does a little more to embrace its unit-ness, however, making it hopefully more useful.

(Cifl has no type that is actually void (non-valued). The cifl enum() would be an actual void type, with no possible values, were it legal, but it is not.)

One can think of void as if it had been declared thus:

package cifl.lang {
  typedef @private _void_singleton_holder_ : enum(void_singleton_type)
  typedef void : _void_singleton_holder_.void_singleton_type
}

Or, depending on how the language progresses, it might even be

package cifl.lang {
  enum(void)
}
  • A block containing no expressions, e.g. {}, evaluates to the sole void value, cifl.lang.void(), at compile time. This value will usually be referred to as {}.
  • A parenthetical list with no elements, e.g. (), evaluates to {} at compile time.
  • The result value of several language constructs is, or is a union containing, the void type.
    • if results in its body if the test succeeds, or {} if it fails.
    • A result is made optional by adding void to the union of alternatives. (Unlike null in the languages that have it, {} is not a possible value by default.)

In most cases, void acts like any other singleton.

Literals

Numeric

Integer

// Decimal notation
12345
// Hexadecimal notation
0xface = 64206
// Generalized notation
10$12345 = 12345
16$face = 64206
8$022 = 18
36$LFLS = 1000000

The decimal notation is essentially identical to C, except that such a number can have a leading zero because the leading-0 octal from C is not present here.

The hexadecimal notation is essentially identical to C.

A new addition is the generalized notation, which leads with a prefix 2$ for base-2, 3$ for base-3, etc. to 36$ for base-36. The digits in a given base are 0 to 9 at face value, followed by A through Z (case-insensitive) for dec 10 to dec 35.

Floating-point

// Traditional notation
12345.6789
.6789
123.456e7 = 1234560000.0
123456000e-3 = 123456.0
// Generalized notation
10$12345$-3 = 10$12345 * 1e-3 = 12.345
10$12345.6789 = 10$123456789$-4 = 12345.6789
16$face123$-3 = 16$face123 / 16**3 = 262988067.0 / 4096.0 ≈ 64206.07104492188
16$face.cafe$+4 = 16$facecafe$0 = 4207856382.0

The traditional notation from C mostly holds. Note that a trailing point may not be used; e.g. 12345. is not valid but 12345.0 or 12345e0 indicate the intended number.

The generalized notation also applies here. Since it is not implemented directly in C, it shall be interpreted in the following way:

  • Normalize:
    • Recover the trailing exponential portion: A number B$S, without the trailing exponential portion, is equivalent to B$S$0.
    • Move any fractional portion into the exponent: A number B$W.F$E, with a fractional point, is equivalent to B$WF$E-len(F), where WF is W concatenated to F and E-len(F) is E minus the number of digits in F.
    • Truncate excess trailing zeros (optional): A number B$SZ$E, where S is nonzero and contains no fractional point and Z is one or more trailing zeros, is equivalent to B$S$E+len(Z), where E+len(Z) is the number of trailing zeros removed.
  • The value of a number B$S$E, where S contains no fractional point, is the rational value that is the product of the integer expressed by S (in base-B digits) multiplied by B raised to the E power.
    • Equivalently, the value of B$S$E is equivalent to the value B$S times the rational value B$1$E, where B$1$E = BE.
    • Equivalently, the value of B$S$E is equivalent to the value B$S over the rational value B$1$-E, where B$1$-E = 1 / (BE).

Example:

16$feed.face00
= 16$feed.face00$0 // recovered exponent
= 16$feedface00$-6 // moved fraction into exponent
= 16$feedface$-4 // removed trailing zeros
= 16$feedface * 16$1$-4 // separated value from exponent
= 16$feedface / 65536 // evaluate scale
= 4277009102 / 65536 // rational value of literal
≈ 65261.97970581055 // approximate floating value converted from literal

Notes:

  • The translator should probably encode this into C by substituting the entire rational form inline (for the above example, using something like 4277009102.0 / 65536.0). The C compiler can perform constant folding on this.
    • The rational should possibly be reduced to its lowest terms first. The above rational, for example, could also be expressed 2138504551.0 / 32768.0.
  • If the rational form is not used, it would probably then be for the best to oversample the rational, using arbitrary-precision math, to a few digits beyond the max precision expected to be used by the C platform. This is especially important in cases where a base-2n number has to be expressed in base-10; the added precision makes it more likely that the correct value is used internally.

Inferred types of literals

The numeric literals (and operators) in cifl use a special system to define where the work of type inference ends and the work of numeric promotion begins.

  • If a value has a type of "weak T", that means that a type inferred from the value is T, but otherwise the type is not defined. In general, "weak T" implies "T or higher-ranking type".
  • If the value is specified as integer (i.e. without a fractional or exponential part), the inferred type is the higher ranking of "weak int" or "weak T" where T is the narrowest integer type that can hold the value. For example, var n = 1000; specifies an int, but var n : long = 1000; specifies a long.
  • If the value is specified as floating-point (i.e. with a fractional part, and exponential part, or both), the inferred type is "weak double". For example, var x = 1.234; specifies a double but var x : ldouble = 1.234; specifies an ldouble.
  • When comparing types, T and "weak T" are at the same rank.
  • Ranking includes ldouble > double > float and llong > long > int > short > scchar.
  • If one operand is a floating type and the other an integer type, the integer type is ranked lower than the lowest floating type (i.e. the integer operand is converted to floating).
  • In a binary operation where two operands with types A and B must be promoted to match, the following is used:
    • If A and B are either both weak or both non-weak:
      • If A < B, promote A to B. Result type is B.
      • If A > B, promote B to A. Result type is A.
      • If A = B, do nothing. Result type is A = B.
    • If A is "weak T" and B is non-weak,
      • If T > B, promote B to A. Result type is A.
      • If T <= B, promote A to B. Result type is B (which is non-weak).
  • An arithmetic operation that accepts weak types may result in a weak type that depends on the types of its operands.

To compare a place where C and Cifl behavior differ:

int n = 0x100000000000 + 1; // C; compiler warning, value = 1
var n = 0x100000000000 + 1; // Cifl, value = 0x100000000001

The initial cifl interpretation is something like the following embellished code:

var n : infer? = weak_?(weak_long(0x100000000000) + weak_int(1));

The weak long and the weak int are the operands of a + operation, so the weak int is promoted and the result type becomes weak_long.

var n : infer = weak_long(weak_long(0x100000000000) + weak_long(1));

At the top, the inferred type is now defined to be long.

var n : long = weak_long(weak_long(0x100000000000) + weak_long(1));

In long context, a weak_long becomes a long. This works its way back down.

var n : long = long(weak_long(0x100000000000) + weak_long(1));
var n : long = long(0x100000000000) + long(1);

Finally, this is encoded into C:

long long n = 0x100000000000LL + 1LL;

On the other hand, if we specify the type to be int from the top, it works its way down from there.

var n : int = weak_?(weak_long(0x100000000000) + weak_int(1));

The unknown weak type takes its actual type from the specification.

var n : int = int(weak_long(0x100000000000) + weak_int(1));

The weak-typed operands take on the result type of the operator.

var n : int = int(0x100000000000) + int(1);

Encoded into C:

int n = (int)(0x100000000000LL) + 1;
// or
int n = 0x00000000 + 1;

Things to note:

  • The type specified under "weak" has no impact whatsoever unless the context in which it is defined infers its type.
  • Once the type of the context is inferred, operation proceeds as if the inferred type had been specified originally.
  • The C "L" and "U" suffixes don't apply on literals. To express a value that is always a long or an unsigned int, use long(value) and uint(value), respectively. Usually (we hope), the translator will instead just guess correctly most of the time.

The top

Syntax overview

top :
	package-head (
		// package as line
		';' package-body-elements |
		// package as block
		package-body-block
	) |
	// package name omitted
	package-body-elements

package-head : 'package' package-name

package-body-block : '{' package-body-elements '}'

package-name : identifier ('.' identifier)*

package-body-elements : import-decl* package-body-element*

package-body-element :
	package-head package-body-block |
	box-def |
	type-def |
	function-def

type-clause : ':' annot* type

function-def :
	'def' annot* identifier formal-params? type-clause? ('=' expr-stmt | block)

expr-stmt : non-block-expr ';' | block-expr

// Types and annots are distributive; assignments are not.
// If no type is specified, all names must be initialized and each will be
// type-inferred individually.
possibly-inited-ident : identifier ('=' expr)?
possibly-inited-idents : possibly-inited-ident (',' possibly-inited-ident)*

// vaq : ('val' | 'var') ('*' constant-expression)?
box-def : vaq annot* possibly-inited-idents type-clause? ';'

type-def : 'typedef' annot* identifier (',' identifier) type-clause ';'

Packages

// Top of file
package some.pkg;

// Mid-file
package mid {
  // here is some.pkg.mid
  package inner {
    // here is some.pkg.mid.inner
  }
}

The current package is where the names of functions, types, boxes, and subpackages in this scope are defined.

If there is no current package (i.e. the current location is not inside a package block and no top-of-file package was specified), the current package is an anonymous package that is not (trivially) addressable. Package blocks defined therein are named relative to the root rather than as subpackages to the anonymous package.

Packages are notated hierarchically for convenience; a package has no practical parent-child relationship with its subpackages.

Packages do not provide access separation; names defined within one package are accessible from a completely different package by qualifying or importing the name. However, annotating an identifier declared at the package level with @cifl.lang.here will prevent it from being linked externally; it will instead be available throughout the file in which it is defined.

Specifying a package at the top of a file, single-line style, is equivalent to surrounding the entire remainder of the file in a package block indicating the same name.

Imports

Functions

In general, and not just the top level, the syntax for a function is thus:

'def' annot* identifier? formal-params? type-clause? ('=' expr-stmt | block)

The identifier names the function. In fact, it is a shorthand for an assignment of an unnamed function expression. For example, the following are equivalent:

def someFunc(a : int, b : int) { return a * b; };
val someFunc = def(a : int, b : int) { return a * b; };

In the above example, someFunc is inferred to have the type def(int,int):int.

For those who like functions to seem more mathematical, the body of the function may be specified as an expression. The following is equivalent to the above version:

def someFunc(a : int, b : int) = a * b;

In fact, the block is an expression itself, so this also works:

def someFunc(a : int, b : int) = { a * b; };

The formal parameters are scoped as if they were declared at the top of the body block (or, if a non-block form is used, a block coterminous with the body).

If a formal parameter is only present as an unused part of a function interface, it can be ignored by using bare _ as its name.

val justReturnA : def(int,int):int = def(a : int, _ : int) = a;

Boxes

var n = 3;
var a, b, c : int;
val (p, q) = (5, 6); // Same as val p = 5; val q = 6;
val (r, s) : struct(int, int) = (7, 8); // Same as val r : int = 7; val s : int = 8;
val*_ v = (10,11,12,13,14);

As described elsewhere, box declarations are performed using var for variables and val for variable-like read-only values.

Each of these statements defines the type, the value, or both of one or more vlpatterns. An vlpattern is either an identifier or a parenthesized tuple of vlpatterns.

vlpattern : identifier | '(' (avlpattern (',' avlpattern)*)? ')'
avlpattern : annot* vlpattern

box-def :
  ('var' | 'val') (box-rest-d | box-rest-array)
box-rest-array :
  '*' expr annot* identifier type-clause? ('=' expr)? ';'
box-rest-d :
  annot* vlpattern (',' vlpattern)* type-clause? ('=' expr)? ';'

If a type clause is present, it applies to each vlpattern that is directly inline. If a vlpattern is a tuple, the type applies to the whole tuple, not to each of its elements. These are equivalent:

var a, b, c : int;
var a : int; var b : int; var c : int;
var (a, b, c) : struct(int, int, int);

If exactly one vlpattern is supplied inline, any value provided is assigned to that value. If that vlpattern is a tuple, the value is expected to be a tuple also.

var a : int = 1; // Like var a : int; a = 1;
var (a, b, c) = (1, 2, 3); // Like var a = 1; var b = 2; var c = 3;

If two or more vlpatterns are supplied, they are assigned as if they were tupled together.

var a, b, c : int = (1, 2, 3); // Like var a : int = 1; var b : int = 2; var c : int = 3;

The stack array form is intended to be equivalent to the individual elements as vlpatterns. For example, the statement

var*5 ary : int = (p, q, r, s, t);

is intended to be like

var ary_0, ary_1, ary_2, ary_3, ary_4 : int = (p, q, r, s, t);

except that the resulting vars are addressed by index rather than name.

It is possible to use bare _ as a vlpattern identifier in order to extract and then discard a value from a structure.

var (_, _, _, h, m, s) = getYearMonthDayHourMinuteSecond();

Recursively, the identifiers in each vlpattern are declared as if declared separately with the val or var keyword used and with any annotations immediately following that keyword.

Types

See value-type syntax summary.

Statements

As expressions

All statements should be evaluable to expressions. The semantics of this shall not be different from C unless the statement is actually used in a non-void context.

Kinds

Labels

C's "labeled statement" does not exist in cifl. The closest equivalent is the loop label.

Blocks

'{' (declaration | statement)* '}'

A block is a combination of a declaration scope and a sequence of statements.

The expression value of a block is the expression value of the last statement therein that runs.

  • If the block contains no code, its return type is cifl.lang.void.
  • If the block contains statements but no jumps, its return type is the type of the last contained statement.
  • If the block contains any unsuppressed statements whose return type is/contains a jump type, its return type is the union of the types of all such statements, plus the type of the last statement that would result if all optional jumps are missed.

Loops

WHILE: 'while' | 'until'
do-loop: 'do' statement WHILE expr ';'
while-loop: WHILE expr statement
for: ...

A loop statement suppresses redo, continue, and break jumps, but only if the jump target is this loop's label.

If a redo jump is suppressed, then the body is rerun.

If a continue jump is suppressed, the loop body ends as if it had ended on its own (continuing to the increment step of for or the test step of others).

If a break jump is suppressed, then the loop statement ends.

The return value of a loop is the return value of its body statement, excluding any jump types that have been suppressed.

Jumps

A jump statement is a statement that automatically evaluates to a certain jump type. An evaluated expression of a jump type automatically exits any contexts in which it is evaluated unless a statement containing the expression suppresses it.

redo LABEL
continue LABEL
break LABEL
return EXPR?
throw EXPR

A loop suppresses redo, continue, and break jumps whose target label matches its own; it propagates others.

A return jump propagates out of a function to the calling scope, where the call that caused it then implicitly suppresses it.

A throw jump propagates out of a function, the calling function, and so forth until it is suppressed with a catch mechanism.

(C implementation: If a function returns but does not throw, the return jump is suppressed at the function and converted into a normal C return value. If a function returns and throws, its C return value is a union of the return and throw jump types; the caller unpacks the value and, if it is a throw jump, cooperatively rethrows.)

A function itself cannot propagate redo, continue, or break. However, an exception might be made for some functional programming syntax wherein these jumps might be a legal way to express loop control.

Expressions

Tuples
()
(expr1)
(expr1, expr2)
(expr1, expr2, expr3)
...

A tuple is essentially a literal value made up of zero or more comma-separated values in parentheses.

A tuple expression with zero elements has an inferred type of void. If that inference is used, the value of the expression is equal to void().

A tuple expression with at least one element has the same inferred type as the last expression it contains. If that inference is used, or if the context does not expect a tuple, the value of the expression is equal to the value of that last expression. For example, the sequence (a += b * c, d -= e / f, g) would evaluate to g (but would still execute the previous elements for their side effects).

The context may expect a tuple specifically, however, in which case all elements, not just the last, would be utilized.

A tuple may be used to initialize multiple values specified with var or val:

var z = (1, 2, 3) // Similar to (1, 2); var z = 3
var (a, b, c) = (1, 2, 3) // Similar to var a = 1; var b = 2; var c = 3;
var*_ v = (1, 2, 3) // Create stack array of 3 elements valued 1, 2, 3

If the left side of an assignment is of a (mutable) structure type that has already been inferred, the right side need not necessarily include the name of the type, either. If the type is anonymous, this is the only way to perform initialization or assignment.

typedef P : struct(m : int, n : int)

// To infer from pctor name
var p = P(11,22)

// To assign when type has been made explicit
var p : P = (11, 22)

// To alter after init
p = (33, 44) // same as p = P(33, 44)

// With an anonymous type
var q : struct(m : int, n : int) = (11, 22)
q = (33, 44)

If the left hand side of an assignment is of a union type, the type of a right-side tuple is always as inferred; that is, the right side value is either the final expression value in the tuple or, if empty, {}. This is because a specific type name is required to assign to a union.

Block
{}
{ expr }
{ expr; expr }
...

The result value of a block depends on its contents as they exist after ignoring superfluous empty statements.

  • If there are no contents, the result type is void and the result is the value {} (i.e., void()).
  • Otherwise, the result type and value are those of the final expression evaluated before leaving the block.
    • Blocks cannot be the addressee of a jump. Any jump that would abruptly terminate a block would necessarily terminate past any expression whose type is inferred from it to a structure or loop containing it. For example, val a = { val z = b() + c(); if z == 0 break; z }; return a + 1 would either assign a value to a or exit the scope in which a can ever be used, so the subsequent code in the block may assume that a has been set.

Branching

Cifl if works mostly like in C as long as there is no else clause.

IF: 'if' | 'unless'
if-stmt : IF expr statement

Note that parentheses are optional around the condition. Note also that "unless" may be substituted for "if" to negate the meaning.

In order to simplify the grammar, there is no direct analogue to the if-else syntax. However, since statements are expressions in cifl, it is legal to use the ternary operator as a substitute.

(x < LOWER_LIMIT) ? {
  log("Below lower limit; fixing");
  x = LOWER_LIMIT;
} : (x > UPPER_LIMIT) ? {
  log("Above upper limit; fixing");
  x = UPPER_LIMIT;
} : {
  log("Level is okay");
};

Alternatively, one might use a form of the match statement:

match _ {
  if x < LOWER_LIMIT {
    log("Below lower limit; fixing");
    x = LOWER_LIMIT;
  }
  if x > UPPER_LIMIT {
    log("Above upper limit; fixing");
    x = UPPER_LIMIT;
  }
  default {
    log("Level is okay");
  }
}

The match statement subsumes the C switch and if-else, mixing in a type-safe polymorphic casting facility and sundry other functionality.

match : 'match' expr ( case | '{' case* '}' )
case : case-with-spec | case-without-spec
case-with-spec : case-spec guard? case-body
case-without-spec : guard case-body
case-spec : 'case' case-pattern | 'default'
alias-all : identifier '<-'
guard : IF expr
case-body : <a statement not starting with 'if' or 'unless'>

case-pattern :
	<prefer> alias-all unaliased-case-pattern |
	unaliased-case-pattern

unaliased-case-pattern :
	case-pattern-term (',' case-pattern-term)*

case-pattern-term :
	<prefer> identifier ('(' extractor-terms ')')? |
	constant-expression

extractor-terms : (extractor-term (',' extractor-term)*)?

extractor-term :
	<prefer> identifier |
	expression

If the match expression is an integer expression, match works in a virtually identical way to C switch, except that fallthrough doesn't happen; at the end of the statement the match is exited.

match n {
  case 0 log("Zero");
  case 1 log("One");
  case 2 log("Two");
  default log("Not zero, one, or two");
}

Each case may have an optional guard condition. Matches are made first on the case expression, unordered, and then on the guards, in the order specified.

Here's an atrocious example:

match n % 3 {
	case 0 if n == 0 log("Zero");
	case 1 log("Positive and one more than a multiple of three");
	if n < 0 log("Negative and two more than a multiple of three");
	case 0 if n < 0 log("Negative multiple of three");
	case -2 log("Negative and one more than a multiple of three");
	case 0 log("Positive multiple of three");
	default log("Positive and two more than a multiple of three");
}

The following is exactly equivalent:

match n % 3 {
	case 0 if n == 0
		log("Zero");
	case 0 if n < 0
		log("Negative multiple of three");
	case 0
		log("Positive multiple of three");
	case 1
		log("Positive and one more than a multiple of three");
	default if n < 0 // or just "if n < 0"
		log("Negative and two more than a multiple of three");
	default
		log("Positive and two more than a multiple of three");
	case -2
		log("Negative and one more than a multiple of three");
}

And this, in turn, is equivalent to:

match n % 3 {
	case 0 {
		(n == 0) ? {
			log("Zero");
		} : (n < 0) ? {
			log("Negative multiple of three");
		} : {
			log("Positive multiple of three");
		};
	}
	case 1 log("Positive and one more than a multiple of three");
	default {
		(n < 0) ? {
			log("Negative and two more than a multiple of three")
		} : {
			log("Positive and two more than a multiple of three");
		};
	}
	case -2 log("Negative and one more than a multiple of three");
}

Aside that, one of the more useful things match does is sorts through union discriminators for you.

For example, say you have this union variable.

var x = union(int, double, pair(x:int, y:int));

In a language like Java, one would discriminate this by casting, requiring occasionally unwieldy and fallible logic to determine that the value is in fact the expected type and also that it is not null. In addition, you would need to use a common superclass, such as Object, to express it, and risk some other, unexpected type being passed.

// Java
if(x instanceof Integer) {
  int n = (Integer)x; useInt(n);
} else if(x instanceof Double) {
  double d = (Double)x; useDouble(d);
} else if(x instanceof Pair) {
  Pair p = (Pair)x; usePair(p);
} else {
  assert false; // yuck
}

In cifl, this variable will never be null, the union doesn't accidentally take on new members, and the check and the cast are combined in a single, indivisible step. The case left parameter is a sort of pattern, so it can be used to destructure structural members of the union.

match x {
	case int(n) useInt(n);
	case double(d) useDouble(d);
	case pair(x, y) useXAndY(x, y);
	// x and y extracted; x masks the match x
}

In the case of a union, the member type is the key; i.e., if multiple cases match the same type, their guards are evaluated in order, but the types themselves are matched unordered (as if they were distinct int values, which under the hood is probably the case).

default is a synonym for case _ (i.e. anything matches, but only if no other case matched first). If a guard is present, default is implied if there is no explicit case.

To ignore one or more parameters of an extractor, substitute bare _ for the parameter. To ignore all parameters, leave off the parameter list entirely. For example, the following are equivalent:

case pair(_, _) log("got pair");
case pair log("got pair");

A short syntax for matching specific values is available; substitute a value for a name. The following are equivalent, except that in the former no variable names are declared:

case pair(4, 5) log("Got four and five")
case pair(_a, _b) if _a == 4 && _b == 5 log("Got four and five")

Identifiers as pattern arguments are used as aliases for the corresponding structure rather than for their data values. For example, this doesn't do what one might think is intuitive:

var a = 4, b = 5;
match ... {
  case pair(a, b) log("Pair matched");
}

This will match any pair, not just (4, 5), and a and b locally stand for the coordinates. To match as expected, qualify the parameters as values using unary +.

  case pair(+a, +b) ...

(Unary + is specified to resolve to the value of its parameter even if that parameter is not numeric.)

Blanks, names, and values can be mixed as well:

case pair(_, 2) ... // Left is anything, right must be 2
case pair(a, +b) ... // Left is anything mapped as a, right must be +b

To match multiple patterns in one case, they can be separated by commas. When doing this it's not possible to extract the values, but you can still specify a value to match (or _ for any value).

case double, pair(5, 6), pair(10, _) { ... }

This also works when matching integers.

case 2, 3, 5 { ... }

When matching a union, it is possible to catch all untested cases using

// Match and discard
case _ { ... }
// Match and use
case rest <- _ { ... }

If an extractor is specified, the value extracted will be of a union type of all members of the union that may not have been matched.

var m : union(alpha(), bravo(), charlie(), delta());
...
match m {
  case alpha log("got alpha");
  case bravo log("got bravo");
  case rest <- _ {
    log("got charlie or delta");
    match rest {
      case charlie log("it was charlie");
      case delta log("it was delta");
    }
  }
}

Clumping the other members into groups can be done with the <- operator. The type of the variable at the left is the union of all types matchable by the case, or, if there is only one matchable type, that type.

match m {
  case ab <- alpha, bravo {
    log("got alpha or bravo");
    match ab { ... }
  }
  case cd <- charlie, delta { // or case cd <- _
    log("got charlie or delta");
    match cd { ... }
  }
}

A single-case match may omit the block around the case, as long as case or default is explicit.

var (a, b) = (0, 0);
match x case pair(p, q) (a, b) = (p, q);

A match is exhaustive if all possibilities unconditionally (as computable at compile time) have a corresponding case.

  • A match is exhaustive if it has a case _ or default case without a guard condition.
  • When matching on a union, a match is also exhaustive if every member type is matched unconditionally.

For the purposes of this definition, the translator should treat a guard that is a constant "always" at compile time, such as "if true", as if it had been omitted.

When the match is exhaustive, the return type of the statement expression is the union of the types of all of its (reachable) body statements; otherwise, the type is the same union with cifl.lang.void added as a member, and this is the member that is returned if the match failed in all cases.

Closure implementation

If we can possibly avoid it, manual allocation won't be common in Cifl. Instead, high-level application state will be managed using closures, and closed local variables will be promoted to heap allocation as necessary to support them.

C structures

A potential handle for a closure would read something like:

struct {
	pcifl_Localstate *const localstate;
	RETURN_TYPE (*const run)(pcifl_Localstate*, OTHER_PARAMETERS);
}

This structure could be passed around by or as if by value.

pcifl_Localstate would be a structure that contains some generic information common to all localstates, such as reference counting and destruction information. The structure would be opaque in general, but may as an example look something like:

typedef struct _pcifl_Localstate pcifl_Localstate;
struct _pcifl_Localstate {
	size_t reference_count;
	mutex_t object_mutex;
	void change_reference_count(pcifl_Localstate*, bool);
}

What the rest of the structure looks like depends on the closed variables it represents. For example, if the closed variables are two ints called a and b, the structure might be:

struct {
	pcifl_Localstate localstate_header;
	struct {
		int a;
		int b;
	} vars;
}

The pointer to this structure would be castable to pcifl_Localstate*, making generic management operations possible.

Example: Simple closures

A cifl function that might result in that latter localstate structure:

typedef myclosures : struct(sum : def():int, dif : def():int)
def myexample (a : int, b : int) : myclosures {
	return (def { a + b }, def { a - b })
}

In longer hand, this might be

typedef void_to_int : def():int
typedef myclosures : struct(
	sum : void_to_int,
	dif : void_to_int
)
def myexample (a : int, b : int) : myclosures {
	val myexample_adder : void_to_int = def {
		return a + b
	}
	val myexample_subber : void_to_int = def {
		return a - b
	}
	return (myexample_adder, myexample_subber)
}

And then, converted to C:

typedef struct {
	pcifl_Localstate *const localstate;
	int (*const run)(pcifl_Localstate*);
} void_to_int ;

typedef struct {
	void_to_int sum;
	void_to_int dif;
} myclosures ;

typedef struct {
	pcifl_Localstate localstate_header;
	struct {
		int a;
		int b;
	} vars;
} myexample_localstate;

// This isn't likely to happen in reality, but for ease of readability here:
#define LOCALSTATE ((myexample_localstate*)(plst))
#define A (LOCALSTATE->vars.a)
#define B (LOCALSTATE->vars.b)

int myexample_adder(pcifl_Localstate* plst) {
	return A + B;
}
int myexample_subber(pcifl_Localstate* plst) {
	return A - B;
}

myclosures myexample(int a, int b) {
	pcifl_Localstate* plst =
		pcifl_allocLocalstate(sizeof(myexample_localstate));
	
	// Copy the parameters to the localstate. Non-parameter variables are
	// simply replaced from the outset instead, so no copying is necessary.
	A = a;
	B = b;
	
	{
		// Each copy we make of the localstate gets an addref. We skip the
		// first, though, because the refcount is already 1.
		void_to_int _adder = { plst, myexample_adder };
		// pcifl_addrefLocalstate(plst);
		void_to_int _subber = { plst, myexample_subber };
		pcifl_addrefLocalstate(plst);
		
		return (myclosures){ _adder, _subber };
	}
}

Heap vs. stack

The contract of the closure object is easily kept by always allocating localstates on the heap, and simple implementations could do this for simplicity at a minor (perhaps even negligible) potential performance cost. However, as long as the translator can verify that the localstate is not used in certain ways, it may be possible to use a stack-allocated localstate instead.

The specific actions that must be prevented are any that could possibly result in the lifetime of the localstate extending beyond the location where it is declared. This includes copying it to a global or outer variable, returning it from a surrounding function, passing it as a parameter to a function (that isn't somehow exempted), or doing any of the same things with an aggregate value that might contain it. Even if this information isn't used for memory management, it may also be useful for verifying the correctness of the program.

Three lifetime restriction levels are thus defined:

  • Normal: The lifetime of the object is effectively infinite-lived; it is either global or dynamically allocated.
    • If the callee's return type is not annotated @returnParam, then the return value must be normal.
    • The return value of a @returnParam call is restricted as tightly as the tightest parameter passed. If all parameters are normal, the return value is treated as normal also.
  • @leave: The lifetime of the object is presumed limited to the scope of the caller to this function, and no additional reference to this object may be retained past the end of this function.
    • This object may never be copied to an outer scope except by returning.
    • This object may be copied to another variable in this scope or a contained scope, but only if said variable is also @leave (or tighter).
    • This object may be copied to a member of an aggregate, but only if said member or said aggregate is also @leave (or tighter).
    • If the callee's return type is annotated @returnParam, the return value may be some @leave variable in the current scope only (not a contained one). (Note that only a @leave variable, even as an aggregate, can contain a @leave object.)
    • A @leave value may be passed as a parameter to a function, but only if that parameter is @leave.
      • If any parameter passed to a @returnParam function is @leave, then the return value is automatically a @leave value (unless another parameter passed is @scoped, which overrides this).
  • @scoped: The lifetime of the object is presumed limited to the scope of this variable.
    • This object may never be copied to an outer scope, even by returning.
    • This object may be copied to another variable in this scope or a contained scope, but only if said variable is also @scoped.
    • This object may be copied to a member of an aggregate, but only if said member or said aggregate is also @scoped (or tighter).
    • A @scoped variable cannot be returned under any circumstances.
    • A @scoped value may be passed as a parameter to a function, but only if that parameter is @leave.
      • If any parameter passed to a @returnParam function is @scoped, then the return value is automatically a @scoped value.

Note that these limitations apply only to objects (specifically, to their localstates) and to the specific aggregate members that contain them.

typedef mytype : struct(n : int, f : def(int):int)
...
var a : @scoped mytype = ...
var b : mytype = ...

a = b // allowed, tightens access
b = a // not allowed, loosens
b.f = a.f // not allowed, loosens
b.n = a.n // allowed, value members are not treated as scoped

The following function determines which transform on an int produces the lower value, returning the applicable @leave-qualified closure. If the transforms are equal, a new closure is returned instead. This function isn't necessarily useful, but illustrates the possibility of returning either a new or an existing object, which wouldn't otherwise be possible unless the parameters were non-@leave.

def decide(a : @leave def(int):int, b : @leave def(int):int, n : int) :
	@returnParam def(int):int
{
	val xa = a(n)
	val xb = b(n)
	(xa < xb) ? a :
	(xb < xa) ? b : def(q : int) : int { return n * q }
}