Cifl

From HalfgeekKB
Jump to navigation Jump to search

Cifl (or cifl, but never CIFL unless required by naming convention) is a proposed programming language designed for a source-to-source compiler to C, intended to implement modern programming language features on top of the general platform provided by C.

Name

Like in so many other projects, "cifl" is an acronym-inspired name that does not in fact stand for anything.

  • Jocular expansions of the name are encouraged. Among infinite options are "Cifl Is For Lovers", "Crescent-Ish Fresh Language", "C-based Improved-Fun Language", "C Isn't For Lightweights".
  • It is always grammatically correct to refer to cifl as "the cifl language", because officially the "l" doesn't stand for anything, including "language".
    • However, just "cifl" is preferred for brevity.
  • "Cifl" is pronounced like "sniffle" but with the "n" removed.
  • The name is admittedly inspired by The Sifl and Olly Show but the language does not share anything of real relevance to the show or character.

The project originated as "Cthis" (pronounced "see this"). A new name was chosen since, while the language provides several C++-like extensions, the this keyword is not borrowed.

Straight C

See also Cifl/Protocifl.

The basic conventions are for writing code that is operationally identical to C but somewhat different in form. Several of these changes are expressly for the purpose of relieving the parser of determining whether an identifier-like word is an identifier or a type.

Using newly reserved identifiers

  • An identifier in backticks is treated as the same identifier without. If a word is a reserved word in Cifl but not in C itself, this allows it to be used. For example, val is a Cifl keyword, but `val` is an identifier that converts to val in C.
    • In straight C mode, words that are reserved in C are not allowed even in backticks (e.g. `if`). An extended mode will allow arbitrary strings as keywords.

Array subscripts and type parameters

There are areas of Cifl where type parameters are possible. Type parameters are given between square brackets [].

To accommodate this change, array indices are given in parens (), as if they were function parameters. Since functions and arrays share a namespace, this doesn't introduce any new conflicts; indexing into an array just looks more like a function call now.

Built-in operators

  • A pointer type is specified using the notation ref[TYPE]; for example, char** becomes ref[ref[char]].
    • So far, dereferencing a pointer still uses unary *.
    • Constant pointers: The declaration val p : ref[TYPE] is for a pointer that cannot be mutated (C: TYPE* const p). For a pointer to a value that is itself immutable, use the @val annotation on the ref itself: var p : ref[@val TYPE] (C: const TYPE* p).
      • Another example: For C, const char**. For us, ref[ref[@val char]].
  • A cast uses function-like notation; for example, (char)12 becomes char(12).
    • Since functions and types share a lexical namespace, this does not introduce any new conflicts.
  • sizeof requires differentiation between types and expressions; the size of the first element of a C array int a[10]; is sizeof(a(0)), but the size of the type is sizeof[int].

Flow control

  • unless and until are negated aliases for if and while, respectively.
  • The conditionals for if/unless and while/until may omit the parentheses if the meaning is unambiguous.
  • switch works mostly as before, with some extensions:
    • Multi-case: case a, b, c is the same as case a: case b: case c.
    • Implicit break: case x { statements; } is the same as case x: { statements; } break;.
      • Note that case a: case b: case c { statements; } is the same as case a, b, c { statements; } and case a: case b: case c: { statements; } break;; i.e., these notations can be combined.
    • continue explicitly drops to the next case. Use continue LABEL to step the current loop.
    • It is an error not to explicitly exit from a switch case (e.g. by break, continue, return). Automatic fall-through is only provided for cases containing no code.
  • Loop labels are added for break and continue, but in straight C mode they are only valid for the structures they would have affected without the labels. This is provided so that it is still possible to continue a loop from a switch inside that loop.
  • Some construct should be equivalent to do { STATEMENT; break; } while(true); i.e., something that runs the block once but starts over on continue rather than breaking. A do without a while clause would work, but would be more complicated to parse. A different clause, something like once would also work but would tie up another keyword.

Predefined constants

  • null[TYPE] produces a null reference that has the specified type for the purposes of type inference. null (without the type parameter) is more like C NULL and will assign properly to a variable with an explicit or already inferred type.
  • true and false are pre-set to their <stdbool.h> meanings.

Types

Builtin

  • The type modifiers signed and unsigned are not used, and the words short and long are not used as modifiers (but are used as types).

The following mappings are used, even if the associated headers aren't included:

	Cifl		C
Native types
	char		char
	scchar		signed char (1)
	ucchar		unsigned char (1)
	short		short int
	int		int
	long		long int
	llong		long long int
	ushort		unsigned short
	uint		unsigned int
	ulong		unsigned long
	ullong		unsigned long long
	float		float
	double		double
	ldouble		long double
From stdbool.h
	bool		<same>
From stdint.h
	int#_t		<same>
	uint#_t		<same>
	int_least#_t	<same>
	uint_least#_t	<same>
	int_fast#_t	<same>
	uint_fast#_t	<same>
	intmax_t	<same>
	uintmax_t	<same>
	intptr_t	<same>
	uintptr_t	<same>
Other normalized types
	ubyte		uint8_t
	sbyte		int8_t
	byte		uint8_t
  1. Sign aliases to char are named [su]cchar for "(un)signed C char". The name "uchar" will be reserved for Unicode text. When specifying binary data, use the byte type instead, if possible.

Structural types

Unions and structures have a somewhat different syntax than C, looking a little more like function definitions.

struct

An example from Wikipedia, translated:

struct account(account_number: int, first_name: ref[char],
	last_name: ref[char], balance: double);

// Or, equivalently,
struct account (
	account_number: int,
	first_name: ref[char],
	last_name: ref[char],
	balance: double
);

Some examples of its use are below:

// Initializers use a function-call-style notation rather than array-style.
var a = account(12345, "John", "Smith", 12.04);
// Or, if the context is explicitly typed, tupled expressions can be used.
var a : account = (12345, "John", "Smith", 12.04);

// Values can be out of order using named arguments
// (syntax pending)
var a = account(12345, balance: 12.04, "John", "Smith");

// Members are accessed in the same way as in C.
printf("%s %s\n", a.first_name, a.last_name);
a.account_number = 67890;

// However, we also have an lvalue syntax for decomposing the value into its
// members.
var account(acct, fnam, lnam, bal) = a;
// Equivalent to:
var acct = a.account_number;
var fnam = a.first_name;
var lnam = a.last_name;
var bal = a.balance;

// Anonymous structs can still be used as types.
def distance(p1: struct(x:double,y:double),
	p2: struct(x:double,y:double)) : double
{
	var dx = p2.x - p1.x, dy = p2.y - p1.y;
	return sqrt(dx*dx + dy*dy);
}

// Since the return type is explicit, the tupled expression syntax works here
// as well.
def midpoint(p1: struct(x:double,y:double),
	p2: struct(x:double,y:double)) : struct(x:double,y:double)
{
	return ((p1.x+p2.x)/2.0, (p1.y+p2.y)/2.0);
}

// The above function could replace the named parameters with decomposition
// parameters like so:
def midpoint_alt(struct(x1:double,y1:double), struct(x2:double,y2:double))
	: struct(x:double, y:double)
{
	return ((x1+x2)/2.0, (y1+y2)/2.0);
}

As with functions (hopefully), it's possible to skip naming parameters by using the keyword _. These values can still be assigned positionally, but cannot be accessed individually (without being reassigned to an equivalent structure with the same parameter named).

Use the @bits(count) annotation on a parameter to use it as a bit field.

Unlike in C itself, structs are allowed to have default values for some or all fields. A struct can be initialized with a default for a field by using the reserved key-identifier _ in the place of the argument, or by omitting the argument from the end.

// Default coordinates to 0, 0, 0
struct d3 (
	x: double = 0,
	y: double = 0,
	z: double = 0
);

var a = d3(9, 8, 7);	// All fields specified
var b = d3(3, _, 5);	// -> (3, 0, 5)
var c = d3(_, _, 9);	// -> (0, 0, 9)
var d = d3(_, 4);	// -> (_, 4, _) -> (0, 4, 0)
var e = d3();		// -> (_, _, _) -> (0, 0, 0)

If a struct has any defaulted fields, the default value settings are implemented using a default-setting function.

enum {
	d3_field_x = 1,
	d3_field_y,
	d3_field_z
};
typedef struct {
	double x;
	double y;
	double z;
} d3;

static void d3_set_default(d3 * s, size_t field) {
	switch(field) {
		case d3_field_x: s->x = 0; break;
		case d3_field_y: s->y = 0; break;
		case d3_field_z: s->z = 0; break;
	}
}

d3 a = { 9, 8, 7 };	// All fields specified
d3 b = { .x = 3, .z = 5 };
d3_set_default(&b, d3_field_y);
d3 c = { .z = 9 };
d3_set_default(&c, d3_field_x);
d3_set_default(&c, d3_field_y);
d3 d = { .y = 4 };
d3_set_default(&d, d3_field_x);
d3_set_default(&d, d3_field_z);
d3 e;
d3_set_default(&e, d3_field_x);
d3_set_default(&e, d3_field_y);
d3_set_default(&e, d3_field_z);

Default initialization is guaranteed to happen in the order in which the field appears in the struct. Therefore, a default expression may refer to any previous defaulted fields and any non-defaulted fields (before or after), but not to later defaulted fields.

struct Fooz (
	p : int;
	q : int;
	// r can refer to p and q, but not s.
	r : int = p * q;
	// s can refer to p, q, and r.
	s : int = q * r;
);

var a = Fooz(1,2,3,4);
var b = Fooz(5,6,7);	// -> (5,6,7,_) -> (5,6,7,6*7)
var c = Fooz(8,9,_,10);	// -> (8,9,8*9,10)
var d = Fooz(11,12);	// -> (11,12,_,_) -> (11,12,11*12,12*11*12)
enum {
	Fooz_field_r = 1,
	Fooz_field_s
};
typedef struct {
	int p;
	int q;
	int r;
	int s;
} Fooz;

static void Fooz_set_default(Fooz * s, size_t field) {
	switch(field) {
		case Fooz_field_r: s->r = s->p * s->q; break;
		case Fooz_field_s: s->s = s->q * s->r; break;
	}
}

Fooz a = { 1, 2, 3, 4 };
Fooz b = { 5, 6, 7 };
Fooz_set_default(&b, Fooz_field_s);
Fooz c = { .p = 8, .q = 9, .s = 10 };
Fooz_set_default(&c, Fooz_field_r);
Fooz d = { 11, 12 };
Fooz_set_default(&d, Fooz_field_r);
Fooz_set_default(&d, Fooz_field_s);

Like with a function, a ? can be passed for one or more parameters to partially apply it. This actually turns the struct value pseudo-function into a very real closure.

var a = Fooz(1, 2, ?, 4);
// Equivalent to
var a : def(int):Fooz = def(r){ return Fooz(1, 2, r, 4); };

Omitted parameters, however, are always interpreted as being default, so all parameters for partial application must be explicitly ?. For example, the following lines have very different meanings:

var a = Fooz(1, 2, ?, ?); // Type is def(int,int):Fooz
var a = Fooz(1, 2, ?); // Type is def(int):Fooz, s has default value
var a = Fooz(1, 2, _, ?); // Type is def(int):Fooz, r has default value
union

There are two major types of union in Cifl:

  • Automatically tagged unions, which we'll call unions or tagged unions
  • Raw C unions, which we'll call raw unions

A raw union is exactly equivalent to its C counterpart. Its type is declared similarly to a struct, but the specialized keyword union.raw is used.

union.raw someVariantType(intVal: int, doubleVal: double, strVal: ref[char]);
// C: typedef union { int intVal; double doubleVal; char* strVal; } someVariantType;

A value of a union type can be expressed by using the member name as a pseudo-method with its value as a parameter.

var uv = someVariantType.doubleVal(12.34);
// C: someVariantType uv; uv.doubleVal = 12.34;

And of course an existing variable can be reassigned.

uv.doubleVal = 56.78;
// or
uv = someVariantType.doubleVal(56.78);

Use of raw unions is discouraged for any purpose other than interoperating with C.

A tagged union can use named types as parameters.

Primitive types can be used, but explicit casts will be necessary since some values have multiple interpretations. Note, for example, that size_t and unsigned long int are never interchangeable, even on platforms that define them identically; the members are distinguished by their types' qualified names.

typedef str : ref[cchar];

union SomeVariantType(int, double, str);

// The union type cannot be inferred here
var a : SomeVariantType = int(37); // or var a = SomeVariantType(int(37));
var b : SomeVariantType = str("Hello"); // or var b = SomeVariantType(str("Hello"));

Usually struct types descriptive of their purpose should be substituted.

struct IntVal(value : int);
struct DoubleVal(value : double);
struct StrVal(value : ref[char]);

union SomeVariantType(IntVal, DoubleVal, StrVal);

// The union type cannot be inferred here
var a : SomeVariantType = IntVal(37); // or var a = SomeVariantType(IntVal(37));
var b : SomeVariantType = StrVal("Hello"); // or var b = SomeVariantType(StrVal("Hello"));

Alternatively, the syntax will automatically generate structure types if a parameter list is given.

union SomeVariantType(
	intVal(int),
	doubleVal(double),
	strVal(ref[char])
);

// The types generated are SomeVariantType.intVal, SomeVariantType.doubleVal,
// SomeVariantType.strVal

// Inferred type of these is the variant type
var a = SomeVariantType.intVal(37);
var b = SomeVariantType.strVal("Hello");

One advantage of the non-inlined syntax is treatment of a value as an actual union type. Parameters of non-inlined types can be transferred automatically between union variables, so a super-union of a union can actually be treated as a supertype to a certain degree.

union SomeVariantType(IntVal, DoubleVal, StrVal);
union SomeNarrowerVariantType(StrVal, IntVal);
...
var a : SomeNarrowerVariantType = (...);
var b : SomeVariantType = a; // Always works

A match statement similar to the switch statement converts a union value back to its component in a type-safe manner. Members are referenced by their type, which is just the name of the member for an auto-generated member. The scope of type names favors the auto members first, then reverts to the normal type resolution for the surrounding context.

Parameter names can be provided to decompose the value if desired, and the key identifier _ (not `_`) can stand in for parameters to be ignored.

A guard clause starting with if or unless can be used to supplement the discriminator.

match uv {
	case intVal(i) { /* i is the int value */ }
	case doubleVal(d) if d > 0.0 { /* d is the positive double value */ }
	case doubleVal(d) { /* d is the nonpositive double value */ }
	case strVal(s) { /* s is the char pointer */ }
}

Once again, as broadly as possible, unions are treated as sets. Matching can be used to safely perform narrowing conversions:

match somev {
	// Note: 
	case SomeNarrowerVariantType(narrowv) {
		/* narrowv contains the StrVal or the IntVal */
	}
	case DoubleVal(d) { /* d is the double value */ }
}

Some mechanism probably based on this will be used to handle and forward voluntary exceptions.

package Cifl.lang {
	struct ErrNumber(errno : int);
}

def mallocEx[T](size : size_t) : union(returned(ref[T]), thrown(ErrNumber)) {
	val buf = malloc(size);
	if(buf) return ref[T](buf); // Short for return ?.returned(ref[T](buf))
	throw ErrNumber(ENOMEM); // Short for return ?.thrown(ErrNumber(ENOMEM))
}

Declarations

  • A variable declaration begins with the keyword var and the name is followed by a type notation, an assignment, or both.
    • The type notation is only optional if the type of the assigned expression can be inferred.
    • The C keywords register, volatile, and static may be specified as @-annotations between var and the identifier.
    • The C keywords auto and extern can only be specified by default.
  • A constant declaration is similar except that the keyword val is used and the assignment is not optional.
var n : type = value; // type n = value;
var n = value; // INFERREDTYPE n = value;
var n : type; // type n;
var @register n : type = value; // register type n = value;
val n : type = value; // const type n = value;
val n = value; // const INFERREDTYPE n = value;

var x, y; // means: var x; var y;
var x, y = value; // means: var x = value; var y = value;
var x, y : type; // means var x : type; var y : type;
var x, y : type = value; // means var x : type = value; var y : type = value;

var(x, y, z) = (1, 2, 3); // means var x = 1; var y = 2; var z = 3;
// The following requires the specified function to be defined with a
// non-opaque type defined for ldiv_t, and for ldiv_t's return order to be
// quotient-first.
var(q, r) = Cifl.C.stdlib.ldiv(num, denom);
// means (one possibility):
//   var q, r : long;
//   { val __result : struct(_1 : long, _2 : long);
//     __result = Cifl.C.stdlib.ldiv(num, denom);
//     q = __result._1; r = __result._2;
//   }
  • A function definition consists of the keyword def, optional annotations, the function name, a parameter list, a following return type notation, and then the function body.
    • Possible annotations include @static and @inline.
def @static divideDoubleByInt(a: double, b: int): double {
  return a / double(b);
}
  • typedef is reordered as typedef alias1, alias2, ... : type;.
    • Since ref is part of the type, there is no direct equivalent to C's typedef TYPE DIRECT, *REF. Do typedef DIRECT : TYPE; typedef REF : ref[TYPE]; instead.
  • A structure or union definition is made up of val and/or var declarations similar to the above, but type notations are required and assignments are omitted.
    • To specify bit fields, use the @bits annotation, such as var @bits(4) n : int;.
  • An enum is essentially identical to that in C, but with no tag.
  • Like C, a structure, union, or enum can be specified inline anywhere that a type is required.
  • A function pointer type is specified with the keyword def, similar to specifying a function but omitting all of the identifiers.
    • An actual C function pointer should be specified with the annotation @native For example, a pointer to mktime() from time.h, whose signature is time_t mktime ( struct tm * timeptr ), would have the type def @native (ref[struct tm]):time_t.
      • Note that a function type is not specified as a ref. Function pointers are documented as different than ordinary pointers, and for one thing it's never guaranteed that a function pointer can be cast to e.g. a void* and back without some damage. Even though there's an address in there somewhere, it makes more sense to treat a function pointer as a value.
    • A function type with the annotation @context implies, by convention, not a function directly by a structure containing a function pointer and a void pointer, and an argument for a void pointer is prepended to the function pointer's argument list as a place to pass context. For example, a def @context (double,int) : ref[char] is roughly the C structure struct { char* (*run)(void*,double,int); void* context; }, and calling a value fn of this type as fn(12.5, 29) results in the equivalent C fn.run(fn.context, 12.5, 29). This provides a facility for supporting closures.
    • Unless it is perfectly obvious that @native is implied, if neither annotation is present then @context is assumed.
    • If a native function is passed to a context parameter, the translator shall bridge the gap by auto-writing a wrapper. Note that a native function can be auto-converted to a contexted function, but not vice versa.
typedef struct {
	char* (*run)(void*, double, int);
	void* context;
} contexted_function_type;

typedef struct {
	char* (*nativefunction)(double, int);
} native_function_wrap_type;

static const char * output = "hello, mundo";

char* example_native_function(double d, int i) {
	printf("%f * %d = %f\n", d, i, d * i);
	return output;
}

static char* wrapper_call_wrapped_function(void* vpwrapped, double d, int i) {
	native_function_wrap_type * pwrapped = (native_function_wrap_type*)vpwrapped;
	return pwrapped->nativefunction(d, i);
}

void example_call_contexted_function(contexted_function_type fn) {
	fn.run(fn.context, 12.5, 29);
}

void example_pass_contexted_function() {
	native_function_wrap_type wrapped = { example_native_function };
	contexted_function_type contexted = { wrapper_call_wrapped_function, &wrapped };
	example_call_contexted_function(contexted);
}

Mild extensions

Text vs. binary string literals

Strings that contain only literals for 0x00-0x7F are simultaneously fine for use as either binary information or UTF-8 text.

If a string contains an \xHH escape for a value from 0x80-0xFF, it is marked as containing binary data.

If a string contains literal text with codepoints above U+7F, or if it contains any 4, 6, or 8-digit hex escapes (\xuHHHH, \XHHHHHH, \UHHHHHHHH), it is marked as containing text data and will be translated to UTF-8 octets when converted to C.

If a string meets both of the above conditions, a warning will be produced and any \xHH escapes will be treated as the equivalent \u00HH escape, resulting in a text string.

High-level distinctions may be made between byte and char types, but this is not currently manifest.

Sugar for specific C conventions

Labeled loops

Unless a function is specifically marked for extended compatibility (e.g. with an annotation like @cRawGoto), the traditional goto will be disabled, and (non-case) statement labels will be meaningless, if not illegal, except for labeling a loop. break and continue can address the loop by these labels.

OUTER: for(x = 0; x < X_LENGTH; ++x) {
	some_code();
	INNER: for(y = 0; y < Y_LENGTH; ++y) {
		if(some_condition_for_next_x())
			continue OUTER;
		else if(some_condition_for_all_done())
			break OUTER;
		
		process(x,y);
		// implicit continue INNER
	}
	some_more_code();
}

Additionally, break and continue are redefined as aliases for the new phrases goto.end and goto.next, respectively. A third jump similar to continue but directed to after the conditional is defined as goto.start (the meaning is modeled on perl's redo).

Provided that there is no try-style code intermixed, a loop like the one above is translated into C such as the following:

for(x = 0; x < X_LENGTH; ++x) {
	OUTER__start:
	some_code();
	for(y = 0; y < Y_LENGTH; ++y) {
		INNER__start;
		if(some_condition_for_next_x())
			goto OUTER__next;
		else if(some_condition_for_all_done())
			goto OUTER__end;
		
		// break = goto.end = break INNER = goto.end INNER
		//	-> break
		// continue = goto.next = continue INNER = goto.next INNER
		//	-> continue
		// goto.start = goto.start INNER
		//	-> goto INNER__start
		// break OUTER = goto.end OUTER
		//	-> goto OUTER__end
		// continue OUTER = goto.next OUTER
		//	-> goto OUTER__next
		// goto.start OUTER
		//	-> goto OUTER__start
		
		process(x,y);
		// implicit continue INNER
		INNER__next:
	}
	INNER__end:
	some_more_code();
	OUTER__next:
}
OUTER__end:

Note that start, end, and next are to be lexed as identifiers, not keywords. Therefore, 'goto' ('.' identifier)? identifier is grammatically correct, but only these identifiers are allowed semantically.

Exit cases

The last non-null statements within any block may be special case EXIT statements. These statements combine some aspects of Perl's continue block and the catch and finally blocks of other languages. These define a framework of actions to be taken before the block is exited, depending on the reason.

  • The case break statement is run when the block is exited by breaking, but note that return, throw, and break or continue to an outer loop each carries an implicit break. It is comparable to a finally block in other languages.
  • The case continue statement is run if this block is the statement part of a loop and the condition is about to be rechecked; i.e., it is the target of a continue in the current loop, including the implicit variety. It is comparable to the third (increment) expression in the definition of a for loop.
  • The case return [NAME] (alias: case NAME) statement is run if this block is being exited by an interior return. The NAME parameter is optional (and is disallowed inside a void function); if provided, the name is an alias for the pending return value within the statement. If return is omitted, the name is not optional.
  • The case throw [NAME] (alias: catch [NAME]) statement is run if this block is being exited by an interior throw. The NAME parameter is optional; if provided, the name is an alias for any exception information provided.

Each of these cases may have a guard expression, an optional if or unless clause to only allow the code to run under some circumstances. catch and possibly case return should also support some or all of the matching syntax used with tagged unions for type discriminators.

Unlike in other languages, these statements are not subsequent to the block to which they apply; instead, they are the last non-null statements inside the block. Some comparisons:

// C
for(INIT; COND; INCR) {
	STMT;
}
// Cifl, as a while
INIT; while(COND) {
	STMT;
	case continue { INCR; }
}
// JavaScript 1.5 (Netscape extensions)
try {
	AAAA;
	BBBB;
	CCCC;
}
catch(e if e instanceof ExType1) { CATCH1(e); }
catch(e if somefn(e) == 4) { CATCH2(e); }
catch(e) { CATCH3(e); }
finally { FFFF; }
{
	AAAA;
	BBBB;
	CCCC;
	// Union matching syntax
	catch extype1(e) { CATCH1(e); }
	// Using guard expression
	catch e if somefn(e) == 4 { CATCH2(e); }
	// Catch-all
	catch e { CATCH3(e); }
	case break { FFFF; }
}

Implementation: Deferred jumps

To support automatic destruction and the exit cases specified above, blocks will need a mechanism to execute code before being exited.

C itself doesn't have any equivalent to try-finally, so any break, continue, goto, or return (and non-local jumps, but those are discouraged here anyway) is non-interceptible by normal means.

Since we're defining a higher-level language, we can simply replace all of those local jumps with versions that invoke cleanup code first.

def foo() : int {
	OUTER: while outer_cond() {
		{
			of_init();
			INNER: while inner_cond() {
				switch inner_choose() {
					case 1 { break; } // switch
					case 2 { break INNER; }
					case 3 { continue; } // INNER
					case 4 { goto.start; } // INNER
					case 5 { break OUTER; }
					case 6 { continue OUTER; }
					case 7 { goto.start OUTER; }
					case 8 { return inner_get_value(); }
				}
			}
			case break {
				of_cleanup();
			}
		}
	}
	return 0;
}

In this example, some of the control flow commands should trigger the cleanup code, and some shouldn't. In particular, cases 1-4 shouldn't, and cases 5-8 should.

The proposed answer to this situation is

  • To implement 1-4 as normal
  • To have each of 5-8 stash information regarding their final destinations, jump to the exit case to run cleanup, and have the exit case code redirect using the stashed information.
// Hypothetical support structure and values
typedef struct {
	int type;
	int address;
	int return_value;
} GEN_jump_data_int;

#define NONE 0
#define BREAK 1
#define CONTINUE 2
#define START 3
#define RETURN 4

#define NONE_ADDRESS -1
#define OUTER_ADDRESS 1
#define INNER_ADDRESS 2

#define JUMP_USING(v, t, a, r, l)	{ v = (GEN_jump_data_int){t, a, r}; goto l; }
#define RETURN_FROM(v, a, r, l)		JUMP_USING(v, RETURN, a, r, l)

#define JUMP_NR_USING(v, t, a, l)	JUMP_USING(v, t, a, (v).return_value, l)
#define BREAK_FROM(v, a, l)		JUMP_NR_USING(v, BREAK, a, l)
#define CONTINUE_FROM(v, a, l)		JUMP_NR_USING(v, CONTINUE, a, l)
#define START_FROM(v, a, l)		JUMP_NR_USING(v, START, a, l)

int foo()
{
	GEN_jump_data_int __jd = { NONE, NONE_ADDRESS, 0 };
	while(outer_cond()) {
		OUTER__start:
		of_init();
		// beginning of enclosing block
		while(inner_cond()) {
			INNER__start:
			switch inner_choose() {
				case 1: break; // switch
				case 2: goto INNER__end;
				case 3: continue; // INNER
				case 4: goto INNER__start; // INNER
				case 5: BREAK_FROM(__jd, OUTER_ADDRESS, CASE_BREAK0);
				case 6: CONTINUE_FROM(__jd, OUTER_ADDRESS, CASE_BREAK0);
				case 7: START_FROM(__jd, OUTER_ADDRESS, CASE_BREAK0);
				case 8: RETURN_FROM(__jd, OUTER_ADDRESS, inner_get_value(), CASE_BREAK0);
			}
			INNER__next:
		}
		INNER__end:
		// case break
		CASE_BREAK0:
		of_cleanup();
		// end case break code
		
		// handle post-exit redirects
		switch(__jd.address) {
			case NONE_ADDRESS:
				// No jump
				break;
			case OUTER_ADDRESS:
				__jd.address == NONE_ADDRESS;
				switch(__jd.type) {
					case BREAK: goto OUTER__end;
					case CONTINUE: goto OUTER__next;
					case START: goto OUTER__start;
					case RETURN: return __jd.return_value;
				}
				break;
			default:
				// Presume that the jump address is positive but below this
				// one
				// Automatically break to serve a higher jump
				goto OUTER__end;
			}
		OUTER__next:
	}
	OUTER__end:
	return 0;
}

Now, in this case, the conditionals for __jd.address aren't strictly necessary and could be optimized out. But in cases where there are nested try blocks, they turn out to be useful. The system is fairly simple:

  • Deeper nested blocks have higher addresses than shallow ones.
  • Addresses < 0 mean that no jump is requested.
  • The jump address is never higher than the current address; that's a logical error.
  • If the jump address is lower than this one, automatically break from the current location.
  • If the jump address is equal to this one, execute the jump specified in the type.

If the "try" block is embedded in another one, the jumps it executes may need to be deferred as well.

Deferred jumps and automatic tagged unions could be combined to produce a cooperative exceptions mechanism.

Exceptions

TODO.

We would want the exception mechanism to be added sugar onto something like the following. The major differences would be:

  • The union type is implicitly created, possibly by an annotated return type from malloc_ex, like @throws(int) ref[T]
  • return and throw would implicitly construct the returned and thrown members, respectively
  • The return value from malloc_ex[T] would behave as a ref[T] rather than having to be matched, but the calling context is required to have either exhaustive catch exit cases or a compatible @throws on the function itself; whichever of these applies is automatically activated if the function returns a thrown value
    • This allows an exception to cooperatively "rise" up through contexts
union malloc_return[T] (
	val returned : ref[T],
	val thrown : int
);

def malloc_ex[T](size : size_t) : malloc_return[T] {
	var r = ref[T](malloc(size));
	if r
		return malloc_return[T].returned(r);
	else
		return malloc_return[T].thrown(ENOMEM);
}

def use_malloc {
	match malloc_ex[char](123) {
		case returned(buf) {
			// use buf
			case break { free(buf); }
		}
		case thrown(+ENOMEM) {
			// warn of out-of-memory
		}
		case thrown {
			// warn unknown error
		}
	}
}

Functions

Default parameters

Default parameters have an implementation that is in sync with that of default values for struct fields.

  • A function is defined with zero or more default values.
  • A default value is specified as a pseudo-assignment on the argument itself.
  • The expression in the assignment may reference any non-defaulted parameter or any default parameter that precedes it in the argument list.

A basic implementation would be to first implement the function as it would be without any default parameters, then provide a second function that accepts a pointer to a struct containing the same parameters, which itself has the default parameters implemented.


TODO

Named parameters

Pending syntax: (argumentname: value)

TODO

Partial application

Pending syntax: _ reserved identifier (a, b, _, d)(c)

TODO

Typed varargs

Papering syntax over final arguments of (size_t, ref[?]) or (int, ref[?]) that turn the count and reference into an apparent bounded array. The sugar works on e.g. the args of main(int, char**).

TODO

Synchronous closures

Even without any sort of automatic memory management for the associated structures, it is possible to pass a closure to be run by another function, as long as that function does not retain a pointer to the closure's context afterward.

def call_a_few_times(clos : def(size_t):void, count : size_t) : void {
	var i : size_t;
	for(i = 0; i < count; ++i) {
		clos(i);
	}
}

def example_caller(s : int) : void {
	var n = s;
	printf("Initial n value: %d\n", n);
	call_a_few_times(def(ctr) {
		printf("%d * %d = %d\n", n, ctr, n * ctr);
		n *= ctr;
	}, 3);
	printf("Final n value: %d\n", n);
}

This might translate as:

// The type generated for a closure always has this layout by convention:
// - The first is a function pointer for a signature the same as that of the
// closure itself, but with an added first void* parameter for passing a
// context.
// - The second is a void* to pass as a context to the function.
// This pair of values combines to make an unmanaged closure. (A managed
// closure would have to manage memory for the context and perhaps the closure
// structure itself.
typedef struct {
	void (*run)(void*, size_t);
	void* context;
} GEN_clos_type;

// An unmanaged closure is passed by value, not by reference.
void call_a_few_times(GEN_clos_type clos, size_t count) {
	size_t i;
	for(i = 0; i < count; ++i) {
		clos(i);
	}
}

// The type generated for the closure's context is comprised of the local
// variables that are closed by the closure. The caller is rewritten so that
// the variables are allocated and accessed inside a structure; the inner
// function is rewritten to accept a pointer to that structure as its context
// and replace its closed variables with the members of the structure.
typedef struct {
	int n;
} GEN_example_caller__ctx;

// The inner function rewritten to exist at the file level.
static void GEN_example_caller__anon(void* _ctx, size_t ctr) {
	// The signature contains a void* to match the signature of the closure
	// member, but in actuality only a specific type is accepted. This isn't
	// technically typesafe, but if this function is only accessed via the
	// closure mechanism, this is moot.
	GEN_example_caller__ctx* ctx = (GEN_example_caller__ctx*)_ctx;
	
	// Note that instances of n have changed to ctx->n.
	printf("%d * %d = %d\n", ctx->n, ctr, ctx->n * ctr);
	ctx->n *= ctr;
}

void example_caller(int s) {
	// The caller replaces its own variables with members of a structure if
	// they are closed by a closure.
	GEN_example_caller__ctx ctx = { s };
	
	// Note that instances of n have changed to ctx.n.
	printf("Initial n value: %d\n", ctx.n);
	// The closure is replaced with the tuple of function and context pointers.
	call_a_few_times(
		(GEN_clos_type){ GEN_example_caller__anon, &ctx }, 3);
	// Since the context itself is passed by reference, changes made by the
	// function are reflected here.
	printf("Final n value: %d\n", ctx.n);
}

Note that this mechanism alone would not suffice for callbacks for asynchronous processes; the closure structure may be sufficient if passed by value, but the context would need to be dynamically allocated to have the necessary lifetime, and provisions would have to be made for its deallocation at some point.

Any general specification for garbage collection should remedy this.

Algebraic-style data types

C's unions provide crude polymorphism to the language. Cifl will add sugar to make it more useful.

General tagged unions

An automatic tagged union can be defined. Tentative syntax example:

typedef TuType : union {
    foo : FooType;
    bar : BarType;
    baz : BazType;
};

A possible translation of this:

// A type defined by the translator to set a uniform width for tags. Should be
// a user option.
typedef uint16_t CTHIS_tag_t;

#define TU_ENTRY(type) struct { CTHIS_tag_t tag; type value; }

#define _GEN___TuType_TAG_foo 1
#define _GEN___TuType_TAG_bar 2
#define _GEN___TuType_TAG_baz 3
#define _GEN___TuType_COUNT 3

typedef union {
	CTHIS_tag_t tag;
	TU_ENTRY(FooType) foo;
	TU_ENTRY(BarType) bar;
	TU_ENTRY(BazType) baz;
} TuType;

Assignments are made by calling an extension method on the type itself corresponding to one of the possible members.

var q = TuType.foo(someFooValue);
var r = TuType.bar(someBarValue);
var s = TuType.baz(someBazValue);
#define TU_VALUE(uniontype, member, value) { .member = { \
	_GEN___ ## uniontype ## _TAG_ ## member, \
	value } }
TuType q = TU_VALUE(TuType, foo, someFooValue);
TuType r = TU_VALUE(TuType, bar, someBarValue);
TuType s = TU_VALUE(TuType, baz, someBazValue);

Extraction of the value is made simple with a simplistic pattern matching syntax:

match tt {
	case foo(v) {
		/* v is the FooType value */
	}
	case bar(v) {
		/* v is the BarType value */
	}
	case baz(v) {
		/* v is the BazType value */
	}
}

which predictably translates to something like

switch(tt.tag) {
	case _GEN___TuType_TAG_foo: {
		/* Substitute tt.foo.value for v */
	}; break;
	case _GEN___TuType_TAG_bar: {
		/* Substitute tt.bar.value for v */
	}; break;
	case _GEN___TuType_TAG_baz: {
		/* Substitute tt.baz.value for v */
	}; break;
}

Each member can be retrieved and interacted with as a Maybe of its type, which is set or empty depending on whether that member of the union is set.

var x = q.baz; // Inferred type: Maybe[BazType]
var cond = s.baz.defined;

If the engine is aware that a tagged union is incorporating one or more others, it shall attempt to combine them instead of stacking tags. The corresponding tag numbers will differ, so the translation will use the enum symbols as the basis for tagging.

typedef Tt1 : union @tagged {
  a : AlphaType;
  b : BravoType;
};
typedef Tt2 : union @tagged {
  c : CharlieType;
  d : DeltaType;
};
typedef BigType : union @tagged {
  one : Tt1;
  two : Tt2;
  e : EchoType;
};

var tooh : Tt2;
var b : BigType;
// ... tooh is set somewhere ...
b = tooh;

#define _GEN___Tt1_TAG_a 1
#define _GEN___Tt1_TAG_b 2
#define _GEN___Tt1_COUNT 2
typedef union {
	CTHIS_tag_t tag;
	TU_ENTRY(AlphaType) a;
	TU_ENTRY(BravoType) b;
} Tt1;

#define _GEN___Tt2_TAG_c 1
#define _GEN___Tt2_TAG_d 2
#define _GEN___Tt2_COUNT 2
typedef union {
	CTHIS_tag_t tag;
	TU_ENTRY(CharlieType) c;
	TU_ENTRY(DeltaType) d;
} Tt2;

#define _GEN___BigType_TAG_one 1
#define _GEN___BigType_TAG_two ((_GEN___BigType_TAG_one) + (_GEN___Tt1_COUNT))
#define _GEN___BigType_TAG_e ((_GEN___BigType_TAG_two) + (_GEN___Tt2_COUNT))
typedef union {
	CTHIS_tag_t tag;
	Tt1 one;
	Tt2 two;
	TU_ENTRY(EchoType) e;
} BigType;

Tt2 tooh;
BigType b;
// ... tooh is set somewhere ...
// First, the data itself is copied.
b.two = tooh;
// Then, since the tag is probably wrong, it's corrected.
// (If the thing was empty, leave it empty.)
b.two.tag += (b.two.tag == 0) ? 0 : _GEN___BigType_TAG_two;
// Done, and nicely.

Either

Sys.Either[A,B] might be defined as

typedef Either : union[A,B] @tagged {
    left : A;
    right : B;
};

Maybe

Sys.Maybe[TYPE] is more of an interface type than an actual structure. If it needs to be defined as a structure, the following would work:

struct {
  bool defined;
  TYPE value;
}

Unlike the tagged union, the value is set by any assignment of type TYPE. To clear the value, the pseudo-value Sys.Nothing is assigned.

m = v;
  // m = { true, v };
m = Nothing;
  // m.defined = false;

The Maybe cannot be read directly as a TYPE. Instead, the value is coerced out using the match syntax or foreach.

match m {
  case value(v) {
     /* use v */
  }
  default {
     /* value not set */
  }
}

for v : m {
  /* use v */
  /* runs once if set, zero times otherwise */
}

An extension method flatMap is defined

def flatMap[B](fn: def(TYPE):B):Maybe[B]

that accepts a contexted function. If the Maybe is defined, the output is the result of that function run on the value as a defined Maybe. If the Maybe is not defined, the output is an undefined Maybe.

Disposables

I believe that this language extension might not be worth making without some provisions for automatic disposal of disposable resources. A prime example of this would be heap memory allocation, but the idiom would also apply to file pointers (which need to be closed) and basically anything that would have a destructor in C++.

Ur example:

static inline int fmt_notice(char* dst, size_t size,
	const char * restrict message, int line)
{
	return snprintf(dst, size, "Notice (%d): %s", line, message);
}

// Presume C99 snprintf, which uses this idiom for string length
static inline size_t len_notice(const char * restrict message, int line)
{
	return 1 + fmt_notice(NULL, 0, message, line);
}

// In production code, malloc null checks should appear at "**".
// In cifl-based code, the same task will be performed using exception-like
// mechanisms.

char* get_notice(const char * restrict message, int line)
{
	char* buf = NULL;
	
	size_t size = len_notice(message, line);
	
	buf = malloc(size);
	// **
	fmt_notice(buf, size, message, line);
	
	return buf;
}

void print_example_notice() {
	char* notice = get_notice("Something happened", 123);
	// **
	printf("%s\n", notice);
	
	// If this condition is true, congratulations. You have a memory leak.
	if(some_condition())
		return;
	
	// If we get here, the memory is freed correctly.
	free(notice);
}

As noted here, one of the most common problems with manual memory allocation is the profusion of cases where early exits accidentally prevent disposal. Some languages have exit hooks for dealing with this; Java has finally and C# further sugars it to using. Similarly, cifl has exit cases.

def print_example_notice {
	var notice = get_notice("Something happened", 123);
	// **
	printf("%s\n", notice);

	// No leak even if this is true	
	if some_condition()
		return;
	
	case break { free(notice); }
}

However, there are many kinds of resources that require disposal, and not all of them use a simple call to free(). I think that it would be handy for these resources to come with an accompanying closure that gets called when the resource goes out of scope.

Let's start by wrapping malloc() to produce a disposal closure.

struct ExampleCharArray(
	data : ref[char],
	dispose : def()
);

def get_ExampleCharArray(size : size_t) : ExampleCharArray {
	var d = ref[char]( malloc(size) );
	// **
	return ExampleCharArray(d, def() {
		// Code to prevent multiple destructions is not necessary in this case,
		// but might be in others
		if d {
			free(d);
			d = NULL;
		}
	});
}

Then, change the rest to use it.

def get_notice(val message : @restrict @read ref[char], line : int)
	: ExampleCharArray
{
	var size = len_notice(message, line);
	
	val buf = get_ExampleCharArray(size);
	fmt_notice(buf.data, size, message, line);
	
	return buf;
}

def print_example_notice {
	var notice = get_notice("Something happened", 123);
	printf("%s\n", notice.data);

	if some_condition()
		return;
	
	case break { notice.dispose(); }
}

This might still be a little too explicit. We shouldn't have to address data and dispose, and we shouldn't have to specify the break case. The last function should look more like:

def print_example_notice {
	var notice = get_notice("Something happened", 123);
	printf("%s\n", notice); // not notice.data

	if some_condition()
		return;
	// notice.dispose() on break case implicit
}
def getCharMemory(size : size_t) : @disposable ref[char] {
	val d = ref[char]( malloc(size) );
	// **
	// This @disposable attribute attaches a void closure containing the
	// procedure for disposing the value.
	return @disposable(def() {
		if d {
			free(d);
			d = NULL;
		}
	}) d;
}

def get_notice(val message : @restrict @read ref[char], line : int)
	: @disposable ref[char]
{
	var size = len_notice(message, line);
	
	// getCharMemory creates a disposable value. This disposable value is by
	// default disposed as soon as it is reassigned (if it is a var) or its
	// enclosing scope ends.
	// The value is *not* disposed, however, if the value is *passed up*
	// successfully as a @disposable, in which case the passee takes on the
	// responsibility instead.
	// To *pass up* refers to making the value available to an outer context,
	// such as by returning it or by assigning it to a wider-scoped variable.
	// The destination type must be @disposable for this transfer to happen.
	val buf = getCharMemory(size);
	
	// The disposability of a value cannot be passed into a function as a
	// parameter. Here, the data portion is automatically extracted but the
	// requirement to dispose remains with the caller.
	fmt_notice(buf, size, message, line);
	
	// As mentioned previously, if this is successful then the disposal becomes
	// the responsibility of the caller.
	return buf;
}

def print_example_notice {
	// get_notice returns a disposable value. It will not return this value as
	// @disposable (or at all), so this function is responsible for disposing
	// it.
	var notice = get_notice("Something happened", 123);
	printf("%s\n", notice);

	if some_condition()
		return;
	
	// An implicit case break runs the dispose closure.
}

Reference counting

The above example in particular, which deals with dynamically allocated memory, is also a prime candidate for some sort of garbage collection that is not (entirely) based on scope. An important usage of malloc() is to create non-static structures with lifetimes beyond their declaring scopes. Because of this, it's less straightforward to determine a good time to destroy such structures.

Let's simulate this with a different but similar structure to the disposable, in particular containing a different type of closure:

struct RefCountExample(
	data : ref[char],
	doCount : def(bool):bool
);

// A multithreaded version would interlock this function or use an atomic
// integer
def singleThreadDestroyCounter(destroy : def()) {
	var count = 1;
	return def(incNotDec: bool):bool {
		if(count > 0) {
			if incNotDec {
				++count;
			}
			else {
				--count;
				if count == 0 {
					destroy();
					return false;
				}
			}
			return true;
		}
		return false;
	}
}

def getCharMemory(size : size_t) : RefCountExample {
	var d = ref[char]( malloc(size) );
	// **
	return (d, singleThreadDestroyCounter(def() { free(d); }));
}

Now, .doCount(true) and .doCount(false) would be used to increment and decrement, respectively, a reference count. Hopefully the usage is easy to imagine. Now, here's how it might look as part of the language, with some additional functions showing its use:

def getMemory[T](size : size_t) : @counted ref[T] {
	var d = ref[T]( malloc(size) );
	// **
	return @counted(def() { free(d); }) d;
}

def getMemCopy[T](mem : @restrict @read ref[T], count : size_t)
	: @counted ref[T]
{
	val dst = getMemory[T](count);
	// **
	
	// When cast to, or passed to an argument that expects, the payload type
	// without the @counted annotation, the uncounted payload of the counter is
	// automatically unpacked and used.
	memcpy(ref[void](dst), mem, count);
	
	// Because the return type is @counted, the ref count on dst is left at 1
	// instead of decremented, so the caller takes over the reference.
	return dst;
}

def getStrCopy(str : @restrict @read ref[char]) : @counted ref[char] {
	// This doesn't touch the ref count of the string it makes and returns.
	return getMemCopy[char](str, strlen(str) + 1);
}

// ...

def getCountedVsprintf(format : @restrict @read ref[char], args : va_list)
	: @counted ref[char]
{
	// This really only works with C99
	
	var count : int;
	{
		// Use a copy of the va_list to get the count
		var args2 : va_list;
		va_copy(args2, args);
		count = vsnprintf(NULL, 0, format, args2);
		va_end(args2);
	}
	assert(count >= 0);
	val size : size_t = count + 1;
	val dst = getMemory[char](size);
	// **
	
	// Passing the counted ref to an uncounted argument unpacks it
	vsnprintf(dst, size, format, args);
	
	return dst;
}

def getCountedSprintf(format: @restrict @read ref[char], ...)
	: @counted ref[char]
{
	var args : va_list;
	va_start(args, format);
	val r = getCountedVsprintf(format, args);
	return r;
	case break { va_end(args); }
}

Much as with COM and XPCOM,

  • The count would be incremented to indicate an additional held reference or decremented to release such a reference.
  • The holder of a reference is responsible for releasing it when it is no longer necessary.
  • The caller of a function that returns a counted thing implicitly owns one reference to it.
  • A function returning a counted thing should own exactly one reference to it at the time of the return; that reference becomes owned by the caller.
  • When passing a counted thing as an in parameter, the caller does not change the count. The callee increments the count if the thing must be available after the end of the call; otherwise it does not change the count.
  • An out parameter has the same semantics as a return value.
  • For a two-way parameter, the caller's reference becomes owned by the callee, after which the parameter is the same as an out parameter that has already been set to the input by the callee.
    • In particular, if the thing is not destroyed and/or replaced by the callee, it is largely as if the caller's reference were never transferred.

And more that I haven't checked:

  • If a counted thing is copied to a wider scope, that scope takes over the originating scope's reference.
    • If it is not determinable at compile time that the value will definitely be copied up, the counting mechanism can be used instead.
  • The payload can be unpacked from the counted thing. This would either implement or substitute for weak references.
    • If we implement the counted type as @counted TYPE which auto-unpacks a value when necessary, simply copying or passing the value to a variable of type TYPE unqualified by @counted should work.
  • If a variable holding a counted thing is reassigned (or deassigned), anything that would happen to it at the end of the scope must happen at the point of the change instead.

Identifier mangling

A system for mapping arbitrary strings into the value space of standard C identifiers will be used for the purposes of:

  • Allowing the use of invalid characters in identifiers, such as `identifier containing spaces`.
  • Allowing the use of C reserved words as identifiers, such as `if`.
  • Working around previously defined identifiers or macro names, such as __FILE__.
  • Inserting structural information into identifiers for namespacing purposes, such as encoding top.middle.bottom in the same way as `top`.`middle`.`bottom` but not the same way as `top.middle.bottom`.

The possibilities are many, including

  • For any character outside [A-Za-z0-9], substitute the shortest of _XX, _uXXXX, or _xXXXXXX encoding the character's codepoint in 2, 4, or 6 lowercase hex digits, or specify __ for a namespace separator.
  • Adapt punycode to treat [A-Za-z0-9] as pass-through characters. A table of substitutable characters starts at 0, for the namespace separator, followed by all characters [\x00-z]&&[^A-Za-z0-9] in order, which is a small mapping, followed by all characters above 'z', whose place in the table is equal to their codepoints minus a fixed offset.

Example: One such schema

The code below demonstrates a mangling mechanism by encoding into C some typedefs of native C types into the theoretical Cifl.`builtin-types` namespace. The encoding is as follows:

  • Each namespace component is transformed so that it contains no characters outside [A-Za-z0-9_].
    • Letters and numbers are passed literally.
    • _ is escaped as __.
    • The space is escaped as _s.
    • Other printable characters in ASCII are encoded as _X, where X is some ASCII letter, according to a table in the code.
    • Non-printable characters (other than U+20) and characters outside of ASCII are encoded as _NXXXX where N is some digit 1-6 and XXXX is a sequence of N lower-case hex digits.
      • XXXX must be the narrowest possible representation of the codepoint that is at least one digit (for example, U+8A is encoded _28a but never _4008a or _600008a).
    • If the namespace component leads with a digit 0-9, it is prefixed with _0.
      • _0 decodes to the empty string; it is only valid in this context.
  • Each transformed namespace component except the last is prefixed with its encoded length as a decimal number in its shortest representation with at least one digit. The component "0" encodes the empty string and is the only component that may begin with "0".
    • e.g. "Cifl" -> "4Cifl"
    • e.g. "" -> "0"
    • e.g. "abcde-fghij" -> "abcde_mfghij" -> "12abcde_mfghij"
  • The components are concatenated together and prefixed with a legal C identifier; in this case, _c1M9_.
    • This prefix is required to turn the digit-led string into a C identifier.
    • This prefix also exempts the identifier from potentially conflicting with reserved words and existing identifiers if the final count prefix is elided (see below); e.g. encoding "if" as "_c1M9_if" makes it a valid identifier.

This is not the most compact encoding possible, and its lexical space is significantly larger than its value space, but the encoding is reversible and simple to implement. Reversing the process:

For the whole name:

  • Remove the identifier prefix (_c1M9_).
  • Repeat:
    • If taking lead 0, accept a component "" and redo.
    • If taking lead [1-9][0-9]*, parse as a decimal integer N. Take the following N characters and parse as a component and redo.
    • Else, take remainder of input, parse as a component, end.

For each component parsed above:

  • If lead is _0:
    • If lead is _0[0-9], skip _0.
    • Otherwise, reject.
  • Repeat until no characters remain:
    • If taking lead [A-Za-z0-9]+, accept characters, redo.
    • If taking lead _, proceed to the following. Else, reject.
    • If taking lead [1-6], parse as a decimal integer N, take the following N characters X. If X does not match ^[1-9a-f][0-9a-f]*$, reject.
      • Parse X as a hexadecimal integer. If this value is not a valid Unicode codepoint or if it is 0x20 or any printable character below 0x80, reject.
      • Accept the codepoint as a character.
    • If taking lead [A-Za-z_] where the character is an encoded form in the encoding table, accept its decoded counterpart as a character.
    • Otherwise, reject.


#! /usr/bin/perl

use 5.010;
use warnings;
use strict;
use Carp;

my %short_mangles = (
	'!' => 'b',
	'#' => 'H',
	'$' => 'd',
	'%' => 'P',

	"'" => 'S',
	'"' => 'D',
	'`' => 'B',

	'*' => 'T',
	'+' => 'p',
	'-' => 'm',

	',' => 'c',
	'.' => 'M',
	
	':' => 'J',
	';' => 'j',
	
	'<' => 'l',
	'=' => 'e',
	'>' => 'g',

	'?' => 'q',
	'@' => 'a',

	'(' => 'x',
	')' => 'y',
	'[' => 'X',
	']' => 'Y',
	'{' => 'V',
	'}' => 'W',

	'/' => 'z',
	'\\' => 'Z',

	'^' => 'k',
	'_' => '_',
	' ' => 's',

	'&' => 'A',
	'|' => 'O',
	'~' => 't',
);

sub mangle_char {
	my $char = shift;

	my $short_mangle = $short_mangles{$char};

	if(defined $short_mangle) {
		return "_$short_mangle";
	}
	else {
		my $n = ord($char);
		my $x = sprintf('%x', $n);
		return '_' . length($x) . $x;
	}
};

sub mangle_all {
	my @names = (@_);
	my @out = ("_c1M9_");
	for(@names) {
		s/([^A-Za-z0-9])/mangle_char($1)/esg;
		s/^([0-9])/_0$1/;
	}
	my $last = pop @names;
	push @out, (map { length($_) . $_ } @names), $last;
	return join("", @out);
};

my @mappings = (
	cchar => 'char',
	scchar => 'signed char',
	ucchar => 'unsigned char',
	short => 'short',
	ushort => 'unsigned short',
	int => 'int',
	uint => 'unsigned int',
	long => 'long',
	ulong => 'unsigned long',
	longlong => 'long long',
	ulonglong => 'unsigned long long',
);

for(8,16,32,64) {
	push @mappings, "int$_" => "int${_}_t";
	push @mappings, "uint$_" => "uint${_}_t";
}

my @ns = ("Cifl", "builtin-types");

my $hdrid = "HDR_" . mangle_all(@ns);

say "";
say "#ifndef $hdrid";
say "#define $hdrid";
say "";
say "#include <stdint.h>";
say "";

{
	my @m = @mappings;
	while(@m) {
		my $name = shift(@m);
		my $mapping = shift(@m);
		say "typedef $mapping " . mangle_all(@ns,$name) . ";"
	}
}

say "";
say "#endif // $hdrid";

Namespaces

Cifl.lang

TODO

extern {
	namespace Cifl.lang {
		typedef void : Cifl.c.unknown.void;
		typedef bool : @header("stdbool") Cifl.c.unknown.bool;
		typedef cchar : Cifl.c.unknown.char;
		typedef scchar : Cifl.c.unknown.`signed char`;
		typedef ucchar : Cifl.c.unknown.`unsigned char`;
		typedef short : Cifl.c.unknown.short;
		typedef ushort : Cifl.c.unknown.`unsigned short`;
		typedef int : Cifl.c.unknown.int;
		typedef uint : Cifl.c.unknown.`unsigned int`;
		typedef long : Cifl.c.unknown.long;
		typedef ulong : Cifl.c.unknown.`unsigned long`;
		typedef llong : Cifl.c.unknown.`long long`;
		typedef ullong : Cifl.c.unknown.`unsigned long long`;
		typedef float : Cifl.c.unknown.float;
		typedef double : Cifl.c.unknown.double;
		typedef ldouble : Cifl.c.unknown.`long double`; 
		
		typedef size_t : Cifl.c.unknown.size_t;
 
		typedef int8 : @header("stdint") Cifl.c.unknown.int8_t;
		typedef int16 : @header("stdint") Cifl.c.unknown.int16_t;
		typedef int32 : @header("stdint") Cifl.c.unknown.int32_t;
		typedef int64 : @header("stdint") Cifl.c.unknown.int64_t;
		typedef uint8 : @header("stdint") Cifl.c.unknown.uint8_t;
		typedef uint16 : @header("stdint") Cifl.c.unknown.uint16_t;
		typedef uint32 : @header("stdint") Cifl.c.unknown.uint32_t;
		typedef uint64 : @header("stdint") Cifl.c.unknown.uint64_t;
 
		typedef byte : uint8;
 
		val null : ref[void] = @header("stddef") Cifl.c.unknown.NULL;
	}
}

Externs and imports

NOTE: After thinking about this for a while, I don't think that extern headers should necessarily be part of the language. Instead, the typical Cifl source compile will result in a C source, to be compiled in C, and a serialized external symbols database file from which a Cifl compiler can determine the existence of other Cifl modules. A extern file would also be used to make existing C externs addressable.

There's no need to make it fully human-readable, I think, because it's not typically going to be something written by a human. Rather, it should be fairly readily machine-readable and should be a reflection of metadata structures or objects useful to the compiler. A trans-infrastructure format such as JSON would be nice for more general information, such as declarations for primitive and the standard C library, but implementation-specific conversions might not be a bad thing, either. For example, it would be nice to be able to automatically write a list of library dependencies based on what functions and namespaces are used, and this would be possible by annotating declarations to require a given library (along with a given header). But these libraries and their nomenclature may vary across platforms, so some flexibility is in order.

So, the parts of the following that deal with an actual syntax are obsolete, but similar structures are likely to show up in the extern file.

Extern headers

Extern header files tell the cifl processor what external entities are available to be imported.

Rather than selecting which of these files to include or exclude, an entire body of these files is expected to be available and individual items from the body could be included using the import declarations in the source files.

The point of the extern headers is to tell Cifl how to treat certain names. Cifl doesn't read C headers, so it knows only what it's told explicitly about the C environment. In truth, this is only a contract of how Cifl behaves with a name. For example, Cifl has no real concept of macros, but you can declare a macro in a C header and extern to it as if it were a fully typed function. (This may be a way that generic/templated functionality is implemented.)

Example

The following mockup shows types that would be automatically available via the Cifl.lang namespace, as well as a series of functions available from (GNU) string.h.

  • The @cRename annotation indicates the actual C name of the item being imported as an identifier. By default, the C name of an item is its fully qualified name encoded into a mangled name. The name given must be either
    • a legal C identifier (e.g. foo), or
    • one or more of the C type keywords void, char, short, int, long, float, double, signed, unsigned, _Bool, or _Complex separated only by single spaces and with no leading or trailing spaces (e.g. unsigned long int).
      • As with any Cifl identifier that isn't legal on its own, the multi-word identifier is enclosed in backticks.
      • While any of the words may appear in any order, the hosting C environment ultimately determines the validity of the combination; sequences that have no legal meaning in C, such as unsigned void long char, are treated as legal but will (probably) cause the resulting C program not to compile. The compiler should probably warn about sequences that look very wrong.
      • Externs such as these could be implemented using C macros; for example, the strcpy declaration above might be implemented using #define _CIFL_4Cifl1C6stringstrcpy strcpy, where the name is the mangled form of Cifl.C.string.strcpy.
      • Typedef "externs" could optionally be realized through actual typedefs instead; the declaration for ulong above might be implemented by typedef unsigned long _CIFL_4Cifl4langulong;.
  • The @cHeader annotation indicates that, for a certain item to function, a C header in which it is defined must be included.
extern {
	namespace Cifl.lang {
		typedef @cRename(void) void;
		typedef @cRename(_Bool) bool;
		typedef @cRename(char) cchar;
		typedef @cRename(`signed char`) scchar;
		typedef @cRename(`unsigned char`) ucchar;
		typedef @cRename(short) short;
		typedef @cRename(`unsigned short`) ushort;
		typedef @cRename(int) int;
		typedef @cRename(`unsigned int`) uint;
		typedef @cRename(long) long;
		typedef @cRename(`unsigned long`) ulong;
		typedef @cRename(`long long`) llong;
		typedef @cRename(`unsigned long long`) ullong;
		typedef @cRename(float) float;
		typedef @cRename(double) double;
		typedef @cRename(`long double`) ldouble; 
		typedef @cHeader("stddef") @cRename(size_t) size_t;
 
		typedef @cHeader("stdint") @cRename(int8_t) int8;
		typedef @cHeader("stdint") @cRename(int16_t) int16;
		typedef @cHeader("stdint") @cRename(int32_t) int32;
		typedef @cHeader("stdint") @cRename(int64_t) int64;
		typedef @cHeader("stdint") @cRename(uint8_t) uint8;
		typedef @cHeader("stdint") @cRename(uint16_t) uint16;
		typedef @cHeader("stdint") @cRename(uint32_t) uint32;
		typedef @cHeader("stdint") @cRename(uint64_t) uint64;
 
		typedef byte : uint8;
 
		val @cHeader("stddef") @cRename(NULL) null : ref[void];
	}
	namespace Cifl.C.string {
		typedef @cHeader("string") @cRename(locale_t) locale_t ;
		def @cHeader("string") @cRename(basename) basename(filename : ref[@val cchar]) : ref[@val cchar] ;
		def @cHeader("string") @cRename(bcmp) bcmp(s1 : ref[@val void], s2 : ref[@val void], n : size_t) : int ;
		def @cHeader("string") @cRename(bcopy) bcopy(src : ref[@val void], dest : ref[void], len : size_t) : void ;
		def @cHeader("string") @cRename(bzero) bzero(dest : ref[void], len : size_t) : void ;
		def @cHeader("string") @cRename(ffs) ffs(i : int) : int ;
		def @cHeader("string") @cRename(ffsl) ffsl(l : long) : int ;
		def @cHeader("string") @cRename(ffsll) ffsll(ll : llong) : int ;
		def @cHeader("string") @cRename(index) index(s : ref[@val cchar], c : int) : ref[@val cchar] ;
		def @cHeader("string") @cRename(memccpy) memccpy(dest : @restrict ref[void], src : @restrict ref[@val void], c : int, n : size_t) : ref[void] ;
		def @cHeader("string") @cRename(memchr) memchr(s : ref[@val void], c : int, n : size_t) : ref[@val void] ;
		def @cHeader("string") @cRename(memchr) memchr(s : ref[void], c : int, n : size_t) : ref[void] ;
		def @cHeader("string") @cRename(memcmp) memcmp(s1 : ref[@val void], s2 : ref[@val void], n : size_t) : int ;
		def @cHeader("string") @cRename(memcpy) memcpy(dest : @restrict ref[void], src : @restrict ref[@val void], len : size_t) : ref[void] ;
		def @cHeader("string") @cRename(memfrob) memfrob(s : ref[void], n : size_t) : ref[void] ;
		def @cHeader("string") @cRename(memmem) memmem(haystack : ref[@val void], haystacklen : size_t, needle : ref[@val void], needlelen : size_t) : ref[void] ;
		def @cHeader("string") @cRename(memmove) memmove(dest : ref[void], src : ref[@val void], len : size_t) : ref[void] ;
		def @cHeader("string") @cRename(mempcpy) mempcpy(dest : @restrict ref[void], src : @restrict ref[@val void], len : size_t) : ref[void] ;
		def @cHeader("string") @cRename(memrchr) memrchr(s : ref[@val void], c : int, n : size_t) : ref[@val void] ;
		def @cHeader("string") @cRename(memrchr) memrchr(s : ref[void], c : int, n : size_t) : ref[void] ;
		def @cHeader("string") @cRename(memset) memset(dest : ref[void], ch : int, len : size_t) : ref[void] ;
		def @cHeader("string") @cRename(rawmemchr) rawmemchr(s : ref[@val void], c : int) : ref[@val void] ;
		def @cHeader("string") @cRename(rawmemchr) rawmemchr(s : ref[void], c : int) : ref[void] ;
		def @cHeader("string") @cRename(rindex) rindex(s : ref[@val cchar], c : int) : ref[@val cchar] ;
		def @cHeader("string") @cRename(rindex) rindex(s : ref[cchar], c : int) : ref[cchar] ;
		def @cHeader("string") @cRename(stpcpy) stpcpy(dest : @restrict ref[cchar], src : @restrict ref[@val cchar]) : ref[cchar] ;
		def @cHeader("string") @cRename(stpncpy) stpncpy(dest : ref[cchar], src : ref[@val cchar], n : size_t) : ref[cchar] ;
		def @cHeader("string") @cRename(strcasecmp) strcasecmp(s1 : ref[@val cchar], s2 : ref[@val cchar]) : int ;
		def @cHeader("string") @cRename(strcasecmp_l) strcasecmp_l(s1 : ref[@val cchar], s2 : ref[@val cchar], loc : locale_t) : int ;
		def @cHeader("string") @cRename(strcasestr) strcasestr(haystack : ref[@val cchar], needle : ref[@val cchar]) : ref[@val cchar] ;
		def @cHeader("string") @cRename(strcasestr) strcasestr(haystack : ref[cchar], needle : ref[@val cchar]) : ref[cchar] ;
		def @cHeader("string") @cRename(strcat) strcat(dest : @restrict ref[cchar], src : @restrict ref[@val cchar]) : ref[cchar] ;
		def @cHeader("string") @cRename(strchr) strchr(s : ref[@val cchar], c : int) : ref[@val cchar] ;
		def @cHeader("string") @cRename(strchr) strchr(s : ref[cchar], c : int) : ref[cchar] ;
		def @cHeader("string") @cRename(strchrnul) strchrnul(s : ref[@val cchar], c : int) : ref[@val cchar] ;
		def @cHeader("string") @cRename(strchrnul) strchrnul(s : ref[cchar], c : int) : ref[cchar] ;
		def @cHeader("string") @cRename(strcmp) strcmp(s1 : ref[@val cchar], s2 : ref[@val cchar]) : int ;
		def @cHeader("string") @cRename(strcoll) strcoll(s1 : ref[@val cchar], s2 : ref[@val cchar]) : int ;
		def @cHeader("string") @cRename(strcoll_l) strcoll_l(s1 : ref[@val cchar], s2 : ref[@val cchar], l : locale_t) : int ;
		def @cHeader("string") @cRename(strcpy) strcpy(dest : @restrict ref[cchar], src : @restrict ref[@val cchar]) : ref[cchar] ;
		def @cHeader("string") @cRename(strcspn) strcspn(s : ref[@val cchar], reject : ref[@val cchar]) : size_t ;
		def @cHeader("string") @cRename(strdup) strdup(s : ref[@val cchar]) : ref[cchar] ;
		def @cHeader("string") @cRename(strerror) strerror(errnum : int) : ref[cchar] ;
		def @cHeader("string") @cRename(strerror_l) strerror_l(errnum : int, l : locale_t) : ref[cchar] ;
		def @cHeader("string") @cRename(strerror_r) strerror_r(errnum : int, buf : ref[cchar], buflen : size_t) : ref[cchar] ;
		def @cHeader("string") @cRename(strfry) strfry(string : ref[cchar]) : ref[cchar] ;
		def @cHeader("string") @cRename(strlen) strlen(s : ref[@val cchar]) : size_t ;
		def @cHeader("string") @cRename(strncasecmp) strncasecmp(s1 : ref[@val cchar], s2 : ref[@val cchar], n : size_t) : int ;
		def @cHeader("string") @cRename(strncasecmp_l) strncasecmp_l(s1 : ref[@val cchar], s2 : ref[@val cchar], n : size_t, loc : locale_t) : int ;
		def @cHeader("string") @cRename(strncat) strncat(dest : @restrict ref[cchar], src : @restrict ref[@val cchar], len : size_t) : ref[cchar] ;
		def @cHeader("string") @cRename(strncmp) strncmp(s1 : ref[@val cchar], s2 : ref[@val cchar], n : size_t) : int ;
		def @cHeader("string") @cRename(strncpy) strncpy(dest : @restrict ref[cchar], src : @restrict ref[@val cchar], len : size_t) : ref[cchar] ;
		def @cHeader("string") @cRename(strndup) strndup(string : ref[@val cchar], n : size_t) : ref[cchar] ;
		def @cHeader("string") @cRename(strnlen) strnlen(string : ref[@val cchar], maxlen : size_t) : size_t ;
		def @cHeader("string") @cRename(strpbrk) strpbrk(s : ref[@val cchar], accept : ref[@val cchar]) : ref[@val cchar] ;
		def @cHeader("string") @cRename(strpbrk) strpbrk(s : ref[cchar], accept : ref[@val cchar]) : ref[cchar] ;
		def @cHeader("string") @cRename(strrchr) strrchr(s : ref[@val cchar], c : int) : ref[@val cchar] ;
		def @cHeader("string") @cRename(strrchr) strrchr(s : ref[cchar], c : int) : ref[cchar] ;
		def @cHeader("string") @cRename(strsep) strsep(stringp : @restrict ref[ref[cchar]], delim : @restrict ref[@val cchar]) : ref[cchar] ;
		def @cHeader("string") @cRename(strsignal) strsignal(sig : int) : ref[cchar] ;
		def @cHeader("string") @cRename(strspn) strspn(s : ref[@val cchar], accept : ref[@val cchar]) : size_t ;
		def @cHeader("string") @cRename(strstr) strstr(haystack : ref[@val cchar], needle : ref[@val cchar]) : ref[@val cchar] ;
		def @cHeader("string") @cRename(strstr) strstr(haystack : ref[cchar], needle : ref[@val cchar]) : ref[cchar] ;
		def @cHeader("string") @cRename(strtok) strtok(s : @restrict ref[cchar], delim : @restrict ref[@val cchar]) : ref[cchar] ;
		def @cHeader("string") @cRename(strtok_r) strtok_r(s : @restrict ref[cchar], delim : @restrict ref[@val cchar], save_ptr : @restrict ref[ref[cchar]]) : ref[cchar] ;
		def @cHeader("string") @cRename(strverscmp) strverscmp(s1 : ref[@val cchar], s2 : ref[@val cchar]) : int ;
		def @cHeader("string") @cRename(strxfrm) strxfrm(dest : @restrict ref[cchar], src : @restrict ref[@val cchar], n : size_t) : size_t ;
		def @cHeader("string") @cRename(strxfrm_l) strxfrm_l(dest : ref[cchar], src : ref[@val cchar], n : size_t, l : locale_t) : size_t ;
	}
}

Auto-generating

See Cifl/Extern generator.

Modeling

See Cifl/Modeling.

Built-in polymorphism

a(b) (as an rvalue)
  • If a is a function, identical to C.
  • If a is an array, as C a[b].
  • If a is a primitive type, as C (a)b.
  • If a is a structure type, as C (a){ b }.
  • If a is a qualified union member X.Y, as C99 (X){ .Y = b }.
a(b) = z
  • If a is an array, as C a[b] = z.
    • a must be one-dimensional. b must be an index (rvalue). z must be assignable to a's component type.
  • If a is a struct type, as b = a(z).<a's first member>;.
    • a must be a struct type with one or more members. c must be assignable to a variable of type a. b must be an lvalue that can be assigned a value of the type of a's first member.
    • Could be implemented by mapping the name to a struct member rather than performing assignments.
a(b,c) = z
  • If a is an array, as C a[b][c] = z.
  • If a is a struct type, as { var tmp = a(z); b = tmp.<a's first member>; c = tmp.<a's second member>; }

Genericity

Type erasure

For code that has already been compiled and/or translated, it may not be practical to implement full templating. Additionally, it may not be the most practical idea to fully reify templates at compile time if there are many different instantiations (leading to larger code).

Type erasure, i.e. using a single supertype-based implementation for all subtypes, is a solution used by Java for backward compatibility. When using type erasure, all type safety is enforced at compile time only, introducing no new effort required by the runtime. One often-cited negative is that run-time type information is lost, but since our target is C that information is generally not present anyway.

A truly demonstrative example from the C library itself is the pair of functions in stdlib.h that do array search and sort, bsearch() and qsort(). Their C signatures would basically look like the following in Cifl (adapted from C99 sec 7.20.5, param names changed for clarity):

def bsearch(
	key : @val ref[void],
	array : @val ref[void],
	array_length : size_t,
	array_element_size : size_t,
	compar : def @raw (@val ref[void], @val ref[void]) : int
) : ref[void] { ... }

def qsort(
	array : @val ref[void],
	array_length : size_t,
	array_element_size : size_t,
	compar : def @raw (@val ref[void], @val ref[void]) : int
) : void { ... }

The language of the spec states that array points to an array of array_length elements, of which each is array_element_size bytes. key, where present, points to an object which could be equal to an element of array. The type of compar accepts two pointers to elements of array (qsort()) or key and one element of array (bsearch()).

Since all of these things refer to the same element type, it would be nice to have Cifl type-check them for us. Refer now to these rewritten definitions:

def bsearch[T](
	key : @val ref[T],
	array : @val ref[T],
	array_length : size_t,
	array_element_size : size_t = sizeof[T],
	compar : def @raw (@val ref[T], @val ref[T]) : int
) : ref[T] { ... }

def qsort[T](
	array : @val ref[T],
	array_length : size_t,
	array_element_size : size_t = sizeof[T],
	compar : def @raw (@val ref[T], @val ref[T]) : int
) : void { ... }

In general, using these functions should produce exactly the same code as before when used properly, but compilation will fail if the types are not consistent, even if the same code would compile in raw C.

For type erasure to work, a type parameter must have certain variances with the types used. C doesn't provide a great deal of built-in type polymorphism, and Cifl embraces this to a degree.

For purposes of type erasure, a void pointer (ref[void]) is a supertype of any data (non-function) pointer type. Data pointer types are the most natural targets since they are the most freely converted. bsearch() and qsort() above are an example of this usage.

If types other than data pointers are among the possibilities, a union will need to be used to enumerate the possibilities, resulting in semi-generic code (generic code whose instantiation possibilities are finite).

union SomeNumberType(int, long, double);

// With no generics, any combination of types can be added.
// The return value may also be any type in the union.
def addAnyTwo(a : SomeNumberType, b : SomeNumberType) : SomeNumberType { ... }

// With generics, it's possible to fix the types of the arguments
// and the result to be the same.
def addSameTypedTwo[T <: SomeNumberType](a : T, b : T) : T { ... }

Interface polymorphism could be useful here.

Objects and interfaces

Cifl will implement an object system that is in many ways more simplistic than others that are familiar.

  • A closure is conceptually a pair consisting of an implementation function and a context.
    • In C, this is likely to be implemented as a structure containing a void* (the context) and a function pointer (the implementation) that accepts the context as its first parameter.
    • If the closure's context is counted or disposable, the closure itself might be as well, or it may simply be understood that, since the function pointer is never disposable, the context pointer itself should be counted instead.
  • An object is an aggregate of closures.
    • An object is likely to be counted and to own a reference to each of its closures.
    • Because the methods are implemented as closures,
      • There is no single "this" object. The context of the object's methods is the lexical context where they were instantiated.
      • The thing called the "object" is a table of closures which do not necessarily depend on it. A method can usually be arbitrarily detached to be used as a closure or even aggregated into other objects.
  • A trait is a set of method signatures declared in the same place.
    • A trait may be fully unimplemented, like a Java interface.
    • A trait may also be partially or fully implemented, like a Scala interface.
    • An object does not simply claim to implement a trait; when implementing methods, it supplies the qualified names (using import where handy) of the interface methods. From there, the system determines whether this is sufficient.
    • Two traits may be composed into a third trait. If a method name conflicts between the two, the code must explicitly either rename (to keep exposed) or exclude one of the implementations; the other then fulfills the contract for both traits being combined.
      • This does not establish a parent-child relationship. In fact, any object that implements the first two traits (and any renamed methods) also implements the third.
      • Since conflicts are explicitly resolved, no resolution order is necessary.
    • A trait may be reduced to a subset of its methods.
      • For example, a trait A may be reduced to a trait B by specifying which methods to keep (or, equivalently, which to omit). Any A is automatically a B.
    • A parameter expecting an object specifies its type by trait. Any object implementing all of the methods in the trait, no matter in which way or order those methods are derived, implements the trait itself.
      • For example, assuming no renames, say trait X aggregates A and B, trait Y aggregates C and D, and trait Z aggregates A, B, C, and D directly (not by way of X or Y). In an inheritance-based system, a Z is not an X or a Y, but in this composition-based system, it is both an X (because it implements A and B) and a Y (because it implements C and D).
  • An object that properly implements all methods of a trait can be used as an "object" of that trait.
    • Behind the scenes, if the object is not already in the expected structure/layout, the program may copy the applicable closure pointers into an empty structure. There's also some possibility that an object is an array of closures with a different lookup table depending on the trait being used, which would involve less up-front copying but more work each (new) member access.

Syntax for this is very much pending.