<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://halfgeek.org/wiki/index.php?action=history&amp;feed=atom&amp;title=Graut%2FSerialized_S-expressions</id>
	<title>Graut/Serialized S-expressions - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://halfgeek.org/wiki/index.php?action=history&amp;feed=atom&amp;title=Graut%2FSerialized_S-expressions"/>
	<link rel="alternate" type="text/html" href="https://halfgeek.org/wiki/index.php?title=Graut/Serialized_S-expressions&amp;action=history"/>
	<updated>2026-05-28T12:05:27Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.34.0</generator>
	<entry>
		<id>https://halfgeek.org/wiki/index.php?title=Graut/Serialized_S-expressions&amp;diff=2252&amp;oldid=prev</id>
		<title>Psmay: Created page with &quot;:&#039;&#039;This is a reworking of a similar scheme discussed in Cifl/Protocifl/Name-encoded_trees is one possibility for a byte-string representation. That encoding is designed sp...&quot;</title>
		<link rel="alternate" type="text/html" href="https://halfgeek.org/wiki/index.php?title=Graut/Serialized_S-expressions&amp;diff=2252&amp;oldid=prev"/>
		<updated>2015-02-11T16:11:45Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;:&amp;#039;&amp;#039;This is a reworking of a similar scheme discussed in &lt;a href=&quot;/wiki/Cifl/Protocifl/Name-encoded_trees&quot; title=&quot;Cifl/Protocifl/Name-encoded trees&quot;&gt;Cifl/Protocifl/Name-encoded_trees&lt;/a&gt; is one possibility for a byte-string representation. That encoding is designed sp...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;:&amp;#039;&amp;#039;This is a reworking of a similar scheme discussed in [[Cifl/Protocifl/Name-encoded_trees]] is one possibility for a byte-string representation. That encoding is designed specifically for S-expressions with a list at the root and only byte strings and other lists as elements.&lt;br /&gt;
&lt;br /&gt;
==Synopsis==&lt;br /&gt;
&lt;br /&gt;
For various purposes, it will be useful to convert an S-expression into an unambiguous string of bytes.&lt;br /&gt;
&lt;br /&gt;
A simple base encoding provides for the serialization of three kinds of element:&lt;br /&gt;
&lt;br /&gt;
* A &amp;#039;&amp;#039;&amp;#039;list&amp;#039;&amp;#039;&amp;#039;, which is a sequence of zero or more other elements&lt;br /&gt;
* An &amp;#039;&amp;#039;&amp;#039;atom&amp;#039;&amp;#039;&amp;#039;, which in this instance is a string that contains zero or more bytes and where the value 0x00 is a valid byte and not a terminator&lt;br /&gt;
* A &amp;#039;&amp;#039;&amp;#039;deser&amp;#039;&amp;#039;&amp;#039; (or &amp;#039;&amp;#039;&amp;#039;deserialization operator&amp;#039;&amp;#039;&amp;#039;), which is a cue for a deserializer to treat the following list specially.&lt;br /&gt;
&lt;br /&gt;
==An abstract alphabet==&lt;br /&gt;
&lt;br /&gt;
The serialization uses an alphabet of 261 possible symbols. A symbol is any byte or one of the following non-byte values:&lt;br /&gt;
&lt;br /&gt;
* An atom terminator, ‡ in this description&lt;br /&gt;
* A list starter, ↓ in this description&lt;br /&gt;
* An list terminator, ↑ in this description&lt;br /&gt;
* An abbreviation for ↑↓, ↕ in this description&lt;br /&gt;
* A deser operator, € in this description&lt;br /&gt;
&lt;br /&gt;
Note that these symbols are abstract and not intended to have any association with their Unicode values.&lt;br /&gt;
&lt;br /&gt;
==Canonical serialization/deserialization==&lt;br /&gt;
&lt;br /&gt;
Consider the list&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
(alpha #(+ 4 8) mu #(= 1 5) omega)&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We do not wish to serialize unresolved function calls, so those calls are forced. The result is something like&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
(&lt;br /&gt;
  alpha&lt;br /&gt;
  #( $((SYSTEM number) &amp;quot;parse-canonical&amp;quot;) &amp;quot;12&amp;quot; &amp;quot;1&amp;quot; )&lt;br /&gt;
  mu&lt;br /&gt;
  #( $((SYSTEM boolean) &amp;quot;parse-canonical&amp;quot;) &amp;quot;false&amp;quot; )&lt;br /&gt;
  omega&lt;br /&gt;
)&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The values &amp;quot;alpha&amp;quot;, &amp;quot;mu&amp;quot;, and &amp;quot;omega&amp;quot; evaluate to atoms. The other two items are neither lists nor atoms but extended data type values—for the types (SYSTEM number) and (SYSTEM boolean), respectively.&lt;br /&gt;
&lt;br /&gt;
Consider the possibility that each type has a &amp;quot;parse-canonical&amp;quot; function (this name may or may not be fictional) that accepts a sequence of zero or more simple values, where a simple value is an atom or a list of zero or more simple values. The function ideally rejects as invalid any input that is not exactly of a unique form (for example, accepting &amp;quot;false&amp;quot; but rejecting &amp;quot;FALSE&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
The human-readable serialization of an extended data type value is a call to its type&amp;#039;s &amp;quot;parse-canonical&amp;quot; function. Unfortunately, this only substitutes more unresolved function calls for the previous ones (and adds a layer of variable resolution on top).&lt;br /&gt;
&lt;br /&gt;
For the machine-readable serialization, we might introduce a fictional sigil, which we&amp;#039;ll call &amp;quot;¢&amp;quot;, that cannot represent function calls in general but can represent a type&amp;#039;s &amp;quot;parse-canonical&amp;quot; call specifically. Then, the above might be represented&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
(&lt;br /&gt;
  alpha&lt;br /&gt;
  ¢( (SYSTEM number) &amp;quot;12&amp;quot; &amp;quot;1&amp;quot; )&lt;br /&gt;
  mu&lt;br /&gt;
  ¢( (SYSTEM boolean) &amp;quot;false&amp;quot; )&lt;br /&gt;
  omega&lt;br /&gt;
)&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is sufficient for us to proceed.&lt;br /&gt;
&lt;br /&gt;
==Serialization==&lt;br /&gt;
&lt;br /&gt;
This pseudocode covers initial serialization. &lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
function serialize_list(stream, list)&lt;br /&gt;
  output ↓ to stream&lt;br /&gt;
  for element in list&lt;br /&gt;
    if element is a list&lt;br /&gt;
      serialize_list(stream, element)&lt;br /&gt;
    else if element is an atom&lt;br /&gt;
      output bytes of element to stream&lt;br /&gt;
      output ‡ to stream&lt;br /&gt;
    else # element is extended data type value&lt;br /&gt;
      output € to stream&lt;br /&gt;
      let t = list expressing data type name of element&lt;br /&gt;
      # e.g. t = [&amp;quot;SYSTEM&amp;quot;, &amp;quot;number&amp;quot;]&lt;br /&gt;
      let p = list expressing deser parameters for element&lt;br /&gt;
      # e.g. p = [&amp;quot;12&amp;quot;, &amp;quot;1&amp;quot;]&lt;br /&gt;
      let d = copy of p with t prepended as a single value&lt;br /&gt;
      # e.g. d = [[&amp;quot;SYSTEM&amp;quot;, &amp;quot;number&amp;quot;], &amp;quot;12&amp;quot;, &amp;quot;1&amp;quot;]&lt;br /&gt;
      serialize_list(stream, d)&lt;br /&gt;
  output ↑ to stream&lt;br /&gt;
&lt;br /&gt;
interface deser_marker&lt;br /&gt;
  property parameters is an appendable list&lt;br /&gt;
&lt;br /&gt;
function deserialize_list(stream)&lt;br /&gt;
  let stack be a mutable stack&lt;br /&gt;
  push [] onto stack&lt;br /&gt;
&lt;br /&gt;
  while stream can be read&lt;br /&gt;
    let current_atom = &amp;quot;&amp;quot;&lt;br /&gt;
    read symbol from stream&lt;br /&gt;
&lt;br /&gt;
    while symbol is a byte&lt;br /&gt;
      append symbol to current_atom&lt;br /&gt;
      read symbol from stream&lt;br /&gt;
&lt;br /&gt;
    if symbol is ‡&lt;br /&gt;
      let t = item at top of stack&lt;br /&gt;
      append current_atom to t&lt;br /&gt;
    else if symbol is ↓&lt;br /&gt;
      let l = []&lt;br /&gt;
      let t = item at top of stack&lt;br /&gt;
      append l to t&lt;br /&gt;
      push l onto stack&lt;br /&gt;
    else if symbol is ↑&lt;br /&gt;
      pop stack&lt;br /&gt;
      assert that stack is not empty (otherwise, too many ascents)&lt;br /&gt;
    else if symbol is €&lt;br /&gt;
      read symbol from stream; assert that it is ↓&lt;br /&gt;
      let d = new deser_marker&lt;br /&gt;
      let t = item at top of stack&lt;br /&gt;
      append d to t&lt;br /&gt;
      push d.parameters onto stack&lt;br /&gt;
&lt;br /&gt;
  pop stack&lt;br /&gt;
  let all = top of stack&lt;br /&gt;
  pop stack&lt;br /&gt;
  assert that stack is empty&lt;br /&gt;
  assert that all contains exactly one element&lt;br /&gt;
  let l = all[0]&lt;br /&gt;
  assert that l is a list&lt;br /&gt;
  return l&lt;br /&gt;
  # Traverse result to resolve deser markers as needed    &lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, consider this list:&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
(hello #(int 12) fizz #(false) (a &amp;quot;&amp;quot; () b c))&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As before, the first thing to do is resolve the function calls, inserting the bogus ¢ sigil as needed:&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
(hello ¢((SYSTEM number) &amp;quot;12&amp;quot; &amp;quot;1&amp;quot;) fizz ¢((SYSTEM boolean) &amp;quot;false&amp;quot;) (a &amp;quot;&amp;quot; () b c))&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following is equivalent; some of the implicit quotes have been added:&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
(&amp;quot;hello&amp;quot; ¢((SYSTEM number) &amp;quot;12&amp;quot; &amp;quot;1&amp;quot;) &amp;quot;fizz&amp;quot; ¢((SYSTEM boolean) &amp;quot;false&amp;quot;) (&amp;quot;a&amp;quot; &amp;quot;&amp;quot; () &amp;quot;b&amp;quot; &amp;quot;c&amp;quot;))&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Serialization would go something like:&lt;br /&gt;
&lt;br /&gt;
  &amp;lt;nowiki&amp;gt;&lt;br /&gt;
  # The root&lt;br /&gt;
  ( &amp;quot;hello&amp;quot; ¢ ( ( SYSTEM  number ) &amp;quot;12&amp;quot; &amp;quot;1&amp;quot; ) &amp;quot;fizz&amp;quot; ¢ ( ( SYSTEM  boolean ) &amp;quot;false&amp;quot; ) ( &amp;quot;a&amp;quot; &amp;quot;&amp;quot; ( ) &amp;quot;b&amp;quot; &amp;quot;c&amp;quot; ) )&lt;br /&gt;
  # Convert byte strings to terminated form&lt;br /&gt;
  (  hello‡ ¢ ( ( SYSTEM‡ number‡ ) 12‡  1‡ )  fizz‡ ¢ ( ( SYSTEM‡ boolean‡ ) false‡ ) (  a‡  ‡ ( )  b‡  c‡ ) )&lt;br /&gt;
  # Convert ascents and descents&lt;br /&gt;
  ↓  hello‡ ¢ ↓ ↓ SYSTEM‡ number‡ ↑ 12‡  1‡ ↑  fizz‡ ¢ ↓ ↓ SYSTEM‡ boolean‡ ↑ false‡ ↑ ↓  a‡  ‡ ↓ ↑  b‡  c‡ ↑ ↑&lt;br /&gt;
  # Convert the construction indicator&lt;br /&gt;
  ↓  hello‡ € ↓ ↓ SYSTEM‡ number‡ ↑ 12‡  1‡ ↑  fizz‡ € ↓ ↓ SYSTEM‡ boolean‡ ↑ false‡ ↑ ↓  a‡  ‡ ↓ ↑  b‡  c‡ ↑ ↑&lt;br /&gt;
  # The same displayed without spaces&lt;br /&gt;
  ↓hello‡€↓↓SYSTEM‡number‡↑12‡1‡↑fizz‡€↓↓SYSTEM‡boolean‡↑false‡↑↓a‡‡↓↑b‡c‡↑↑&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Compression==&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;nowiki&amp;gt;&lt;br /&gt;
function compress_serialized_list(symbols)&lt;br /&gt;
  with symbols&lt;br /&gt;
    # deser implicitly descends twice&lt;br /&gt;
    replace all instances of €↓↓ with €&lt;br /&gt;
    # a non-empty byte string implicitly ends at the next non-byte&lt;br /&gt;
    remove all instances of ‡ that immediately follow a byte and immediately precede ↑, ↓, or €&lt;br /&gt;
    # the root is implicitly a list&lt;br /&gt;
    remove one ↓ at beginning&lt;br /&gt;
    # the list is implicitly fully closed&lt;br /&gt;
    remove all ↑ at end&lt;br /&gt;
    # adjacent lists are partly abbreviated&lt;br /&gt;
    replace ↑↓ with ↕&lt;br /&gt;
  return symbols&lt;br /&gt;
&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
  &amp;lt;nowiki&amp;gt;&lt;br /&gt;
# (changes are called out with angle brackets)&lt;br /&gt;
&lt;br /&gt;
# Replace €↓↓ with €&lt;br /&gt;
# (constructor implicitly starts with two descents)&lt;br /&gt;
↓hello‡&amp;lt;€↓↓&amp;gt;SYSTEM‡number‡↑12‡1‡↑fizz‡&amp;lt;€↓↓&amp;gt;SYSTEM‡boolean‡↑false‡↑↓a‡‡↓↑b‡c‡↑↑&lt;br /&gt;
↓hello‡&amp;lt;€  &amp;gt;SYSTEM‡number‡↑12‡1‡↑fizz‡&amp;lt;€  &amp;gt;SYSTEM‡boolean‡↑false‡↑↓a‡‡↓↑b‡c‡↑↑&lt;br /&gt;
↓hello‡€SYSTEM‡number‡↑12‡1‡↑fizz‡€SYSTEM‡boolean‡↑false‡↑↓a‡‡↓↑b‡c‡↑↑&lt;br /&gt;
&lt;br /&gt;
# Wherever ‡ follows a byte and precedes [↑↓€], omit it as implicit.&lt;br /&gt;
# (Note that it must actually follow a byte—this doesn&amp;#039;t apply with an empty string.)&lt;br /&gt;
↓hello&amp;lt;‡&amp;gt;€SYSTEM‡number&amp;lt;‡&amp;gt;↑12‡1&amp;lt;‡&amp;gt;↑fizz&amp;lt;‡&amp;gt;€SYSTEM‡boolean&amp;lt;‡&amp;gt;↑false&amp;lt;‡&amp;gt;↑↓a‡‡↓↑b‡c&amp;lt;‡&amp;gt;↑↑&lt;br /&gt;
↓hello&amp;lt; &amp;gt;€SYSTEM‡number&amp;lt; &amp;gt;↑12‡1&amp;lt; &amp;gt;↑fizz&amp;lt; &amp;gt;€SYSTEM‡boolean&amp;lt; &amp;gt;↑false&amp;lt; &amp;gt;↑↓a‡‡↓↑b‡c&amp;lt; &amp;gt;↑↑&lt;br /&gt;
↓hello€SYSTEM‡number↑12‡1↑fizz€SYSTEM‡boolean↑false↑↓a‡‡↓↑b‡c↑↑&lt;br /&gt;
&lt;br /&gt;
# Remove initial ↓ and final ↑+&lt;br /&gt;
# (root is implicitly a list and is implicitly closed at any depth at the end)&lt;br /&gt;
&amp;lt;↓&amp;gt;hello€SYSTEM‡number↑12‡1↑fizz€SYSTEM‡boolean↑false↑↓a‡‡↓↑b‡c&amp;lt;↑↑&amp;gt;&lt;br /&gt;
&amp;lt; &amp;gt;hello€SYSTEM‡number↑12‡1↑fizz€SYSTEM‡boolean↑false↑↓a‡‡↓↑b‡c&amp;lt;  &amp;gt;&lt;br /&gt;
hello€SYSTEM‡number↑12‡1↑fizz€SYSTEM‡boolean↑false↑↓a‡‡↓↑b‡c&lt;br /&gt;
&lt;br /&gt;
# Replace ↑↓ with ↕&lt;br /&gt;
hello€SYSTEM‡number↑12‡1↑fizz€SYSTEM‡boolean↑false&amp;lt;↑↓&amp;gt;a‡‡↓↑b‡c&lt;br /&gt;
hello€SYSTEM‡number↑12‡1↑fizz€SYSTEM‡boolean↑false&amp;lt;↕ &amp;gt;a‡‡↓↑b‡c&lt;br /&gt;
hello€SYSTEM‡number↑12‡1↑fizz€SYSTEM‡boolean↑false↕a‡‡↓↑b‡c&amp;lt;/nowiki&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To encode exclusively with characters in the set [A-Za-z0-9_], this algorithm is provided:&lt;br /&gt;
&lt;br /&gt;
Assign these values to the elements of the alphabet:&lt;br /&gt;
&lt;br /&gt;
* A byte as its numeric value (0 .. 255)&lt;br /&gt;
* ‡ as 357 (i.e., 260 + &amp;#039;a&amp;#039;)&lt;br /&gt;
* ↓ as 358&lt;br /&gt;
* ↑ as 359&lt;br /&gt;
* ↕ as 360&lt;br /&gt;
* € as 361&lt;br /&gt;
&lt;br /&gt;
Replace any value outside [A-Za-z_] with its long form, where the long form is a digit [0-9] followed by an uppercase letter [A-Z] and is determined as follows:&lt;br /&gt;
&lt;br /&gt;
* The value cannot be in [A-Za-z_]; these cannot be encoded in long form.&lt;br /&gt;
* Add 250 - &amp;#039;0&amp;#039;; i.e., add 202. (This maps &amp;#039;0&amp;#039; to the coded value 250.)&lt;br /&gt;
* Take this value modulo 260.&lt;br /&gt;
* Let the digit index and the letter index be the value modulo 10 and the value integer-divided by 10, respectively.&lt;br /&gt;
* The digit and the letter are &amp;#039;0&amp;#039; + the digit index and &amp;#039;A&amp;#039; + the letter index, respectively.&lt;br /&gt;
&lt;br /&gt;
Because of the second step, the digits [0-9] are encoded as themselves followed by &amp;#039;Z&amp;#039; (i.e. &amp;#039;0Z&amp;#039; .. &amp;#039;9Z&amp;#039;).&lt;br /&gt;
&lt;br /&gt;
The values assigned to the special symbols would be the long forms of lowercase letters if those were allowed as input.&lt;br /&gt;
&lt;br /&gt;
This result can be compressed slightly: Replace ([0-9])Z with \1 unless it is followed by [A-Z]. A digit in the input that is not followed by [A-Z] can simply imply the &amp;#039;Z&amp;#039;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Naturally, decompression reverses compression:&lt;br /&gt;
&lt;br /&gt;
* Replace all ↕ with ↑↓.&lt;br /&gt;
* Add a ↓ at the beginning.&lt;br /&gt;
* Add a ↑ at the end.&lt;br /&gt;
** This is needed for the following step to work in all cases.&lt;br /&gt;
* Insert a ‡ after any byte value that is followed by ↑, ↓, or €.&lt;br /&gt;
* Replace all € with €↓↓.&lt;br /&gt;
* Finally, add an additional ↑ at the end for each remaining open list.&lt;/div&gt;</summary>
		<author><name>Psmay</name></author>
		
	</entry>
</feed>