Text and Characters
Unicode text handling with Text and individual Char values.
Overview
Text represents Unicode strings and Char represents individual Unicode code points. Text is opaque in the verifier except for equality, literal identity, and trusted library contracts such as Text.size and Text.concat. Char is represented by its Unicode code point, so simple code-point comparisons and digit-range predicates can be verified.
// Basic Text type usage in contracts
module {
// Text equality in postconditions
public func identity(s : Text) : Text
ensures result == s;
{
s
};
// Text parameter selection
public func choose(cond : Bool, a : Text, b : Text) : Text
ensures cond ==> result == a;
ensures (not cond) ==> result == b;
{
if (cond) { a } else { b }
};
// Text reflexivity - every text equals itself
public func selfEqual(s : Text) : Bool
ensures result == true;
{
s == s
};
}
Text Literals
Text literals use double quotes and support Unicode characters and escape sequences:
let greeting : Text = "Hello, World!";
let unicode : Text = "Unicode: \u{1F600}"; // Emoji
let escaped : Text = "Line 1\nLine 2"; // Newline
let empty : Text = ""; // Empty string
Escape Sequences
| Escape | Meaning |
|---|---|
\n | Newline |
\r | Carriage return |
\t | Tab |
\\ | Backslash |
\" | Double quote |
\u{XXXX} | Unicode code point (hex) |
Unicode escapes must be valid code points. Surrogate values (U+D800 to U+DFFF) are rejected.
Character Literals
Character literals use single quotes and represent exactly one Unicode code point:
let letter : Char = 'A';
let digit : Char = '7';
let emoji : Char = '\u{2764}'; // Heart symbol
// Basic Char type usage
module {
// Character equality
public func compareChar(a : Char, b : Char) : Bool
ensures result == (a == b);
{
a == b
};
// Character reflexivity
public func charSelfEqual(c : Char) : Bool
ensures result == true;
{
c == c
};
// Select character based on condition
public func selectChar(first : Bool, a : Char, b : Char) : Char
ensures first ==> result == a;
ensures (not first) ==> result == b;
{
if (first) { a } else { b }
};
}
Character literals must contain exactly one code point. Empty ('') or multi-character ('ab') literals are syntax errors.
Text Concatenation
The # operator concatenates text values:
// Text concatenation with the # operator
module {
// Concatenate two texts
pure func concat(a : Text, b : Text) : Text {
a # b
};
// Check if text matches concatenation
public func testConcat() : Text
ensures result == concat("hello", "world");
{
"hello" # "world"
};
// Build greeting message
pure func greet(name : Text) : Text {
"Hello, " # name # "!"
};
}
Concatenation can be used in pure functions and contracts. Raw a # b is an opaque symbolic operation, but the core Text module exposes trusted contracts for facts such as Text.concat(a, b) preserving Text.size(a) + Text.size(b).
Empty String Comparison
You can compare text against the empty string literal:
// Empty string literal and comparison
module {
// Check if text is empty
pure func isEmpty(s : Text) : Bool {
s == ""
};
// Test empty string detection
public func testEmpty() : Bool
ensures result == true;
{
isEmpty("")
};
// Test non-empty string
public func testNonEmpty() : Bool
ensures result == false;
{
isEmpty("hello")
};
}
Verification Model
Text and Char use different verification models:
- Text literals are interned to unique integer IDs - each distinct string gets a unique ID
- Char literals are encoded as their Unicode code point value
- Equality works correctly - same strings compare equal, different strings compare unequal
- Library size facts come from trusted core contracts such as
Text.size,Text.fromChar,Text.fromArray, andText.concat - Raw concatenation is an opaque function - the verifier knows
a # bis consistent but cannot analyze string content
This means you can verify:
- Text equality and inequality
- Size facts promised by the core Text library
- Passing text through functions unchanged
- Text stored in variants and collections
- Char code-point comparisons, such as
c >= '0' and c <= '9'
But you cannot verify:
- String length/content facts that are not exposed by a library contract
- Substring relationships
- Locale, normalization, grapheme cluster, or substring relationships
Text in Variants
Text is commonly used as an error payload in variant types:
// Text as variant payload for error handling
persistent actor {
type Result = {
#ok : Nat;
#err : Text;
};
transient var lastResult : Result = #err("not initialized");
public func setError(msg : Text) : async ()
modifies lastResult
ensures lastResult == #err(msg);
{
lastResult := #err(msg);
};
public func setOk(val : Nat) : async ()
modifies lastResult
ensures lastResult == #ok(val);
{
lastResult := #ok(val);
};
public func process(r : Result) : async Nat
ensures result >= 0;
{
switch (r) {
case (#ok(n)) { n };
case (#err(msg)) { 0 };
};
};
}
Text in Spec Collections
Text and Char are spec-immutable primitives, so they can be used as elements in proof-only ghost collections:
// Text as a proof-only spec collection element
module {
public ghost func rememberName(names : Set<Text>, name : Text) : Set<Text>
ensures Set.contains(name, result);
{
Set.union(names, Set.singleton(name))
};
public ghost func hasName(names : Set<Text>, name : Text) : Bool
ensures result == Set.contains(name, names);
{
Set.contains(name, names)
};
}
Text Methods
The Text type provides these methods at runtime:
| Method | Return Type | Description |
|---|---|---|
size() | Nat | Number of Unicode code points |
chars() | Iter<Char> | Iterator over characters |
These methods work at runtime. Prefer the core Text functions in verified contracts when you need explicit size facts. for-in loops over text.chars() are supported when the iterator's next() method is contracted, but the verifier still treats text contents as opaque. See loops for the contracted-iterator rule.
Character Conversions
Characters can be converted to and from their Unicode code point values:
import Char "mo:core/Char";
let c : Char = 'A';
let codepoint : Nat32 = Char.toNat32(c); // 65
let back : Char = Char.fromNat32(65); // 'A'
In verification, these conversions are identity operations since Char is already represented as its code point value. Char.fromNat32 requires the input to be a valid Unicode scalar value, excluding the surrogate range.
Limitations
No String Content Verification
The verifier cannot derive arbitrary string content, length, or ordering facts from raw text methods. Use the core Text library when a trusted contract provides the fact you need. These contracts cannot be verified as written:
// Cannot verify - verifier doesn't know string length
public func isShort(s : Text) : Bool
ensures result == (s.size() < 10); // Not verifiable
{ s.size() < 10 }
// Cannot verify - verifier doesn't know ordering
public func isLess(a : Text, b : Text) : Bool
ensures result == (a < b); // Not verifiable
{ a < b }
For-In Over Text
Iterating over text characters with for-in is supported as long as the
iterator's next() method has a type contract:
public func count(t : Text) : async Nat {
for (c in t.chars()) {
let _ = c;
};
0
}
Immutability
Text values are immutable. You cannot modify a string in place - operations like concatenation return new Text values.
Common Patterns
Error Messages in Results
type Result<T> = { #ok : T; #err : Text };
public func validate(x : Nat) : Result<Nat>
ensures x > 0 ==> result == #ok(x);
ensures x == 0 ==> result == #err("must be positive");
{
if (x > 0) { #ok(x) } else { #err("must be positive") }
}
Text Field in Actor
persistent actor {
var owner : Text = "admin";
public func getOwner() : async Text
reads owner
ensures result == owner;
{
owner
};
public func setOwner(newOwner : Text) : async ()
modifies owner
ensures owner == newOwner;
{
owner := newOwner;
};
}
Pure Text Helpers
pure func formatPair(a : Text, b : Text) : Text {
"(" # a # ", " # b # ")"
};
public func test() : Text
ensures result == formatPair("x", "y");
{
formatPair("x", "y")
}
Summary
Textrepresents Unicode strings;Charrepresents single Unicode code points- Both types support equality-based verification via uninterpreted integer encoding
- Text concatenation uses the
#operator and works in pure functions and contracts - Text and Char are spec-immutable and can be used in ghost collections
- The verifier cannot reason about string content, length, or ordering unless a trusted library contract exposes the fact
for-inloops overtext.chars()are supported, but contents remain opaque- Common uses include error messages in variants and identity tracking in contracts