10. Code Identity & The Verified-to-Binary Gap

1. Introduction: The "Double-Compiler" Problem

In most formally verified languages, there is a dangerous gap between the code that is verified and the code that actually runs.

Case A: Verification happens on a mathematical model (e.g., TLA+, Coq) that was written by hand. If the model doesn't match the implementation, the proof is worthless.
Case B: Verification happens on the source code, but the compiler's optimization or code generation passes introduce bugs that weren't in the source.

Sector9 addresses this through Unified Code Identity. It uses the same frontend to parse and type-check the code for both the verifier (Viper/TLA+) and the runtime (Wasm). However, this creates a new challenge: Runtime Projection. Verification code often contains "Ghost" logic (proof-only variables and lemmas) that must be completely erased from the production binary without changing the program's behavior.

This research analyzes how Sector9 ensures this gap is closed using src/pipeline/pipeline.ml and src/pipeline/typed_phase_check.ml.

2. The Runtime Projection Pipeline

Sector9 implements a strict "Fail-Closed" projection pipeline. When the compiler prepares a canister for deployment, it performs a series of Erasure Transformations.

Ghost Erasure

The system identifies any declaration marked with the ghost keyword. These are recursively pruned from the AST.

let standalone_runtime_surface_entry_of_dec (dec : Syntax.dec) : (...) option =
  let phase = dec.Source.note.Syntax.note_phase in
  if phase = Some Syntax.GhostPhase || CompUnit.is_ghost_only_scaffolding_dec dec then
    None
  else
    (* ... export to runtime surface *)

This ensures that the "Runtime Surface" of a module—the set of functions and variables accessible to other canisters—contains zero traces of the verification logic.

Unit-Call Erasure

A common pattern in Sector9 is calling a lemma (a ghost function that returns unit) to help the verifier prove a property.

ghost func check_balance(b : Nat) { ... };
// ...
check_balance(this.balance); // Purely for verification

The projection engine in pipeline.ml implements erase_runtime_proof_unit_calls_exp. It replaces any call to a unit-returning ghost function with a TupE [] (a no-op unit value). This ensures that proofs have zero runtime overhead.

3. Legality Checking: Preventing Proof Leakage

A critical security risk is "Proof Leakage": what if a runtime function accidentally depends on a ghost variable?

ghost var secret_key = ...;
public func get_key() { return secret_key; }; // SHOULD BE ILLEGAL

Sector9 solves this in typed_phase_check.ml using a Proof Dependency Graph.

The Dependency Graph

The compiler builds a graph where:

Nodes are declarations (func, var, type).
Edges represent "Uses" (e.g., Function A calls Function B).
Edges are tagged as RuntimeEdgeContext or VerificationEdgeContext.

The reject_direct_verify_roots pass performs a reachability analysis. It starts from the "Runtime Roots" (public actor methods) and ensures they never traverse an edge into a Ghost or Verification-only node.

let roots_from_graph (...) =
  (* ... *)
  if not is_runtime_root then
    acc
  else
    match resolve_path Region_set.empty decl_at with
    | Some (dep_detail, dep_path) ->
        (* Found a path from runtime to ghost! REJECT BUILD *)

This keeps proof-only logic isolated from runtime roots: runtime behavior must not depend on ghost declarations, verification-only assertions, or signature-only proof stubs.

4. CHASH: Runtime Code Identity

Sector9 currently uses CHASH (Content Hash) for three shipped purposes: CLI-facing actor/module identity (--chash, --chash-json, --chash-tree, --chash-all, --chash-verify), compiler-internal module identity for token shared-boundary wire labels, and defining-module identity for stable token compatibility.

Runtime-Projection Hashing

Runtime CHASH follows the same projected-root shape used by runtime projection and standalone runtime preflight. It filters proof-only declarations and non-runtime assertions before hashing runtime declarations:

Erase ghost declarations and ghost-only scaffolding.
Erase non-runtime static/proof assertions.
Keep runtime assertions and entry_requires guards, because they emit runtime checks.
Hash the remaining runtime-relevant actor/module declarations.

This means many proof-only edits do not affect runtime CHASH. It does not currently mean that the compiled Wasm hash is byte-identical, that a canister ID is derived from CHASH, or that a shipped immutable-upgrade gate exists. Those remain design goals, not current product behavior.

5. The Trusted Compiler Path

While Sector9 ensures the source and binary are semantically aligned, it still trusts the OCaml-to-Wasm generator.

The Gap: If the compiler backend's IR-to-Wasm pass has a bug (e.g., incorrect register allocation for a Nat64), the proof in Viper might say the code is safe, but the Wasm might overflow.
Mitigation: Sector9 uses the established Motoko-derived compiler backend, which has been battle-tested on the IC for years. The TLA+ lane (Subject 4) can separately check protocol-level transition catalogs and semantic facts, but it does not prove the Wasm backend correct.

6. Evaluation & Comparison

vs. Solidity (Source-Verify Gap)

In Solidity, the "Source" (Solidity) is very different from the "Binary" (EVM Bytecode). Verifying Solidity source doesn't protect against solc compiler bugs (like the infamous delegatecall bugs).

Sector9: By having the verifier and compiler share the same frontend and projection engine, Sector9 significantly reduces the "Mismatch Surface."

vs. Dafny (Code Generation)

Dafny compiles to C#, Java, or Go.

Dafny: Often suffers from "Target Language Semantic Mismatch." For example, Dafny's Int (infinite) might be mapped to a 64-bit integer in the target, leading to overflows.
Sector9: Maps Sector9 Int to Z3's Int and Nat32 to partial-axiom Viper calls (Subject 9). This keeps the mathematical model and the Wasm runtime aligned where the current verifier has axioms.

vs. F* (Verified Extraction)

F* can "extract" verified OCaml or C code through a verified extraction process.

F:* Highest security but extremely difficult to use.
Sector9: Aims for a "Pragmatic Verified Extraction." It doesn't prove the compiler itself is correct, but it uses Fail-Closed Projection to ensure the compiler cannot silently ignore the boundaries between proofs and code.

7. Conclusion

Sector9’s unified source path reduces model/code drift by sharing frontend and projection machinery between verification and runtime lanes. Runtime projection, typed legality checks, and CHASH together make proof/runtime boundaries explicit, but the compiler backend and projection policy remain part of the trusted base.

Strengths:

Isolation: Proof dependency graph prevents ghost logic from affecting runtime behavior.
Identity: Runtime CHASH ignores many proof-only edits and tags token shared-boundary wire labels.
Efficiency: Automated erasure ensures zero runtime overhead for formal specifications.

Weaknesses:

Compiler Trust: The backend (IR-to-Wasm) remains in the TCB and is not itself verified.
Complexity: The dual-path (Verification vs. Runtime) in the compiler pipeline increases the risk of "Projection Bugs" where the wrong nodes are erased.

1. Introduction: The "Double-Compiler" Problem​

2. The Runtime Projection Pipeline​

Ghost Erasure​

Unit-Call Erasure​

3. Legality Checking: Preventing Proof Leakage​

The Dependency Graph​

4. CHASH: Runtime Code Identity​

Runtime-Projection Hashing​

5. The Trusted Compiler Path​

6. Evaluation & Comparison​

vs. Solidity (Source-Verify Gap)​

vs. Dafny (Code Generation)​

vs. F* (Verified Extraction)​

7. Conclusion​