Regression Prevention

Sector9 uses golden file testing to catch unintended changes to verification behavior. This ensures that translator modifications, solver updates, or code changes do not silently break existing verification guarantees.

How Regression Testing Works

The regression testing system has three layers of protection:

Semantic validation: Tests must behave according to their naming suffix (_sat, _unsat, _reject, _noperm)
Golden file comparison: Actual outputs must match stored reference files
CI enforcement: Failed tests block merges to protected branches

When any layer fails, the system reports the discrepancy and prevents accidental regressions from reaching production.

Test Naming Conventions

Test files encode their expected outcome through suffixes:

Suffix	Expected Behavior	Regression If...
`_sat`	Verification succeeds	Verification fails
`_unsat`	Verification fails	Verification succeeds
`_reject`	Translation rejected	Translation succeeds
`_noperm`	Permission error	No permission error

A test named arithmetic_add_sat.mo must verify successfully. If it fails verification, that is a semantic regression regardless of golden file status.

Detecting Regressions

Run the Viper test suite to check for regressions:

./test-viper.sh

The output shows both semantic and golden file status:

=== level1 ===
✅ [m] arithmetic_add_sat.mo - verified OK
✅ [m] array_bounds_unsat.mo - postcondition might not hold
❌ [!] changed_test_sat.mo - Postcondition might not hold
✅ [ ] new_test_sat.mo - verified OK

Understanding the Output

Each line has two status indicators:

Indicator	Position	Meaning
`✅` / `❌`	First	Semantic status (matches expected outcome)
`[m]` / `[!]` / `[ ]`	Second	Golden file status (matches stored reference)

Regression scenarios:

❌ [m] - Semantic failure, golden matches: The test now produces wrong outcome but identical output. This indicates a logic bug.
❌ [!] - Semantic failure, golden mismatch: The test outcome and output both changed. Investigate the cause.
✅ [!] - Semantic pass, golden mismatch: Output changed but test still passes. This may be intentional or an early warning.

Investigating Regressions

When you see a mismatch, compare the actual output against the golden:

# Run a specific test
cd test/viper/level1
../../run.sh -v my_test_sat.mo

# Compare outputs
diff ok/my_test_sat.vpr _out/my_test_sat.vpr
diff ok/my_test_sat.silicon _out/my_test_sat.silicon

Common Regression Causes

Translator changes: Modifications to src/viper/*.ml that change Viper code generation. Use ./test-viper.sh -a for the full Viper translator lane when accepting intentional output changes.

Solver updates: New Silicon or Z3 versions that produce different outputs or counterexamples.

Prelude changes: Modifications to the verification prelude affecting all tests.

Type system changes: Changes to Sector9 type checking that affect translation.

Intentional vs Accidental Changes

Not all mismatches are regressions. Some changes are intentional:

Intentional changes (accept the new golden):

Bug fixes that change Viper output
Improved error messages
Translation optimizations
New feature implementations

Accidental regressions (do not accept, fix the bug):

Semantic failures (❌ indicator)
Unexpected output changes
Changes unrelated to your modifications

Decision Workflow

1. Run tests, see mismatch
2. Check semantic status (✅ or ❌)
   - If ❌: This is a regression. Fix the bug.
   - If ✅: Continue to step 3.
3. Review the diff
   - Expected based on your changes? Accept.
   - Unexpected? Investigate before accepting.

Accepting Changes

When a change is intentional, update the golden files:

# Update a single test
./update_goldens.sh test/viper/level1/my_test_sat.mo

# Update all tests in a directory
cd test/viper/level1
../../run.sh -a -v *.mo

The -a flag copies actual outputs to the ok/ directory as new reference files.

Adding Regression Tests

When you fix a bug, add a test that would have caught it:

Create a test file that reproduces the bug:

// test/viper/bugs/issue_123_field_shadow_sat.mo
// Regression test for issue #123: parameter shadowing actor field

persistent actor {
  var x : Int = 0;

  public func buggy(x : Int) : async Int
    ensures result == x + 1;
  {
    x + 1
  };
}

Run the test to verify it passes with the fix:

./test-viper.sh
# Output: ✅ [ ] issue_123_field_shadow_sat.mo - verified OK

Accept the golden files:

./update_goldens.sh test/viper/bugs/issue_123_field_shadow_sat.mo

Commit both the test and its goldens:

git add test/viper/bugs/issue_123_field_shadow_sat.mo
git add test/viper/bugs/ok/issue_123_field_shadow_sat.*
git commit -m "Add regression test for issue #123"

The bugs/ Directory

The test/viper/bugs/ directory contains regression tests for specific bugs and edge cases. Tests here are named descriptively:

test/viper/bugs/
├── arrays_test02_array_update_sat.mo
├── contracts_bug01_param_field_shadow_sat.mo
├── effects_test29_result_field_name_bug_sat.mo
├── ghost_test13_loop_inv_order_bug_sat.mo
└── ok/
    └── ... (golden files)

CI Integration

The test suite runs in CI and blocks merges on failures:

# Example GitHub Actions workflow
name: Tests
on: [push, pull_request]

jobs:
  verify:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v5
      - name: Run Viper tests
        run: ./test-viper.sh
        env:
          JOBS: 4
          VERIFY_TIMEOUT_MS: 120000
          XDG_CACHE_HOME: /tmp/sector9

Exit Codes

The test runner returns non-zero exit codes on failures:

Exit Code	Meaning
`0`	All tests pass
`1`	One or more tests failed

CI systems detect the non-zero exit and fail the build.

Environment Variables

Control test behavior with environment variables:

Variable	Default	Purpose
`JOBS`	`12`	Parallel test jobs
`VERIFY_TIMEOUT_MS`	`120000`	Per-test verification timeout
`SLOW_VERIFY_TIMEOUT_MS`	`300000`	Per-test timeout for files marked `// SLOW`
`SLOW_TEST_THRESHOLD`	`60`	Seconds before test marked slow
`XDG_CACHE_HOME`	`/tmp/sector9`	Cache directory

JOBS=4 VERIFY_TIMEOUT_MS=300000 ./test-viper.sh

Output Normalization

Golden files are normalized to prevent false positives from environment differences:

Memory addresses replaced with placeholders
Version strings removed
Timestamps stripped
Stack traces removed
Temporary paths normalized

This normalization happens automatically during comparison, ensuring tests pass consistently across different machines.

Best Practices

Run tests before committing: Catch regressions early. Use the focused runner for the code you touched: ./test-viper.sh -a for Viper translator changes, ./test-mo.sh for language/frontend changes, and ./test-pic.sh for runtime .pic.ts checks.

Review diffs before accepting: Never blindly accept golden changes.

Add regression tests for bugs: Prevent the same bug from recurring.

Keep tests focused: One concept per test file.

Use descriptive names: issue_123_field_shadow_sat.mo is better than test1_sat.mo.

Commit goldens with tests: Keep tests and their expected outputs together.

Summary

Regression tests compare actual outputs against stored golden files
Semantic validation ensures tests behave according to their suffix
Run ./test-viper.sh to check for regressions
Use ./update_goldens.sh to accept intentional changes
Add regression tests to test/viper/bugs/ for fixed bugs
CI blocks merges when tests fail
Review diffs carefully before accepting changes
Use ./test-viper.sh -a --no-cache-for <test> or scripts/clear-vcache-for-test.sh when a single cached verifier result looks stale

How Regression Testing Works​

Test Naming Conventions​

Detecting Regressions​

Understanding the Output​

Investigating Regressions​

Common Regression Causes​

Intentional vs Accidental Changes​

Decision Workflow​

Accepting Changes​

Adding Regression Tests​

The bugs/ Directory​

CI Integration​

Exit Codes​

Environment Variables​

Output Normalization​

Best Practices​

Summary​