Skip to main content

Regression Prevention

Sector9 uses golden file testing to catch unintended changes to verification behavior. This ensures that translator modifications, solver updates, or code changes do not silently break existing verification guarantees.

How Regression Testing Works

The regression testing system has three layers of protection:

  1. Semantic validation: Tests must behave according to their naming suffix (_sat, _unsat, _reject, _noperm)
  2. Golden file comparison: Actual outputs must match stored reference files
  3. CI enforcement: Failed tests block merges to protected branches

When any layer fails, the system reports the discrepancy and prevents accidental regressions from reaching production.

Test Naming Conventions

Test files encode their expected outcome through suffixes:

SuffixExpected BehaviorRegression If...
_satVerification succeedsVerification fails
_unsatVerification failsVerification succeeds
_rejectTranslation rejectedTranslation succeeds
_nopermPermission errorNo permission error

A test named arithmetic_add_sat.mo must verify successfully. If it fails verification, that is a semantic regression regardless of golden file status.

Detecting Regressions

Run the Viper test suite to check for regressions:

./test-viper.sh

The output shows both semantic and golden file status:

=== level1 ===
✅ [m] arithmetic_add_sat.mo - verified OK
✅ [m] array_bounds_unsat.mo - postcondition might not hold
❌ [!] changed_test_sat.mo - Postcondition might not hold
✅ [ ] new_test_sat.mo - verified OK

Understanding the Output

Each line has two status indicators:

IndicatorPositionMeaning
/ FirstSemantic status (matches expected outcome)
[m] / [!] / [ ]SecondGolden file status (matches stored reference)

Regression scenarios:

  • ❌ [m] - Semantic failure, golden matches: The test now produces wrong outcome but identical output. This indicates a logic bug.
  • ❌ [!] - Semantic failure, golden mismatch: The test outcome and output both changed. Investigate the cause.
  • ✅ [!] - Semantic pass, golden mismatch: Output changed but test still passes. This may be intentional or an early warning.

Investigating Regressions

When you see a mismatch, compare the actual output against the golden:

# Run a specific test
cd test/viper/level1
../../run.sh -v my_test_sat.mo

# Compare outputs
diff ok/my_test_sat.vpr _out/my_test_sat.vpr
diff ok/my_test_sat.silicon _out/my_test_sat.silicon

Common Regression Causes

Translator changes: Modifications to src/viper/*.ml that change Viper code generation. Use ./test-viper.sh -a for the full Viper translator lane when accepting intentional output changes.

Solver updates: New Silicon or Z3 versions that produce different outputs or counterexamples.

Prelude changes: Modifications to the verification prelude affecting all tests.

Type system changes: Changes to Sector9 type checking that affect translation.

Intentional vs Accidental Changes

Not all mismatches are regressions. Some changes are intentional:

Intentional changes (accept the new golden):

  • Bug fixes that change Viper output
  • Improved error messages
  • Translation optimizations
  • New feature implementations

Accidental regressions (do not accept, fix the bug):

  • Semantic failures ( indicator)
  • Unexpected output changes
  • Changes unrelated to your modifications

Decision Workflow

1. Run tests, see mismatch
2. Check semantic status (✅ or ❌)
- If ❌: This is a regression. Fix the bug.
- If ✅: Continue to step 3.
3. Review the diff
- Expected based on your changes? Accept.
- Unexpected? Investigate before accepting.

Accepting Changes

When a change is intentional, update the golden files:

# Update a single test
./update_goldens.sh test/viper/level1/my_test_sat.mo

# Update all tests in a directory
cd test/viper/level1
../../run.sh -a -v *.mo

The -a flag copies actual outputs to the ok/ directory as new reference files.

Adding Regression Tests

When you fix a bug, add a test that would have caught it:

  1. Create a test file that reproduces the bug:
// test/viper/bugs/issue_123_field_shadow_sat.mo
// Regression test for issue #123: parameter shadowing actor field

persistent actor {
var x : Int = 0;

public func buggy(x : Int) : async Int
ensures result == x + 1;
{
x + 1
};
}
  1. Run the test to verify it passes with the fix:
./test-viper.sh
# Output: ✅ [ ] issue_123_field_shadow_sat.mo - verified OK
  1. Accept the golden files:
./update_goldens.sh test/viper/bugs/issue_123_field_shadow_sat.mo
  1. Commit both the test and its goldens:
git add test/viper/bugs/issue_123_field_shadow_sat.mo
git add test/viper/bugs/ok/issue_123_field_shadow_sat.*
git commit -m "Add regression test for issue #123"

The bugs/ Directory

The test/viper/bugs/ directory contains regression tests for specific bugs and edge cases. Tests here are named descriptively:

test/viper/bugs/
├── arrays_test02_array_update_sat.mo
├── contracts_bug01_param_field_shadow_sat.mo
├── effects_test29_result_field_name_bug_sat.mo
├── ghost_test13_loop_inv_order_bug_sat.mo
└── ok/
└── ... (golden files)

CI Integration

The test suite runs in CI and blocks merges on failures:

# Example GitHub Actions workflow
name: Tests
on: [push, pull_request]

jobs:
verify:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- name: Run Viper tests
run: ./test-viper.sh
env:
JOBS: 4
VERIFY_TIMEOUT_MS: 120000
XDG_CACHE_HOME: /tmp/sector9

Exit Codes

The test runner returns non-zero exit codes on failures:

Exit CodeMeaning
0All tests pass
1One or more tests failed

CI systems detect the non-zero exit and fail the build.

Environment Variables

Control test behavior with environment variables:

VariableDefaultPurpose
JOBS12Parallel test jobs
VERIFY_TIMEOUT_MS120000Per-test verification timeout
SLOW_VERIFY_TIMEOUT_MS300000Per-test timeout for files marked // SLOW
SLOW_TEST_THRESHOLD60Seconds before test marked slow
XDG_CACHE_HOME/tmp/sector9Cache directory
JOBS=4 VERIFY_TIMEOUT_MS=300000 ./test-viper.sh

Output Normalization

Golden files are normalized to prevent false positives from environment differences:

  • Memory addresses replaced with placeholders
  • Version strings removed
  • Timestamps stripped
  • Stack traces removed
  • Temporary paths normalized

This normalization happens automatically during comparison, ensuring tests pass consistently across different machines.

Best Practices

Run tests before committing: Catch regressions early. Use the focused runner for the code you touched: ./test-viper.sh -a for Viper translator changes, ./test-mo.sh for language/frontend changes, and ./test-pic.sh for runtime .pic.ts checks.

Review diffs before accepting: Never blindly accept golden changes.

Add regression tests for bugs: Prevent the same bug from recurring.

Keep tests focused: One concept per test file.

Use descriptive names: issue_123_field_shadow_sat.mo is better than test1_sat.mo.

Commit goldens with tests: Keep tests and their expected outputs together.

Summary

  • Regression tests compare actual outputs against stored golden files
  • Semantic validation ensures tests behave according to their suffix
  • Run ./test-viper.sh to check for regressions
  • Use ./update_goldens.sh to accept intentional changes
  • Add regression tests to test/viper/bugs/ for fixed bugs
  • CI blocks merges when tests fail
  • Review diffs carefully before accepting changes
  • Use ./test-viper.sh -a --no-cache-for <test> or scripts/clear-vcache-for-test.sh when a single cached verifier result looks stale