Regression Prevention
Sector9 uses golden file testing to catch unintended changes to verification behavior. This ensures that translator modifications, solver updates, or code changes do not silently break existing verification guarantees.
How Regression Testing Works
The regression testing system has three layers of protection:
- Semantic validation: Tests must behave according to their naming suffix (
_sat,_unsat,_reject,_noperm) - Golden file comparison: Actual outputs must match stored reference files
- CI enforcement: Failed tests block merges to protected branches
When any layer fails, the system reports the discrepancy and prevents accidental regressions from reaching production.
Test Naming Conventions
Test files encode their expected outcome through suffixes:
| Suffix | Expected Behavior | Regression If... |
|---|---|---|
_sat | Verification succeeds | Verification fails |
_unsat | Verification fails | Verification succeeds |
_reject | Translation rejected | Translation succeeds |
_noperm | Permission error | No permission error |
A test named arithmetic_add_sat.mo must verify successfully. If it fails verification, that is a semantic regression regardless of golden file status.
Detecting Regressions
Run the Viper test suite to check for regressions:
./test-viper.sh
The output shows both semantic and golden file status:
=== level1 ===
✅ [m] arithmetic_add_sat.mo - verified OK
✅ [m] array_bounds_unsat.mo - postcondition might not hold
❌ [!] changed_test_sat.mo - Postcondition might not hold
✅ [ ] new_test_sat.mo - verified OK
Understanding the Output
Each line has two status indicators:
| Indicator | Position | Meaning |
|---|---|---|
✅ / ❌ | First | Semantic status (matches expected outcome) |
[m] / [!] / [ ] | Second | Golden file status (matches stored reference) |
Regression scenarios:
❌ [m]- Semantic failure, golden matches: The test now produces wrong outcome but identical output. This indicates a logic bug.❌ [!]- Semantic failure, golden mismatch: The test outcome and output both changed. Investigate the cause.✅ [!]- Semantic pass, golden mismatch: Output changed but test still passes. This may be intentional or an early warning.
Investigating Regressions
When you see a mismatch, compare the actual output against the golden:
# Run a specific test
cd test/viper/level1
../../run.sh -v my_test_sat.mo
# Compare outputs
diff ok/my_test_sat.vpr _out/my_test_sat.vpr
diff ok/my_test_sat.silicon _out/my_test_sat.silicon
Common Regression Causes
Translator changes: Modifications to src/viper/*.ml that change Viper code generation. Use ./test-viper.sh -a for the full Viper translator lane when accepting intentional output changes.
Solver updates: New Silicon or Z3 versions that produce different outputs or counterexamples.
Prelude changes: Modifications to the verification prelude affecting all tests.
Type system changes: Changes to Sector9 type checking that affect translation.
Intentional vs Accidental Changes
Not all mismatches are regressions. Some changes are intentional:
Intentional changes (accept the new golden):
- Bug fixes that change Viper output
- Improved error messages
- Translation optimizations
- New feature implementations
Accidental regressions (do not accept, fix the bug):
- Semantic failures (
❌indicator) - Unexpected output changes
- Changes unrelated to your modifications
Decision Workflow
1. Run tests, see mismatch
2. Check semantic status (✅ or ❌)
- If ❌: This is a regression. Fix the bug.
- If ✅: Continue to step 3.
3. Review the diff
- Expected based on your changes? Accept.
- Unexpected? Investigate before accepting.
Accepting Changes
When a change is intentional, update the golden files:
# Update a single test
./update_goldens.sh test/viper/level1/my_test_sat.mo
# Update all tests in a directory
cd test/viper/level1
../../run.sh -a -v *.mo
The -a flag copies actual outputs to the ok/ directory as new reference files.
Adding Regression Tests
When you fix a bug, add a test that would have caught it:
- Create a test file that reproduces the bug:
// test/viper/bugs/issue_123_field_shadow_sat.mo
// Regression test for issue #123: parameter shadowing actor field
persistent actor {
var x : Int = 0;
public func buggy(x : Int) : async Int
ensures result == x + 1;
{
x + 1
};
}
- Run the test to verify it passes with the fix:
./test-viper.sh
# Output: ✅ [ ] issue_123_field_shadow_sat.mo - verified OK
- Accept the golden files:
./update_goldens.sh test/viper/bugs/issue_123_field_shadow_sat.mo
- Commit both the test and its goldens:
git add test/viper/bugs/issue_123_field_shadow_sat.mo
git add test/viper/bugs/ok/issue_123_field_shadow_sat.*
git commit -m "Add regression test for issue #123"
The bugs/ Directory
The test/viper/bugs/ directory contains regression tests for specific bugs and edge cases. Tests here are named descriptively:
test/viper/bugs/
├── arrays_test02_array_update_sat.mo
├── contracts_bug01_param_field_shadow_sat.mo
├── effects_test29_result_field_name_bug_sat.mo
├── ghost_test13_loop_inv_order_bug_sat.mo
└── ok/
└── ... (golden files)
CI Integration
The test suite runs in CI and blocks merges on failures:
# Example GitHub Actions workflow
name: Tests
on: [push, pull_request]
jobs:
verify:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- name: Run Viper tests
run: ./test-viper.sh
env:
JOBS: 4
VERIFY_TIMEOUT_MS: 120000
XDG_CACHE_HOME: /tmp/sector9
Exit Codes
The test runner returns non-zero exit codes on failures:
| Exit Code | Meaning |
|---|---|
0 | All tests pass |
1 | One or more tests failed |
CI systems detect the non-zero exit and fail the build.
Environment Variables
Control test behavior with environment variables:
| Variable | Default | Purpose |
|---|---|---|
JOBS | 12 | Parallel test jobs |
VERIFY_TIMEOUT_MS | 120000 | Per-test verification timeout |
SLOW_VERIFY_TIMEOUT_MS | 300000 | Per-test timeout for files marked // SLOW |
SLOW_TEST_THRESHOLD | 60 | Seconds before test marked slow |
XDG_CACHE_HOME | /tmp/sector9 | Cache directory |
JOBS=4 VERIFY_TIMEOUT_MS=300000 ./test-viper.sh
Output Normalization
Golden files are normalized to prevent false positives from environment differences:
- Memory addresses replaced with placeholders
- Version strings removed
- Timestamps stripped
- Stack traces removed
- Temporary paths normalized
This normalization happens automatically during comparison, ensuring tests pass consistently across different machines.
Best Practices
Run tests before committing: Catch regressions early. Use the focused runner
for the code you touched: ./test-viper.sh -a for Viper translator changes,
./test-mo.sh for language/frontend changes, and ./test-pic.sh for runtime
.pic.ts checks.
Review diffs before accepting: Never blindly accept golden changes.
Add regression tests for bugs: Prevent the same bug from recurring.
Keep tests focused: One concept per test file.
Use descriptive names: issue_123_field_shadow_sat.mo is better than test1_sat.mo.
Commit goldens with tests: Keep tests and their expected outputs together.
Summary
- Regression tests compare actual outputs against stored golden files
- Semantic validation ensures tests behave according to their suffix
- Run
./test-viper.shto check for regressions - Use
./update_goldens.shto accept intentional changes - Add regression tests to
test/viper/bugs/for fixed bugs - CI blocks merges when tests fail
- Review diffs carefully before accepting changes
- Use
./test-viper.sh -a --no-cache-for <test>orscripts/clear-vcache-for-test.shwhen a single cached verifier result looks stale