Skip to main content

Emergency Playbook

This document provides response procedures for emergency scenarios. It is intended for guardians, governance operators, and anyone monitoring the protocol.

Trigger / response matrix

TriggerSeverityImmediate actionInvestigationRecovery
Rate drop breaker firesHighProtocol auto-pauses. No action needed.Check Aztec rollup for slashing events. Run refreshAttesterState() for affected attesters.Verify exchange rate is accurate after accounting. Guardian unpauses SafetyModule, then Core and Vault.
Queue ratio breaker firesMediumProtocol auto-pauses.Check if a large withdrawal request caused the spike or if total assets dropped (slashing).If queue pressure is legitimate, unpause and let rebalance unstake to cover. If caused by a bug, investigate before unpausing.
Accounting staleness breaker firesMediumProtocol auto-pauses.Check why updateAccounting() hasn't been called. Possible causes: rebalance stuck mid-cycle, gas prices too high, no callers.Call updateAccounting() (permissionless). If rebalance is stuck, guardian calls forceRebalanceReset() first. Then unpause.
Suspected exploit in progressCriticalGuardian calls emergencyPauseAll() on OllaGovernance (pauses Core + Vault in one tx).Assess scope: which contracts are affected, what funds are at risk, is the attacker still active.Do not unpause until root cause is identified. Prepare upgrade if contract fix is needed.
Aztec rollup incidentHighGuardian pauses Core and Vault.Monitor Aztec status channels. Check attester states via refreshAttesterState().Wait for rollup recovery. Verify staked balances match expectations. Unpause when rollup is stable.
Rebalance stuck mid-cycleLowGuardian calls forceRebalanceReset().Check which step failed and why (gas, external call revert).Reset clears the state machine. Cooldown restarts. Next rebalance will retry. No funds are lost.
Key compromise (guardian)HighGovernance revokes GUARDIAN_ROLE from compromised key and grants to new address (timelocked).Assess if the compromised guardian performed any malicious pauses or resets.Guardian can only pause — no fund loss possible. But availability may be disrupted if attacker keeps pausing.
Key compromise (governance)CriticalCommunity alert. No on-chain mitigation if governance multisig is fully compromised.Governance controls upgrades and fee parameters. Full compromise means full protocol risk.This is a catastrophic scenario. Timelock delay is the only mitigation — users have a window to exit before malicious proposals execute.

Response procedures

Circuit breaker triggered

When any circuit breaker fires, the SafetyModule emits CircuitBreakerTriggered(reason) with one of:

  • BreakerReason.RateDrop
  • BreakerReason.QueueRatio
  • BreakerReason.AccountingStale

Step 1: Identify the cause

# Check which breaker fired (look for CircuitBreakerTriggered events)
cast logs --from-block <block> --address <SafetyModule> "CircuitBreakerTriggered(uint8)"

Step 2: Investigate

For rate drop: Check if attesters were slashed on the rollup. Call refreshAttesterState() with affected attester addresses to update the protocol's view of staked balances.

For queue ratio: Check OllaVault.pendingWithdrawalAssets() vs OllaCore.totalAssets(). Determine if the ratio is transient (large single withdrawal) or systemic.

For accounting staleness: Check lastAccountingTimestamp on the SafetyModule. If rebalance is stuck (step != Done), use forceRebalanceReset() first, then call updateAccounting().

Step 3: Verify state before unpausing

Before unpausing, confirm:

  • Exchange rate reflects current on-chain reality.
  • No ongoing exploit or attack.
  • Attester states are up-to-date (refreshAttesterState called for all active attesters).
  • Accounting has been updated.

Step 4: Unpause

Unpause in order:

  1. SafetyModule.unpause() (guardian)
  2. OllaCore.unpause() (guardian)
  3. OllaVault.unpause() (guardian)

Or use OllaGovernance.emergencyUnpauseAll() (governance admin) to unpause Core and Vault in one transaction. SafetyModule must be unpaused separately.

Suspected exploit

Step 1: Pause immediately

# Governance admin (fastest, pauses Core + Vault in one tx)
cast send <OllaGovernance> "emergencyPauseAll()" --private-key <gov_key>

# Also pause SafetyModule separately
cast send <SafetyModule> "pause()" --private-key <guardian_key>

Step 2: Assess damage

  • Check token balances of all protocol contracts.
  • Check for unexpected Transfer, Approval, or role change events.
  • Verify proxy implementations haven't been changed.
  • Check if any governance proposals are pending in the timelock.

Step 3: Prepare response

If a contract upgrade is needed:

  1. Develop and test the fix.
  2. Deploy new implementation.
  3. Schedule upgrade through governance timelock.
  4. Wait for timelock delay.
  5. Execute upgrade.
  6. Verify fix.
  7. Unpause.

Force rebalance reset

Use when a rebalance cycle is stuck (e.g., an external call reverts repeatedly).

cast send <OllaCore> "forceRebalanceReset()" --private-key <guardian_key>

What this does:

  • Resets the rebalance state machine to Done.
  • Sets lastRebalanceTimestamp to current time (enforces cooldown before next cycle).

What is NOT lost:

  • Unharvested rewards remain on the rollup. The next cycle's harvest step will claim them.
  • Partial unstakes are tracked on-chain by the Aztec rollup. refreshAttesterState() will pick them up.
  • Queued withdrawal requests remain in OllaVault storage and are picked up by the next rebalance.

What is discarded:

  • Any in-progress step computations (e.g., partially calculated stake/unstake amounts).

Monitoring recommendations

Events to watch

EventContractIndicates
CircuitBreakerTriggered(reason)SafetyModuleAutomatic pause. Investigate immediately.
Paused() / Unpaused()OllaCore, OllaVault, SafetyModuleManual or automatic pause / resume.
RebalanceReset()OllaCoreGuardian reset a stuck rebalance.
AccountingUpdated(...)OllaCoreSuccessful accounting cycle.
NegativeRewardsPeriod(grossRewardsSigned)OllaCoreSlashing exceeded rewards in a reporting window.
WithdrawalAdjusted(id, original, adjusted)OllaVaultA queued redemption was paid out below assetsExpected because slashing reduced the rate after the request.
FeesMinted(treasuryShares, providerShares)OllaVaultProtocol fees distributed.
Upgraded(implementation) (ERC-1967)Any UUPS proxyA governance upgrade landed. The Butler logs every Upgraded event in the governance log.

Health checks

Run periodically to detect issues before they trigger breakers:

  1. Accounting freshness: Time since last AccountingUpdated event. The Butler exposes olla_butler_accounting_staleness_seconds and alerts on > 24h.
  2. Queue pressure: OllaVault.pendingWithdrawalAssets() / OllaCore.totalAssets(). Compare against maxQueueRatioBps.
  3. Buffer level: Track OllaVault.bufferedAssets() against OllaVault.pendingWithdrawalAssets(). The Butler exposes olla_butler_buffer_utilization_pct and warns below 20%.
  4. Rebalance state: olla_butler_rebalance_overdue is set to 1 when the on-chain cooldown has elapsed but the state machine is not at Done for 10+ minutes (typical signal of a stuck cycle).
  5. Attester health: olla_butler_attester_slashing_loss, olla_butler_rollup_attester_zombie_count, olla_butler_attester_cached_vs_rollup_drift, and olla_butler_attester_refresh_needed_count cover the common failure modes. The full alerting ruleset lives in the Butler repo's monitoring/alerts.yml.
  6. Recent activity: hit the Butler's /events (operational) and /governance (config / upgrade / pause) endpoints for a quick human-readable view of what's happened recently.