Skip to main content
When multiple users — or multiple CI jobs — share a Lager Box, locks prevent two callers from clobbering each other. Lager provides two locking mechanisms:
  1. Automatic test / admin locklager python and the box-mutating admin commands (lager install, lager uninstall, lager update, lager install-wheel) reserve the box for the lifetime of the command.
  2. User locklager boxes lock explicitly reserves a box until you unlock it.

Automatic test lock

Every lager python <runnable> invocation automatically acquires the box lock at start and releases it at end. This includes failures, Ctrl+C, crashes, and signal-killed runs — the lock is released through a finally block, a signal handler, an atexit net, and (worst case) a server-side TTL reap.

Which commands auto-lock

CommandLock windowWhy
lager pythonFull test run (acquire → heartbeat → release)Canonical test runner.
lager installThe setup_and_deploy_box.sh step (the part that restarts the container)Container restart mid-test would kill the test outright.
lager uninstallContainer teardown, image wipe, ~/box and /etc/lager removalSame — destructive on-box mutation.
lager updateContainer stop → image rebuild → restart → health checkContainer restart is the test-clobbering action. Read-only probe / fetch are deliberately outside the lock.
lager install-wheelThe pip install invocation inside the containerpip install mutates the container’s Python environment; a concurrent test could race on imports.
Read-only commands (lager hello, lager boxes list, lager boxes lock / unlock itself, status / dry-run paths, etc.) do not acquire the auto-lock. Note: this is intentionally narrower than the v0.12–0.13.3 behavior, which slapped a --force-command-overridable lock on every single CLI command. See Backward compatibility below for the v0.13.4 history. The lock identity is CI-aware so concurrent test runs in CI mutually exclude correctly. Holder formats:
EnvironmentHolder string
Dev (your machine)OS user (same as lager defaults --user)
GitHub Actionsci:github:<repo>#<run>-<attempt>/<job>@<runner>:<pid>
Droneci:drone:<repo>#<build>:<pid>@<host>
GitLab CIci:gitlab:<project>#<pipeline>/<job>:<pid>@<host>
Bitbucket Pipelinesci:bitbucket:<repo>#<build>:<pid>@<host>
Jenkinsci:jenkins:<tag>:<pid>@<host>
Generic CI fallbackci:generic:<host>:<pid>
The :pid (and @runner / @host) suffix guarantees that two parallel matrix items in the same workflow run get distinct holder strings.

Collision behavior

When lager python tries to acquire a lock that another holder owns:
  • On dev: prints an error and exits 1 immediately (no waiting).
  • In CI: waits up to LAGER_LOCK_WAIT seconds (default 1800, i.e. 30 min), polling every 2s, and only fails if the wait elapses. This lets matrix jobs queue against the same self-hosted box.
If you have already lager boxes locked the box as yourself before running lager python, the CLI sees the lock as already-ours and does not release it on exit — your explicit reservation survives the test.

TTL & heartbeat

Each test lock is written with ttl_seconds: 1800 and refreshed every 60 seconds by a background heartbeat thread inside the CLI. The TTL is not a cap on test runtime — as long as the heartbeat keeps refreshing last_heartbeat, the lock stays valid indefinitely. What the TTL actually bounds is the worst-case stale-lock dwell time after a CLI crash. If your laptop loses network or the CI runner is hard-killed, the box reaps the lock once last_heartbeat + ttl_seconds falls in the past, so another caller waits at most one TTL.

--detach keeps the lock

lager python script.py --detach acquires the lock with ttl_seconds: null (no auto-expiry) because the heartbeat thread dies with the CLI. The detached script keeps running on the box, but the lock must be released manually:
lager python long_test.py --box lab-box --detach
# Box 'lab-box' locked for detached run; release with: lager boxes unlock --box lab-box

# ... later, after the script finishes on the box:
lager boxes unlock --box lab-box

Escape hatches

Env varEffect
LAGER_AUTO_LOCK_DISABLE=1Skip auto-lock entirely. The command still checks for someone else’s user lock but does not acquire.
LAGER_LOCK_WAIT=<seconds>Override collision wait time. 0 = fail-fast (dev default), large value = patient queue (CI default).
LAGER_LOCK_HOLDER=<string>Override the holder identity. Useful when you intentionally want two jobs to share a single reservation.
LAGER_LOCK_TTL=<seconds>Override the TTL the CLI writes. LAGER_LOCK_TTL=none = eternal (caller must lager boxes unlock).
LAGER_LOCK_HEARTBEAT=<sec>Override the heartbeat refresh interval (default 60s).

User lock

A user lock is an explicit, persistent reservation you place on a box. Unlike the automatic test lock, user locks never expire — you must manually unlock when you’re done. Use cases:
  • Reserving a box for an extended debugging session.
  • Preventing others from using a box during maintenance.
  • Claiming a box when you’re not actively running a command.

lager boxes lock

lager boxes lock --box NAME
Options:
  • --box (required) — name of the box to lock.
  • --user — username to lock as (useful when running inside Docker where the user would otherwise be root).
Example:
lager boxes lock --box lab-box

# Output:
Box 'lab-box' is locked by alice
If the box is already locked by another user:
Error: Box 'lab-box' is already locked by bob (since 2026-03-20T13:00:00Z)

lager boxes unlock

lager boxes unlock --box NAME [--force]
Options:
  • --box (required) — name of the box to unlock.
  • --force — force unlock even if the box was locked by another user (use this to clear a stale lager boxes lock left by a teammate).
Examples:
# Unlock your own lock
lager boxes unlock --box lab-box

# Force unlock a box locked by someone else
lager boxes unlock --box lab-box --force

Management operations skip the lock

The following sub-commands of lager python are management operations on already-running processes and intentionally skip both lock checks and auto-acquire:
  • lager python --kill <ID>
  • lager python --kill-all
  • lager python --reattach <ID>
  • lager python --continue <ID>
  • lager python --console <ID>
This is what lets you Ctrl+C a hung detached script and immediately --kill it without first having to fight an unrelated user lock.

lager boxes shows lock holders

When boxes are locked, lager boxes shows an extra column:
 name        ip               version   status    locked by
=====================================================================
 lab-box-1   100.x.x.1        0.24.0    current   alice
 lab-box-2   100.x.x.2        0.24.0    current   github lager run 9182 job test on runner-3
 lab-box-3   100.x.x.3        0.24.0    current
CI holders are formatted human-readably (e.g. github lager run 9182 job test on runner-3) rather than printed as raw colon-delimited strings.

CI workflow example

The always-on auto-lock + CI auto-wait combination means a CI matrix job needs no special invocation:
# .github/workflows/integration-tests.yml
jobs:
  hardware-tests:
    strategy:
      matrix:
        suite: [power, communication, debug]
    runs-on: [self-hosted, lager-bench]
    steps:
      - uses: actions/checkout@v4
      - run: pip install lager-cli
      - run: lager python test/api/${{ matrix.suite }} --box lab-box
The three matrix items each get a unique holder (...GITHUB_JOB=hardware-tests/<runner>:<pid> differs per item), POST /lock, and whichever loses the race waits up to 30 minutes for the winner to finish before retrying. No lager boxes lock call needed.

Backward compatibility

  • lager boxes lock and lager boxes unlock behave exactly as before. The CLI now sends holder_type: "user" + ttl_seconds: null on the wire, but legacy clients (e.g. older CLIs against the new box server) get the same eternal-lock behavior automatically because the server treats a payload with neither field as legacy and applies the same defaults.
  • _check_box_lock (the read-only lock check that already gates every command in resolve_and_validate_box) is unchanged.

How this differs from v0.13.0 – v0.13.3 (removed in v0.13.4)

v0.13.0 added an ephemeral “command-in-progress” lock that fired on every CLI command via a shared decorator, gated by a --force-command flag. v0.13.4 removed it because three corner cases were unfixable in that design:
v0.13.4 corner caseHow this PR avoids it
”Supply commands never released the lock”The auto-lock is only attached to 5 commands (python, install, uninstall, update, install-wheel), not every CLI surface. Supply commands etc. don’t touch the lock — no decorator-on-everything to leak from.
”Long-running commands blocked all other commands on the same box”Only those 5 commands check the lock; status / list / read-only paths are unaffected. For genuine concurrent test runs, dev gets fail-fast in <5s and CI gets a queue (default 60s, configurable). That’s the desired policy.
”Detached processes left stale locks”--detach is opt-in for a long-lived hold (ttl_seconds: null is intentional). Non-detached runs have heartbeat + TTL reap, so an abnormal CLI exit self-recovers in ≤ TTL + grace.
--force-command is gone. Collision policy is structured (fail-fast in dev, queue in CI) and the existing lager boxes lock --force is the escape hatch when you genuinely need to override.