Pynenc models every task invocation as a finite-state machine. The file
pynenc/invocation/status.py
declares the invocation states, the allowed transitions between them, ownership
rules, and dedicated recovery states.
That is more structure than a simple pending/running/done task table. The
extra machinery is there for a concrete reason: pynenc needs the same task
lifecycle to work across in-memory tests, SQLite, Redis, MongoDB, and other
backends without making correctness depend on one storage engine’s locking
model.
The diagram below is a dated copy of the state graph generated from the
implementation. The current generated SVG in the core repository lives at
docs/_static/invocation_state_machine.svg.
With pynenc 0.2.3 installed, the diagram can be regenerated from the live code:
pynenc status render --format svg
This article uses that graph to explain two design choices:
- What does the state machine buy us?
- Why do
PENDINGandPENDING_RECOVERYexist as separate states instead of onePENDINGstatus with arecoveringflag?
The short answer: the state machine is a feature that follows from pynenc’s backend portability goal. The recovery states are where that design becomes visible.
The states, briefly
The status graph starts when a task call is registered, moves through runner
ownership while it is being executed, and ends in a final status such as
SUCCESS, FAILED, or CONCURRENCY_CONTROLLED_FINAL.
The main flow is familiar:
START
-> REGISTERED
-> PENDING
-> RUNNING
-> SUCCESS
-> FAILED
-> RETRY -> PENDING
The rest of the graph handles real worker-pool behavior:
REGISTERED
-> CONCURRENCY_CONTROLLED -> REROUTED
-> CONCURRENCY_CONTROLLED_FINAL
PENDING
-> PENDING_RECOVERY -> REROUTED
-> REROUTED
-> KILLED -> REROUTED
RUNNING
-> PAUSED -> RESUMED
-> RUNNING_RECOVERY -> REROUTED
-> KILLED -> REROUTED
RESUMED
-> PAUSED
-> RETRY -> PENDING
-> SUCCESS
-> FAILED
Some names look redundant at first: REROUTED, PENDING_RECOVERY, RUNNING_RECOVERY, and the two concurrency-control states. They exist because
they have different transition or ownership rules. In pynenc, a different rule
gets a different state.
CONCURRENCY_CONTROLLED is temporary: the invocation was blocked by a concurrency rule and will be rerouted to try again. CONCURRENCY_CONTROLLED_FINAL
is terminal: the invocation was blocked and configured to not retry — its caller receives a final status with no result stored in the state backend.
Why the state machine exists
Pynenc is designed to run the same task code on different backends:
- in-memory backends for unit tests,
- SQLite for single-host services and local runs,
- Redis or MongoDB plugins,
- RabbitMQ on the broker side,
- and future plugin backends.
A framework tied to one database can design around that database’s strongest primitive. A backend-agnostic framework has to assume less. Some backends have transactions. Some have atomic updates for specific data shapes. Some are useful mainly for local development.
Pynenc cannot make correctness depend on every backend exposing the same lock or transaction model. Instead, it pushes the lifecycle rules into the framework: the legal states, ownership requirements, and recovery transitions are declared in one table.
That table is _CONFIG in
status.py.
Every state change goes through validate_transition, validate_ownership, and
compute_new_owner. The backend stores status records; the framework decides
whether the next status is allowed.
That is the tradeoff. The backend contract stays small, and the lifecycle stays consistent while the storage implementation changes underneath it.
Why PENDING_RECOVERY is a state
Consider one invocation:
- A runner picks it up from the broker. The status moves to
PENDING, and the runner becomes the owner. - The runner either starts executing it, moving it to
RUNNING, or disappears before execution starts.
In the second case, the invocation is stuck in PENDING with an owner that no
longer exists. Other runners are not allowed to touch it, because PENDING
requires ownership validation.
One possible design is a flag on the status record:
status = PENDING
recovering = true
Pynenc does not do that. It uses a separate state:
InvocationStatus.PENDING_RECOVERY: StatusDefinition(
allowed_transitions=frozenset({InvocationStatus.REROUTED}),
releases_ownership=True,
overrides_ownership=True,
)
Those three properties are the point.
overrides_ownership=True means recovery is the only path where a non-owner can
touch the invocation. If a runner breaks ownership, the status history says so.
allowed_transitions={REROUTED} means recovery does not jump straight to
execution. The invocation re-enters the normal queue path, then a healthy runner
can pick it up.
releases_ownership=True removes the dead owner from the record, making the
invocation claimable again.
The same pattern exists for RUNNING_RECOVERY, which handles invocations whose
owner disappeared during execution.
Why not a flag?
A flag looks smaller, but it spreads the rule across the system.
Every code path that reads PENDING would also need to ask whether recovery is
active. Every backend would need to serialize the flag. Every test around
PENDING would need the extra matrix case. The lifecycle would be partly in the
status and partly in a boolean field.
As a state, recovery is handled by the same mechanism as everything else. The allowed-transition table answers the question in one place.
It also gives operators a better audit trail. This sequence tells a clear story:
REGISTERED
-> PENDING(owner=A)
-> PENDING_RECOVERY(no owner)
-> REROUTED
-> PENDING(owner=B)
-> RUNNING(owner=B)
-> SUCCESS
The history says that runner A stopped responding, recovery released the
owner, and runner B completed the work. That is much easier to inspect than a
status row where a boolean quietly changed at some point.
What the state machine costs
This is not free.
There is more code than a flag: one StatusDefinition per status, transition
sets, ownership rules, and tests around invalid state changes. There is also
more terminology for users to learn. The first time somebody sees
RUNNING_RECOVERY in a timeline, they need to know that it means a runner died
and another service noticed.
The benefit is specific: the correctness of the task lifecycle does not depend on every backend exposing the same primitives. The framework has one source of truth for transitions, and the documentation diagram is generated from that source.
Takeaways
Strict state machines are useful when a system needs to run on more than one backend. They keep the backend interface small and put lifecycle rules in a place that can be tested directly.
Special cases deserve first-class states when they have different rules.
PENDING_RECOVERY is not just PENDING with another field. It can override
ownership, release ownership, and only transition back through REROUTED. That
is a real state.
The audit trail is part of the API. Every recovery state in the timeline is one more fact an operator can inspect when something goes wrong.
Pynenc is on PyPI as pynenc, and the
source is at github.com/pynenc/pynenc.
The status configuration discussed here lives in
pynenc/invocation/status.py.