Pynenc models every task invocation as a finite-state machine. The file
pynenc/invocation/status.py
declares the invocation states, the allowed transitions between them, ownership
rules, and dedicated recovery states.
That is more structure than a simple pending/running/done task table. The
extra machinery is there for a concrete reason: pynenc needs the same task
lifecycle to work across in-memory tests, SQLite, Redis, MongoDB, and other
backends without making correctness depend on one storage engine’s locking
model.
The interesting design choice is not just the diagram. It is the combination of
status and ownership. PENDING is not the same as RUNNING: it marks the time
after a runner has picked an invocation from the broker, but before task code
starts executing. That small gap is where backend and broker races tend to show
up, so pynenc gives it a state of its own.
The diagram below is a dated copy of the state graph generated from the
implementation. The current generated SVG in the core repository lives at
docs/_static/invocation_state_machine.svg.
With pynenc 0.2.3 installed, the diagram can be regenerated from the live code:
pynenc status render --format svg
This article uses that graph to explain three design choices:
- What does the state machine buy us?
- Why is
PENDINGa separate owned state beforeRUNNING? - Why does recovery get explicit states such as
PENDING_RECOVERYandRUNNING_RECOVERYinstead of being hidden insidePENDINGorRUNNING?
The short answer: the state machine is a feature that follows from pynenc’s backend portability goal. The ownership states are where that design becomes visible.
The states, briefly
The status graph starts when a task call is registered, moves through runner
ownership while it is being executed, and ends in a final status such as
SUCCESS, FAILED, or CONCURRENCY_CONTROLLED_FINAL.
The main flow is familiar:
START
-> REGISTERED
-> PENDING
-> RUNNING
-> SUCCESS
-> FAILED
-> RETRY -> PENDING
The rest of the graph handles real worker-pool behavior:
REGISTERED
-> CONCURRENCY_CONTROLLED -> REROUTED
-> CONCURRENCY_CONTROLLED_FINAL
PENDING
-> PENDING_RECOVERY -> REROUTED
-> REROUTED
-> KILLED -> REROUTED
RUNNING
-> SUCCESS
-> FAILED
-> RETRY -> PENDING
-> PAUSED -> RUNNING
-> RUNNING_RECOVERY -> REROUTED
-> KILLED -> REROUTED
PAUSED is the runner-owned state used while an invocation is waiting on a
sub-invocation result; once the dependency resolves, the process gets SIGCONT
and the status goes back to RUNNING.
Some names look redundant at first: REROUTED, PENDING_RECOVERY, RUNNING_RECOVERY, and the two concurrency-control states. They exist because
they have different transition or ownership rules. In pynenc, a different rule
gets a different state.
CONCURRENCY_CONTROLLED is temporary: the invocation was blocked by a concurrency rule and will be rerouted to try again. CONCURRENCY_CONTROLLED_FINAL
is terminal: the invocation was blocked and configured to not retry — its caller receives a final status with no result stored in the state backend.
Why the state machine exists
Pynenc is designed to run the same task code on different backends:
- in-memory backends for unit tests,
- SQLite for single-host services and local runs,
- Redis or MongoDB plugins,
- RabbitMQ on the broker side,
- and future plugin backends.
A framework tied to one database can design around that database’s strongest primitive. A backend-agnostic framework has to assume less. Some backends have transactions. Some have atomic updates for specific data shapes. Some are useful mainly for local development.
Pynenc cannot make correctness depend on every backend exposing the same lock or transaction model. Instead, it pushes the lifecycle rules into the framework: the legal states, ownership requirements, and recovery transitions are declared in one table.
That table is _CONFIG in
status.py.
Every state change goes through validate_transition, validate_ownership, and
compute_new_owner. The backend stores status records; the framework decides
whether the next status is allowed.
That is the tradeoff. The backend contract stays small, and the lifecycle stays consistent while the storage implementation changes underneath it.
The backend is still expected to make status transitions atomic. The framework cannot turn a broken storage implementation into a correct distributed lock. But it can make each important moment explicit enough to validate, log, recover, and debug.
Why PENDING exists before RUNNING
When a runner takes work from the broker, pynenc does not immediately say the
task is running. It first moves the invocation to PENDING and records the
runner as the owner.
That state means: this runner has claimed the invocation, but task code has not started yet.
That distinction matters because brokers and storage backends are plugins. A backend can be strong enough to make claim operations atomic, or weak enough that two runners may briefly believe they can claim the same invocation. Pynenc does not pretend that uncertainty disappears. It records the claim as a state transition and gives it ownership rules.
PENDING has two properties that make the design work:
InvocationStatus.PENDING: StatusDefinition(
allowed_transitions=frozenset({
InvocationStatus.RUNNING,
InvocationStatus.KILLED,
InvocationStatus.REROUTED,
InvocationStatus.PENDING_RECOVERY,
}),
requires_ownership=True,
acquires_ownership=True,
)
acquires_ownership=True means entering PENDING records which runner claimed
the invocation. requires_ownership=True means later transitions out of
PENDING must come from that same runner, unless the next state is an explicit
recovery state that overrides ownership.
So the path to execution has two checks:
- The runner retrieves an invocation and tries to move it to
PENDING. - Before task code runs, that same runner tries to move it from
PENDINGtoRUNNING.
If another runner already moved the invocation forward, the second transition is not valid anymore. If another runner tries to move an invocation it does not own, ownership validation rejects it. That does not remove every possible bug in an uncertain plugin system, but it makes the race visible at a precise point: claiming the invocation, starting execution, or recovering a dead owner.
This is the practical reason for having a richer status system. The state machine is not only documentation. It gives pynenc a place to attach ownership rules, concurrency checks, history records, and recovery behavior while still supporting backends with different guarantees.
Why PENDING_RECOVERY is a state
Consider one invocation:
- A runner picks it up from the broker. The status moves to
PENDING, and the runner becomes the owner. - The runner either starts executing it, moving it to
RUNNING, or disappears before execution starts.
In the second case, the invocation is stuck in PENDING with an owner that no
longer exists. Other runners are not allowed to touch it, because PENDING
requires ownership validation.
At that point recovery needs different rules from normal pending execution. A non-owner must be allowed to intervene, the dead owner must be released, and the invocation should return to the queue path rather than jump straight to execution.
Pynenc makes those different rules explicit with a separate state:
InvocationStatus.PENDING_RECOVERY: StatusDefinition(
allowed_transitions=frozenset({InvocationStatus.REROUTED}),
releases_ownership=True,
overrides_ownership=True,
)
Those three properties are the point.
overrides_ownership=True means recovery is the only path where a non-owner can
touch the invocation. If a runner breaks ownership, the status history says so.
allowed_transitions={REROUTED} means recovery does not jump straight to
execution. The invocation re-enters the normal queue path, then a healthy runner
can pick it up.
releases_ownership=True removes the dead owner from the record, making the
invocation claimable again.
The same pattern exists for RUNNING_RECOVERY, which handles invocations whose
owner disappeared during execution.
Why make recovery explicit?
Keeping recovery inside the normal PENDING or RUNNING path would make those
states carry two different meanings: ordinary execution, and recovery from a
dead owner.
Those meanings have different ownership rules. Ordinary PENDING is owned by a
runner, so other runners should not transition it. PENDING_RECOVERY is the
controlled exception: a recovery service can override ownership, release the
dead owner, and reroute the invocation.
As a state, recovery is handled by the same mechanism as everything else. The allowed-transition table answers the question in one place, and tests can assert the recovery path directly.
It also gives operators a better audit trail. This sequence tells a clear story:
REGISTERED
-> PENDING(owner=A)
-> PENDING_RECOVERY(no owner)
-> REROUTED
-> PENDING(owner=B)
-> RUNNING(owner=B)
-> SUCCESS
The history says that runner A stopped responding, recovery released the
owner, and runner B completed the work. That is much easier to inspect than a
status row where recovery was handled as hidden metadata or special-case code.
What the state machine costs
This is not free.
There is more code than a smaller status list: one StatusDefinition per
status, transition sets, ownership rules, and tests around invalid state
changes. There is also more terminology for users to learn. The first time
somebody sees RUNNING_RECOVERY in a timeline, they need to know that it means
a runner died and another service noticed.
The benefit is specific: the correctness of the task lifecycle does not depend on every backend exposing the same primitives. The framework has one source of truth for transitions, and the documentation diagram is generated from that source.
Takeaways
Strict state machines are useful when a system needs to run on more than one backend. They keep the backend interface small and put lifecycle rules in a place that can be tested directly.
Special cases deserve first-class states when they have different rules.
PENDING_RECOVERY is not ordinary PENDING. It can override ownership,
release ownership, and only transition back through REROUTED. That is a real
state.
The audit trail is part of the API. Every recovery state in the timeline is one more fact an operator can inspect when something goes wrong.
Pynenc is on PyPI as pynenc, and the
source is at github.com/pynenc/pynenc.
The status configuration discussed here lives in
pynenc/invocation/status.py.