API reference¶

Generated from the docstrings of the public retina package.

`retina` ¶

Retina — turn camera streams into event streams.

A small, model-agnostic, hardware-neutral library for the Signal -> Event layer: one level above object detection (Supervision gives you boxes; Retina gives you "person entered the dock and dwelled 31s"), and one level below domain judgment.

Quickstart (3 lines, any model):

from retina import Retina, Zone, ZoneRule, YoloDetector
from retina.sources import video_frames

dock = Zone("dock", [(0.1, 0.1), (0.9, 0.1), (0.9, 0.9), (0.1, 0.9)], normalized=True)
cam = Retina(
    source_id="cam_01",
    detector=YoloDetector("yolo11n.pt", classes={"person"}),
    rules=[ZoneRule(dock, classes={"person"}, dwell_s=30)],
)
for event in cam.run(video_frames("dock.mp4")):
    print(event.to_json())

Compose models like n8n / LCEL (no GUI):

pipe = YoloDetector("yolo11n.pt") | IoUTracker() | ZoneRule(dock) | JsonlSink("e.jsonl")

`CallableDetector` ¶

Bases: Pipeable

Wrap a plain function as a Detector, optionally filtering classes / confidence. Lets you plug any model in one line.

`CountRule` ¶

Bases: _RuleBase

count.threshold when the number of tracked objects (optionally inside a zone / of given classes) crosses threshold. Edge-triggered: fires once when the predicate flips true, re-arms when it goes false.

`Detection` `dataclass` ¶

One object found in one frame.

`from_supervision(detections, class_names=None)` `classmethod` ¶

Ingest a Roboflow Supervision sv.Detections → list[Detection].

Supervision is the de-facto interop format ~20+ CV libraries convert into, so anyone already using it pipes straight into Retina's event layer. We never import supervision — the object is read by duck-typing: .xyxy (Nx4 [x1,y1,x2,y2]), .confidence (N or None), .class_id (N or None), .data (dict, may hold "class_name").

Label resolution, per row i: prefer data["class_name"][i]; else map class_id[i] through class_names (dict or list); else str(class_id); else "". Missing confidence falls back to the Detection default.

`Detector` ¶

Bases: Protocol

Any object/callable that turns a frame into detections.

A frame is an HxWx3 uint8 numpy array (or whatever your detector accepts — Retina just passes it through).

`DetectorNode` ¶

Bases: Node

Run a detector on the frame image; fill frame.detections.

`DinoV2Embedder` ¶

Bases: Pipeable

Frozen DINOv2 per-object embedder — the first real vec producer.

Callable enricher: for each track it crops frame.image[y1:y2, x1:x2], runs DINOv2 over all crops in one batched forward pass, and attaches the L2-normalized embedding as track.user["vec"] = Vec(...).to_dict(). From there WorldState.from_frame lifts it onto entity.vec.

size picks the backbone: small (dim 384, default), base (768), large (1024). device="auto" selects mps → cuda → cpu. Set bgr=True for OpenCV frames (cv2 is BGR); synthetic / RGB frames keep the default False. Empty or out-of-bounds crops are skipped (clamped to image bounds).

`EnricherNode` ¶

Bases: Node

Run a function on the frame and merge its result into frame.user.

The seam for a VLM describe, a classifier, or a V-JEPA novelty score. fn takes the Frame and returns a dict (merged into frame.user) or any value (stored under key).

`Entity` `dataclass` ¶

One thing present in the scene: a symbolic core (+ optional latent vec).

Two distinct, optional position channels coexist:

bbox — an image-space axis-aligned box in pixels (the vision path: a detector/tracker output).
locus — a metric position in a world/scene coordinate frame (the units and frame are defined by the producer, e.g. metres in a room/map frame). This is the typed home for field / non-bbox signals (CSI, radar, lidar, GPS) whose state is a point in space, not a pixel box. An entity may carry either, both, or neither; locus is not a reprojection of bbox.

`Event` `dataclass` ¶

One thing that happened. Serializes to the minimal JWT-style form.

`to_dict()` ¶

Flat dict, null/empty fields omitted, custom ext merged in.

`EventType` ¶

The closed primitive vocabulary for 0.1 (see SPEC.md).

`Frame` `dataclass` ¶

Append-only enrichment unit flowing through the pipeline.

Stages attach to it: the detector fills detections, the tracker fills tracks, the rules fill events. user is an open extension slot.

`GateNode` ¶

Bases: Node

Drop the frame (skip everything downstream) when the gate says don't look.

`GroundingDinoDetector` ¶

Bases: Pipeable

Open-vocabulary detection from a text prompt via Grounding DINO (HF transformers). pip install 'trio-retina[grounding]'. Detects any classes you name — no training. Heavy (torch); not imported unless instantiated.

`IoUTracker` ¶

Bases: Pipeable

Greedy IoU association — small, deterministic, zero extra deps.

A detection matches the highest-IoU live track of the same class above iou_threshold. Tracks survive max_missed frames of occlusion and become confirmed after min_hits hits (so transient noise never fires events).

`JsonlSink` ¶

Bases: _SinkPipeable

Append events to a JSONL file as they arrive (streaming).

`Line` `dataclass` ¶

A directed tripwire a->b. Crossing direction is reported relative to it.

`scaled(size)` ¶

The (a, b) endpoints in pixel coords; scale once per frame and reuse.

`LineRule` ¶

Bases: _RuleBase

line.cross when a track's centroid crosses the tripwire. dir is a_to_b or b_to_a by which side it moved toward.

Requires tracked input (each track carries an id and prev_centroid), per the standard — line.cross is meaningless without object identity.

min_frames (default 1) is a jitter debounce, like Supervision's LineZone.minimum_crossing_threshold. With min_frames=1 the rule is stateless and emits the instant the prev→curr centroid segment intersects the line (the original behavior). With min_frames > 1, a crossing is pending once the segment intersects, and is confirmed and emitted only after the track has stayed continuously on the new side for min_frames frames (including the crossing frame). If the track returns to the original side before then, the crossing is discarded as jitter and nothing is emitted. The event fires on the frame the crossing is confirmed, carrying the direction of the original crossing (and that frame's t / box).

`MotionGate` ¶

Look only when the frame changed from the previous one (mean abs diff).

`Node` ¶

Bases: Pipeable

A pipeline step: Frame -> Frame (or None to drop the frame).

`NorfairTracker` ¶

Bases: Pipeable

Norfair adapter — pure-Python Kalman tracking with re-association, better ID stability through occlusion than IoUTracker. pip install 'trio-retina[norfair]'.

Surfaces only tracks detected this frame (coasting/occluded ones are kept internally for re-association but not returned, so occupancy/dwell stay honest).

`Pipeable` ¶

Mixin giving | composition. Subclasses implement to_node().

`Pipeline` ¶

A linear chain of nodes. Each frame flows through every node in order; a node returning None drops the frame (the rest of the chain is skipped).

`process(image, t, *, frame_num=None)` ¶

Run one (image, timestamp) through the chain; return the enriched Frame.

frame_num defaults to an internal monotonic counter; pass the true source frame index if you have it (a Pipeline is single-stream/stateful).

`run(frames)` ¶

Stream events from an iterable of (image, timestamp) pairs.

`run_states(frames)` ¶

Stream a WorldState snapshot per frame — the assembled-state channel (entities + relations + scene), alongside run()'s event stream.

`Relation` `dataclass` ¶

A typed, directed relation between two entities (subj -predicate-> obj).

family is an optional coarse grouping (spatial / social / functional …) above the specific predicate.

`Retina` ¶

Sugar over Pipeline for the common detector -> tracker -> rules case.

`RuleNode` ¶

Bases: Node

Run an event rule over the tracks; append to frame.events.

`SinkNode` ¶

Bases: Node

Emit each event on the frame to a sink (jsonl/webhook/kafka/...).

`Track` `dataclass` ¶

A detected object followed across frames.

bbox is the tracker's current box; det_bbox preserves the raw detector box (they differ once a Kalman/DCF tracker predicts). user is an open extension slot for downstream code.

`TrackerNode` ¶

Bases: Node

Give detections identity over time; fill frame.tracks.

`VJepa2Embedder` ¶

Bases: Pipeable

Frozen V-JEPA 2 scene-level embedder — the first real scene producer.

V-JEPA 2 is a self-supervised video encoder, so this is not a per-frame op: it keeps a rolling buffer of the last clip_len frame images and, once full, runs V-JEPA 2 over the whole clip, mean-pools the patch/temporal tokens to a single vector, and attaches it as frame.user["scene"] = Vec(...).to_dict(). WorldState.from_frame then lifts it onto ws.scene — symmetric with how DinoV2Embedder fills entity.vec. Before the buffer fills, the frame passes through untouched (no scene yet). The buffer slides by one frame thereafter, so every frame from clip_len on carries a fresh scene latent.

clip_len is the number of frames per clip (default 16). device="auto" selects mps → cuda → cpu. normalize=True L2-normalizes the pooled vector. Set bgr=True for OpenCV frames (cv2 is BGR); synthetic / RGB frames keep the default False.

Needs the extra (pulls torch + transformers + pillow, downloads V-JEPA 2 weights): pip install 'trio-retina[vjepa]'.

`Vec` `dataclass` ¶

A model-tagged latent. Small vectors ride values inline; large or re-embeddable ones ride ref by reference. Always tagged {model, dim}.

`VlmDetector` ¶

Bases: Pipeable

Use ANY vision-language model as a detector.

You pass a client(image, prompt) -> iterable of dicts, where each dict has label, box = [x1, y1, x2, y2] (pixels), and optional score. VlmDetector just maps that into Detections — so Qwen-VL, Gemini, GPT-4o, Claude, or a local VLM all plug in behind the same seam. The client is yours (an OpenAI-compatible call, an HTTP request, etc.); keep grounding/JSON parsing there. A VLM can also be used as an EnricherNode/event source — see docs.

`WebhookSink` ¶

Bases: _SinkPipeable

POST each event as JSON to a URL. Uses urllib (stdlib) — no requests dep.

Only http/https URLs are accepted. The URL may come from a workflow.json (a trusted operator input — see SECURITY.md), but we still reject other schemes (file://, ftp://, …) so a stray config can't make urllib read a local file or hit an unexpected protocol.

`WorldState` `dataclass` ¶

The assembled snapshot: entities present, their relations, scene latent.

scene is the home for a whole-field / scene-level latent — a model-tagged Vec describing the frame as a whole, with no bounding box (e.g. a V-JEPA scene vector, or a CSI channel latent for an RF/field measurement of the whole room). It is the natural slot for any signal whose state is global to the scene rather than anchored to one detected object.

`from_frame(frame)` `classmethod` ¶

Assemble a WorldState from a Frame: each track becomes an entity.

Maps the symbolic core (id/type/bbox/conf) straight off the track; if a per-object latent was attached upstream (in track.user["vec"] as a dict), it rides along as the entity's vec. A scene-level latent (e.g. a frozen V-JEPA scene encoder) attaches symmetrically: if frame.user["scene"] is a dict, it lifts onto ws.scene. Relations default empty — filled by a higher stage (a relation extractor).

`to_dict()` ¶

Minimal dict, null/empty fields omitted — the smallest is {src, t}.

`WorldStateNode` ¶

Bases: Node

Assemble a WorldState snapshot from the frame's tracks and store it on frame.user[key], so the state channel flows through the same composable pipeline as events. Read it off frame.user or via Pipeline.run_states().

`YoloDetector` ¶

Bases: Pipeable

Optional Ultralytics YOLO adapter. pip install trio-retina[yolo].

Loads any Ultralytics weights — YOLOv5/8/9/10/11/12, YOLO-World, RT-DETR — so swapping models is just a different weights string. Not imported unless you instantiate it, so the base package stays light.

`Zone` `dataclass` ¶

A polygonal region of interest.

`scaled(size)` ¶

The polygon in pixel coords. For a normalized zone, multiply by the frame size; compute this once per frame and reuse across objects.

`ZoneRule` ¶

Bases: _RuleBase

zone.enter on entry, zone.exit on departure, zone.dwell once a track has stayed dwell_s seconds inside (fires once per visit).

exit_grace_s keeps a track logically inside until it has been out-of-zone or absent for that long (rides out detection blips / id flicker without a spurious exit; the exit dur is measured to the last frame seen inside). anchor picks the body-point tested against the polygon: center (default, the centroid), feet (bottom-center of bbox), or head (top-center).

`event_f1(pred, ref, **kw)` ¶

Precision / recall / F1 between predicted and reference events.

`load_schema()` ¶

The formal JSON Schema (draft 2020-12) for retina.event.

`match_events(pred, ref, *, time_tol=2.0, keys=('type', 'zone', 'dir'))` ¶

Greedy nearest-in-time matching. Returns (tp, fp, fn).

A predicted event matches an unused reference event with identical keys and the smallest |t_pred - t_ref| within time_tol.

`register_node(type_name, builder)` ¶

Register a custom node type for declarative from_json workflows.

The registry is a module global, so a registration is process-wide.

`sample_events()` ¶

Return a filesystem path to the bundled sample retina.event JSONL.

The file ships inside the package (retina/_assets/sample_events.jsonl), so this works offline the instant the wheel is installed — no network, no licensing risk. It is the five-event synthetic dock scene from retina demo (count threshold → zone enter → dwell → line cross → exit), handy for trying the event format, validate(), or the CLI::

retina validate "$(python -c 'import retina; print(retina.sample_events())')"

Returns the path as a str. The path is stable for the life of the process; treat the file as read-only (it lives inside the install).

`sample_video(*, force=False)` ¶

Return a path to a small sample video clip, cached per-user.

What this is. A synthetic clip — deterministic moving shapes (a couple of coloured rectangles drifting across a dark background) — generated once with OpenCV and cached under ~/.cache/trio-retina/. It exists to exercise the video-source plumbing end to end with zero network and zero third-party-footage licensing risk: video_frames(retina.sample_video()), retina run workflow.json "$(... sample_video ...)", frame striding, EOF handling, and so on.

What this is NOT. It is not real-world footage, so a real object detector (YoloDetector) will find no people/vehicles in it — there are none. For the YOLO-on-real-footage path, point Retina at your own clip (video_frames("your.mp4")); the synthetic clip only verifies the wiring.

Writing the clip needs OpenCV (the [video] extra). The first call writes and caches it; later calls return the cached path immediately. Pass force=True to regenerate.

Raises RuntimeError with a clear [video] hint if OpenCV is missing and the clip is not already cached.

`to_jsonl(events, path)` ¶

Write events to a JSONL file. Returns the count written.

`to_node(x)` ¶

Coerce a step into a Node: pass Nodes through, auto-wrap pipeable domain objects (detector/tracker/rule/sink) via their to_node().

`validate(event)` ¶

Return a list of problems (empty = valid). Accepts an Event or a dict.

API reference¶

retina ¶

CallableDetector ¶

CountRule ¶

Detection dataclass ¶

from_supervision(detections, class_names=None) classmethod ¶

Detector ¶

DetectorNode ¶

DinoV2Embedder ¶

EnricherNode ¶

Entity dataclass ¶

Event dataclass ¶

to_dict() ¶

EventType ¶

Frame dataclass ¶

GateNode ¶

GroundingDinoDetector ¶

IoUTracker ¶

JsonlSink ¶

Line dataclass ¶

scaled(size) ¶

LineRule ¶

MotionGate ¶

Node ¶

NorfairTracker ¶

Pipeable ¶

Pipeline ¶

process(image, t, *, frame_num=None) ¶

run(frames) ¶

run_states(frames) ¶

Relation dataclass ¶

Retina ¶

RuleNode ¶

SinkNode ¶

Track dataclass ¶

TrackerNode ¶

VJepa2Embedder ¶

Vec dataclass ¶

VlmDetector ¶

WebhookSink ¶

WorldState dataclass ¶

from_frame(frame) classmethod ¶

to_dict() ¶

WorldStateNode ¶

YoloDetector ¶

Zone dataclass ¶

scaled(size) ¶

ZoneRule ¶

event_f1(pred, ref, **kw) ¶

load_schema() ¶

match_events(pred, ref, *, time_tol=2.0, keys=('type', 'zone', 'dir')) ¶

register_node(type_name, builder) ¶

sample_events() ¶

sample_video(*, force=False) ¶

to_jsonl(events, path) ¶

to_node(x) ¶

validate(event) ¶

`retina` ¶

`CallableDetector` ¶

`CountRule` ¶

`Detection` `dataclass` ¶

`from_supervision(detections, class_names=None)` `classmethod` ¶

`Detector` ¶

`DetectorNode` ¶

`DinoV2Embedder` ¶

`EnricherNode` ¶

`Entity` `dataclass` ¶

`Event` `dataclass` ¶

`to_dict()` ¶

`EventType` ¶

`Frame` `dataclass` ¶

`GateNode` ¶

`GroundingDinoDetector` ¶

`IoUTracker` ¶

`JsonlSink` ¶

`Line` `dataclass` ¶

`scaled(size)` ¶

`LineRule` ¶

`MotionGate` ¶

`Node` ¶

`NorfairTracker` ¶

`Pipeable` ¶

`Pipeline` ¶

`process(image, t, *, frame_num=None)` ¶

`run(frames)` ¶

`run_states(frames)` ¶

`Relation` `dataclass` ¶

`Retina` ¶

`RuleNode` ¶

`SinkNode` ¶

`Track` `dataclass` ¶

`TrackerNode` ¶

`VJepa2Embedder` ¶

`Vec` `dataclass` ¶

`VlmDetector` ¶

`WebhookSink` ¶

`WorldState` `dataclass` ¶

`from_frame(frame)` `classmethod` ¶

`to_dict()` ¶

`WorldStateNode` ¶

`YoloDetector` ¶

`Zone` `dataclass` ¶

`scaled(size)` ¶

`ZoneRule` ¶

`event_f1(pred, ref, **kw)` ¶

`load_schema()` ¶

`match_events(pred, ref, *, time_tol=2.0, keys=('type', 'zone', 'dir'))` ¶

`register_node(type_name, builder)` ¶

`sample_events()` ¶

`sample_video(*, force=False)` ¶

`to_jsonl(events, path)` ¶

`to_node(x)` ¶

`validate(event)` ¶