inference and long-running agents
Agent and inference lab.
focus
Wexpro Labs works on the systems around models: lower-cost inference below them, agents that keep working above them, and local tools that make existing agent workflows cheaper and more reliable.
compress context
01
serve models
02
run agents
03
ship probes
04
products
-
01
Context Compression
A local background token saver for AI agents reading JSON, JSONL, CSV, and TSV.
-
02
inference
Lower latency and cost at the runtime layer.
-
03
long-running agents
Agents that keep state, resume work, and finish multi-step jobs.
-
04
experiments
Applied probes that turn model capability into working behavior.