inference and long-running agents

Agent and inference lab.

Wexpro Labs works on the systems around models: lower-cost inference below them, agents that keep working above them, and local tools that make existing agent workflows cheaper and more reliable.

compress context 01
serve models 02
run agents 03
ship probes 04
  • 01

    Context Compression

    A local background token saver for AI agents reading JSON, JSONL, CSV, and TSV.

  • 02

    inference

    Lower latency and cost at the runtime layer.

  • 03

    long-running agents

    Agents that keep state, resume work, and finish multi-step jobs.

  • 04

    experiments

    Applied probes that turn model capability into working behavior.