๐ ContaineRL - Containerize your RL Environments and Agents

TL;DR: How I built ContaineRL, a lightweight toolkit to package RL environments and agents as reproducible, containerized services, exposing them via gRPC with a clean Python API and CLI. It enables modular, language-agnostic RL pipelines, supports scalable deployment, and enforces strong engineering practices (type checking, testing, CI) without sacrificing developer ergonomics
Introduction
Modern reinforcement learning workflows increasingly span multiple systems: simulators, training pipelines, evaluation services, and deployment infrastructure. In practice, RL environments and agents are often tightly coupled to a single codebase, Python runtime, or execution context, making them difficult to reuse, scale, or integrate into production systems.
ContaineRL was designed to address this gap.
The goal of the project is simple but ambitious: treat RL environments and agents as first-class, containerized services, with:
- clean, strongly-typed interfaces
- reproducible execution
- minimal assumptions about the training stack
- easy local and CI-driven workflows
Instead of embedding environments and agents directly into training loops, ContaineRL allows them to be exposed over gRPC, packaged in Docker images, and orchestrated like any other microservice.
This approach enables:
- decoupled training and simulation
- scalable experimentation
- cross-language interoperability
- production-grade deployment patterns for RL systems
How ContaineRL Improves Quality-Of-Life
Most RL tooling assumes a monolithic setup:
- the environment lives in the same process as the agent
- the simulator shares memory with the learner
- scaling requires bespoke infrastructure glue
This breaks down quickly when:
- environments are heavy or simulator-bound
- agents must run remotely (or on different hardware)
- multiple learners need to interact with the same environment
- CI, reproducibility, or security boundaries matter
ContaineRL reframes the problem: An environment or an agent is just a service with a well-defined contract.
By containerizing these components:
- environments become reproducible artifacts
- agents can be swapped, scaled, or versioned independently
- training systems can remain lightweight and flexible
This design is particularly useful in:
- distributed RL
- simulation-heavy domains
- benchmarking and evaluation pipelines
- research-to-production transitions
Architecture Overview
At its core, ContaineRL provides two symmetric abstractions:
- Containerized Environments
- Containerized Agents
Both are exposed via gRPC, with serialization handled through msgpack-compatible types, and both are managed through the same lifecycle primitives.
The system is composed of three main layers:
- Python API - minimal abstractions to wrap environments and agents
- Transport Layer - gRPC interfaces for interaction
- CLI Tooling - build, run, test, and manage containers
Exposing an Environment as a Service
Any Gymnasium-compatible environment can be wrapped and exposed with minimal boilerplate.
import gymnasium as gym
from containerl import create_environment_server
class Environment(gym.Env):
def __init__(self, render_mode: str, env_name: str):
self._env = gym.make(env_name, render_mode=render_mode)
self.observation_space = gym.spaces.Dict({
"observation": self._env.observation_space
})
self.action_space = self._env.action_space
def reset(self, *, seed=None, options=None):
obs, info = self._env.reset(seed=seed, options=options)
return {"observation": obs}, info
def step(self, action):
obs, reward, terminated, truncated, info = self._env.step(action)
return {"observation": obs}, float(reward), terminated, truncated, info
if __name__ == "__main__":
create_environment_server(Environment)
Once containerized, this environment:
- listens on a fixed gRPC port
- accepts initialization arguments remotely
- can be run locally, in CI, or on a cluster
The environment process is fully isolated, making execution deterministic and reproducible.
Exposing an Agent as a Service
Agents follow the same philosophy. Any policy or controller implementing the CRLAgent interface can be served as a container.
import numpy as np
from gymnasium import spaces
from containerl import CRLAgent, create_agent_server
class Agent(CRLAgent):
def __init__(self, target: float, gain: float):
self.target = target
self.gain = gain
self.observation_space = spaces.Dict({
"state": spaces.Box(0, 100, shape=(1,))
})
self.action_space = spaces.Box(0, 10, shape=(1,), dtype=np.float32)
def get_action(self, observation):
return np.clip(
self.gain * (self.target - observation["state"]),
0, 10
)
if __name__ == "__main__":
create_agent_server(Agent)
This makes it possible to:
- deploy agents independently of training code
- run agents on specialized hardware
- test policies via standardized client interfaces
CLI-Driven Workflow
ContaineRL includes a CLI (containerl-cli) to manage the full lifecycle of containerized components.
Build an image:
uv run containerl-cli build ./examples/gymnasium/environments/atari/ \
-n my-image -t v1
Run one or more containers:
uv run containerl-cli run my-image:v1 --count 3
Test connectivity and correctness:
uv run containerl-cli build-run-test ./examples/gymnasium/environments/atari/
This enables:
- fast local iteration
- reproducible CI checks
- automated validation of containers before deployment
Key Takeaways
- ContaineRL decouples RL components cleanly, enabling modular and scalable systems.
- Containerized environments and agents become reusable, versioned artifacts.
- gRPC-based interfaces allow language-agnostic integration.
- Strong engineering discipline ensures reliability without hurting developer velocity.
This project reflects my broader approach to RL systems: treat learning components as deployable, testable infrastructure, not just Python objects inside a training loop.
References
- ๐จ๐ฝโ๐ป GitHub Code: https://github.com/alexpalms/containerl
- ๐ฆ PyPI Package: https://pypi.org/project/containerl/