๐Ÿ‹ ContaineRL - Containerize your RL Environments and Agents

Nov 1, 2025ยท
Alessandro Palmas
Alessandro Palmas
ยท 4 min read
Code

TL;DR: How I built ContaineRL, a lightweight toolkit to package RL environments and agents as reproducible, containerized services, exposing them via gRPC with a clean Python API and CLI. It enables modular, language-agnostic RL pipelines, supports scalable deployment, and enforces strong engineering practices (type checking, testing, CI) without sacrificing developer ergonomics

Introduction

Modern reinforcement learning workflows increasingly span multiple systems: simulators, training pipelines, evaluation services, and deployment infrastructure. In practice, RL environments and agents are often tightly coupled to a single codebase, Python runtime, or execution context, making them difficult to reuse, scale, or integrate into production systems.

ContaineRL was designed to address this gap.

The goal of the project is simple but ambitious: treat RL environments and agents as first-class, containerized services, with:

  • clean, strongly-typed interfaces
  • reproducible execution
  • minimal assumptions about the training stack
  • easy local and CI-driven workflows

Instead of embedding environments and agents directly into training loops, ContaineRL allows them to be exposed over gRPC, packaged in Docker images, and orchestrated like any other microservice.

This approach enables:

  • decoupled training and simulation
  • scalable experimentation
  • cross-language interoperability
  • production-grade deployment patterns for RL systems

How ContaineRL Improves Quality-Of-Life

Most RL tooling assumes a monolithic setup:

  • the environment lives in the same process as the agent
  • the simulator shares memory with the learner
  • scaling requires bespoke infrastructure glue

This breaks down quickly when:

  • environments are heavy or simulator-bound
  • agents must run remotely (or on different hardware)
  • multiple learners need to interact with the same environment
  • CI, reproducibility, or security boundaries matter

ContaineRL reframes the problem: An environment or an agent is just a service with a well-defined contract.

By containerizing these components:

  • environments become reproducible artifacts
  • agents can be swapped, scaled, or versioned independently
  • training systems can remain lightweight and flexible

This design is particularly useful in:

  • distributed RL
  • simulation-heavy domains
  • benchmarking and evaluation pipelines
  • research-to-production transitions

Architecture Overview

At its core, ContaineRL provides two symmetric abstractions:

  • Containerized Environments
  • Containerized Agents

Both are exposed via gRPC, with serialization handled through msgpack-compatible types, and both are managed through the same lifecycle primitives.

The system is composed of three main layers:

  • Python API - minimal abstractions to wrap environments and agents
  • Transport Layer - gRPC interfaces for interaction
  • CLI Tooling - build, run, test, and manage containers

Exposing an Environment as a Service

Any Gymnasium-compatible environment can be wrapped and exposed with minimal boilerplate.

import gymnasium as gym
from containerl import create_environment_server

class Environment(gym.Env):
    def __init__(self, render_mode: str, env_name: str):
        self._env = gym.make(env_name, render_mode=render_mode)
        self.observation_space = gym.spaces.Dict({
            "observation": self._env.observation_space
        })
        self.action_space = self._env.action_space

    def reset(self, *, seed=None, options=None):
        obs, info = self._env.reset(seed=seed, options=options)
        return {"observation": obs}, info

    def step(self, action):
        obs, reward, terminated, truncated, info = self._env.step(action)
        return {"observation": obs}, float(reward), terminated, truncated, info

if __name__ == "__main__":
    create_environment_server(Environment)

Once containerized, this environment:

  • listens on a fixed gRPC port
  • accepts initialization arguments remotely
  • can be run locally, in CI, or on a cluster

The environment process is fully isolated, making execution deterministic and reproducible.

Exposing an Agent as a Service

Agents follow the same philosophy. Any policy or controller implementing the CRLAgent interface can be served as a container.

import numpy as np
from gymnasium import spaces
from containerl import CRLAgent, create_agent_server

class Agent(CRLAgent):
    def __init__(self, target: float, gain: float):
        self.target = target
        self.gain = gain
        self.observation_space = spaces.Dict({
            "state": spaces.Box(0, 100, shape=(1,))
        })
        self.action_space = spaces.Box(0, 10, shape=(1,), dtype=np.float32)

    def get_action(self, observation):
        return np.clip(
            self.gain * (self.target - observation["state"]),
            0, 10
        )

if __name__ == "__main__":
    create_agent_server(Agent)

This makes it possible to:

  • deploy agents independently of training code
  • run agents on specialized hardware
  • test policies via standardized client interfaces

CLI-Driven Workflow

ContaineRL includes a CLI (containerl-cli) to manage the full lifecycle of containerized components.

Build an image:

uv run containerl-cli build ./examples/gymnasium/environments/atari/ \
  -n my-image -t v1

Run one or more containers:

uv run containerl-cli run my-image:v1 --count 3

Test connectivity and correctness:

uv run containerl-cli build-run-test ./examples/gymnasium/environments/atari/

This enables:

  • fast local iteration
  • reproducible CI checks
  • automated validation of containers before deployment

Key Takeaways

  • ContaineRL decouples RL components cleanly, enabling modular and scalable systems.
  • Containerized environments and agents become reusable, versioned artifacts.
  • gRPC-based interfaces allow language-agnostic integration.
  • Strong engineering discipline ensures reliability without hurting developer velocity.

This project reflects my broader approach to RL systems: treat learning components as deployable, testable infrastructure, not just Python objects inside a training loop.

References