System Overview¶
This repository contains a multiplayer data collection framework for Minecraft. It uses programmed bots based on Mineflayer that engage in diverse, collaborative, multiplayer scenarios. The data it collects is the official Minecraft graphics (observations) for every player, annotated with their corresponding actions.
Data Collection Workflow¶
The sole entry point to the data collection workflow is the run.sh/run_evals.sh scripts that generate the training and eval datasets. They generate Docker Compose files tying together the system components: Controller Bot, Camera Bot, Minecraft Server Plugin, Spectator Bot, execute the docker compose instances in isolation in parallel, run the postprocessing scripts on the host to produce the aligned video-action episodes, and transform the collected data into the format of training and eval datasets.
Controller¶
The Controller Bot is a JavaScript program built on top of Mineflayer. It connects to the Minecraft Server and drives the behavior of the player. To ensure collaboration, it communicates with the controller instances of other players connected to the same server. It features a set of high-level, reusable game play primitives and a modular system of various episode types focusing on different aspects of the game. See Controller for more details.
The controller is responsible for action recording of the playing bot. It saves them to disk as json files. Below is the list of all actions it records:
Action key |
Type |
Description |
|---|---|---|
forward |
bool/sustained |
Player moving forward (W). |
back |
bool/sustained |
Player moving backward (S). |
left |
bool/sustained |
Player strafing left (A). |
right |
bool/sustained |
Player strafing right (D). |
jump |
bool/sustained |
Player jumping. |
sprint |
bool/sustained |
Player sprinting. |
sneak |
bool/sustained |
Player sneaking. |
camera |
vec2f/sustained |
Change in player camera orientation (yaw, pitch). |
attack |
bool/once |
Player attacks. |
use |
bool/once |
Player uses / interacts with the environment. |
mount |
bool/once |
Player mounts an entity/vehicle. |
dismount |
bool/once |
Player dismounts. |
place_block |
bool/once |
Player places a block using the currently selected item. |
place_entity |
bool/once |
Player places an entity item. |
mine |
bool/sustained |
Player mining a block. |
hotbar.1 |
bool/once |
Player selects hotbar slot 1. |
hotbar.2 |
bool/once |
Player selects hotbar slot 2. |
hotbar.3 |
bool/once |
Player selects hotbar slot 3. |
hotbar.4 |
bool/once |
Player selects hotbar slot 4. |
hotbar.5 |
bool/once |
Player selects hotbar slot 5. |
hotbar.6 |
bool/once |
Player selects hotbar slot 6. |
hotbar.7 |
bool/once |
Player selects hotbar slot 7. |
hotbar.8 |
bool/once |
Player selects hotbar slot 8. |
hotbar.9 |
bool/once |
Player selects hotbar slot 9. |
Camera¶
The Camera Bot is the official Minecraft Java Client that runs headlessly on an Xvfb virtual display. We enable GPU-accelerated graphics via vglrun -d "egl". It connects to the server and pairs up with the corresponding Controller Bot of that player,
so that these two processes are logically a single player. Through the Minecraft Server Plugin, the camera bot, at all times, shares the first person perspective of its controller bot.
The Camera Bot captures graphics using ffmpeg with the -f x11grab option, which grabs the X11 virtual display. We enable NVENC hardware encoding via -c:v h264_nvenc. SolarisEngine aligns the video and actions together in postprocessing to form a final episode. Both the controller (actions) and camera (video) bots record at 20 FPS. We save videos at 1280×720, though this is flexible.
Minecraft Server Plugin¶
SolarisEngine works with a standard Minecraft 1.21 Paper server that it augments with a custom server-side plugin. The plugin provides controls to pair controller bots with their corresponding camera bots by continuously synchronizing their character states. It replays all actions, positions, camera movements, and GUI elements, allowing the controller complete control over the player while accurately capturing its perspective with a real Minecraft client. It keeps the camera bot invisible to all players.
Spectator Bot¶
The spectator bot is another Mineflayer bot (making it a total of 3 bots constituting a single logical player). It always stays in the Spectate mode and just follows its controller bot. It always stays in the Spectate mode and follows its controller bot. This extra bot only exists to observe both the controller and the camera at once and is used internally by the plugin to synchronize block-breaking animations.
Postprocessing¶
After all the controller and camera processes finish, SolarisEngine cuts the single, raw camera output of a player into episodes,
according to the episode action json files produced by the controller. The postprocessing script process_recordings.py
uses ffprobe to extract frames corresponding to their actions based on the per-frame wallclock timestamps. Note this is only possible because we
tell ffmpeg to write absolute timestamps during recording via -use_wallclock_as_timestamps 1 -copyts -vsync 0 -bf 0. By default, ffmpeg does not write absolute timestamps, which are crucial to ensure perfect frame-action alignment.
An episode always consists of N actions and N observations. The observation at index t corresponds to the frame recorded at a wall-clock time at or after the action at index t,
ensuring correct causality (observations come after actions).
Datasets Preparation¶
As a last step, the engine transforms the collected data into the training and evaluation dataset formats Solaris model code expects. Although the final format of training and eval datasets is the same, the procedure of obtaining them differs depending on the dataset type.
Training¶
The prepare_train_dataset.py script validates and transforms the output of SolarisEngine to the final dataset format, and split_train_test.py splits the dataset folder into train/ and test/ subfolders. The two optional scripts detect_water_episodes_batch.py and filter_dataset.py detect episodes
where either Alpha or Bravo is underwater by analyzing the oxygen bar HUD and excluding them from the train dataset. Lastly, the optional script annotate_video_batch.py stitches the videos of all players into one and overlays them with visualized actions.
It’s a helpful debug tool to see how well all bots behave in an episode and that their actions are properly aligned with the observations.
Eval¶
The prepare_eval_datasets.py script validates and transforms the output of SolarisEngine to the final dataset format.
Docker¶
SolarisEngine uses Docker and Docker Compose to manage its components. The controller bot, camera bot, spectator bot, and Minecraft server are separate Docker containers.
The controller bot has the additional act_recorder Python process for writing actions to disk that runs in a separate Docker container.
All in all, for two players, it’s 2 * 4 + 1 = 9 long running Docker containers total. They are bundled in Docker Compose, forming an instance, which allows them to run in isolation.
A Docker Compose instance also has two additional procedural Docker containers, plugin_starter and prep_data,
that run at startup to set up the Minecraft server and the server-side plugin.
The outer layer of Python scripts, generate_compose.py and orchestrate.py, generates a configurable number of such Docker Compose instances and executes them in parallel, enabling data collection at scale.
The camera bot has a dedicated Docker image, solaris-engine-camera, configured with a Java runtime and the official Minecraft Java client running headlessly.
It does its rendering on the GPU and requires the host machine to have one to ensure proper Minecraft graphic rendering FPS.
The controller bot, spectator bot, and act_recording Docker containers all share the solaris-engine-base Docker image that has both Node and Python environments set up.
The Minecraft server uses the publicly available itzg/minecraft-server Docker image.
All postprocessing happens on the host inside the conda environment created by env.yaml file.
Third-party Dependencies¶
Mineflayer¶
The Controller uses a forked version of Mineflayer with the following modifications:
Mineflayer API exposes access to the most recently applied camera action in its physics module and its event system is extended to send events on one-off semantic actions such as attacking, using, placing, and hotbar changes.
The bot correctly looks at the face of the block when placing a new block.
Camera smoothing is added to all non-Pathfinder look commands.
See the full list of changes here.
Mineflayer-Pathfinder¶
The Controller uses a forked version of Mineflayer-Pathfinder plugin with the following modifications:
Improved looking when digging.
Extended scaffolding items.
See the full list of changes here.
Mineflayer-Prismarine-Viewer¶
The Controller implements its action recording based on a forked version of Prismarine-Viewer. It modifies it in the following way:
It disables any graphic recordings because it’s handled by the dedicated camera process.
It receives actions from the Mineflayer physics plugin and sends them to the separate
act_recorderprocess over network to be saved as json files on disk.
See the full list of changes here.