System Overview =============== This repository contains a multiplayer data collection framework for Minecraft. It uses programmed bots based on `Mineflayer `_ that engage in diverse, collaborative, multiplayer scenarios. The data it collects is the official Minecraft graphics (observations) for every player, annotated with their corresponding actions. Data Collection Workflow ------------------------ The sole entry point to the data collection workflow is the :ref:`run.sh `/:ref:`run_evals.sh ` scripts that generate the training and eval datasets. They generate :ref:`Docker Compose files ` tying together the system components: :ref:`Controller Bot `, :ref:`Camera Bot `, :ref:`Minecraft Server Plugin `, :ref:`Spectator Bot `, execute the docker compose instances in isolation in parallel, run the :ref:`postprocessing scripts ` on the host to produce the aligned video-action episodes, and :ref:`transform ` the collected data into the format of training and eval datasets. .. _controller: Controller ---------- The Controller Bot is a JavaScript program built on top of Mineflayer. It connects to the Minecraft Server and drives the behavior of the player. To ensure collaboration, it communicates with the controller instances of other players connected to the same server. It features a set of high-level, reusable game play primitives and a modular system of various episode types focusing on different aspects of the game. See :doc:`controller` for more details. The controller is responsible for action recording of the playing bot. It saves them to disk as json files. Below is the list of all actions it records: .. list-table:: :header-rows: 1 :widths: 18 16 66 * - Action key - Type - Description * - forward - bool/sustained - Player moving forward (W). * - back - bool/sustained - Player moving backward (S). * - left - bool/sustained - Player strafing left (A). * - right - bool/sustained - Player strafing right (D). * - jump - bool/sustained - Player jumping. * - sprint - bool/sustained - Player sprinting. * - sneak - bool/sustained - Player sneaking. * - camera - vec2f/sustained - Change in player camera orientation (yaw, pitch). * - attack - bool/once - Player attacks. * - use - bool/once - Player uses / interacts with the environment. * - mount - bool/once - Player mounts an entity/vehicle. * - dismount - bool/once - Player dismounts. * - place_block - bool/once - Player places a block using the currently selected item. * - place_entity - bool/once - Player places an entity item. * - mine - bool/sustained - Player mining a block. * - hotbar.1 - bool/once - Player selects hotbar slot 1. * - hotbar.2 - bool/once - Player selects hotbar slot 2. * - hotbar.3 - bool/once - Player selects hotbar slot 3. * - hotbar.4 - bool/once - Player selects hotbar slot 4. * - hotbar.5 - bool/once - Player selects hotbar slot 5. * - hotbar.6 - bool/once - Player selects hotbar slot 6. * - hotbar.7 - bool/once - Player selects hotbar slot 7. * - hotbar.8 - bool/once - Player selects hotbar slot 8. * - hotbar.9 - bool/once - Player selects hotbar slot 9. .. _camera: Camera ------ The Camera Bot is the official Minecraft Java Client that runs headlessly on an ``Xvfb`` virtual display. We enable GPU-accelerated graphics via ``vglrun -d "egl"``. It connects to the server and pairs up with the corresponding Controller Bot of that player, so that these two processes are logically a single player. Through the :ref:`Minecraft Server Plugin `, the camera bot, at all times, shares the first person perspective of its controller bot. The Camera Bot captures graphics using ``ffmpeg`` with the ``-f x11grab`` option, which grabs the X11 virtual display. We enable NVENC hardware encoding via ``-c:v h264_nvenc``. ``SolarisEngine`` aligns the video and actions together in postprocessing to form a final episode. Both the controller (actions) and camera (video) bots record at ``20`` FPS. We save videos at ``1280×720``, though this is flexible. .. _minecraft-server-plugin: Minecraft Server Plugin ---------------------- ``SolarisEngine`` works with a standard Minecraft 1.21 Paper server that it augments with a custom server-side plugin. The plugin provides controls to pair controller bots with their corresponding camera bots by continuously synchronizing their character states. It replays all actions, positions, camera movements, and GUI elements, allowing the controller complete control over the player while accurately capturing its perspective with a real Minecraft client. It keeps the camera bot invisible to all players. .. _spectator-bot: Spectator Bot ------------- The spectator bot is another Mineflayer bot (making it a total of 3 bots constituting a single logical player). It always stays in the Spectate mode and just follows its controller bot. It always stays in the Spectate mode and follows its controller bot. This extra bot only exists to observe both the controller and the camera at once and is used internally by the plugin to synchronize block-breaking animations. .. _postprocessing: Postprocessing -------------- After all the controller and camera processes finish, ``SolarisEngine`` cuts the single, raw camera output of a player into episodes, according to the episode action json files produced by the controller. The postprocessing script :ref:`process_recordings.py ` uses ``ffprobe`` to extract frames corresponding to their actions based on the per-frame wallclock timestamps. Note this is only possible because we tell ``ffmpeg`` to write absolute timestamps during recording via ``-use_wallclock_as_timestamps 1 -copyts -vsync 0 -bf 0``. By default, ffmpeg does not write absolute timestamps, which are crucial to ensure perfect frame-action alignment. An episode always consists of ``N`` actions and ``N`` observations. The observation at index ``t`` corresponds to the frame recorded at a wall-clock time at or after the action at index ``t``, ensuring correct causality (observations come after actions). .. _datasets-preparation: Datasets Preparation -------------------- As a last step, the engine transforms the collected data into the training and evaluation dataset formats `Solaris `_ model code expects. Although the final format of training and eval datasets is the same, the procedure of obtaining them differs depending on the dataset type. Training ~~~~~~~~ The :ref:`prepare_train_dataset.py ` script validates and transforms the output of ``SolarisEngine`` to the final dataset format, and :ref:`split_train_test.py ` splits the dataset folder into ``train/`` and ``test/`` subfolders. The two optional scripts :ref:`detect_water_episodes_batch.py ` and :ref:`filter_dataset.py ` detect episodes where either Alpha or Bravo is underwater by analyzing the oxygen bar HUD and excluding them from the train dataset. Lastly, the optional script :ref:`annotate_video_batch.py ` stitches the videos of all players into one and overlays them with visualized actions. It's a helpful debug tool to see how well all bots behave in an episode and that their actions are properly aligned with the observations. Eval ~~~~ The :ref:`prepare_eval_datasets.py ` script validates and transforms the output of ``SolarisEngine`` to the final dataset format. .. _docker: Docker ------ ``SolarisEngine`` uses Docker and Docker Compose to manage its components. The controller bot, camera bot, spectator bot, and Minecraft server are separate Docker containers. The controller bot has the additional ``act_recorder`` Python process for writing actions to disk that runs in a separate Docker container. All in all, for two players, it's ``2 * 4 + 1 = 9`` long running Docker containers total. They are bundled in Docker Compose, forming an instance, which allows them to run in isolation. A Docker Compose instance also has two additional procedural Docker containers, ``plugin_starter`` and ``prep_data``, that run at startup to set up the Minecraft server and the server-side plugin. The outer layer of Python scripts, :ref:`generate_compose.py ` and :ref:`orchestrate.py `, generates a configurable number of such Docker Compose instances and executes them in parallel, enabling data collection at scale. The camera bot has a dedicated Docker image, ``solaris-engine-camera``, configured with a Java runtime and the official Minecraft Java client running headlessly. It does its rendering on the GPU and requires the host machine to have one to ensure proper Minecraft graphic rendering FPS. The controller bot, spectator bot, and ``act_recording`` Docker containers all share the ``solaris-engine-base`` Docker image that has both Node and Python environments set up. The Minecraft server uses the publicly available ``itzg/minecraft-server`` Docker image. All postprocessing happens on the host inside the conda environment created by `env.yaml `_ file. Third-party Dependencies ------------------------ Mineflayer ~~~~~~~~~~ The Controller uses a `forked version `_ of Mineflayer with the following modifications: - Mineflayer API exposes access to the most recently applied camera action in its physics module and its event system is extended to send events on one-off semantic actions such as attacking, using, placing, and hotbar changes. - The bot correctly looks at the face of the block when placing a new block. - Camera smoothing is added to all non-Pathfinder look commands. See the full list of changes `here `_. Mineflayer-Pathfinder ~~~~~~~~~~~~~~~~~~~~~ The Controller uses a `forked version `_ of Mineflayer-Pathfinder plugin with the following modifications: - Improved looking when digging. - Extended scaffolding items. See the full list of changes `here `_. Mineflayer-Prismarine-Viewer ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Controller implements its action recording based on a `forked version `_ of Prismarine-Viewer. It modifies it in the following way: - It disables any graphic recordings because it's handled by the dedicated camera process. - It receives actions from the Mineflayer physics plugin and sends them to the separate ``act_recorder`` process over network to be saved as json files on disk. See the full list of changes `here `_.