SHARD Format – A New Take on Minecraft Region Data

Modern game modes with sophisticated concepts have specific needs when it comes to Minecraft region data. Regions have to be small in size to keep the storage cost low, they have to be fast to migrate to perform regular Minecraft version updates and they should support all new features while also being extendable enough to allow for customization. And if all that wouldn't be enough, immutability and other properties of modern cloud computing would also be beneficial to have, to create reliable setups.

While the Minecraft Region Data Format (Anvil) is perfectly fine for vanilla Minecraft servers, that utilize most if not all of the default mechanics of Minecraft, they are not a great fit for custom game modes, lobbies, and single, independent structures or snippets of worlds that have to be stitched together.

Then there are various schematic formats focused on providing immutable snippets, but fall short when it comes to representing entire (possibly huge) worlds that have to be loaded efficiently and with a minimum amount of resources. Schematics simply were never designed to hold entire worlds and therefore their process for being imported in Minecraft worlds is rather slow and their data structures are not optimized for the throughput that is necessary to import Minecraft worlds, covering billions of blocks.

For our game mode Clashlands, we developed our custom region format, which should finally blur the line between what is a schematic and what is a world. We named it SHARD (Shard Highly Augmented Region Data) and in this post, we want to elaborate on the shortcomings of the existing formats, our development process, what makes the SHARD format superior, the actual format, and how we use it within JustChunks.

We've put in a lot of work to consider every aspect of modern Minecraft region data and want to know your opinion on our format. Depending on the feedback, we may also release it with a permissive license, so it can be used within your future projects as well.

ConfigPositions – A Fusion of Configuration and Level Data
Managing configuration data, that corresponds to different locations in a level, has always been a hassle in Minecraft. Level and configuration data can get out of sync or the configuration data was set incorrectly from the beginning. Read more about how ConfigPositions make that easier.

We've also developed and written about a custom configuration approach called "ConfigPositions". These configs can be embedded into SHARD files natively.

Developing the SHARD Format

The SHARD (Shard Highly Augmented Region Data) format holds information regarding Minecraft levels that can be either embedded into existing worlds or loaded as an independent world. It holds auxiliary information that adds meaning to specific positions within the stored area. Only store basic information that could have an impact on the visual appearance of the level is stored, omitting technical information that only makes sense for persistent Minecraft worlds.

We developed this format at JustChunks to replace any other Minecraft world representation we previously used and took inspiration from Hypixel's Slime format as well as the established Sponge Schematic format. The existing options missed some parts that were necessary for our setup, especially the embedded configuration. We wanted SHARDs to be self-sufficient and could be used as standalone sources for levels including visuals as well as configuration.

Prior to the development of SHARD format v0, we had an inofficial version oriented more closely on the Sponge Schematic format. Our levels eventually grew too big to be read in a reasonable time, and they also consumed way too much memory. That is why we completely scrapped JSON and therefore declared human-readability a non-goal for our next increment as those two factors were limiting our pursuit for greater performance.

Comparison to Existing Formats

The SHARD format was developed to create a new format that combines the strengths of existing formats and to merge conventional level data with more sophisticated configuration approaches. Therefore, it may be interesting to compare our SHARD format to existing, more-established formats available.

Minecraft's Region Format (Anvil)

Minecraft's Region Format is designed to support vanilla Minecraft gameplay and stores a lot of data for those mechanics. The data is spread across a lot of files, and each region is compressed (by zlib or LZMA) individually. This is poor for the overall size (as a much greater compression rate can be reached) and also involves a lot of unnecessary operations if the whole world is to be loaded anyway. Additionally, redundant data may form over time and the world becomes bigger and bigger with no visual differences.

The original Minecraft region and world format (Anvil) contains loads of superfluous data for temporary server instances that do not use most of Vanilla's features.

Our SHARD format has little to no extra data and is a single file. It is optionally compressed with zstd and can be extended with custom configuration data. The SHARD can be loaded either as a world or as a schematic. And it is always as small as it can be, considering the visual elements that need to be present. We are only using SNBT to store the individual palettes of block-s, tile entities, and regular entities.

Hypixel's Slime Format

Hypixel's Slime Region Format does a lot of things right. It's a binary format, cuts a lot of unnecessary data, uses modern compression, and can be loaded as a world. It can be serialized and deserialized promptly and only uses NBT to store the palettes and entities (like we do). There are, however, a few things that we think can be improved and that we've focused on for the SHARD format. Quite a few of those things may be fixed in future iterations of the format and could not be predicted when the format was created. Our observations are based on the published version of the format:

  • (Tile-)Entities are not part of the chunk definitions: If entities and tile entities were part of the chunk definitions, they could be loaded at the moment when the individual chunk is processed. This is great because that means, the chunk won't have to be loaded again (for schematics), and also nice for worlds, as each chunk can be constructed in its entirety before going to the next chunk.
  • Chunks are stored from top to bottom with a fixed height: Modern Minecraft worlds can have dynamic heights, which is not supported by the Slime Region Format. The format also does not use sections and instead encodes the entire chunk in a single block. This wastes the potential for optimizations. Most chunks (even in conventional worlds) have a lot of empty sections, which the SHARD format can just skip with a single bit. And even the in-memory representation can be optimized this way.
  • Biomes are 2D and therefore cannot differ along the y-axis: Minecraft versions 1.18+ have three-dimensional biomes. This means that the biome on the ground can be different from the biome on an isle, floating above it. This allows map builders to be more creative with biomes and color shading. The SHARD format has full support for 3D biomes and is flexible enough to support any combination of biomes.
  • Can only be loaded as a world, not as a schematic: Slime files can only be used and loaded as worlds, not as schematics. This limits their utility as they cannot be used to assemble complex worlds or to store only sections that should be embedded into bigger worlds. The format is optimized only to be used as worlds and does not fit into the concept of schematics or tiny structures.
  • Components are compressed individually: The individual components of the Slime format are compressed individually. This not only makes the serialization/deserialization more complex but also reduces the possible compression ratio. The SHARD format compresses the entire file at the end, which has the advantage that compression is optional and that we can compress across all data, resulting in optimal results. On top of that, we can stream our data asynchronously through multiple layers and perform compression/decompression on the fly.
  • Configuration/Extra Data can only be added globally: Auxiliary data can only be attached globally to the Slime file and is written with NBT tags. The SHARD format, on the other hand, uses a very minimal binary format and is able to store configuration and extra data for any specific position within the SHARD. This allows for better compression (smaller memory footprint) and more descriptive configuration that is merged with the visual blocks and regions within the contained structure.
  • Does not store the data version: The Minecraft data version is not saved and cannot be used to efficiently migrate the contained data. The version of the NBT data has to be stored outside the Slime file and needs to be kept in sync. With SHARDs, this data is stored reliably at the beginning so that all the following data can be interpreted and migrated according to the source and target version.
  • Stores blocks and biomes by their numeric ID: All blocks and biomes are stored with their dynamic IDs that were used before the flattening happened. Not only are those numeric IDs not used anymore, but we also can now add our custom biomes and other dynamic data through registries. This is not possible with the Slime format. The SHARD format has fully-fledged registry support and only uses temporary IDs with a palette of registry keys to take the best of both worlds.

Sponge's Schematic Format

Sponge's Schematic format is currently in its third iteration. It supports 3D biomes and can hold any entities, biomes, and blocks as well as custom metadata. It is based on (binary) JSON and stores blocks and biomes without any sections. That means both are stored as one long continuous array, making it cumbersome for massive schematics (as they can be hard to load because of the sheer size) and cannot be used to load them as worlds.

⚠️
This part of the article originally stated that Sponge's Schematic Format does not allow custom configuration. As pointed out by PierreSchwang, this is not the case. Any metadata can be stored within the corresponding object of the specification, allowing for custom configuration.

Our format was originally based on the Sponge Schematic Format, but we had to switch away from it, as the runtime performance just became unbearable for huge worlds. There are faster implementations available though, but one problem is the reliance on the correct order of fields. As it is fundamentally JSON, the order cannot be enforced. Even if the order is wrong, they can still be read, just not with optimal performance.

These underlying design problems led us to abandon JSON and develop our own binary format.

Original Schematic Format (WorldEdit, MCEdit, Schematica)

The original Schematic Format is based on numeric block IDs and a fixed block palette. It also uses numeric sub-IDs and does not support three-dimensional biomes. It is therefore not possible to represent the data of more recent Minecraft versions accurately. Those schematics cannot be loaded as worlds, and the format is completely NBT-based, which is not ideal for compression. In general, the original schematic format should not be used for any modern projects.

Design Principles

During the development of the SHARD format, we agreed on some key design principles that were important for determining the direction in which we developed the format. Since we have specific requirements for the capabilities and behavior of our worlds, levels, and objects, we abstracted them and extracted some high-level principles from them. These principles have emerged in particular from the shortcomings of the existing formats and are therefore attempts to address those inherent problems at the root.

Atomicity

SHARDs consist of a single, self-contained file with a common state. This simplifies portability, versioning, and storage within cloud-based infrastructures. Having only a single file and therefore only a single source of truth leads to atomicity. Either the entire world is loaded or nothing is loaded. This prevents faulty or corrupted states, and we can be sure that if the SHARD is loaded, it is exactly in the state that we expect it to be.

Immutability

SHARDs are meant to be only read from and write operations always overwrite the entire SHARD. Therefore, SHARDs are immutable by design. This is also reflected in the data format and is especially beneficial for containerized workloads. For lobby worlds that cannot be permanently altered by players, writing only wastes resources and is lost on a restart anyway. By having the SHARDs immutable, we can be sure that a SHARD will always stay the same, no matter what happens on the server or whether the world has been modified temporarily.

Extendability

Having a well-defined, rigid format often comes at the cost of extendability. We've made sure that the SHARD format leaves enough room for domain-specific configuration, metadata, and content. For that, we invented another construct: ConfigPosition. This is a touple of a relative location within the SHARD, a dynamic type, and a KV store for auxiliary data, corresponding to this configuration entry. The format supports up to 2^32 = 4294967296 distinct ConfigPositions, which is more than enough space for any configuration needs.

Those ConfigPositions are meant to fuse map configuration with the blocks/entities/biomes themselves. Configuration is often handled in separate config files. With ConfigPositions, we merged the configuration into the level data. In the build worlds, those are represented by text displays that are then converted by the integration to ConfigPositions. When the SHARD is loaded as a world, the specific plugin parses those positions and links them with their respective gameplay significance.

The palettes are also independent of the vanilla registries and can therefore support modded/custom content and custom data fields for the individual blocks as well. The meaning of the SHARDs is interpreted by the individual integrations. As long as an integration can understand the specific palette entries, it is completely fine to add custom entries that may not be part of the official vanilla registries.

Simplicity

The Minecraft Region Format (Anvil) contains loads of data that we don't need for lobbies, maps, and other use cases for the SHARD format. This includes stuff like advancements, structure information, points of interest, statistics, time inhabited in specific chunks, last time a chunk was updated, weather ... it just does not end. All this data is justified in the context of vanilla Minecraft gameplay but becomes completely redundant for modern Minecraft servers with more sophisticated concepts that don't use the default Minecraft mechanics.

During clan wars, the map may be damaged, but all damage should be only temporary. Therefore it is easier to work with templates instead of working with mutable worlds that need to be rolled back after each war.

The SHARD format is reduced to the bare minimum to represent the map visually. Everything not related to the appearance of the map is optional, and everything is focused on custom gameplay mechanics. This makes the SHARD format straightforward to understand and gives all available knobs significance within the context of representing a map or structure within Minecraft.

To simplify the recognition of SHARD files, we've added our custom signature at the start of the files, so that they can be properly differentiated from other files. That's also why we use a custom file ending, namely .shard and for compressed SHARDs .shard.zst to make it very transparent what the content of a SHARD is.

Flexibility

Perhaps the aspect that stands out the most is that SHARDs can be loaded as worlds and schematics, without changing the file itself. As outlined above: SHARDs just hold level data and configuration. Whether that information is loaded without anything else (world) or on top of existing structures (schematic) is up to the user. While loaded as a world, the SHARD format supplies all the necessary visual information and creates the necessary world configuration from the sensitive defaults, but it is also customizable to set things like the world name, PvP status, mob spawning, and other properties.

The configuration is completely usage-agnostic and therefore not opinionated. Whether the SHARD format is used to represent a lobby, a Bedwars map, a clan base, or the template for a plot, they all fit into the SHARD format. This can also be seen by the sheer size that the format theoretically can support. It holds up to 2^32 sections of 16^3 blocks. That means that absolutely all custom worlds should be able to be stored in the SHARD format.

It is also forward-compatible: By using Mojang's DataFixerUpper (DFU), SHARDs can be upgraded to any more recent Minecraft version if the migration is present in the runtime environment. The version used to store the SHARD is stored along the content, so it's straightforward to track what migrations have to be performed on the palettes and entity information.

Efficiency

Speed is important in modern gaming. Players don't want to wait for their maps to be assembled, minigames require frequent loading and deserialization of map data, and maps have to be regularly updated for new features. Therefore, efficiency and performance were a major concern during the development of the SHARD format. The inofficial pre-version of the SHARD format suffered from choosing JSON to structure the data, especially with huge worlds (~3000x3000 blocks). Therefore, we chose to implement our binary format for SHARDs v0.

The result is blazingly fast and very space-efficient. The smallest possible SHARD is only 73 bytes (33 bytes if compressed), and everything from the bottom up was built with performance in mind. Serialization and deserialization are done with async I/O and native channels, the data is distributed in sections to enable concurrency and allow for "empty" data to be culled. The sections are prefixed by a bitmask that reduces the space consumption and parsing of empty sections to just a single bit. Also, the in-memory representation uses a singleton to represent empty sections to improve the runtime memory usage and equality check performance.

object EmptyShardSection : ShardSection {
    override val bounds: Bounds = completeBounds
    override val blocks: List<UInt> = List(completeBounds.size) { 0u }
    override val biomes: List<UInt> = List(completeBounds.size) { 0u }
    override val tileEntities: Set<TileEntityData> = setOf()
    override val entities: Set<EntityData> = setOf()
    override val complete: Boolean = true
}

EmptyShardSection is the singleton instance that we use to improve the footprint of the in-memory representation of SHARDs as well as to make the distinction of empty sections easier while serializing a SHARD.

Compression is entirely optional, and it can be beneficial for the read performance of very small SHARDs to disable it, but the default is zstd. Other compression algorithms can be used and the compression is not part of the file format itself. This is reflected by the file ending being .shard for non-compressed SHARDS and .shard.zst for SHARDs that have been compressed with zstd.

Although we support updating Minecraft data ad-hoc, we skip migrations altogether if the detected Minecraft data version is equal to the data version detected within the runtime. This saves a lot of time, so we migrate our SHARDs whenever a new Minecraft version is released to not rely on the ad-hoc migration. This means we'll only have to execute the migration once and not, again and again, every time the map is loaded.

Non-Goals

We consider certain aspects as out of the scope of the SHARD format. Those are things that don't fit well with the rest of the format, are a hindrance to the development of other functionality, make things unnecessarily complex or just come with too much of a maintenance burden. That's why we're confident they will never be a part of the SHARD format.

Vanilla Minecraft World Replacement

The SHARD format is not meant to replace normal Minecraft (vanilla) worlds. We don't try to mimic or adapt normal Minecraft mechanics, introduce permanent mutability or auxiliary data like statistics, advancements, and structures. Therefore, the SHARD format may not be a good fit for all servers that want to offer more vanilla-oriented gameplay. This includes servers that rely on the default implementations of advancements, statistics, or other data that can be normally found in Minecraft worlds.

Legacy Version Support

It's also not a goal for the SHARD format to support Minecraft versions before "the Flattening" (Version 1.13). The palettes of more recent versions are optimized for the (S)NBT representations of blocks and the previous versions referred to block types by a numeric ID and another numeric sub-ID. Supporting those legacy formats would incur a lot of complexities on our data schema as well as the code for transformations. Therefore, we will always only support the latest version of Minecraft's internal data scheme, but provide upgrade paths.

Human Readability

As the SHARD format is binary, human readability cannot be achieved. Instead, we rely on tooling to allow users to inspect the contents of SHARDs and their metadata. SHARDs are not meant to be edited by hand (as their schema) could be easily messed up, if manually modified. Therefore, the loss of human readability in the raw format is not that bad. The same applies to debugging, which is why we don't add any debug data to the SHARDs.

Transformation

One of the key advantages of the SHARD format is the ability to perform swift and efficient transformations on the data. Transformations within real Minecraft worlds have to account for various locks (concurrency), multiple distributed data locations, states, packets, and other components because the world is already loaded. SHARDs, on the other hand, can be transformed purely mathematically and data-driven. This is time and space-efficient and allows us to first get the SHARDs in the desired shape before we apply them to the Minecraft worlds.

Merge (Layering)

A very convenient operation we can perform on SHARDs is the layering or merge. This means overlaying multiple SHARDs in a specific order and with different offsets to create a new combined SHARD that can be integrated into the Minecraft server in a single step. The resulting SHARD is optimized to only be as big as strictly necessary, and (now) empty sections are culled from the final result. The transformation supports an arbitrary number of SHARDs to be merged on top of one another with positive and negative offsets. The SHARD is grown to support all blocks at their respective offsets, and entities can be purged on lower layers or just added to the existing entities.

Clan islands are merged with the dynamic base of the individual clans, before being loaded within Bukkit. This saves a lot of time that would otherwise be spent on updating block states and executing physics updates.

Rotation

Sometimes the orientation of SHARDs does not match the orientation that they should have in the world. Shards can be rotated around all three axes in steps of 90 degrees. While other angles would also be possible, this is outside the scope of SHARDs, as that would require some kind of interpolation, and we focus on clean transformations. The blocks in the palette are also rotated accordingly (if possible for this angle) and the dimensions of the SHARD are adjusted to match the new orientation.

No-Op

The No-Op transformation just returns the original SHARD. This is useful if the real transformations should be temporarily disabled or skipped without adjusting the whole code. This way, the real transformation is just replaced with the No-Op transformation and everything still works as expected.

Integration

Without integration into the Minecraft ecosystem, SHARDs would not serve any purpose. Through the use of Mojang's DataFixerUpper (DFU), the palettes and entities within SHARDs can already be migrated ad-hoc to any desired Minecraft version. After they have been migrated to the requested version, they may be loaded into the Minecraft server: Either as an independent world or as a part of an existing world (schematic).

As a World

Loading SHARDs as worlds is done through a patch in our Paper fork Cardboard. SHARD worlds are loaded as natively as possible and are injected into the heart of the Minecraft server. The world loading is highly optimized, and all chunks are kept loaded all the time. The chunks outside the defined area are replaced with a special empty placeholder chunk that is neither ticked nor sent to the players. This improves the server performance and is also beneficial for the traffic and client performance of each player.

Gigantic worlds like the main world of Clashlands cannot be pasted into existing worlds in an acceptable duration of time. Instead, they are loaded natively and the data is converted into real proto chunks that are then sent to the clients.

We've also made sure that SHARDs can be loaded as the main world and are compatible with any third-party plugins (except for those that access the world folder on their own). Loading worlds in this native way allows us to stay compatible with the existing API and only offer extensions on top of it without having to establish a new ecosystem. SHARDs can also be used alongside normal vanilla worlds and are therefore completely compatible with most workloads.

As a Schematic

Embedding and extracting a SHARD as a schematic (that means as a piece of an existing world) happens through optimized integrations. Those integrations either scan the world for its contents and store the discovered information in our SHARD format, or they take the existing information from the SHARD and apply (overwrite) the content in the world. While iterating over the contents, it is possible to specify transformations and filters that influence how or if specific contents are applied to the world or SHARD.

Some SHARDs blur the lines between what is a schematic and what is a world. In this scenario, the clan bases are more like schematics, but since they are merged into the template, they are loaded as worlds.

The SHARDs can be inserted at any coordinate, and all data within the SHARD is transposed to that coordinate. This also applies to config positions and entities. It can be modified whether existing blocks and entities will be purged before inserting the SHARD or if only air will be replaced and the entities will be added alongside the existing inhabitants of the world. This can be controlled by the use of filters.

The Specification

We've strictly defined the specifications for the SHARD format so that we can create compatible implementations of the format involving different languages, environments, and technologies. The specification is publicly available on our GitHub repository and includes all the details about the binary structure of a SHARD file and the constraints that SHARD files need to satisfy.

Versioning

To add new functionality and allow for optimizations or adjustments, the concept of different versions or iterations was considered from the very start of the specification. Versions are meant to be improvements to previous increments, so there should not be any reason to prefer an old version over a newer version.

Scheme

A new version will be published each time a breaking change (that requires adjustments to either the serialization, deserialization, or integration) is necessary. They are neither forward- nor backward-compatible, but the SHARD format library never drops old versions and will try to deserialize each SHARD in its specified version. The version is always incremented to the next integer.

Optimizations (that do not change the underlying data) do not trigger version bumps. So if old unoptimized implementations still are able to serialize and deserialize SHARDs, optimizations can happen at any time. This could, for example, be an optimization to the system calls involved in loading the file or some internal caching that speeds up the read process.

History

We're currently still at version 0. It was released on the 17th of June, 2023, and still supports everything that we require from the format. It is likely, that we'll bump the version as soon as we need to support new features or when Minecraft adds new properties to regions that we need to account for. There are also various optimization possibilities that I'll further outline later in this article. If we could solve one of those underlying problems, this would also result in the creation of a new, enhanced version of the specification.

The Implementation

Since we've developed the SHARD format primarily to embed this information within our Minecraft server, our implementation needs to be compatible with the ecosystem. We use our personal Paper fork 'Cardboard' to run our servers and thus, Cardboard needs to be compatible with our Java technologies. We switched to exclusively using Kotlin for our Java development more than a year ago, so this was the obvious choice for our implementation.

About the Ecosystem

The Kotlin implementation of the SHARD format is very efficient, fully documented, and thoroughly tested as well as benchmarked. SHARDs are involved in all of our games, so it's important that they are robust, can be adjusted and maintained, and can be easily understood by all of our developers. We're using much of Kotlin's syntactic sugar and the latest Java features and APIs.

We're using ByteBuffers and asynchronous high-performance channels from Java's nio package to read and write the SHARD files. The buffers are pooled and reused as much as possible and if we want to apply compression, the channel is then pipelined through the compression/decompression step. This maximizes the performance and uses low-level optimizations whenever possible, without having to write our own JNI (Java Native Interface) methods.

All types, properties, and functions are documented in the codebase. This is especially important, as we use the SHARD library in all our projects and therefore new developers need to be able to quickly find all relevant information.

More about the data structures can be found in the specification. Kotlin offers some types to accurately match our specifications in a way that Java does not. We're using UInt, UByte and other unsigned variants for example. Additionally, we're working with Kotlin's object keyword to create singleton instances that represent empty sections to further cut down on the in-memory space consumption and make comparisons faster.

The code is then tested and benchmarked through Kotest on JUnit. This allows us to write our assertions and data preparations in an easier way. We can also make use of the descriptive 'should' test method, that we utilize throughout our applications. The coverage is then calculated by kover, as a modern replacement for JaCoCo.

We try to cover at least all individual units but also try to test for combinations of units, like serializing and deserializing an entire SHARD to see whether the information changed. This mimics real-world usages of the SHARD system and gives us more confidence in the robustness and correctness of the results of the SHARD format.

context("BufferPool.get(Capacity)") {
    should("not get existing ByteBuffer of smaller size") {
        // given
        val pool = BufferPool()
        val suppliedBuffer = ByteBuffer.allocate(6)
        suppliedBuffer.put(1)
        pool.give(suppliedBuffer)

        // when
        val buffer = pool[8]

        // then
        buffer.limit() shouldBe 8
        buffer.capacity() shouldBe 8
        buffer.position() shouldBe 0
        buffer shouldNotBe suppliedBuffer
        buffer.clear()
    }

    // …
}

An example test within our implementation that verifies one aspect of our own ByteBuffer pooling utility. This example uses the 'should' syntax from Kotest and the original context contains many additional tests.

But we don't want to rely on our gut feelings alone. We're performing an extensive analysis of all important vitals of our SHARD format with SonarQube. That's how we can be sure, that each increment of the implementation matches our quality standards and can be embedded within our libraries.

SonarQube includes many rules regarding security, reliability, maintainability, and test coverage. This is especially helpful, as we aim for continuous improvement and it's important to have metrics that allow us to track our progress in this endeavor.

Merge Requests are also checked against our quality gates and we require all new code to be tested. That is essential to avoid the build-up of any technical debt in the application. The rules also all come with their own rationale and explanation so that new developers can further understand the implications of their code.

The quality dashboard of shard-format within our instance of SonarQube. Everything is clean and the coverage is looking excellent.

The documentation is written in KDoc and processed through dokka. This allows us to generate beautiful HTML overview pages and can also be automatically embedded into its own jar that can then be used to fetch KDoc information in our other projects. The syntax is very similar to regular JavaDoc, but references, links, and other logical elements can be embedded a little more easily.

In our pipeline and locally, we execute benchmarks on our code through kotlinx-benchmark, which is based on Java Microbenchmark Harness (JMH). That's how we can check whether new changes made the conversion faster or had a negative impact. Through IntelliJ IDEA's Profiler, we can also generate flame graphs and other insights to check for bottlenecks in the process.

Finally, we perform linting through ktlint with the JetBrains styles. This makes the code more readable and maintainable. It is run within the pipeline and on every commit through a git commit pre-hook, effectively enforcing unified styling throughout the entire codebase.

Common Implementation

The shard-format library contains some common utilities and interfaces, that provide a stable foundation for the platform-specific integrations. We use no NBT types within the library and all data fields are platform-independent. That has the advantage, that our shard-format library can be used with any server platform, regardless of the underlying data classes. Therefore, shard-format could be used on Paper, Forge, Minestom, or any other server implementation.

SHARDs are split up into sections to improve the culling of empty data and to improve compression. This also allows us to asynchronously serialize/deserialize SHARDs. In this image, each color represents a different section. The stone is the only full section, while all other colors are incomplete with pink only consisting of a single block.

That is why shard-format handles SHARDs from a data-driven perspective. SHARDs are data. Their meaning is only created in the context of the specific integration. While one integration may view a SHARD as a schematic, another integration would consider the SHARD to be a world. Integrations can also add custom transformations and converters.

One aspect, that all integrations need, is the serialization and deserialization of SHARDs. We support the serialization/deserialization from/to files, channels, and streams. While all boil down to the channel implementation, files and streams are wrapped into optimized channels that make use of the specifics of their origin. The file-based channel, for example, uses optimized buffers and IO flags to speed up the writing and reading.

All IO-related functions reside within the ShardDataHandler. This handler is a Kotlin object. The Java counterpart would be a class with only static methods. And indeed: By using some Kotlin annotations we achieve complete compatibility with Java. The ShardDataHandler also offers a few useful constants for typical file extensions in the SHARD context (like .shard and .shard.zst).

SHARDs are currently always deserialized to MemoryShard, which is a full in-memory representation of said SHARD. We intentionally made Shard an interface though, as the format would also allow for streamed implementations. Those implementations would then only deserialize the data on-demand, cutting down on tospace consumption even further.

Additionally, our library adds utilities for merging/layering multiple SHARDs with each other. Layering blocks and biomes in real Minecraft worlds is very slow and often incurs a lot of block updates. That's why we try to merge those blocks before loading or embedding the SHARDs so that we can use optimized algorithms that can be executed asynchronously.

Our clan war maps for Clashlands are merged from five independent SHARDs. The base layer is the map itself (its base design), then both clan skins are layered with specific offsets and the filter to only replace air blocks, finally, both clan areas are layered with their offsets and without any filters (replacing all blocks there).

We support an arbitrary amount of layers and each layer can be of any size. The layers can additionally be configured with the following properties:

  • Shift: The offset to the origin (0, 0, 0) of the base layer. All three axes can assume any integer number and negative offsets effectively shift the origin of the resulting SHARD.
  • BlockMask: A predicate to filter which existing blocks (of lower layers) can be replaced by this layer. The default is to replace all blocks.
  • BiomeMask: A predicate to filter which existing biomes (of lower layers) can be replaced by this layer. The default is to replace all biomes.
  • PurgeConfigPositions: Whether ConfigPositions of lower layers, that are within the area of this layer, will be purged.
  • PurgeEntities: Whether entities of lower layers, that are within the area of this layer, will be purged.

Merging is difficult, because of the SHARD sections that can be offsetted from each other. Therefore, we need to carefully evaluate the block grid and take information for the resulting SHARD from different sections in each layer. As they are also of varying sizes and some sections are incomplete (smaller than 16x16x16), there are a lot of cases to consider.

The result of the operation is a new SHARD that contains only the merged blocks and biomes. This object has no references to the layers that were used in its creation and it can be used like any other SHARD. In fact, it is also a normal MemoryShard.

And then there is the transformation system. Transformations can be applied tooccurrences SHARDs to modify their content. A typical example is a ReplaceTransformation that replaces all occurrences of one material with another. Or a FlipTransformation that flips all blocks along some axis.

When rotating orientational blocks like stairs, levers, buttons, trapdoors, and many others, it is important to also rotate them around their own axis. Normal blocks can just be rotated around the center of the SHARD.

The shard-format library only implements the general interface and very easy transformations that can be performed without any knowledge of the details of materials. That's why RotationInformation is only implemented within our server platform. A rotation requires the rotation of the block palette indices, but also the rotation of the materials themselves. Stairs, for example, need to face in a different direction after the rotation.

There are different integrations for using SHARDs as schematics and for using them as worlds. In the following paragraphs, I'll outline how both of these integrations work and what their fundamental difference is.

Integration in Our Plugin Library (Schematics)

SHARDs may be loaded as schematics and be extracted or embedded from/to existing worlds. These snippets often contain individual structures, logical elements, or modules that are dynamically inserted and replaced during gameplay. That's why they cannot be merged into the underlying SHARD, as the world is already loaded at this point and players are already present.

To extract SHARDs from selections within existing worlds, we take a snapshot of the selection and copy all information asynchronously into a new MemoryShard. We write each section individually and serialize the NBT data from (tile-) entities into SNBT. There is the possibility to apply masks while scanning the contents so that specific blocks, biomes, or entities can be omitted (falling back to the defaults).

Before writing each section, we evaluate whether the resulting section would be empty and replace it with the optimized empty singleton in this case. This comes with some additional cost while creating the SHARD, but significantly speeds up the pasting of said SHARDs, as whole sections may be skipped.

SHARDs are placed by iterating over the individual sections, translating their relative coordinates into the absolute coordinates of the world, and applying the necessary changes, one by one. There might be more efficient ways to do this, but that is the current implementation. We would especially like to edit whole regions and send those changes as a single update to our players.

Extraction and embedding are both asynchronous and return a CompletableFuture to watch the progress of these operations. We measure the remaining time in a tick (SHARDs are processed last within each tick) and try to squeeze in as many changes as possible without creating server lag. That means SHARDs are placed faster if there's more time to spare in a tick.

The integration for SHARD creation and pasting is within our library (and not Cardboard, our server software), as code within the libraries is significantly easier to maintain. We don't need to create patches and account for merge conflicts and we just use the basic Bukkit methods anyways. Therefore, we added this integration to our plugin library, so that all plugins can use it.

Integration in Cardboard (Worlds)

As outlined above, it is also possible to load SHARDs as independent worlds. And that is the primary reason, that we created this system in the first place. We didn't just want to load a normal world and then slowly replace its content, but instead, dynamically load a SHARD natively as a world.

This is implemented through a patch in Cardboard. It adds the necessary wrappers to use any SHARD as a ServerLevel. The normal logic of world loading is skipped entirely, so there are no things like chunk population, generation, or any other normal step of loading a chunk. Instead, we skip right to the final ProtoChunk definition of each requested chunk.

Being able to load worlds natively within the server is beneficial for performance and maintainability. We don't need to inject any code through reflection or bytecode manipulation, but can instead just intercept the world loading with normal patches.

Chunks are loaded on demand (like with normal worlds) and the underlying SHARD sections are only requested, once the corresponding chunk is requested. SHARDs may be loaded with an offset (shifting the coordinates), but only in multiples of the length of one section (16 blocks), so that the grids still align. This allows us to perform various optimizations.

If a chunk only contains empty sections or is outside of the underlying SHARD, we use an optimized ProtoChunk that is never ticked, uses the same reference for all instances, and does not result in any unnecessary traffic. A side-effect of this is, that blocks cannot be modified in these chunks; neither by players nor by plugins. This is, however, actually an advantage for us, as it keeps the worlds clean.

Saving is completely disabled. SHARDs are meant to represent volatile instances of immutable templates and therefore, the idea of saving back entire worlds of SHARDs is out of the scope of our project. If fixed regions need to be saved though, that can be handled by extracting them as schematics.

Once chunks have been requested, they are kept loaded through chunk tickets, to speed up teleports and other operations within the world. This also allows us to keep persistent references to entities and tick the world as a whole, not caring about any chunk loading/unloading. Additionally, all chunks can be loaded on startup, if that is desired.

The levels are currently always backed by a MemoryShard. In the future, we want to explore the idea of a streamed SHARD implementation further, as worlds would especially profit from this, drastically cutting down on memory consumption and necessary time even further.

Sometimes it is necessary to keep a fixed (but empty) main world. That is because the main world is special in Minecraft servers. SHARD worlds can also be easily used as main worlds, but the main world cannot (easily) be unloaded or replaced at runtime. Therefore, we have to keep an empty main world, if we want to change the worlds dynamically. We've added a special config option for this in Cardboard and use a thoroughly optimized, empty non-ticking main world, based on the SHARD levels.

How We Use SHARDs at JustChunks

Originally, the SHARD format was developed to speed up the loading times of clan wars. Since there were 5 individual (dynamic) schematics, that needed to be merged, the loading took multiple minutes before. With the SHARD format, this went down to as little as a few milliseconds.

We store our Dungeon Explorer rooms within structured worlds to be able to easily change them and view all of them in one place. We've then added special ConfigPositions to export/update all of those rooms automatically with a single command.

After this initial success, we've since adapted the SHARD format for every world loading on our server. JustChunks offers lots of dynamic content with worlds that can be manipulated by the player either temporarily or permanently. By using the same format for all world loading and schematic needs, we can focus on optimizing this single bottleneck and can assert a specific behavior.

We currently use the format for these scenarios:

  • Lobby worlds (Like for our hub server or the Dungeon Explorer hub)
  • Dynamic (but building-restricted) worlds (Clashlands main world)
  • Build worlds (Clashlands clan islands)
  • Empty worlds (Clashlands clan war and clan island main worlds)
  • Module schematics (Dungeon Explorer rooms, Clashlands castles and structures)
  • Player-generated schematics (Clashlands clan bases)

The sizes of these SHARDs vary from tiny (3x3x3 blocks), in the case of structures, to gigantic (1600x180x1400 blocks), in the case of the Clashlands main world. That's why it's important that the algorithms of the integrations scale well and that the format can support very large worlds. The theoretical limit of the specification is 2,147,483,647 sections, which would contain 8,796,093,000,000 blocks, assuming all sections would be complete. That's more than enough room for all use cases.

SHARDs are either loaded and created dynamically (as is the case with clan islands, for example) or they are loaded on startup (hub server worlds). To further illustrate the dynamic case, we'll discuss how clan island servers are provisioned:

Our clan island servers are waiting in idle, watching changes through the Agones SDK (we've created our own Kotlin client). They don't let anyone connect and only have a non-ticking main world loaded. Once a specific label (agones.dev/sdk-clan-identifier) is attached, we inspect the UUID value. The corresponding clan information is then loaded and the SHARDs are requested from the object storage.

We merge the clan base into the selected clan island skin and then load the result as an independent world. Once that process is complete, we allow anyone to connect to the server.

Real-World Analysis With Production Data

To verify the reliability and performance of our SHARD format, we've conducted some tests and analyses. Speed was one of the primary factors that led to the development of our SHARD format, so we want to make sure that it satisfies our requirements and is as reliable as we need it to be.

While loading (worlds) and pasting (schematics) of SHARDs is also important, we will only focus on the capabilities of our format and the shard-format library that serializes and deserializes the SHARDs from/to the binary format. The performance of the integrations will be discussed in future articles and offers its own set of improvement possibilities and important metrics to consider.

All tests were performed on my local machine (i9-9900x, 64 GB DDR 4, Samsung NVMe M.2 SSD 970 EVO Plus 1TB). The files are stored within an EXT4 filesystem (default) and all zstd compression was done with the default settings (level 1). No other applications were used during the benchmarks, but as it is a graphical OS (Arch), interference cannot be ruled out.

The tests were performed with kotlinx-benchmark (which internally uses JMH) and at least 5 warmup runs were executed with 4 JVM forks. The shard-format library was compiled on OpenJDK 21.0.3+9-LTS. The same JDK was used to run the benchmarks. Each dot in the scatter plots represents a single sample run.

Our first tests are about the deserialization of SHARD files. Those files are compressed with zstd and are read from the local filesystem. The benchmark time includes the disk I/O, as well as the decompression from zstd, and therefore is a representative comparison to a real-world scenario.

These metrics have a confidence interval of [452.9, 457.8] and are therefore very precise, considering the length of the individual operations.

The Clashlands Main World is especially huge with ~ 400 million blocks and a decompressed size of more than 500 megabytes. This shows that even gigantic maps can be loaded in a reasonable amount of time. Ideally, those types of worlds would be streamed in the future, which would cut down their initial loading time even further. To illustrate a normal hub map, we've also deserialized the Dungeon Explorer Camp World:

The confidence interval is [23.7, 24.5] and the error margin is therefore less than 0.4 ms. The few outliers might be explained by some background processes that used the SSD during the benchmark.

It's visible, that the results are very reliable and consistent and that the average load time is less than half of the available time of a single server tick. That allows us to dynamically load worlds from SHARDs very quickly. As there are no references to any Minecraft objects, this can also be done completely asynchronously. Therefore, the SHARD format is well-equipped to handle most use cases effortlessly.

While reading (deserialization) is the more important operation, as it is performed way more frequently, the SHARD format can also boast excellent write (deserialization) performance. In our benchmarks, we wrote a MemoryShard to the filesystem, while also compressing it with zstd.

The serialization of this SHARD was benchmarked with a confidence interval of [471.6, 479.0]. With the 95th percentile at 493.5 we still need less than half of a second.

The performance of serializing, compressing, and storing SHARDs is almost identical to the opposite operation. SHARDs can be stored very efficiently and considering the latency of the SSD and the filesystem, the margin between the fastest execution and the 95th percentile is very small.

We've also done a benchmark for the serialization of the Dungeon Explorer Camp and the margin is even smaller. Writing can also be done asynchronously, so this is just a little bonus for us. Even the Dungeon Explorer Camp World is comparatively big. Tiny SHARDs like Dungeon Explorer rooms are serialized even faster.

The results for this benchmark have a confidence interval of [26.6, 26.7] with an error margin of less than 0.08. The results are so reliable, that there is only about a millisecond difference between the fastest and the slowest execution.

Another important factor is the compression ratio of SHARDs, as this allows us faster file transfers, easier storage, and reduced storage costs. The SHARD format is compression agnostic, it therefore does not rely on any specific compression algorithm or compression in general.

Nevertheless, we developed the format with compressibility in mind. To be easily compressible, a format should encourage the emergence of patterns within the data, as those can be extracted and used to compress the data. That increases the overall compression ratio and results in smaller files for the same amount of data that needs to be saved.

We use zstd whenever we need to save a SHARD file, as the compression is fast and very space-efficient. When SHARDs are read or written, we stream the resulting bytes through zstd and work with the processed byte stream. To measure our compression ratio, we compared the original and compressed file sizes of different, typical SHARDs that we use daily. This includes 250+ Dungeon Explorer rooms, 30 Clashlands clan bases, the Dungeon Explorer Camp, the Hub lobby, and 3 Clashlands clan war maps.

The x-axis is scaled logarithmically so that the differences in size can be easily distinguished. It can be seen, that the compression ratio is slightly growing, as the uncompressed size gets bigger.

The scatter plot shows that we're able to achieve compression ratios well above 90% for all of the measured SHARDs. The compression ratio also gets better for larger SHARDs, which is probably because of the file header, metadata, and mandatory data that is always present, regardless of the size of the SHARD.

The Clashlands Main World (which is more than 500 MB big) has a compression ratio of more than 99%. That means that it is less than 5 MB after the compression. We're very happy with these results, as most SHARDs are tiny after compression. Even our hub worlds are less than a megabyte.

How We Store the Data

Managing those SHARDs efficiently and accessing them in a unified manner is essential to unleashing their full potential. Our infrastructure is scheduled and managed through Kubernetes and therefore, all of our workloads are containerized. This means, that we have to somehow solve the provisioning of SHARD files for the individual containers.

A solution that makes this very easy is object storage. That is a service, that allows storing objects within the service and requesting those objects at a later time. There is some way to reference an object and the physical storage of the objects is handled by the service. Therefore, the application that uses the object storage, does not have to worry about any of the details.

An especially popular type of object storage is Amazon S3. If you're not familiar with this technology, here's a short explanation: S3 stands for Simple Storage Service and offers a standardized set of operations that can be performed by the clients. The service is centered around individual namespaces or repositories called buckets. It's possible to specify permissions for each individual user and bucket. Notable examples of operations are s3:PutObject, s3:DeleteObject and s3:GetObject.

While S3 is used to refer to the technology and schema, it was originally the name of the storage hosting offered as a part of AWS (Amazon Web Services). We don't use AWS S3 directly, but instead use SeaweedFS, hosted within our own cluster. SeaweedFS is compatible with the S3 API and therefore, we're free to switch to any other S3-compatible hosting whenever it is necessary.

We decided to do this because of the reduced latency and to cut down on storage costs, as our servers all come with their own hard drives anyway. And since switching is that easy with S3, we are flexible to switch to proper hosting if the need arises. Since we only reference the S3 paths in MongoDB, we just need to change the hostname and access keys in order to target another S3 storage.

Seaweed comes with its own Filer web interface. This allows us to view and download all stored files within their S3 buckets. Those files are referenced in the corresponding MongoDB documents and requested on-demand.

Our SeaweedFS cluster runs with 3 volume servers. Each volume server hosts multiple logical volumes and those volumes are replicated (redundantly stored) on at least two different physical nodes at different geographical locations. Backups are created once a day and stored on Amazon AWS with Velero. They are kept for two weeks.

The volumes are assigned to specific buckets and are abstracted away behind the filers. Those services can be used to automatically spread stored objects on the available volumes. But we only interact with the filers indirectly through the S3 adapters of SeaweedFS. So we send them a compliant S3 request and they communicate with the filers in the backend.

We configure our applications with their individual S3 access keys and secrets keys, the hostname to reach the S3 instance, and their individual bucket names. With this information, they can assemble appropriate requests to store new objects and request objects on the fly.

Optimization Possibilities

Even though we've gone to great lengths to optimize the SHARD format as much as possible, there are still a few problems we were not yet able to solve. Some of them are tricky, and there is no obvious solution, others have solutions that come with their own drawbacks. We will try to solve those problems in future iterations of the SHARD format and strive to optimize it even further! Below is a list of optimization possibilities that we currently track:

3D Biomes Save Unnecessary Data

Minecraft 1.18 introduced 3D biomes. This means that biomes are no longer guaranteed to be the same across all y levels, but instead can have different biomes at different heights. The SHARD format supports this information, but there is a small overhead in how we save that data. We save biome information for every block, although only every 16th block would be necessary. This is because biomes are saved for 4x4x4 cubes.

We chose to probe and store the biome for each block anyway, as the offsets when pasting SHARDs as a schematic or loading it as a world can cause the biome grid to shift. Therefore, the more accurate data can help to interpolate which biome should be where. If we find a better-suited algorithm, we could cut down on the biome data within the stored SHARDs and therefore reduce our memory footprint.

Originally introduced in the nether update, 3D biomes have quickly been adopted in custom world-building to make use of the different properties of the biomes. It's also possible to add custom biomes in the registries so that they can be used within SHARDs.

Bounds for Sections Could Be Saved As Bytes

Bounds for individual sections are saved as int, if the sections are not complete (16x16x16). This is often the case for the edge sections towards the upper bounds of the SHARD, as the total dimension of the SHARD is not a multiple of 16 for all axes. In these cases, we are wasting nine bytes per incomplete section, as the maximum size of a section is 16, which can comfortably fit into a byte. In fact, we could even store the bounds in 12 bits (3 nibbles) if we wanted to squash as much data as possible.

The reason why we still store the bounds as ints is that the conversion from byte to int would incur some cost for each section that we deserialize/serialize. Unless we find a more performant way to convert the types (or have more thorough benchmarks of the impact), we continue to store them as int. That way, we can use the same type as the in-memory representation of the bound axes.

NBT Data Could Be Expressed in Binary

We currently save all NBT data as SNBT (so the stringified form). That takes more space than the binary version of the same information in NBT would take. By serializing to binary instead, we could save both space and processing time. Considering there's possibly a lot of NBT data (entities, block entities), this could be a considerable optimization for the SHARD format.

We made this decision to write easier migrations across different data versions and to be able to debug problems with the stored data more easily. But our format is already binary, so it would be possible (and probably desirable) to represent NBT Compounds in binary as well. It is, however, crucial that the interfaces remain intuitive and that efficient encoders/decoders are used for the serialization and deserialization of NBT.

Light Data Could Be Cached

To speed up world loading within the Minecraft server, we could pre-populate light data (sky and block light) and use this data to load the world. This only takes a little amount of additional space (which we could even make optional) but could possibly drastically improve the world loading performance. In theory, this could even be merged into the world when embedding SHARDs as schematics, rendering most (if not all) light updates redundant.

Including light data in SHARDs increases the complexity of most transformation operations and especially the merging of SHARDs. To reliably modify light data during those operations, we would need to implement details about how light data is populated. That is a huge maintenance burden. Alternatively, we could just drop any light data once a modification has been applied to the SHARD.

Lighting is resource-intensive to calculate. Therefore, it could be beneficial to calculate all lighting beforehand and then only inject those mappings into the integration.

Future of the Format

We've put a lot of time and effort into our SHARD format. It has been in-use by us for months and we've fixed any bugs that we could find. Now that the initial version can be considered stable, there are some options for the future of the format that we're currently considering.

One option would be to release the format to the public and work on getting the format supported more broadly. For example in FAWE (FastAsyncWorldEdit), in Paper/a public fork, or within pages like PlanetMinecraft. Our reference library could be used to reliably serialize and deserialize SHARDs of all sizes and the individual platforms would add their own integrations.

By making our reference library open-source, other people could also make changes to enhance it and a bigger community may improve the format faster and discover more bugs. We've already released the entire specification, so that wouldn't be such a big step anymore.

Another option would be to create ports for other languages, like Rust and JavaScript. Having a Rust crate would allow us to perform some operations on SHARDs with an even faster performance. Since our shard-format library doesn't rely on any Minecraft internal objects, it would certainly be feasible to create a clean library crate for Rust.

Having a JavaScript port of shard-format would allow us to visualize SHARDs (for example in a canvas element) so that SHARDs could be easily displayed on websites and could maybe even be analyzed in dashboards. That would make the SHARD format even more versatile.

There are so many things we still can do and want to explore with our SHARD format. We're very proud of everything that we've already achieved and how much this format helped us to evolve our internal processes, infrastructure and game flows. And we've excited about everything that the future holds for us.

Want To Know More?

We're constantly working on exciting stuff like this and would love you to take part in the development of JustChunks. If you're just interested in more JustChunks-related development or want to get in touch with the community, feel free to read about our server news here or hop on our discord server!

Tritt dem JustChunks-Discord-Server bei!
Entdecke Minecraft von einer anderen Seite und erlebe spektakuläre Serverkonzepte! | 263 Mitglieder

Our Discord server for JustChunks. Join our community, discuss our development, or propose changes to features. If you'd like to talk with us, that is your point to go!