3 min read

Oxen v0.25.0 Migration

Oxen v0.25.0 Migration

Today we released oxen v0.25.0 πŸŽ‰ which comes with a few performance optimizations, including how we traverse the Merkle Tree to find files and folders. The main improvement is how quickly we can load individual files on the oxen-server . Unfortunately, these speed optimizations require a migration for all local repositories.

~ TLDR ~ Just run the following commands to update Oxen and your local repository:

# Update Oxen
brew tap Oxen-AI/oxen
brew update
brew install oxen

# Update your repository
cd /path/to/your/oxen/repo
oxen migrate up add_child_counts_to_nodes .

# continue committing, pushing, pulling as usual

That’s it! If you don’t care about the low level details, feel free to tune out now, but if you do, we have a series of posts explaining the underlying data structures if you want to dive in πŸ€“

The Motivation

We noticed loading data frames with images becoming a bit slow on our public web hub. For example, the training dataset from ImageNet has 1.1 million files in the images/train folder. When we load the Data Frame referencing all the images, we make an HTTP request to load each individual image. These HTTP requests were becoming slower and slower the larger its parent directory was.

When peeling back the layers, we realized each request to load a file was taking ~1-2 seconds, overloading the server in the case of the data frame above. Under the hood, we were loading pointers to every single child of the directory just to fetch the individual file. Even though they are just file pointers, this got expensive when loading a directory like the ImageNet one. We ended up unnecessarily reading in a list of 1 million file paths per request from disk into memory.

The Fix

We needed these pointers solely to get the count of children in the directory. So the natural fix is reflected in the name of the migration: add_child_counts_to_nodes . The migration simply iterates over the Merkle Tree and adds count fields so that we can load the number of children in a directory without loading all the pointers. As a result, the images in this data frame load almost instantaneously. Easy fixes for the win πŸŽ‰

We are working on making data migrations easier for Oxen, but for now, don't forget to install the new version of oxen and run:

oxen migrate up add_child_counts_to_nodes .

If you want to learn more about the underlying Merkle Tree structure, or specifically how we traverse to file nodes, I would suggest you check out our Merkle Tree 101 post or VNodes post.

To learn more about Merkle Trees in general:

🌲 Merkle Tree 101 | Oxen.ai
Intro Merkle Trees are important data structures for ensuring integrity, deduplication, and verification of data at scale. They used heavily in tools such as git, Bitcoin, IPFS, BitTorrent, and other popular distributed collaboration tools. When we started working on Oxen we decided to build our own Merkle Tree from scratch so we had full control over performance down to the hashing algorithm, read/write access patterns, network protocols, and format on disk. Why should you care? You probabl

To learn about some of the internals of Oxen.ai's Merkle Tree:

🌲 Merkle Tree VNodes | Oxen.ai
In this post we peel back some of the layers of Oxen.ai’s Merkle Tree and show how we make it suitable for projects with large directories. If you are unfamiliar with Merkle Trees or just want a refresher, we suggest you checkout our introductory post on Merkle Trees first. 🌲 Merkle Tree 101 | Oxen.aiIntro Merkle Trees are important data structures for ensuring integrity, deduplication, and verification of data at scale. They used heavily in tools such as git, Bitcoin, IPFS, BitTorrent, and ot

Best & Moo,
Oxen.ai Herd 🀝 πŸ‚