# Design of the Versioned HDF5 Library

In a previous post, we introduced the Versioned HDF5 library and described some of its features. In this post, we'll go into detail on how the underlying design of the library works on a technical level.

# Performance of the Versioned HDF5 Library

In several industry and science applications, a filesystem-like storage model such as HDF5 is the more appropriate solution for manipulating large amounts of data. However, suppose that data changes over time. In that case, it's not obvious how to track those different versions, since HDF5 is a binary format and is not well suited for traditional version control systems and tools.

In a previous post, we introduced the Versioned HDF5 library, which implements a mechanism for storing binary data sets in a versioned way that feels natural to users of other version control systems, and described some of its features. In this post, we'll show some of the performance analysis we did while developing the library, hopefully making the case that reading and writing versioned HDF5 files can be done with a nice, intuitive API while being as efficient as possible. The tests presented here show that using the Versioned HDF5 library results in reduced disk space usage, and further reductions in this area can be achieved with the use of HDF5/h5py-provided compression algorithms. That only comes at a cost of <10x file writing speed.

# PyTorch-Ignite: training and evaluating neural networks flexibly and transparently

Authors: Victor Fomin (Quansight), Sylvain Desroziers (IFPEN, France)
This post is a general introduction of PyTorch-Ignite. It intends to give a brief but illustrative overview of what PyTorch-Ignite can offer for Deep Learning enthusiasts, professionals and researchers. Following the same philosophy as PyTorch, PyTorch-Ignite aims to keep it simple, flexible and extensible but performant and scalable.

# Traitlets - an introduction & use in Jupyter configuration management

You have probably seen Traitlets in applications, you likely even use it. The package has nearly 5 million downloads on conda-forge alone.

## But, what is Traitlets ?

In this post we'll answer this question along with where Traitlets came from, its applications, and a bit of history.

# IPython reproducible builds

Starting with IPython 7.16.1 (released in June 2020), you should be able to recreate the sdist (.tar.gz) and wheel (.whl), and get byte for byte identical result to the wheels published on PyPI. This is a critical step toward being able to trust your computing platforms, and a key component to improve efficiency of build and packaging platforms. It also potentially impacts fast conda environment creation for users. The following goes into some reasons for why you should care.

# Introducing Versioned HDF5

The problem of storing and manipulating large amounts of data is a challenge in many scientific computing and industry applications. One of the standard data models for this is HDF5, an open technology that implements a hierarchical structure (similar to a file-system structure) for storing large amounts of possibly heterogeneous data within a single file. Data in an HDF5 file is organized into groups and datasets; you can think about these as the folders and files in your local file system, respectively. You can also optionally store metadata associated with each item in a file, which makes this a self-describing and powerful data storage model.

# Designing with and for developers

Open source is notorious for lack of design presence, enough so that my search to prove this fact has turned up nearly nothing. There’s many ways that such a gap in community might manifest, but one that I never anticipated was working with developers that had never interacted with a designer before.

A quick note for context: I’m writing this as a UX/UI designer working with open source projects for a little over a year. Because there are so many ways design processes can happen (enough to warrant its own blog post), this post is not intended to discuss design process deeply. My goal here is to pass on some of what I’ve learned that helps me design in this unusual space in hopes that it can help someone else. This post might seem most relevant for designers, but I think this experience could be helpful for developers as well.

# Quansight Labs: what I learned in my first 3 months

I joined Quansight at the beginning of April, splitting my time between PyTorch (as part of a larger Quansight team) and contributing to Quansight Labs supported community-driven projects in the Python scientific and data science software stack, primarily to NumPy. I have found my next home; the people, the projects, and the atmosphere are an all around win-win for me and (I hope) for the projects to which I contribute.