Read Our Blog

Welcoming Tania Allard as Quansight Labs co-director

Tania Allard.

Today I'm incredibly excited to welcome Tania Allard to Quansight as Co-Director of Quansight Labs. Tania (GitHub, Twitter, personal site) is a well-known and prolific PyData community member. In the past few years she has been involved as a conference organizer (JupyterCon, SciPy, PyJamas, PyCon UK, PyCon LatAm, JuliaCon and more), as a community builder (PyLadies, NumFOCUS, RForwards), as a contributor to Matplotlib and Jupyter, and as a regular speaker and mentor. She also brings relevant experience in both industry and academia - she joins us from Microsoft where she was a senior developer advocate, and has a PhD in computational modelling.

Read more…

Develop a JupyterLab Winter Theme

JupyterLab 3.0 is about to be released and provides many improvements to the extension system. Theming is a way to extend JupyterLab and benefits from those improvements.

While theming is often disregarded as a purely cosmetic endeavour, it can greatly improve software. Theming can be great help for accessibility, and the Jupyter team pays attention to making the default appearance accessibility-aware by using sufficient contrast. For users with a high visual acuity you may also choose to increase the information density.

Theming can also be a great way to improve communication by increasing or decreasing emphasis of the user interface, which can be of use for teaching or presenting. Theming may also help with security, for example, by having a clear distinction between staging and production.

Finally Theming can be a great way to express oneself, for example, by using a branded version of software that fits well into a context, or expressing one's artistic preferences or opinions.

In the following blog post, we will show you step-by-step how you can develop a custom theme for JupyterLab, distribute it, and take the example of the jupyterlab-theme-winter theme we release today to celebrate the end of 2020.

Read more…

A second CZI grant for NumPy and OpenBLAS

I am happy to announce that NumPy and OpenBLAS have once again been awarded a grant from the Chan Zuckerberg Initiative through Cycle 3 of the Essential Open Source Software for Science (EOSS) program. This new grant totaling $140,000 will fund part of our efforts to improve usability and sustainability in both projects and is excellent news for the scientific computing community, which will certainly benefit from this work downstream.

Read more…

Introduction to Design in Open Source

This blog post is a conversation. Portions lead by Tim George are marked with TG, and those lead by Isabela Presedo-Floyd are marked with IPF.

TG: When I speak with other designers, one common theme I see concerning why they chose this career path is they want to make a difference in the world. We design because we imagine a better world and we want to help make it real. Part of the reason we design as a career is we're unable to go through life without designing; we're always thinking about how things are and how they could be better. This ethos also exists in many open-source communities. It seems like it ought to be an ideal match.

So what's the disconnect? I'm still exploring that myself, but after a few years in open source I want to share my observations, experiences, and hope for a stronger collaboration between design and development. I don't think I have a complete solution, and some days I'm not even sure I grasp the entire problem. What I hope is to say that which often goes unsaid in these spaces: design and development skills in open source coexist precariously.

Read more…

Querying multiple backends with Ibis

In our recent Ibis post, we discussed querying & retrieving data using a familiar Pandas-like interface. That discussion focused on the fluent API that Ibis provides to query structure from a SQLite database—in particular, using a single specific backend. In this post, we'll explore Ibis's ability to answer questions about data using two different Ibis backends.

import ibis.omniscidb, dask, intake, sqlalchemy, pandas, pyarrow as arrow, altair, h5py as hdf5

Ibis in the scientific Python ecosystem

Before we delve into the technical details of using Ibis, we'll consider Ibis in the greater historical context of the scientific Python ecosystem. It was started by Wes McKinney, the creator of Pandas, as way to query information on the Hadoop distributed file system and PySpark. More backends were added later as Ibis became a general tool for data queries.

Throughout the rest of this post, we'll highlight the ability of Ibis to generically prescribe query expressions across different data storage systems.

Read more…

Manylinux1 is obsolete, manylinux2010 is almost EOL, what is next?

The basic installation format for users who install packages via pip is the wheel format. Wheel names are composed of four parts: a package-name-and-version tag (which can be further broken down), a Python tag, an ABI tag, and a platform tag. More information on the tags can be found in PEP 425. So a package like NumPy will be available on PyPI as numpy-1.19.2-cp36-cp36m-win_amd64.whl for 64-bit windows and numpy-1.19.2-cp36-cp36m-macosx_10_9_x86_64.whl for macOS. Note that only the plaform tag win_amd64 or macosx_10_9_x86_64 differs.

But what about Linux? There is no single, vendor controlled, "Linux platform" e.g., Ubuntu, RedHat, Fedora, Debian, FreeBSD all package software at slightly different versions. What most Linux distributions do have in common is the glibc runtime library, and a smattering of various additional system libraries. So it is possible to define a least common denominator (LCD) of software expected to be on a Linux platform (exceptions apply, e.g. non-glibc distributions).

The decision to converge on a LCD common platform gave birth to the manylinux1 standard. Going back to our example, numpy is available as numpy-1.19.2-cp36-cp36m-manylinux1_x86_64.whl.

The first manylinux standard, manylinux1, was based on CentOS5 which has been obsolete since March 2017. The subsequent manylinux2010 standard is based on CentOS6, which will hit end-of-life in December 2020. The manylinux2014 standard still has some breathing room. Based on CentOS7, it will reach end-of-life in July 2024.

So what is next for manylinux, and what manylinux should users and package maintainers use?

Read more…

Performance of the Versioned HDF5 Library

In several industry and science applications, a filesystem-like storage model such as HDF5 is the more appropriate solution for manipulating large amounts of data. However, suppose that data changes over time. In that case, it's not obvious how to track those different versions, since HDF5 is a binary format and is not well suited for traditional version control systems and tools.

In a previous post, we introduced the Versioned HDF5 library, which implements a mechanism for storing binary data sets in a versioned way that feels natural to users of other version control systems, and described some of its features. In this post, we'll show some of the performance analysis we did while developing the library, hopefully making the case that reading and writing versioned HDF5 files can be done with a nice, intuitive API while being as efficient as possible. The tests presented here show that using the Versioned HDF5 library results in reduced disk space usage, and further reductions in this area can be achieved with the use of HDF5/h5py-provided compression algorithms. That only comes at a cost of <10x file writing speed.

Read more…

PyTorch-Ignite: training and evaluating neural networks flexibly and transparently

Authors: Victor Fomin (Quansight), Sylvain Desroziers (IFPEN, France)
This post is a general introduction of PyTorch-Ignite. It intends to give a brief but illustrative overview of what PyTorch-Ignite can offer for Deep Learning enthusiasts, professionals and researchers. Following the same philosophy as PyTorch, PyTorch-Ignite aims to keep it simple, flexible and extensible but performant and scalable.

Read more…