The Labs team shares their news, updates, and knowledge

Posts, articles and tutorials

blog

header

Complications of having all these resources and methods to manage them.

There are lots of accessibility resources

A recap of the first year of work on enabling support for the free-threaded build of CPython in community packages.

The first year of free-threaded Python

Highlights the work done to improve developer experience at SciPy, specifically on supporting Intel oneAPI/MSVC and spin

Enhancing Developer Experience at SciPy - Intel oneAPI/MSVC Support and Migrating to spin

Yes, list comprehensions can belong in SQL (!)

Mastering DuckDB when you're used to pandas or Polars: part 2

Presenting our 2024 annual report! Read about our open source project and community highlights, initiatives, and work culture.

Quansight Labs Annual Report 2024: Year of focus and execution

Mastering DuckDB when you're used to pandas or Polars: part 1

Analysis of PEP 517 build backends used in 8000 top PyPI packages

PEP 517 build system popularity

In 2022 we were awarded a CZI EOSS grant for conda-forge. The proposal, co-submitted by Quansight Labs and QuantStack, targeted three areas: maintaining and improving conda-forge's infrastructure, creating a new maintainer's dashboard, and implementing OCI-based mirroring for the packages. This work has now concluded and we would like to publish a summary of what we achieved!


Two years of contributions to conda-forge: work done during our CZI EOSS 5 grant

Our work for the napari project resulted in multiple beneficial side effects for the conda packaging ecosystem.

From napari to the world: how we generalized the `conda/constructor` stack for distributing Python applications

The story of the first shared library to make it into the world of low level code that lies beneath SciPy's surface.

libsf_error_state: SciPy's first shared library

Support everything, depend on (almost) nothing

Universal dataframe support with the Arrow PyCapsule Interface + Narwhals

Implementing LAPACK routines for numerical computation in web applications

LAPACK in your web browser: high-performance linear algebra with stdlib

Event resources are here! With October over and the end of another year galloping our way, the Scientific Python Accessibility Events have come and gone in a flurry of discussion, ideas, and, of course, many many questions on how accessibility fits into our work.

Practicing Accessibility: Scientific Python Accessibility Events in Summary

A closer look at non-elementary group-by aggregations

The Polars vs pandas difference nobody is talking about

Lessons learned during my three-month internship in 2024 at Quansight on adding accessibility testing to Bokeh.

Testing Bokeh's Accessibility: A Web Developer's Experience with a Python Data Library

Introducing the new data-type for Numpy providing cross-platform support of quadruple precision.

Numpy QuadDType: Quadruple Precision for Everyone

Increasing ease-of-use of Polars plugins by improving an existing tutorial.

Polars Plugins: let's make them easier to use

My work focused on extending support for COOrdinate sparse arrays in SciPy to n-dimensions.

Multi-dimensional Sparse Arrays in SciPy

Extending SciPy's integration facilities for multidimensional and array-valued integrands.

Multidimensional integration in SciPy

I am happy to announce two upcoming public events focused on helping the scientific Python ecosystem develop their accessibility skills before the new year.

Announcing Scientific Python Accessibility Events

A small showcase of accessibility improvements made to the PyData Sphinx Theme, Fall 2023-Spring 2024

Towards Inclusive Documentation: the PyData Sphinx Theme, Before and After Accessibility Fixes

Making the documentation more user friendly and how benchmarks were integrated in pydata sparse.

Pydata/Sparse: Maintenance and docs overhaul

Meet the interns joining us for our third-annual summer internship.

Introducing the 2024 Labs Internship Program Cohort

In this blog post, I describe my experience as a first-time contributor to NumPy and talk about the story behind `np.top_k`.

The convoluted story behind `np.top_k`

An overview of the ongoing efforts to improve and roll out support for free-threaded CPython throughout the Python open source ecosystem

Free-threaded CPython is ready to experiment with!

An overview of the different options available for working with sparse arrays in Python

An overview of the Sparse Array Ecosystem for Python

Why you probably shouldn't be using `df.resample('M')`

And how your Python library can become dataframe-agnostic too

How Narwhals and scikit-lego came together to achieve dataframe-agnosticism

An overview of the dataframe landscape, and solution to the "we only support pandas" problem

Dataframe interoperability - what has been achieved, and what comes next?

The journey of writing string ufuncs and creating the np.strings namespace for NumPy 2.0

Writing fast string ufuncs for NumPy 2.0

Presenting our 2023 annual report! Read about our open source project and community highlights, initiatives, and work culture.

Quansight Labs Annual Report 2023: Building for Ecosystem-wide Impact and Sustainability

What are those words on the bottom of your video screen and where do they come from? Captioning’s normalization in the past several decades may seem like it would render those questions moot, but understanding more about captions means making more informed decisions about when, how, and why we make sure information is accessible.

Captioning: A Newcomer’s Guide

A quick overview of the new Numba engine in DataFrame.apply

Unlocking C-level performance in pandas.DataFrame.apply with Numba

We are excited to spread the news about the improvements that have been taking place in CuPy, where 18 interpolation and more than 100 signal processing parallel GPU APIs are now available as part of a EOSS4 CZI grant.

Improving the interpolation and signal processing capabilities of CuPy

Moving SciPy to Meson meant finding a different Fortran compiler on Windows, which was particularly tricky to pull off for conda-forge. This blog tells the story about how things looked pretty grim for the Python 3.12 release, and how things ended up working out just in the nick of time.

The 'eu' in eucatastrophe – Why SciPy builds for Python 3.12 on Windows are a minor miracle


My work was focused on improving NumPy support in Numba, with focus on the polynomial package.

Adding support for polynomials to Numba

A journey through NumPy's Python API from a maintenance perspective.

Refining NumPy's Python API for its 2.0 release

SymPy's documentation has received many significant improvements over the past two years thanks to funding by the Chan Zuckerberg Initiative.

Improving SymPy's Documentation

Doctesting for PyData Libraries

Gives an introduction to the utility of hypothesis in SymPy

Integrating Hypothesis into SymPy

How can SciPy use the Array API Standard to achieve array library interoperability?

The Array API Standard in SciPy

A summary of my contributions to the Code-Generator project and PyTorch-Ignite ecosystem in the past few months as Quansight Labs intern and my learnings in the process.

Bridging Data Science Tools with PyTorch-Ignite's Code-Generator and Nebari

In this blog post, we share how scikit-learn enabled support for the Array API Standard.

Array API Support in scikit-learn

In this post I'm sharing my experience of traveling to the US for PyCon US 2023

PyCon US 2023 - An action-packed week

In the following blogpost, we will explore the newly added feature in Numba: Dynamic exception support. We will discuss the previous limitations and explain how Numba was enhanced to handle runtime exceptions.

Numba Dynamic Exceptions

Presenting our first annual report! Read about our project achievements, community initiatives, and work culture.

Quansight Labs Annual Report 2022: Celebrating Growth and Sustainability in Open Source

Potential solutions for pain points when dealing with native code; what needs unifying in the Python packaging space, and how should that be approached? 

Python packaging & workflows - where to next?

Blogpost of working on the PyTorch-Ignite project during internship at Quansight

Sangho's Internship at Quansight with PyTorch-Ignite project

Surbhi Sharma shares her exciting experience working as an intern at Quansight Labs and contributing to condacolab, a tool that lets you deploy a Miniconda installation easily on Google Colab notebooks. This enables you to use conda or mamba to install new packages on any Colab session.

Conda on Colaboratory

Kulsoom Zahra learns about accessibility and fixes a part of the JupyterLab interface (that used to break when zoomed in) during her summer 2022 internship at Quansight Labs.

Zoom zoom zoom! Improving Accessibility in JupyterLab

accessible-pygments hosts curated WCAG-compliant themes for all your syntax highlighting needs.

Making pygments accessible

In this blogpost, I share my experience as a Google Season of Docs 2022 technical writer working on updating the Editor user documentation.

The new Spyder Editor documentation under the spotlights!

Learning from awesome mentors and contributing to pandas open source

Close Encounter with pandas and the Jedis of open source

We are delighted to share details about new grants to support the sustainability of SciPy, conda-forge, and CuPy

Quansight Labs awarded three CZI EOSS Cycle 5 Grants

The Nebari CLI consists of various commands the user needs to run to initialize, deploy, configure, and update Nebari.

Developing a Typer CLI for Nebari

Quansight Labs is delighted to welcome its second cohort of 6 interns, who will work on a variety of open source projects and tasks

Introducing the 2022 Interns Cohort

Announcing the SciPy 2022 Accessibility Awareness Efforts

SciPy 2022 Accessibility Awareness Programs

A non-exhaustive but totally honest checklist for accessibility review

Checking for accessibility: thoughts and a checklist!

The development story of a developer command-line interface (CLI) for the SciPy project, with exmaples

The evolution of the SciPy developer CLI

In our weekly show and tell we got real about "why can writing blog  posts be so hard?" and collaboratively wrote up this blog post about what we learned  from the discussion.

Why is writing blog posts hard?

How we can use the Python Array API Standard with the fundamental libraries in the PyData ecosystem along with CuPy for making GPUs accessible to the users of these libraries

Making GPUs accessible to the PyData Ecosystem via the Array API Standard.

The Chan Zuckerberg Initiative has funded efforts to make the Jupyter ecosystem, starting with JupyterLab, more accessible. As a part of these increased efforts, the team will be providing a periodically updated list of what is currently being worked on and what is coming soon.

Jupyter accessibility efforts have a roadmap!

Grayskull is an automatic conda recipe generator, with a focus on conda-forge.

Conda and Grayskull, the Masters of Software Packaging

This is a companion post from the Official release of IPython 8.0. We hope it will help you apply best practices, and have an easier time maintaining your projects, or helping other.

IPython 8.0, Lessons learned maintaining software

A lot of us showed up for the code, but hung around for the community. We'll continue this post talking about the monthly Jupyter community calls, and how they help all jovyans, Project Jupyter's pet name for their developers and users, stay connected.

A year of Jupyter community calls

In this post, we aim to articulate that vision and suggest a path to making it concrete, focusing on three libraries at the core of the PyData ecosystem: SciPy, scikit-learn and scikit-image. 

A vision for extensibility to GPU & distributed support for SciPy, scikit-learn, scikit-image and beyond

My work was majorly focused on providing performance benchmarks to NumPy in realistic situations. The target was to show the world that NumPy is efficient in handling quasi real-life situations too.

NumPy Benchmarking

In the next lines, I"ll try to capture my experience at Quansight Labs as an intern working on the cuDF implementation of the dataframe interchange protocol. cuDF is a dataframe library very much like pandas which operates on the GPU in order to benefit from its computing power.

Dataframe interchange protocol: cuDF implementation

An exploration of a method to call C++ libarary function from Numba compiled functions, i.e. python function that are decorated with "numba.jit(nopython=True)"

An efficient method of calling C++ functions from numba using clang++/ctypes/rbc

Let's dive into the technicalities involved in array libraries interoperability and understand the protocols making this a reality.

Array Libraries Interoperability

In this blog post I talk about the projects and my work during my internship at Quansight Labs. My efforts were geared towards re-engineering CI/CD pipelines for SciPy to make them more efficient to use with GitHub Actions.

Re-Engineering CI/CD pipelines for SciPy

The work I briefly describe in this blog post is the implementation of the dataframe interchange protocol into Vaex which I was working on through the three month period as a Quansight Labs Intern.

Dataframe interchange protocol and Vaex

Low-code contribution workshops offer a way for folks to join a project's contributors list without the technical overhead. In this blog we present a workshop format that relies solely on the GitHub web interface.

Low-code contributions through GitHub

How we ended up with a discussion about where issues with JupyterLab’s built-in code editors fit on the list of immediate accessibility priorities.

Not a checklist: different accessibility needs in JupyterLab

A small team started working on making NumPy more accessible, specifically its website and documentation. They weren’t experts in accessible technology. In fact, they feared that they didn’t know enough. Yet they strongly believed that accessible technology is a right every person should access and that NumPy could be accessible too.

Making Numpy Accessible: Guidelines and Tools

We are thrilled to announce that the team at Quansight Labs has been awarded five EOSS Cycle 4 grants to work on several projects within the PyData ecosystem. This post will introduce the successful grantees and their objectives for these two-year long grants.

CZI EOSS4 Grants at Quansight Labs

Instead of going through all the complications involved in renting or acquiring dedicated hardware, setting up credentials and monitoring costs of a benchmarkin suite, we hoped we could use the same free cloud resources normally used for CI tests. Ideally, GitHub Actions.

Is GitHub Actions suitable for running benchmarks?

SciPy now builds with Meson on Linux, and the full test suite passes! This is a pretty exciting milestone, and good news for SciPy maintainers and contributors - they can look forward to much faster builds and a more pleasant development experience.

Moving SciPy to the Meson build system

PyTorch-Ignite is a practical solution, a high-level library from the PyTorch ecosystem for training neural networks. It is designed to simplify workflow development while maintaining maximum control, flexibility, and reproducibility. PyTorch-Ignite feels like a natural extension to PyTorch.

Introducing PyTorch-Ignite's Code Generator v0.2.0

In this blogpost we present pyflyby, a project and an extension to IPython and JupyterLab, that, among many things, automatically inserts imports and tidies Python files and notebooks.

Pyflyby: Improving Efficiency of Jupyter Interactive Sessions

Writing agnostic distributed code that supports different platforms, hardware configurations (GPUs, TPUs) and communication frameworks is tedious. In this blog, we will discuss how PyTorch-Ignite solves this problem with minimal code change.

Distributed Training Made Easy with PyTorch-Ignite

With the support of a team member with prior experience auditing for accessibility, we pinpointed specific ways in which JupyterLab lacked support for accessibility broken up by WCAG 2.1 standards.

Putting out the fire: Where do we start with accessibility in JupyterLab?

Today I want to look into a topic that has not evolved much since, and I believe could use an upgrade. Accessing interactive Documentation when in a Jupyter session, and what it could become. At the end I"ll link to my current prototype if you are adventurous.

Rethinking Jupyter Interactive Documentation

As a community manager in the Spyder team, I have been looking for ways of involving more users in the community and making Spyder useful for a larger number of people. With this, a new idea came: Education. For the past months, we have been wondering with the team whether Spyder could also serve as a teaching-learning platform, especially in this era where remote instruction has become necessary.

A step towards educating with Spyder

For contributors to the PyTorch codebase, one of the most commonly encountered C++ classes is TensorIterator. Recently, however, the interface has changed significantly. This post describes how to use the current interface as of April 2021.

PyTorch TensorIterator Internals - 2021 Update

Starting from Numba 0.53, Numba will ship with an enhanced version of the @guvectorize decorator. Similar to the @vectorize decorator, @guvectorize now has two modes of operation: Eager, or decoration-time compilation and Lazy, or call-time compilation

Enhancements to Numba's guvectorize decorator

This blog post contains the raw output of the 30-minute brainstorm (only cleaned up for textual issues) and my annotations on it (in italics) which capture some of the discussion during the session and links and context that may be helpful.

Python packaging in 2021 - pain points and bright spots

In this blog post we will focus specifically on recent improvements that have been made to SciPy's interpolation functions. A recent NumFOCUS small development grant awarded to the SciPy developers allowed a dedicated effort to fix existing bugs in boundary handling and improve the documentation of the behavior of these modes.

Making SciPy's Image Interpolation Consistent and Well Documented

Today I'm incredibly excited to welcome Tania Allard to Quansight as Co-Director of Quansight Labs. Tania is a well-known and prolific PyData community member. In the past few years she has been involved as a conference organizer (JupyterCon, SciPy, PyJamas, PyCon UK, PyCon LatAm, JuliaCon and more), as a community builder (PyLadies, NumFOCUS, RForwards), as a contributor to Matplotlib and Jupyter, and as a regular speaker and mentor. She also brings relevant experience in both industry and academia - she joins us from Microsoft where she was a senior developer advocate, and has a PhD in computational modelling.

Welcoming Tania Allard as Quansight Labs co-director

In the following blog post, we will show you step-by-step how you can develop a custom theme for JupyterLab, distribute it, and take the example of the jupyterlab-theme-winter theme we release today to celebrate the end of 2020.

Develop a JupyterLab Winter Theme

This new grant totaling $140,000 will fund part of our efforts to improve usability and sustainability in both projects and is excellent news for the scientific computing community, which will certainly benefit from this work downstream.

A second CZI grant for NumPy and OpenBLAS

This blog post is a conversation between two contributors, sharing their observations, experiences, and their hope for a stronger collaboration between design and development in open source.

Introduction to Design in Open Source

We'll highlight the ability of Ibis to generically prescribe query expressions across different data storage systems.

Querying multiple backends with Ibis

The basic installation format for users who install packages via pip is the wheel format. But what about Linux? The first manylinux standard, manylinux1, was based on CentOS5 which has been obsolete since March 2017. So what is next for manylinux, and what manylinux should users and package maintainers use?

Manylinux1 is obsolete, manylinux2010 is almost EOL, what is next?

In this post, I will detail some of the open source work that I have done recently, both as part of my open source consulting, and as part of my work on SymPy for Quansight Labs.

Quansight Labs Work Update for September, 2019

In this post, we'll go into detail on how the underlying design of the library works on a technical level. Versioned HDF5 is a library that wraps h5py and offers a versioned abstraction for HDF5 groups and datasets.

Design of the Versioned HDF5 Library

In this post, we'll show some of the performance analysis we did while developing the library, hopefully making the case that reading and writing versioned HDF5 files can be done with a nice, intuitive API while being as efficient as possible. The tests presented here show that using the Versioned HDF5 library results in reduced disk space usage, and further reductions in this area can be achieved with the use of HDF5/h5py-provided compression algorithms.

Performance of the Versioned HDF5 Library

You have probably seen Traitlets in applications, you likely even use it. The package has nearly 5 million downloads on conda-forge alone. But, what is Traitlets? In this post we'll answer this question along with where Traitlets came from, its applications, and a bit of history.

Traitlets - an introduction & use in Jupyter configuration management

Starting with IPython 7.16.1 (released in June 2020), you should be able to recreate the sdist (.tar.gz) and wheel (.whl), and get byte for byte identical result to the wheels published on PyPI. This is a critical step toward being able to trust your computing platforms, and a key component to improve efficiency of build and packaging platforms. It also potentially impacts fast conda environment creation for users. The following goes into some reasons for why you should care.

IPython reproducible builds

HDF5 is an open technology that implements a hierarchical structure (similar to a file-system structure) for storing large amounts of possibly heterogeneous data within a single binary file, using regular version control tools (such as git) may prove difficult. The Versioned HDF5 library is a versioned abstraction on top of h5py, that allows you to keep a record of which changes occurred to your HDF5 files, and enables you to recover previous versions of this file.