Quansight Labs (old posts, page 7)

The work I briefly describe in this blog post is the implementation of the dataframe interchange protocol into Vaex which I was working on through the three month period as a Quansight Labs Intern.

Dataframe protocol will enable data interchange between different dataframe libraries for example cuDF, Vaex, Koalas, Pandas, etc. From all of these Vaex is the library for which the implementation of the dataframe protocol was attempted. Vaex is a high performance Python library for lazy Out-of-Core DataFrames. Connection between dataframe libraries with dataframe protocol

About | What is all that?

Today there are quite a number of different dataframe libraries available in Python. Also, there are quite a number of, for example, plotting libraries. In most cases they accept only the general Pandas dataframe and so the user is quite often made to convert between dataframes in order to be able to use the functionalities of a specific plotting library. It would be extremely cool to be able to use plotting libraries on any kind of dataframe, would it not?

Low-code contributions through GitHub

Isabela Presedo-Floyd Mars Lee Melissa Weber Mendonça Tony Fast

2021-09-28

Comments

Healthy, inclusive communities are critical to impactful open source projects. A challenge for established projects is that the history and implicit technical debt increase the barrier to contribute to significant portions of code base. The literacy of large code bases happens over time through incremental contributions, and we'll discuss a format that can help people begin this journey.

At Quansight Labs, we are motivated to provide opportunities for new contributors to experience open source community work regardless of their software literacy. Community workshops are a common format for onboarding, but sometimes the outcome can be less than satisfactory for participants and organizers. In these workshops, there are implicit challenges that need to be overcome to contribute to projects' revision history like Git or setting up development environments.

Our goal with the following low-code workshop is to offer a way for folks to join a project's contributors list without the technical overhead. To achieve this we'll discuss a format that relies solely on the GitHub web interface.

Not a checklist: different accessibility needs in JupyterLab

Isabela Presedo-Floyd

2021-09-14

Comments

JupyterLab Accessibility Journey Part 3

In a pandemic, the template joke-starter “x and y walk into a bar” seems like a stretch from my reality. So let’s try this remote version:

Two community members with accessibility knowledge enter a virtual meeting room to talk about JupyterLab. They’ve both updated themselves on GitHub issues ahead of time. They’ve both identified major problems with the interface. They both get ready to express to the rest of the community what is indisputably, one hundred percent for-sure the biggest accessibility blocker in JupyterLab for users. Here it is, the moment of truth!

And they each say totally different things.

Making Numpy Accessible: Guidelines and Tools

Mars Lee

2021-09-06

Comments

A large eye is placed over two bar graphs. Two silhouettes of heads are also
overlaid the bar graphs.

Header illustration by author, Mars Lee

Numpy is now foundational to Python scientific computing. Our efforts reach millions of developers each month. As our user base grows, we recognize that we are neglecting the disabled community by not having our website and documentation up to modern accessibility standards.

CZI EOSS4 Grants at Quansight Labs

Thomas Fan Aaron Meurer Melissa Weber Mendonça Tania Allard Matthias Bussonnier Isabela Presedo-Floyd Ralf Gommers Travis Oliphant

2021-08-31

Comments

Here, at Quansight Labs, our goal is to work on sustaining the future of Open Source. We make sure we can live up to that goal by spending a significant amount of time working on impactful and critical infrastructure and projects within the Scientific Ecosystem.

As such, our goals align with those of the Chan Zuckerberg Initiative and, in particular, the Essential Open Source Software for Science (EOSS) program that supports tools essential to biomedical research via funds for software maintenance, growth, development, and community engagement.

CZI’s Essential Open Source Software for Science program supports software maintenance, growth, development, and community engagement for open source tools critical to science. And the Chan Zuckerberg Initiative was founded in 2015 to help solve some of society’s toughest challenges — from eradicating disease and improving education, to addressing the needs of our local communities. Their mission is to build a more inclusive, just, and healthy future for everyone.

Today, we are thrilled to announce that the team at Quansight Labs has been awarded five EOSS Cycle 4 grants to work on several projects within the PyData ecosystem. This post will introduce the successful grantees and their objectives for these two-year long grants.

Is GitHub Actions suitable for running benchmarks?

Jaime Rodríguez-Guerra

2021-08-18

Comments

Benchmarking software is a tricky business. For robust results, you need dedicated hardware that only runs the benchmarking suite under controlled conditions. No other processes! No OS updates! Nothing else! Even then, you might find out that CPU throttling, thermal regulation and other issues can introduce noise in your measurements.

So, how are we even trying to do it on a CI provider like GitHub Actions? Every job runs in a separate VM instance with frequent updates and shared resources. It looks like it would just be a very expensive random number generator.

Well, it turns out that there is a sensible way to do it: relative benchmarking. And we know it works because we have been collecting stability data points for several weeks.

Moving SciPy to the Meson build system

Ralf Gommers

2021-07-25

Comments

Let's start with an announcement: SciPy now builds with Meson on Linux, and the full test suite passes!

This is a pretty exciting milestone, and good news for SciPy maintainers and contributors - they can look forward to much faster builds and a more pleasant development experience. So how fast is it? Currently the build takes about 1min 50s (a ~4x improvement) on my 3 year old 12-core Intel CPU (i9-7920X @ 2.90GHz):

Profiling result of a parallel build of SciPy with Meson

Profiling result of a parallel build (12 jobs) of SciPy with Meson. Visualization created with ninjatracing and Perfetto.

As you can see from the tracing results, building a single C++ file (bsr.cxx, which is one of SciPy's sparse matrix formats) takes over 90 seconds. So the 1min 50 sec build time is close to optimal - the only ways to improve it are major surgery on that C++ code, or buying a faster CPU.

Introducing PyTorch-Ignite's Code Generator v0.2.0

Victor Fomin

2021-07-16

Comments

Authors: Jeff Yang, Taras Savchyn, Priyansi, Victor Fomin

Along with the PyTorch-Ignite 0.4.5 release, we are excited to announce the new release of the web application for generating PyTorch-Ignite's training pipelines. This blog post is an overview of the key features and updates of the Code Generator v0.2.0 project release.

Pyflyby: Improving Efficiency of Jupyter Interactive Sessions

Matthias Bussonnier Aaron Meurer

2021-07-07

Comments

Few things hinder productivity more than interruption. A notification, random realization, or unrelated error can derail one's train of thought when deep in a complex analysis – a frustrating experience.

In the software development context, forgetting to import a statement in an interactive Jupyter session is such an experience. This can be especially frustrating when using typical abbreviations, like np, pd, plt, where the meaning is obvious to the human reader, but not to the computer. The time-to-first-plot, and ability to quickly cleanup one's notebook afterward are critical to an enjoyable and efficient workflow.

In this blogpost we present pyflyby, a project and an extension to IPython and JupyterLab, that, among many things, automatically inserts imports and tidies Python files and notebooks.

Distributed Training Made Easy with PyTorch-Ignite

Victor Fomin

2021-06-28

Comments

Authors: François Cokelaer, Priyansi, Sylvain Desroziers, Victor Fomin

Writing agnostic distributed code that supports different platforms, hardware configurations (GPUs, TPUs) and communication frameworks is tedious. In this blog, we will discuss how PyTorch-Ignite solves this problem with minimal code change.

Read Our Blog

Dataframe interchange protocol and Vaex

About | What is all that?

Low-code contributions through GitHub

Not a checklist: different accessibility needs in JupyterLab

JupyterLab Accessibility Journey Part 3

Making Numpy Accessible: Guidelines and Tools

CZI EOSS4 Grants at Quansight Labs

Is GitHub Actions suitable for running benchmarks?

Moving SciPy to the Meson build system

Introducing PyTorch-Ignite's Code Generator v0.2.0

Pyflyby: Improving Efficiency of Jupyter Interactive Sessions

Distributed Training Made Easy with PyTorch-Ignite