Read Our Blog

The evolution of the SciPy developer CLI

🤔 What is a command-line interface (CLI)?

Imagine a situation, where there is a massive system with various tools and functionalities, and every functionality requires a special command or an input from the user. A CLI is designed to tackle such situations. Like a catalog or menu, it lists all the options available, thus helping the user to navigate a complex system.

CLI example

Now that we understand what a CLI is, how about we dive into the world of SciPy?

Read more…

Why is writing blog posts hard?

We write code. We write issues. We write documentation. We write notes to ourselves, messages to each other, and guidelines to unite teams across projects.

Day in and out our remote work and open source lives are driven by written communication. But blog posts are one kind of writing that eludes our regular practice. In our weekly show and tell we got real about "why can writing blog posts be so hard?" and collaboratively wrote up this blog post about what we learned from the discussion.

Read more…

Making GPUs accessible to the PyData Ecosystem via the Array API Standard.

GPUs have become an essential part of the scientific computing stack and with the advancement in the technology around GPUs and the ease of accessing a GPU in the cloud or on-prem, it is in the best interest of the PyData community to spend time and effort to make GPUs accessible for the users of PyData libraries. A typical user in the PyData ecosystem is quite familiar with the APIs of libraries like SciPy, scikit-learn, and scikit-image -- and at the moment these libraries are largely limited to single-threaded operations on CPU (there are exceptions to that, like linear algebra functions and scikit-learn functionality which uses OpenMP under the hood). In this blog post I will talk about how we can use the Python Array API Standard with the fundamental libraries in the PyData ecosystem along with CuPy for making GPUs accessible to the users of these libraries. With the introduction of that standard by the Consortium for Python Data API Standards and its adoption mechanism in NEP 47 it is now possible to write code that is portable between NumPy and other array/tensor libraries that adopt the Array API Standard. We will also discuss the workflow and challenges for actually achieving such portability.

Read more…

Jupyter accessibility efforts have a roadmap!

Really? Tell me more.

The Chan Zuckerberg Initiative has funded efforts to make the Jupyter ecosystem, starting with JupyterLab, more accessible (As was announced in a prior Jupyter blog post about grants in the ecosystem). You can read the full grant proposal for Jupyter accessibility, the proposal summary, or a GitHub Project list of the grant's milestones to get a sense of the grant's scope.

Read more…

IPython 8.0, Lessons learned maintaining software

This is a companion post from the Official release of IPython 8.0, that describe what we learned with this large new major IPython release. We hope it will help you apply best practices, and have an easier time maintaining your projects, or helping other. We'll focus on many patterns that made it easier for us to make IPython 8.0 what it is with minimal time involved.

Read more…

A year of Jupyter community calls

A framing for open source is that the software and code are kernels of community. The code, and its abstractions, unite developers and their patrons; a struggle for growing/evolving open communities is to make sure these groups remain connected. A lot of us showed up for the code, but hung around for the community. We'll continue this post talking about the monthly Jupyter community calls, and how they help all jovyans, Project Jupyter's pet name for their developers and users, stay connected.

Read more…

A vision for extensibility to GPU & distributed support for SciPy, scikit-learn, scikit-image and beyond

Over the years, array computing in Python has evolved to support distributed arrays, GPU arrays, and other various kinds of arrays that work with specialized hardware, or carry additional metadata, or use different internal memory representations. The foundational library for array computing in the PyData ecosystem is NumPy. But NumPy alone is a CPU-only library - and a single-threaded one at that - and in a world where it's possible to get a GPU or a CPU with a large core count in the cloud cheaply or even for free in a matter of seconds, that may not seem enough. For the past couple of years, a lot of thought and effort has been spent on devising mechanisms to tackle this problem, and evolve the ecosystem in a gradual way towards a state where PyData libraries can run on a GPU, as well as in distributed mode across multiple GPUs.

We feel like a shared vision has emerged, in bits and pieces. In this post, we aim to articulate that vision and suggest a path to making it concrete, focusing on three libraries at the core of the PyData ecosystem: SciPy, scikit-learn and scikit-image. We are also happy to share that AMD has recognized the value of this vision, and is partnering with Quansight Labs to help make it a reality.

Read more…

NumPy Benchmarking

In this blog post, I'll be talking about my journey in Quansight. I want to share all things I was involved in and accomplished. What issues I faced, and most importantly, what were awesome life hacks I learned during this period.

First of all, I'd like to express my gratitude to the whole team for allowing me to be a part of such a great team. My work was majorly focused on providing performance benchmarks to NumPy in realistic situations. The target was to show the world that NumPy is efficient in handling quasi real-life situations too.

The primary technical outcome of my work is available in the numpy documentation.

A word cloud with themes, open-source projects and people mentioned throughout the blog post. Each is stylized using a different font, most of them calligraphical.

Read more…