Read Our Blog

Designing with and for developers

Open source is notorious for lack of design presence, enough so that my search to prove this fact has turned up nearly nothing. There’s many ways that such a gap in community might manifest, but one that I never anticipated was working with developers that had never interacted with a designer before.

A quick note for context: I’m writing this as a UX/UI designer working with open source projects for a little over a year. Because there are so many ways design processes can happen (enough to warrant its own blog post), this post is not intended to discuss design process deeply. My goal here is to pass on some of what I’ve learned that helps me design in this unusual space in hopes that it can help someone else. This post might seem most relevant for designers, but I think this experience could be helpful for developers as well.

Read more…

Quansight Labs: what I learned in my first 3 months

I joined Quansight at the beginning of April, splitting my time between PyTorch (as part of a larger Quansight team) and contributing to Quansight Labs supported community-driven projects in the Python scientific and data science software stack, primarily to NumPy. I have found my next home; the people, the projects, and the atmosphere are an all around win-win for me and (I hope) for the projects to which I contribute.

Read more…

Learn NixOS by turning a Raspberry Pi into a Wireless Router

I recently moved, and my new place has a relatively small footprint. (Yes, I moved during the COVID-19 pandemic. And yes, it was crazy.) I quickly realized that was going to need a wireless router of some sort, or more formally, a wireless access point (WAP). Using my Ubuntu laptop's "wireless hotspot" capability was a nice temporary solution, but it had a few serious drawbacks.

Read more…

Writing docs is not just writing docs

I joined the Spyder team almost two years ago, and I never thought I was going to end up working on docs. Six months ago I started a project with CAM Gerlach and Carlos Cordoba to improve Spyder’s documentation. At first, I didn’t actually understand how important docs are for software, especially for open source projects. However, during all this time I’ve learned how documentation has a huge impact on the open-source community and I’ve been thankful to have been able to do this. But, from the beginning, I asked myself “why am I the ‘right person’ for this?”

Read more…

Creating a Portable Python Environment from Imports

Python environments provide sandboxes in which packages can be added. Conda helps us deal with the requirements and dependencies of those packages. Occasionally we find ourselves working in a constrained remote machine which can make development challenging. Suppose we wanted to take our exact dev environment on the remote machine and recreate it on our local machine. While conda relieves the package dependency challenge, it can be hard to reproduce the exact same environment.

Read more…

Ibis: an idiomatic flavor of SQL for Python programmers

Ibis is a mature open-source project that has been in development for about 5 years; it currently has about 1350 stars on Github. It provides an interface to SQL for Python programmers and bridges the gap between remote storage & execution systems. These features provide authors the ability to:

  1. write backend-independent SQL expressions in Python);
  2. access different database connections (eg. SQLite, OmniSci, Pandas); and
  3. confirm visually their SQL queries with directed acyclic graphs (DAGs).

Highlights of the Ibis 1.3 release

Ibis 1.3 was just released, after 8 months of development work, with 104 new commits from 16 unique contributors. What is new? In this blog post we will discuss some important features in this new version!

First, if you are new to the Ibis framework world, you can check this blog post I wrote last year, with some introductory information about it.

Some highlighted features of this new version are:

  • Addition of a PySpark backend
  • Improvement of geospatial support
  • Addition of JSON, JSONB and UUID data types
  • Initial support for Python 3.8 added and support for Python 3.5 dropped
  • Added new backends and geospatial methods to the documentation
  • Renamed the mapd backend to omniscidb

Thanking the people behind Spyder 4

After more than three years in development and more than 5000 commits from 60 authors around the world, Spyder 4 finally saw the light on December 5, 2019! I decided to wait until now to write a blogpost about it because shortly after the initial release, we found several critical performance issues and some regressions with respect to Spyder 3, most of which are fixed now in version 4.1.2, released on April 3rd 2020.

Read more…

Introducing ndindex, a Python library for manipulating indices of ndarrays

One of the most important features of NumPy arrays is their indexing semantics. By "indexing" I mean anything that happens inside square brackets, for example, a[4::-1, 0, ..., [0, 1], np.newaxis]. NumPy's index semantics are very expressive and powerful, and this is one of the reasons the library is so popular.

Index objects can be represented and manipulated directly. For example, the above index is (slice(4, None, -1), 0, Ellipsis, [0, 1], None). If you are any author of a library that tries to replicate NumPy array semantics, you will have to work with these objects. However, they are often difficult to work with:

  • The different types that are valid indices for NumPy arrays do not have a uniform API. Most of the types are also standard Python types, such as tuple, list, int, and None, which are usually unrelated to indexing.

  • Those objects that are specific to indexes, such as slice and Ellipsis do not make any assumptions about their underlying semantics. For example, Python lets you create slice(None, None, 0) or slice(0, 0.5) even though a[::0] and a[0:0.5] would be always be an IndexError on a NumPy array.

  • Some index objects, such as slice, list, and ndarray are not hashable.

  • NumPy itself does not offer much in the way of helper functions to work with these objects.

These limitations may be annoying, but are easy enough to live with. The real challenge when working with indices comes when you try to manipulate them. Slices in particular are challenging to work with because the rich meaning of slice semantics. Writing formulas for even very simple things is a real challenge with slices. slice(start, stop, step) (corresponding to a[start:stop:step]) has fundamentally different meaning depending on whether start,stop, or step are negative, nonnegative, or None. As an example, take a[4:-2:-2], where a is a one-dimensional array. This slices every other element from the third element to the second from the last. What will the shape of this sliced array be? The answer is (0,) if the original shape is less than 1 or greater than 5, and (1,) otherwise.

Code that manipulates slices will tend to have a lot of if/else chains for these different cases. And due to 0-based indexing, half-open semantics, wraparound behavior, clipping, and step logic, the formulas are often quite difficult to write down.

Read more…

PyTorch TensorIterator Internals

PyTorch is one of the leading frameworks for deep learning. Its core data structure is Tensor, a multi-dimensional array implementation with many advanced features like auto-differentiation. PyTorch is a massive codebase (approx. a million lines of C++, Python and CUDA code), and having a method for iterating over tensors in a very efficient manner that is independent of data type, dimension, striding and hardware is a critical feature that can lead to a very massive simplification of the codebase and make distributed development much faster and smoother. The TensorIterator C++ class within PyTorch is a complex yet useful class that is used for iterating over the elements of a tensor over any dimension and implicitly parallelizing various operations in a device independent manner.

It does this through a C++ API that is independent of type and device of the tensor, freeing the programmer of having to worry about the datatype or device when writing iteration logic for PyTorch tensors. For those coming from the NumPy universe, NpyIter is a close cousin of TensorIterator.

This post is a deep dive into how TensorIterator works, and is an essential part of learning to contribute to the PyTorch codebase since iterations over tensors in the C++ codebase are extremely commonplace. This post is aimed at someone who wants to contribute to PyTorch, and you should at least be familiar with some of the basic terminologies of the PyTorch codebase that can be found in Edward Yang's excellent blog post on PyTorch internals. Although TensorIterator can be used for both CPUs and accelerators, this post has been written keeping in mind usage on the CPU. Although there can be some dissimilarities between the two, the overall concepts are the same.

Read more…