# Read Our Blog

## Introducing ndindex, a Python library for manipulating indices of ndarrays

One of the most important features of NumPy arrays is their indexing semantics. By "indexing" I mean anything that happens inside square brackets, for example, `a[4::-1, 0, ..., [0, 1], np.newaxis]`. NumPy's index semantics are very expressive and powerful, and this is one of the reasons the library is so popular.

Index objects can be represented and manipulated directly. For example, the above index is `(slice(4, None, -1), 0, Ellipsis, [0, 1], None)`. If you are any author of a library that tries to replicate NumPy array semantics, you will have to work with these objects. However, they are often difficult to work with:

• The different types that are valid indices for NumPy arrays do not have a uniform API. Most of the types are also standard Python types, such as `tuple`, `list`, `int`, and `None`, which are usually unrelated to indexing.

• Those objects that are specific to indexes, such as `slice` and `Ellipsis` do not make any assumptions about their underlying semantics. For example, Python lets you create `slice(None, None, 0)` or `slice(0, 0.5)` even though `a[::0]` and `a[0:0.5]` would be always be an `IndexError` on a NumPy array.

• Some index objects, such as `slice`, `list`, and `ndarray` are not hashable.

• NumPy itself does not offer much in the way of helper functions to work with these objects.

These limitations may be annoying, but are easy enough to live with. The real challenge when working with indices comes when you try to manipulate them. Slices in particular are challenging to work with because the rich meaning of slice semantics. Writing formulas for even very simple things is a real challenge with slices. `slice(start, stop, step)` (corresponding to `a[start:stop:step]`) has fundamentally different meaning depending on whether `start`,`stop`, or `step` are negative, nonnegative, or `None`. As an example, take `a[4:-2:-2]`, where `a` is a one-dimensional array. This slices every other element from the third element to the second from the last. What will the shape of this sliced array be? The answer is `(0,)` if the original shape is less than 1 or greater than 5, and `(1,)` otherwise.

Code that manipulates slices will tend to have a lot of `if`/`else` chains for these different cases. And due to 0-based indexing, half-open semantics, wraparound behavior, clipping, and step logic, the formulas are often quite difficult to write down.

## PyTorch TensorIterator Internals

The history section of this post is still relevant, but `TensorIterator`'s interface has changed significantly. For an update on the new API, please check out this new blog post.

PyTorch is one of the leading frameworks for deep learning. Its core data structure is `Tensor`, a multi-dimensional array implementation with many advanced features like auto-differentiation. PyTorch is a massive codebase (approx. a million lines of C++, Python and CUDA code), and having a method for iterating over tensors in a very efficient manner that is independent of data type, dimension, striding and hardware is a critical feature that can lead to a very massive simplification of the codebase and make distributed development much faster and smoother. The `TensorIterator` C++ class within PyTorch is a complex yet useful class that is used for iterating over the elements of a tensor over any dimension and implicitly parallelizing various operations in a device independent manner.

It does this through a C++ API that is independent of type and device of the tensor, freeing the programmer of having to worry about the datatype or device when writing iteration logic for PyTorch tensors. For those coming from the NumPy universe, `NpyIter` is a close cousin of `TensorIterator`.

This post is a deep dive into how `TensorIterator` works, and is an essential part of learning to contribute to the PyTorch codebase since iterations over tensors in the C++ codebase are extremely commonplace. This post is aimed at someone who wants to contribute to PyTorch, and you should at least be familiar with some of the basic terminologies of the PyTorch codebase that can be found in Edward Yang's excellent blog post on PyTorch internals. Although `TensorIterator` can be used for both CPUs and accelerators, this post has been written keeping in mind usage on the CPU. Although there can be some dissimilarities between the two, the overall concepts are the same.

## Documentation as a way to build Community

As a long time user and participant in open source communities, I've always known that documentation is far from being a solved problem. At least, that's the impression we get from many developers: "writing docs is boring"; "it's a chore, nobody likes to do it". I have come to realize I'm one of those rare people who likes to write both code and documentation.

Nobody will argue against documentation. It is clear that for an open-source software project, documentation is the public face of the project. The docs influence how people interact with the software and with the community. It sets the tone about inclusiveness, how people communicate and what users and contributors can do. Looking at the results of a “NumPy Tutorial” search on any search engine also gives an idea of the demand for this kind of content - it is possible to find documentation about how to read the NumPy documentation!

I've started working at Quansight in January, and I have started doing work related to the NumPy CZI Grant. As a former professor in mathematics, this seemed like an interesting project both because of its potential impact on the NumPy (and larger) community and because of its relevance to me, as I love writing educational material and documentation. Having official high-level documentation written using up-to-date content and techniques will certainly mean more users (and developers/contributors) are involved in the NumPy community.

So, if everybody agrees on its importance, why is it so hard to write good documentation?

## uarray: GSoC Participation

I'm pleased to announce that `uarray` is participating in GSoC '20 as a sub-organization under the umbrella of the Python Software Foundation. Our ideas page is up here, go take a look and see if you (or someone you know) is interested in participating, either as a student or as a mentor.

Prasun Anand and Peter Bell and myself will be mentoring, and we plan to take a maximum of two students, unless more community mentors show up.

There have been quite a few pull requests already to qualify from prospective students, some even going as far as to begin the work described in the idea they plan to work on.

We're quite excited by the number of students who have shown an interest in participating, and we look forward to seeing excellent applications! What's more exciting, though, are some of the first contributions from people not currently at Quansight, in the true spirit of open-source software!

## What have we been doing so far? 🤔

### Research 📚

A lot of behind the scenes work has been taking place on PyData/Sparse. Not so much in terms of code, more in terms of research and community/team building. I've more-or-less decided to use the structure and the research behind the Tensor Algebra Compiler, the work of Fredrik Kjolstad and his collaborators at MIT. 🙇🏻‍♂️ To this end, I've read/watched the following talks and papers:

## My Unexpected Dive into Open-Source Python

Header illustration by author, Mars Lee

I'm very happy to announce that I have joined Quansight as a front-end developer and designer! It was a happy coincidence how I joined- the intersection of my skills and the open source community's expanded vision.

## Creating the ultimate terminal experience in Spyder 4 with Spyder-Terminal

The Spyder-Terminal project is revitalized! The new 0.3.0 version adds numerous features that improves the user experience, and enhances compatibility with the latest Spyder 4 release, in part thanks to the improvements made in the xterm.js project.

## metadsl PyData talk

PyData NYC just ended and I thought it would be good to collect my thoughts on `metadsl` based on the many conversations I had there surrounding it. This is a rather long post, so if you are just looking for some code here is a Binder link for my talk. Also, here is the talk I gave a month or so later on the same topic in Austin:

### What is `metadsl`?

``````class Number(metadsl.Expression):
def __add__(self, other: Number) -> Number:
...

@classmethod
def from_int(cls, i: int) -> Number:
...

yield Number.from_int(0) + y, y
yield y + Number.from_int(0), y
``````

## Variable Explorer improvements in Spyder 4

Spyder 4 will be released very soon with lots of interesting new features that you'll want to check out, reflecting years of effort by the team to improve the user experience. In this post, we will be talking about the improvements made to the Variable Explorer.

These include the brand new Object Explorer for inspecting arbitrary Python variables, full support for MultiIndex dataframes with multiple dimensions, and the ability to filter and search for variables by name and type, and much more.

It is important to mention that several of the above improvements were made possible through integrating the work of two other projects. Code from gtabview was used to implement the multi-dimensional Pandas indexes, while objbrowser was the foundation of the new Object Explorer.