I joined the Spyder team almost two years ago, and I never thought I was going to end up working on docs. Six months ago I started a project with CAM Gerlach and Carlos Cordoba to improve Spyder’s documentation. At first, I didn’t actually understand how important docs are for software, especially for open source projects. However, during all this time I’ve learned how documentation has a huge impact on the open-source community and I’ve been thankful to have been able to do this. But, from the beginning, I asked myself “why am I the ‘right person’ for this?”
Python environments provide sandboxes in which packages can be added. Conda helps us deal with the requirements and dependencies of those packages. Occasionally we find ourselves working in a constrained remote machine which can make development challenging. Suppose we wanted to take our exact dev environment on the remote machine and recreate it on our local machine. While conda relieves the package dependency challenge, it can be hard to reproduce the exact same environment.
Ibis is a mature open-source project that has been in development for about 5 years; it currently has about 1350 stars on Github. It provides an interface to SQL for Python programmers and bridges the gap between remote storage & execution systems. These features provide authors the ability to:
Ibis 1.3 was just released, after 8 months of development work, with 104 new commits from 16 unique contributors. What is new? In this blog post we will discuss some important features in this new version!
First, if you are new to the Ibis framework world, you can check this blog post I wrote last year, with some introductory information about it.
Some highlighted features of this new version are:
- Addition of a
- Improvement of geospatial support
- Addition of
- Initial support for
Python 3.8added and support for
- Added new backends and geospatial methods to the documentation
- Renamed the
After more than three years in development and more than 5000 commits from 60 authors around the world, Spyder 4 finally saw the light on December 5, 2019! I decided to wait until now to write a blogpost about it because shortly after the initial release, we found several critical performance issues and some regressions with respect to Spyder 3, most of which are fixed now in version 4.1.2, released on April 3rd 2020.
One of the most important features of NumPy arrays is their indexing
semantics. By "indexing" I mean anything that happens inside square brackets,
a[4::-1, 0, ..., [0, 1], np.newaxis]. NumPy's index semantics
are very expressive and powerful, and this is one of the reasons the library
is so popular.
Index objects can be represented and manipulated directly. For example, the
above index is
(slice(4, None, -1), 0, Ellipsis, [0, 1], None). If you are
any author of a library that tries to replicate NumPy array semantics, you
will have to work with these objects. However, they are often difficult to
The different types that are valid indices for NumPy arrays do not have a uniform API. Most of the types are also standard Python types, such as
None, which are usually unrelated to indexing.
Those objects that are specific to indexes, such as
Ellipsisdo not make any assumptions about their underlying semantics. For example, Python lets you create
slice(None, None, 0)or
slice(0, 0.5)even though
a[0:0.5]would be always be an
IndexErroron a NumPy array.
Some index objects, such as
ndarrayare not hashable.
NumPy itself does not offer much in the way of helper functions to work with these objects.
These limitations may be annoying, but are easy enough to live with. The real
challenge when working with indices comes when you try to manipulate them.
Slices in particular are challenging to work with because the rich meaning of
slice semantics. Writing formulas for even very simple things is a real
challenge with slices.
slice(start, stop, step) (corresponding to
a[start:stop:step]) has fundamentally different meaning depending on whether
step are negative, nonnegative, or
None. As an example,
a is a one-dimensional array. This slices every
other element from the third element to the second from the last. What will
the shape of this sliced array be? The answer is
(0,) if the original shape
is less than 1 or greater than 5, and
Code that manipulates slices will tend to have a lot of
else chains for
these different cases. And due to 0-based indexing, half-open semantics,
wraparound behavior, clipping, and step logic, the formulas are often quite
difficult to write down.
PyTorch is one of the leading frameworks for deep learning. Its core data
Tensor, a multi-dimensional array implementation with many
advanced features like auto-differentiation. PyTorch is a massive
codebase (approx. a million lines of
C++, Python and CUDA code), and having a method for iterating over tensors in a
very efficient manner that is independent of data type, dimension, striding and
hardware is a critical feature that can lead to a very massive simplification
of the codebase and make distributed development much faster and smoother. The
C++ class within PyTorch is a complex yet useful class that is used for
iterating over the elements of a tensor over any dimension and implicitly
parallelizing various operations in a device independent manner.
It does this through a C++ API that is independent of type and device of the
tensor, freeing the programmer of having to worry about the datatype or device
when writing iteration logic for PyTorch tensors. For those coming from the
NpyIter is a close cousin of
This post is a deep dive into how
TensorIterator works, and is an essential
part of learning to contribute to the PyTorch codebase since iterations over
tensors in the C++ codebase are extremely commonplace. This post is aimed at
someone who wants to contribute to PyTorch, and you should at least be familiar
with some of the basic terminologies of the PyTorch codebase that can be found
in Edward Yang's excellent blog post
on PyTorch internals. Although
TensorIterator can be used for both CPUs and
accelerators, this post has been written keeping in mind usage on the CPU.
Although there can be some dissimilarities between the two, the overall
concepts are the same.
As a long time user and participant in open source communities, I've always known that documentation is far from being a solved problem. At least, that's the impression we get from many developers: "writing docs is boring"; "it's a chore, nobody likes to do it". I have come to realize I'm one of those rare people who likes to write both code and documentation.
Nobody will argue against documentation. It is clear that for an open-source software project, documentation is the public face of the project. The docs influence how people interact with the software and with the community. It sets the tone about inclusiveness, how people communicate and what users and contributors can do. Looking at the results of a “NumPy Tutorial” search on any search engine also gives an idea of the demand for this kind of content - it is possible to find documentation about how to read the NumPy documentation!
I've started working at Quansight in January, and I have started doing work related to the NumPy CZI Grant. As a former professor in mathematics, this seemed like an interesting project both because of its potential impact on the NumPy (and larger) community and because of its relevance to me, as I love writing educational material and documentation. Having official high-level documentation written using up-to-date content and techniques will certainly mean more users (and developers/contributors) are involved in the NumPy community.
So, if everybody agrees on its importance, why is it so hard to write good documentation?
I'm pleased to announce that
uarray is participating in GSoC '20 as a sub-organization under the umbrella of the Python Software Foundation. Our ideas page is up here, go take a look and see if you (or someone you know) is interested in participating, either as a student or as a mentor.
Prasun Anand and Peter Bell and myself will be mentoring, and we plan to take a maximum of two students, unless more community mentors show up.
We're quite excited by the number of students who have shown an interest in participating, and we look forward to seeing excellent applications! What's more exciting, though, are some of the first contributions from people not currently at Quansight, in the true spirit of open-source software!
What have we been doing so far? 🤔
A lot of behind the scenes work has been taking place on PyData/Sparse. Not so much in terms of code, more in terms of research and community/team building. I've more-or-less decided to use the structure and the research behind the Tensor Algebra Compiler, the work of Fredrik Kjolstad and his collaborators at MIT. 🙇🏻♂️ To this end, I've read/watched the following talks and papers: