Labs update and April highlights

It has been an exciting first month for me at Quansight Labs. It's a good time for a summary of what we worked on in April and what is coming next.

Progress on array computing libraries

Our first bucket of activities I'd call "innovation". The most prominent projects in this bucket are XND, uarray, metadsl, python-moa, Remote Backend Compiler and arrayviews. XND is an umbrella name for a set of related array computing libraries: xnd, ndtypes, gumath, and xndtools.

Hameer Abbasi made some major steps forward with uarray: the backend and coercion semantics are now largely worked out, there is good documentation, and the unumpy package (which currently has numpy, XND and PyTorch backends) is progressing well. This blog post gives a good overview of the motivation for uarray and its main concepts.

Saul Shanabrook and Chris Ostrouchov worked out how best to put metadsl and python-moa together: metadsl can be used to create the API for python-moa to simplify the code base of the latter a lot. Chris also wrote an interesting blog post explaining the MoA principles.

Read more…

What's New in SymPy 1.4

As of November, 2018, I have been working at Quansight, under the heading of Quansight Labs. Quansight Labs is a public-benefit division of Quansight. It provides a home for a "PyData Core Team" which consists of developers, community managers, designers, and documentation writers who build open-source technology and grow open-source communities around all aspects of the AI and Data Science workflow. As a part of this, I am able to spend a fraction of my time working on SymPy. SymPy, for those who do not know, is a symbolic mathematics library written in pure Python. I am the lead maintainer of SymPy.

SymPy 1.4 was released on April 9, 2019. In this post, I'd like to go over some of the highlights for this release. The full release notes for the release can be found on the SymPy wiki.

To update to SymPy 1.4, use

conda install sympy

or if you prefer to use pip

pip install -U sympy

The SymPy 1.4 release contains over 500 changes from 38 different submodules, so I will not be going over every change, but only a few of the main highlights. A total of 104 people contributed to this release, of whom 66 contributed for the first time for this release.

While I did not personally work on any of the changes listed below (my work for this release tended to be more invisible, behind the scenes fixes), I did do the release itself.

Read more…

uarray: A Generic Override Framework for Methods

uarray: A Generic Override Framework for Methods

uarray is an override framework for methods in Python. In the scientific Python ecosystem, and in other similar places, there has been one recurring problem: That similar tools to do a job have existed, but don't conform to a single, well-defined API. uarray tries to solve this problem in general, but also for the scientific Python ecosystem in particular, by defining APIs independent of their implementations.

Array Libraries in the Scientific Python Ecosystem

When SciPy was created, and Numeric and Numarray unified into NumPy, it jump-started Python's data science community. The ecosystem grew quickly: Academics started moving to SciPy, and the Scikits that popped up made the transition all the more smooth.

However, the scientific Python community also shifted during that time: GPUs and distributed computing emerged. Also, there were old ideas that couldn't really be used with NumPy's API, such as sparse arrays. To solve these problems, various libraries emerged:

  • Dask, for distributed NumPy
  • CuPy, for NumPy on Nvidia-branded GPUs.
  • PyData/Sparse, a project started to make sparse arrays conform to the NumPy API
  • Xnd, which extends the type system and the universal function concept found in NumPy

Read more…

MOA: a theory for composable and verifiable tensor computations

Python-moa (mathematics of arrays) is an approach to a high level tensor compiler that is based on the work of Lenore Mullin and her dissertation. A high level compiler is necessary because there are many optimizations that a low level compiler such as gcc will miss. It is trying to solve many of the same problems as other technologies such as the taco compiler and the xla compiler. However, it takes a much different approach than others guided by the following principles.

  1. What is the shape? Everything has a shape. scalars, vectors, arrays, operations, and functions.
  2. What are the given indicies and operations required to produce a given index in the result?

Having a compiler that is guided upon these principles allows for high level reductions that other compilers will miss and allows for optimization of algorithms as a whole. Keep in mind that MOA is NOT a compiler. It is a theory that guides compiler development. Since python-moa is based on theory we get unique properties that other compilers cannot guarantee:

Read more…

Thoughts on joining Quansight Labs

In his blog post welcoming me, Travis set out his vision for pushing forward the Python ecosystem for scientific computing and data science, and how to fund it. In this post I'll add my own perspectives to that. Given that Quansight Labs' purpose, it seems fitting to start with how I see things as a community member and organizer.

A community perspective

The SciPy and PyData ecosystems have experienced massive growth over the past years, and this is likely to continue in the near future. As a maintainer, that feels very gratifying. At the same time it brings up worries. Core projects struggle to keep up with the growth in number of users. Funded development can help with this, if done right. Some of the things I would like to see from companies that participate in the ecosystem:

  • Explain innovations they're working on to the community and solicit input, at an early stage. Developing something away from the spotlight and then unveiling it as the "next big thing" once it's done usually leads to either corporate-driven projects (if users adopt it) or a short life span.
  • Participate in a sustainable way. This means for example to contribute in a way that lowers, or at least doesn't increase, the overall effort required for maintenance. Only sending pull requests with new features doesn't achieve that. Solving maintenance pain points or helping with code review does.
  • Operate transparently. Develop in the open, plan in the open, be clear about directions and motivations.

Read more…