uarray is a generic override framework for objects and methods in Python. Since my last
uarray blogpost, there have been plenty of developments, changes to the API and improvements to the overhead of the protocol. Let’s begin with a walk-through of the current feature set and API, and then move on to current developments and how it compares to
__array_function__. For further details on the API and latest developments, please see the API page for
uarray. The examples there are doctested, so they will always be current.
Other array objects¶
NumPy is a simple, rectangular, dense, and in-memory data store. This is great for some applications but isn't complete on its own. It doesn't encompass every single use-case. The following are examples of array objects available today that have different features and cater to a different kind of audience.
- Dask is one of the most popular ones. It allows distributed and chunked computation.
- CuPy is another popular one, and allows GPU computation.
- PyData/Sparse is slowly gaining popularity, and is a sparse, in-memory data store.
- XArray includes named dimensions.
- Xnd is another effort to re-write and modernise the NumPy API, and includes support for GPU arrays and ragged arrays.
- Another effort (although with no Python wrapper, only data marshalling) is xtensor.
Some of these objects can be composed. Namely, Dask both expects and exports the NumPy API, whereas XArray expects the NumPy API. This makes interesting combinations possible, such as distributed sparse or GPU arrays, or even labelled distributed sparse or CPU/GPU arrays.
Also, there are many other libraries (a popular one being scikit-learn) that need a back-end mechanism in order to be able to support different kinds of array objects. Finally, there is a desire to see SciPy itself gain support for other array objects.
__array_function__ and its limitations¶
One of my motivations for working on
uarray were the limitations of the
__array_function__ protocol, defined in this proposal. The limitations are threefold:
- It can only dispatch on array objects.
- Consequently, it can only dispatch on functions that accept array objects.
- It has no mechanism for conversion and coercion.
- Since it conflates arrays and backends, only a single backend type per array object is possible.
uarray — The solution?¶
With that out of the way, let's explore
uarray, a library that hopes to resolve these issues, and even though the original motivation was NumPy and array computing, the library itself is meant to be a generic multiple-dispatch mechanism.
# Enable __array_function__ for NumPy < 1.17.0 !export NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1
import uarray as ua import numpy as np
uarray, the fundamental building block is a multimethod. Multimethods have a number of nice properties, such as automatic dispatch based on backends. It is important to note here that multimethods will be written by API authors, rather than implementors. Here's how we define a multimethod in