It has been an exciting first month for me at Quansight Labs. It's a good time for a summary of what we worked on in April and what is coming next.
Progress on array computing libraries
Our first bucket of activities I'd call "innovation". The most prominent
projects in this bucket are XND,
Remote Backend Compiler and
XND is an umbrella name for a set of related array
Hameer Abbasi made some major steps forward with
uarray: the backend and
coercion semantics are now largely worked out, there is
good documentation, and the
unumpy package (which currently has
is progressing well. This blog post
gives a good overview of the motivation for
uarray and its main concepts.
Saul Shanabrook and Chris Ostrouchov worked out how best to put
metadsl can be used to create the API for
python-moa to simplify the code base of the latter a lot. Chris
also wrote an interesting blog post
explaining the MoA principles.
The work on XND over the last month consisted mostly of "under the hood"
improvements and fixes in
ndtypes by Stefan Krah. We did create
a new xnd-benchmarks repository
and had some interesting discussions on performance. One thing I learned is that
XND has automatic multithreading and has very similar performance to NumPy + MKL
for basic arithmetic operations (at least for array sizes above ~1e4 elements, the
overhead for small arrays is larger). The
xnd.array interface, which is a higher
level interface than
xnd.xnd and can be used similarly to
numpy, is taking
shape as well. One user-visible new feature worth mentioning is that xnd containers
can now be serialized and pickled.
Work on PyData core projects
Most people in the team are maintainers of or contributors to one or more core projects in the PyData or SciPy stacks. Helping maintain and evolve those projects is our second bucket of activities.
Gonzalo Pena-Castellanos is working full-time on Spyder, with guidance from Carlos Cordoba. Together they have been working very hard to get the first beta of Spyder 4 ready. Some exciting new features are also in the works, however Gonzalo will be blogging about those soon so I won't steal his thunder.
Ivan Ogasawara is spending some time each week on maintenance of Ibis. If you're a Pandas or scikit-learn user and need to interact with SQL databases or HDFS/Spark, Ibis is worth looking into.
I myself have enjoyed having a little more bandwidth for NumPy and SciPy.
On the technical front, this allowed me to contribute to the design discussion
about an addition
to NEP 18 (the
__array_function__ override mechanism),
do the numpydoc 0.9 release, deal
with several build issues, and review a number of PRs
allowing to specify BLAS and LAPACK link order
was particularly nice). On the organizational front, I fixed the description
of how donations are handled on numpy.org, finalized the
Tidelift agreement for NumPy (see the
for details), helped NumPy and SciPy get accepted for the
Google Season of Docs program,
and did everything needed to finalize the fiscal sponsorship agreement between
SciPy and NumFOCUS.
Jupyter and JupyterLab improvements
Jupyter is a key part of the PyData ecosystem. It extends well beyond that though, so I'm giving it its own bucket here. At Quansight we have a number of Jupyter core developers and contributors. Ian Rose, Saul Shanabrook, Grant Nestor and others have been very busy with both maintenance tasks and adding new features to Jupyter and JupyterLab.
JupyterLab is about to get support for printing (not inside the notebook, but the old-fashioned
Ctrl-P variant). This pull request
by Saul has nice screenshots showing the feature in action for whole notebooks,
images, the JSON viewer and the inspector.
Ian worked on the third alpha release of JupyterLab 1.0, on testing and CI infrastructure, and other general maintenance tasks. He also improved PDF preview in JupyterLab, so it now works as expected in Firefox and Chrome (at least).
Other interesting features are in progress and will make their way into the main repositories soon.
Starting to shape Labs
There is a lot of work to do to figure out for ourselves exactly what Labs will be, and then to communicate that clearly to the outside world. We have a rough idea (see my first blog post and Travis' blog post), but there's a long way to go from there to having an compelling elevator pitch, a website that tells our story well, people and projects organized, a funding stream, and more.
One of the first things we did do is start this blog, to start communicating about the technical work we're doing. We're also going through the roadmaps published at quansight.com/projects, to ensure they're up-to-date and to make clear that those are for community driven projects that Quansight is aiming to obtain industry support for.
Both DE Shaw and OmniSci are supporting a significant amount of work on JupyterLab, which highlights how important Jupyter and JupyterLab have become in the data science ecosystem. DE Shaw is also supporting work on projects like Dask, Numba and XND that is starting at the moment. OmniSci supports work on Ibis and Remote Backend Compiler. Finally, Quansight is working with Cal Poly (one of the Jupyter lead institutes, together with UC Berkeley) to execute on the Project Jupyter roadmap for JupyterLab.
TDK is sponsoring the Spyder work I talked about above. Supporting both general maintenance for the Spyder 4 release and some interesting new features is an important contribution that helps the many engineers and scientists that use Spyder as their main development and data science interface.
The above is direct funding from companies for work on open source projects.
Quansight also offers open-source support and consulting, as well as training
around the PyData stack. Those activities also yield funds that we then use to
fund the efforts of Quansight Labs. To learn more about those offerings,
contact Travis (
firstname.lastname@example.org), myself (
Besides funding from companies, we are also applying for grants. So far we have submitted two proposals to the NSF and three to NASA, on topics ranging from JupyterLab extensions for high performance computing to improving Xarray's array backend system. For most of these proposals we expect the verdict in the next 1-2 months. In April we got a rejection from the NSF for a proposal titled "Accelerated Development of the Scientific Python Ecosystem", which we wrote together with NumFOCUS and Columbia, with the latter as lead institute (thanks goes especially to Andreas Mueller and Andy Terrel for a lot of the hard work on that proposal). The discussions triggered by that rejection have been very useful and generated a number of new ideas and contacts to follow up on in the coming months.
One idea that came up more than once is to clearly express the needs of these projects in public, ideally in fundable chunks and with an effort estimate attached, and then approaching both funding bodies and companies with that. This is likely to be more effective than responding to solicitations that may not be a perfect match. Quansight Labs is positioned well to either participate in or help lead such a process, and to work with companies that rely on the PyData stack in particular.
However we look for funding, it will be important to be clear in our messaging
and transparent with the community about the ways we look for funding. I will be
actively soliciting feedback on this as well, both via blog posts like these
(please email me at
email@example.com if you have ideas, questions or
concerns!) and in person.
Finally, we are finalizing and signing a preferred partnership with NumFOCUS, where 5% of Quansight Labs funds or projects referred from NumFOCUS will be provided to NumFOCUS to sustain their efforts. NumFOCUS is an important fundament of the PyData ecosystem, and we would like to contribute to keeping it on a sound financial footing and growing NumFOCUS further.