Quansight Labs Dask Update
Published August 27, 2019
This post provides an update on some recent Dask-related activities the Quansight Labs team has been working on.
Dask community work order
Through a community work order (CWO) with the D. E. Shaw group, the Quansight Labs team has been able to dedicate developer time towards bug fixes and feature requests for Dask. This work has touched on several portions of the Dask codebase, but generally have centered around using Dask Arrays with the distributed scheduler.For instance, performance improvement in how Dask handles large graphs were made in
dask PR #4918 and
distributed PR #2594, while several Dask Array bug fixes and support for asymmetric overlapping computations were included in
dask PR #5256,
dask PR #5151, and
dask PR #4863.
This CWO approach allows institutions, in this case the D. E. Shaw group, to fund development in community-driven open source projects while still respecting how these projects make decisions. To learn more about funding projects through CWOs, check out this post by Ralf Gommers which discusses the CWO model in detail.
In addition to working on Dask directly, we've also been engaging the broader scientific Python community through attending conferences. In particular, developers at Quansight, Anaconda, and NVIDIA organized a Dask sprint at the SciPy 2019 conference. Throughout the two-day sprint, core developers were able to help sprint attendees become more familiar with the Dask codebase, set up local development environments, and even get new pull requests from sprinters submitted, reviewed, and merged into Dask! I personally had a great time at SciPy and hope to see lots of familiar faces again next year.
Likewise, in July I spoke about Dask at the Data-Driven Wisconsin 2019 conference. Following my talk, I had several engaging discussions with other conference attendees and it was really gratifying to see all the enthusiasm surrounding Dask. The materials presented at the talk can be found on GitHub and a live, interactive version is available on mybinder.org.
Maintenance and development
Finally, there's been a push for a more coordinated effort towards project maintenance and development by core Dask maintainers at Quansight, Anaconda, and NVIDIA. As part of this effort, we spend a portion of our work week on day-to-day project maintenance tasks (e.g. responding on issues, reviewing pull requests, fixing CI systems, etc.) as well as working on contributions that require significant amounts of time or expertise to implement (e.g. large-scale refactoring, adding new features, writing documentation, etc.). Today, Dask users typically get a quicker response from a core maintainer when opening an issue or pull request, in part, because of these efforts. I, and perhaps other core maintainers, hope to write more about this process in the future.