Community-driven open source and funded development

Quansight Labs is an experiment for us in a way. One of our main aims is to channel more resources into community-driven PyData projects, to keep them healthy and accelerate their development. And do so in a way that projects themselves stay in charge.

This post explains one method we're starting to use for this. I'm writing it to be transparent with projects, the wider community and potential funders about what we're starting to do. As well as to explicitly solicit feedback on this method.

Community work orders

If you talk to someone about supporting an open source project, in particular a well-known one that they rely on (e.g. NumPy, Jupyter, Pandas), they're often willing to listen and help. What you quickly learn though is that they want to know in some detail what will be done with the funds provided. This is true not only for companies, but also for individuals. In addition, companies will likely want a written agreement and some form of reporting about the progress of the work. To meet this need we came up with community work orders (CWOs) - agreements that outline what work will be done on a project (implementing new features, release management, improving documentation, etc.) and outlining a reporting mechanism. What makes a CWO different from a consulting contract? Key differences are:

  1. It must be work that is done on the open source project itself (and not e.g. on a plugin for it, or a customization for the client).
  2. The developers must have a reasonable amount of freedom to decide what to work on and what the technical approach will be, within the broad scope of the agreement.
  3. Deliverables cannot be guaranteed to end up in a project; instead the funder gets the promise of a best effort of implementation and working with the community.

Respecting community processes

Point 3 above is particularly important: we must respect how open source projects make decisions. If the project maintainers decide that they don't want to include a particular change or new feature, that's their decision to make. Any code change proposed as part of work on a CWO has to go through the same review process as any other change, and be accepted on its merits. The argument "but someone paid for this" isn't particularly strong, nor is one that reviewers should have to care about. Now of course we don't expect it to be common for work to be rejected. An important part of the Quansight value proposition is that because we understand how open source works and many of our developers are maintainers and contributors of the open source projects already, we propose work that the community already has interest in and we open the discussion about any major code change early to avoid issues.

Read more…

Measuring API usage for popular numerical and scientific libraries

Developers of open source software often have a difficult time understanding how others utilize their libraries. Having better data of when and how functions are being used has many benefits. Some of these are:

  • better API design
  • determining whether or not a feature can be deprecated or removed.
  • more instructive tutorials
  • understanding the adoption of new features

Python Namespace Inspection

We wrote a general tool python-api-inspect to analyze any function/attribute call within a given set of namespaces in a repository. This work was heavily inspired by a blog post on inspecting method usage with Google BigQuery for pandas, NumPy, and SciPy. The previously mentioned work used regular expressions to search for method usage. The primary issue with this approach is that it cannot handle import numpy.random as rand; rand.random(...) unless additional regular expressions are constructed for each case and will result in false positives. Additionally, BigQuery is not a free resource. Thus, this approach is not general enough and does not scale well with the number of libraries that we would like to inspect function and attribute usage.

A more robust approach is to inspect the Python abstract syntax tree (AST). Python comes with a performant method from the ast module ast.parse(...) for constructing a Python AST from source code. A node visitor is used to traverse the AST and record import statements, and function/attribute calls. This allows us to catch any absolute namespace reference. The following are cases that python-api-inspect catches:

Read more…

Spyder 4.0 takes a big step closer with the release of Beta 2!

It has been almost two months since I joined Quansight in April, to start working on Spyder maintenance and development. So far, it has been a very exciting and rewarding journey under the guidance of long time Spyder maintainer Carlos Córdoba. This is the first of a series of blog posts we will be writing to showcase updates on the development of Spyder, new planned features and news on the road to Spyder 4.0 and beyond.

First off, I would like to give a warm welcome to Edgar Margffoy, who recently joined Quansight and will be working with the Spyder team to take its development even further. Edgar has been a core Spyder developer for more than two years now, and we are very excited to have his (almost) full-time commitment to the project.

Spyder 4.0 Beta 2 released!

Since August 2018, when the first beta of the 4.x series was released, the Spyder development team has been working hard on our next release. Over the past year, we've implemented the long awaited full-interface dark theme; overhauled our entire code completion and linting architecture to use the Language Server Protocol, opening the door to supporting many other languages in the future; added a new Plots pane to view and manage the figures generated by your code; and numerous other feature enhancements, bug fixes and internal improvements.

Dark theme

A full-interface dark theme has been a long awaited feature, and is enabled by default in Spyder 4. You can still select the light theme under Preferences > Appearance by either choosing a light-background syntax-highlighting scheme, or changing Interface theme to Light.

Screenshot of the Spyder main window with default panes, with the dark theme applied across the entire interface

Pretty, right :-) ?

Read more…

Labs update and April highlights

It has been an exciting first month for me at Quansight Labs. It's a good time for a summary of what we worked on in April and what is coming next.

Progress on array computing libraries

Our first bucket of activities I'd call "innovation". The most prominent projects in this bucket are XND, uarray, metadsl, python-moa, Remote Backend Compiler and arrayviews. XND is an umbrella name for a set of related array computing libraries: xnd, ndtypes, gumath, and xndtools.

Hameer Abbasi made some major steps forward with uarray: the backend and coercion semantics are now largely worked out, there is good documentation, and the unumpy package (which currently has numpy, XND and PyTorch backends) is progressing well. This blog post gives a good overview of the motivation for uarray and its main concepts.

Saul Shanabrook and Chris Ostrouchov worked out how best to put metadsl and python-moa together: metadsl can be used to create the API for python-moa to simplify the code base of the latter a lot. Chris also wrote an interesting blog post explaining the MoA principles.

Read more…

What's New in SymPy 1.4

As of November, 2018, I have been working at Quansight, under the heading of Quansight Labs. Quansight Labs is a public-benefit division of Quansight. It provides a home for a "PyData Core Team" which consists of developers, community managers, designers, and documentation writers who build open-source technology and grow open-source communities around all aspects of the AI and Data Science workflow. As a part of this, I am able to spend a fraction of my time working on SymPy. SymPy, for those who do not know, is a symbolic mathematics library written in pure Python. I am the lead maintainer of SymPy.

SymPy 1.4 was released on April 9, 2019. In this post, I'd like to go over some of the highlights for this release. The full release notes for the release can be found on the SymPy wiki.

To update to SymPy 1.4, use

conda install sympy

or if you prefer to use pip

pip install -U sympy

The SymPy 1.4 release contains over 500 changes from 38 different submodules, so I will not be going over every change, but only a few of the main highlights. A total of 104 people contributed to this release, of whom 66 contributed for the first time for this release.

While I did not personally work on any of the changes listed below (my work for this release tended to be more invisible, behind the scenes fixes), I did do the release itself.

Read more…

uarray: A Generic Override Framework for Methods

uarray: A Generic Override Framework for Methods

uarray is an override framework for methods in Python. In the scientific Python ecosystem, and in other similar places, there has been one recurring problem: That similar tools to do a job have existed, but don't conform to a single, well-defined API. uarray tries to solve this problem in general, but also for the scientific Python ecosystem in particular, by defining APIs independent of their implementations.

Array Libraries in the Scientific Python Ecosystem

When SciPy was created, and Numeric and Numarray unified into NumPy, it jump-started Python's data science community. The ecosystem grew quickly: Academics started moving to SciPy, and the Scikits that popped up made the transition all the more smooth.

However, the scientific Python community also shifted during that time: GPUs and distributed computing emerged. Also, there were old ideas that couldn't really be used with NumPy's API, such as sparse arrays. To solve these problems, various libraries emerged:

  • Dask, for distributed NumPy
  • CuPy, for NumPy on Nvidia-branded GPUs.
  • PyData/Sparse, a project started to make sparse arrays conform to the NumPy API
  • Xnd, which extends the type system and the universal function concept found in NumPy

Read more…

MOA: a theory for composable and verifiable tensor computations

Python-moa (mathematics of arrays) is an approach to a high level tensor compiler that is based on the work of Lenore Mullin and her dissertation. A high level compiler is necessary because there are many optimizations that a low level compiler such as gcc will miss. It is trying to solve many of the same problems as other technologies such as the taco compiler and the xla compiler. However, it takes a much different approach than others guided by the following principles.

  1. What is the shape? Everything has a shape. scalars, vectors, arrays, operations, and functions.
  2. What are the given indicies and operations required to produce a given index in the result?

Having a compiler that is guided upon these principles allows for high level reductions that other compilers will miss and allows for optimization of algorithms as a whole. Keep in mind that MOA is NOT a compiler. It is a theory that guides compiler development. Since python-moa is based on theory we get unique properties that other compilers cannot guarantee:

Read more…

Thoughts on joining Quansight Labs

In his blog post welcoming me, Travis set out his vision for pushing forward the Python ecosystem for scientific computing and data science, and how to fund it. In this post I'll add my own perspectives to that. Given that Quansight Labs' purpose, it seems fitting to start with how I see things as a community member and organizer.

A community perspective

The SciPy and PyData ecosystems have experienced massive growth over the past years, and this is likely to continue in the near future. As a maintainer, that feels very gratifying. At the same time it brings up worries. Core projects struggle to keep up with the growth in number of users. Funded development can help with this, if done right. Some of the things I would like to see from companies that participate in the ecosystem:

  • Explain innovations they're working on to the community and solicit input, at an early stage. Developing something away from the spotlight and then unveiling it as the "next big thing" once it's done usually leads to either corporate-driven projects (if users adopt it) or a short life span.
  • Participate in a sustainable way. This means for example to contribute in a way that lowers, or at least doesn't increase, the overall effort required for maintenance. Only sending pull requests with new features doesn't achieve that. Solving maintenance pain points or helping with code review does.
  • Operate transparently. Develop in the open, plan in the open, be clear about directions and motivations.

Read more…