Read Our Blog

TDK-Micronas partners with Quansight to sponsor Spyder

TDK-Micronas is sponsoring Spyder development efforts through Quansight Labs. This will enable the development of some features that have been requested by our users, as well as new features that will help TDK develop custom Spyder plugins in order to complement their Automatic Test Equipment (ATE’s) in the development of their Application Specific Integrated Circuits (ASIC’s).

At this point it may be useful to clarify the relationship the role of Quansight Labs in Spyder's development and the relationship with TDK. To quote Ralf Gommers (director of Quansight Labs):

"We're an R&D lab for open source development of core technologies around data science and scientific computing in Python. And focused on growing communities around those technologies. That's how I see it for Spyder as well: Quansight Labs enables developers to be employed to work on Spyder, and helps with connecting them to developers of other projects in similar situations. Labs should be an enabler to let the Spyder project, its community and individual developers grow. And Labs provides mechanisms to attract and coordinate funding. Of course the project is still independent. If there are other funding sources, e.g. donations from individuals to Spyder via OpenCollective, all the better."

Multiple Projects aka Workspaces

In its current state Spyder can only handle one active project at a time. Although in the past we had basic support for workspaces, it was never a fully functional feature, so to ease development and simplify the user experience, we decided to remove it in the 3.x series.

Read more…

metadsl: A Framework for Domain Specific Languages in Python

metadsl: A Framework for Domain Specific Languages in Python

Hello, my name is Saul Shanabrook and for the past year or so I have been at Quansight exploring the array computing ecosystem. This started with working on the xnd project, a set of low level primitives to help build cross platform NumPy-like APIs, and then started exploring Lenore Mullin's work on a mathematics of arrays. After spending quite a bit of time working on an integrated solution built on these concepts, I decided to step back to try to generalize and simplify the core concepts. The trickiest part was not actually compiling mathematical descriptions of array operations in Python, but figuring out how to make it useful to existing users. To do this, we need to meet users where they are at, which is with the APIs they are already familiar with, like numpy. The goal of metadsl is to make it easier to tackle parts of this problem seperately so that we can collaborate on tackling it together.

Libraries for Scientific Computing

Much of the recent rise of Python's popularity is due to its usage for scientific computing and machine learning. This work is built on different frameworks, like Pandas, NumPy, Tensorflow, and scikit-learn. Each of these are meant to be used from Python, but have their own concepts and abstractions to learn on top of the core language, so we can look at them as Domain Specific Languages (DSLs). As the ecosystem has matured, we are now demanding more flexibility for how these languages are executed. Dask gives us a way to write Pandas or NumPy and execute it across many cores or computers, Ibis allows us to write Pandas but on a SQL database, with CuPy we can execute NumPy on our GPU, and with Numba we can optimize our NumPy expession on a CPU or GPU. These projects prove that it is possible to write optimizing compilers that target varying hardware paradigms for existing Python numeric APIs. However, this isn't straightforward and these projects success is a testament to the perserverence and ingenuity of the authors. We need to make it easy to add reusable optimizations to libraries like these, so that we can support the latest hardware and compiler optimizations from Python. metadsl is meant to be a place to come together to build a framework for DSLs in Python. It provides a way to seperate the user experience from the the specific of execution, to enable consistency and flexibility for users. In this post, I will go through an example of creating a very basic DSL. It will not use the metadsl library, but will created in the same style as metadsl to illustrate its basic principles.

Community-driven open source and funded development

Quansight Labs is an experiment for us in a way. One of our main aims is to channel more resources into community-driven PyData projects, to keep them healthy and accelerate their development. And do so in a way that projects themselves stay in charge.

This post explains one method we're starting to use for this. I'm writing it to be transparent with projects, the wider community and potential funders about what we're starting to do. As well as to explicitly solicit feedback on this method.

Community work orders

If you talk to someone about supporting an open source project, in particular a well-known one that they rely on (e.g. NumPy, Jupyter, Pandas), they're often willing to listen and help. What you quickly learn though is that they want to know in some detail what will be done with the funds provided. This is true not only for companies, but also for individuals. In addition, companies will likely want a written agreement and some form of reporting about the progress of the work. To meet this need we came up with community work orders (CWOs) - agreements that outline what work will be done on a project (implementing new features, release management, improving documentation, etc.) and outlining a reporting mechanism. What makes a CWO different from a consulting contract? Key differences are:

  1. It must be work that is done on the open source project itself (and not e.g. on a plugin for it, or a customization for the client).
  2. The developers must have a reasonable amount of freedom to decide what to work on and what the technical approach will be, within the broad scope of the agreement.
  3. Deliverables cannot be guaranteed to end up in a project; instead the funder gets the promise of a best effort of implementation and working with the community.

Respecting community processes

Point 3 above is particularly important: we must respect how open source projects make decisions. If the project maintainers decide that they don't want to include a particular change or new feature, that's their decision to make. Any code change proposed as part of work on a CWO has to go through the same review process as any other change, and be accepted on its merits. The argument "but someone paid for this" isn't particularly strong, nor is one that reviewers should have to care about. Now of course we don't expect it to be common for work to be rejected. An important part of the Quansight value proposition is that because we understand how open source works and many of our developers are maintainers and contributors of the open source projects already, we propose work that the community already has interest in and we open the discussion about any major code change early to avoid issues.

Read more…

Measuring API usage for popular numerical and scientific libraries

Developers of open source software often have a difficult time understanding how others utilize their libraries. Having better data of when and how functions are being used has many benefits. Some of these are:

  • better API design
  • determining whether or not a feature can be deprecated or removed.
  • more instructive tutorials
  • understanding the adoption of new features

Python Namespace Inspection

We wrote a general tool python-api-inspect to analyze any function/attribute call within a given set of namespaces in a repository. This work was heavily inspired by a blog post on inspecting method usage with Google BigQuery for pandas, NumPy, and SciPy. The previously mentioned work used regular expressions to search for method usage. The primary issue with this approach is that it cannot handle import numpy.random as rand; rand.random(...) unless additional regular expressions are constructed for each case and will result in false positives. Additionally, BigQuery is not a free resource. Thus, this approach is not general enough and does not scale well with the number of libraries that we would like to inspect function and attribute usage.

A more robust approach is to inspect the Python abstract syntax tree (AST). Python comes with a performant method from the ast module ast.parse(...) for constructing a Python AST from source code. A node visitor is used to traverse the AST and record import statements, and function/attribute calls. This allows us to catch any absolute namespace reference. The following are cases that python-api-inspect catches:

Read more…

Spyder 4.0 takes a big step closer with the release of Beta 2!

It has been almost two months since I joined Quansight in April, to start working on Spyder maintenance and development. So far, it has been a very exciting and rewarding journey under the guidance of long time Spyder maintainer Carlos Córdoba. This is the first of a series of blog posts we will be writing to showcase updates on the development of Spyder, new planned features and news on the road to Spyder 4.0 and beyond.

First off, I would like to give a warm welcome to Edgar Margffoy, who recently joined Quansight and will be working with the Spyder team to take its development even further. Edgar has been a core Spyder developer for more than two years now, and we are very excited to have his (almost) full-time commitment to the project.

Spyder 4.0 Beta 2 released!

Since August 2018, when the first beta of the 4.x series was released, the Spyder development team has been working hard on our next release. Over the past year, we've implemented the long awaited full-interface dark theme; overhauled our entire code completion and linting architecture to use the Language Server Protocol, opening the door to supporting many other languages in the future; added a new Plots pane to view and manage the figures generated by your code; and numerous other feature enhancements, bug fixes and internal improvements.

Dark theme

A full-interface dark theme has been a long awaited feature, and is enabled by default in Spyder 4. You can still select the light theme under Preferences > Appearance by either choosing a light-background syntax-highlighting scheme, or changing Interface theme to Light.

Screenshot of the Spyder main window with default panes, with the dark theme applied across the entire interface.

Pretty, right :-) ?

Read more…

Labs update and April highlights

It has been an exciting first month for me at Quansight Labs. It's a good time for a summary of what we worked on in April and what is coming next.

Progress on array computing libraries

Our first bucket of activities I'd call "innovation". The most prominent projects in this bucket are XND, uarray, metadsl, python-moa, Remote Backend Compiler and arrayviews. XND is an umbrella name for a set of related array computing libraries: xnd, ndtypes, gumath, and xndtools.

Hameer Abbasi made some major steps forward with uarray: the backend and coercion semantics are now largely worked out, there is good documentation, and the unumpy package (which currently has numpy, XND and PyTorch backends) is progressing well. This blog post gives a good overview of the motivation for uarray and its main concepts.

Saul Shanabrook and Chris Ostrouchov worked out how best to put metadsl and python-moa together: metadsl can be used to create the API for python-moa to simplify the code base of the latter a lot. Chris also wrote an interesting blog post explaining the MoA principles.

Read more…

What's New in SymPy 1.4

As of November, 2018, I have been working at Quansight, under the heading of Quansight Labs. Quansight Labs is a public-benefit division of Quansight. It provides a home for a "PyData Core Team" which consists of developers, community managers, designers, and documentation writers who build open-source technology and grow open-source communities around all aspects of the AI and Data Science workflow. As a part of this, I am able to spend a fraction of my time working on SymPy. SymPy, for those who do not know, is a symbolic mathematics library written in pure Python. I am the lead maintainer of SymPy.

SymPy 1.4 was released on April 9, 2019. In this post, I'd like to go over some of the highlights for this release. The full release notes for the release can be found on the SymPy wiki.

To update to SymPy 1.4, use

conda install sympy

or if you prefer to use pip

pip install -U sympy

The SymPy 1.4 release contains over 500 changes from 38 different submodules, so I will not be going over every change, but only a few of the main highlights. A total of 104 people contributed to this release, of whom 66 contributed for the first time for this release.

While I did not personally work on any of the changes listed below (my work for this release tended to be more invisible, behind the scenes fixes), I did do the release itself.

Read more…

uarray: A Generic Override Framework for Methods

uarray: A Generic Override Framework for Methods

uarray is an override framework for methods in Python. In the scientific Python ecosystem, and in other similar places, there has been one recurring problem: That similar tools to do a job have existed, but don't conform to a single, well-defined API. uarray tries to solve this problem in general, but also for the scientific Python ecosystem in particular, by defining APIs independent of their implementations.

Array Libraries in the Scientific Python Ecosystem

When SciPy was created, and Numeric and Numarray unified into NumPy, it jump-started Python's data science community. The ecosystem grew quickly: Academics started moving to SciPy, and the Scikits that popped up made the transition all the more smooth.

However, the scientific Python community also shifted during that time: GPUs and distributed computing emerged. Also, there were old ideas that couldn't really be used with NumPy's API, such as sparse arrays. To solve these problems, various libraries emerged:

  • Dask, for distributed NumPy
  • CuPy, for NumPy on Nvidia-branded GPUs.
  • PyData/Sparse, a project started to make sparse arrays conform to the NumPy API
  • Xnd, which extends the type system and the universal function concept found in NumPy

Read more…

MOA: a theory for composable and verifiable tensor computations

Python-moa (mathematics of arrays) is an approach to a high level tensor compiler that is based on the work of Lenore Mullin and her dissertation. A high level compiler is necessary because there are many optimizations that a low level compiler such as gcc will miss. It is trying to solve many of the same problems as other technologies such as the taco compiler and the xla compiler. However, it takes a much different approach than others guided by the following principles.

  1. What is the shape? Everything has a shape. scalars, vectors, arrays, operations, and functions.
  2. What are the given indicies and operations required to produce a given index in the result?

Having a compiler that is guided upon these principles allows for high level reductions that other compilers will miss and allows for optimization of algorithms as a whole. Keep in mind that MOA is NOT a compiler. It is a theory that guides compiler development. Since python-moa is based on theory we get unique properties that other compilers cannot guarantee:

Read more…

Thoughts on joining Quansight Labs

In his blog post welcoming me, Travis set out his vision for pushing forward the Python ecosystem for scientific computing and data science, and how to fund it. In this post I'll add my own perspectives to that. Given that Quansight Labs' purpose, it seems fitting to start with how I see things as a community member and organizer.

A community perspective

The SciPy and PyData ecosystems have experienced massive growth over the past years, and this is likely to continue in the near future. As a maintainer, that feels very gratifying. At the same time it brings up worries. Core projects struggle to keep up with the growth in number of users. Funded development can help with this, if done right. Some of the things I would like to see from companies that participate in the ecosystem:

  • Explain innovations they're working on to the community and solicit input, at an early stage. Developing something away from the spotlight and then unveiling it as the "next big thing" once it's done usually leads to either corporate-driven projects (if users adopt it) or a short life span.
  • Participate in a sustainable way. This means for example to contribute in a way that lowers, or at least doesn't increase, the overall effort required for maintenance. Only sending pull requests with new features doesn't achieve that. Solving maintenance pain points or helping with code review does.
  • Operate transparently. Develop in the open, plan in the open, be clear about directions and motivations.

Read more…