Variable Explorer improvements in Spyder 4

Spyder 4 will be released very soon with lots of interesting new features that you'll want to check out, reflecting years of effort by the team to improve the user experience. In this post, we will be talking about the improvements made to the Variable Explorer.

These include the brand new Object Explorer for inspecting arbitrary Python variables, full support for MultiIndex dataframes with multiple dimensions, and the ability to filter and search for variables by name and type, and much more.

It is important to mention that several of the above improvements were made possible through integrating the work of two other projects. Code from gtabview was used to implement the multi-dimensional Pandas indexes, while objbrowser was the foundation of the new Object Explorer.

Read more…

A new grant for NumPy and OpenBLAS!

I'm very pleased to announce that NumPy and OpenBLAS just received a $195,000 grant from the Chan Zuckerberg Initiative, through its Essential Open Source Software for Science (EOSS) program! This is good news for both projects, and I'm particularly excited about the types of activities we'll be undertaking, what this will mean in terms of growing the community, and to be part of the first round of funded projects of this visionary program.

The program

The press release gives a high level overview of the program, and the grantee website lists the 32 successful applications. Other projects that got funded include SciPy and Matplotlib (it's the very first significant funding for both projects!), Pandas, Zarr, scikit-image, JupyterHub, and Bioconda - we're in good company!

Nicholas Sofroniew and Dario Taborelli, two of the people driving the EOSS program, wrote a blog post that's well worth reading about the motivations for starting this program and the 42 projects that applied and got funded: The Invisible Foundations of Biomedicine.

Read more…

File management improvements in Spyder4

Version 4.0 of Spyder—a powerful Python IDE designed for scientists, engineers and data analysts—is almost ready! It has been in the making for well over two years, and it contains lots of interesting new features. We will focus on the Files pane in this post, where we've made several improvements to the interface and file management tools.

Simplified interface

In order to simplify the Files pane's interface, the columns corresponding to size and kind are hidden by default. To change which columns are shown, use the top-right pane menu or right-click the header directly.

Pane Menu

Read more…

uarray: Attempting to move the ecosystem forward

There comes a time in every project where most technological hurdles have been surpassed, and its adoption is a social problem. I believe uarray and unumpy had reached such a state, a month ago.

I then proceeded, along with Ralf Gommers and Peter Bell to write NumPy Enhancement Proposal 31 or NEP-31. This generated a lot of excellent feedback on the structure and the nuances of the proposal, which you can read both on the pull request and on the mailing list discussion, which led to a lot of restructuring in the contents and the structure of the NEP, but very little in the actual proposal. I take full responsibility for this: I have a bad tendency to assume everyone knows what I'm thinking. Thankfully, I'm not alone in this: It's a known psychological phenomenon.

Read more…

Quansight Labs Work Update for September, 2019

As of November, 2018, I have been working at Quansight. Quansight is a new startup founded by the same people who started Anaconda, which aims to connect companies and open source communities, and offers consulting, training, support and mentoring services. I work under the heading of Quansight Labs. Quansight Labs is a public-benefit division of Quansight. It provides a home for a "PyData Core Team" which consists of developers, community managers, designers, and documentation writers who build open-source technology and grow open-source communities around all aspects of the AI and Data Science workflow.

My work at Quansight is split between doing open source consulting for various companies, and working on SymPy. SymPy, for those who do not know, is a symbolic mathematics library written in pure Python. I am the lead maintainer of SymPy.

In this post, I will detail some of the open source work that I have done recently, both as part of my open source consulting, and as part of my work on SymPy for Quansight Labs.

Bounds Checking in Numba

As part of work on a client project, I have been working on contributing code to the numba project. Numba is a just-in-time compiler for Python. It lets you write native Python code and with the use of a simple @jit decorator, the code will be automatically sped up using LLVM. This can result in code that is up to 1000x faster in some cases:

Read more…

Ruby wrappers for the XND project

Table of Contents

Introduction

Lack of stable and reliable scientific computing software has been a persistent problem for the Ruby community, making it hard for enthusiastic Ruby developers to use Ruby in everything from their web applications to their data analysis projects. One of the most important components of any successful scientific software stack is a well maintained and flexible array computation library that can act as a fast and simple way of storing in-memory data and interfacing it with various fast and battle-tested libraries like LAPACK and BLAS.

Various projects have attempted to make such libraries in the past (and some are still thriving and maintained). Some of the notable ones are numo, nmatrix, and more recently, numruby. These projects attempt to provide a simple Ruby-like API for creating and manipulating arrays of various types. All of them are able to easily interface with libraries like ATLAS, FFTW and LAPACK.

However, all of the above projects fall short in two major aspects:

  • Lack of extensibility to adapt to modern use cases (read Machine Learning).
  • Lack of a critical mass of developers to maintain a robust and fast array library.

The first problem is mainly due to the fact that they do not support very robust type systems. The available data types are limited and are hard to extend to more complex uses. Modern use cases like Machine Learning require a more robust type system (i.e. defining array shapes of arbitrary dimension on multiple devices), as has been demonstrated by the tensor implementations of various frameworks like Tensorflow and PyTorch.

The second problem is due to the fact that all of the aforementioned projects are community efforts that are maintained part-time by developers simply out of a sense of purpose and passion. Sustaining such complex projects for extended periods of time without expectation of any support is simply unfeasible even for the most driven engineers.

This is where the XND project comes in. The XND project is a project for building a common library that is able to meet the needs of the various data analysis and machine learning frameworks that have had to build their own array objects and programming languages. It is built with the premise of extending arrays with new types and various device types (CPUs, GPUs etc.) without loss of performance and ease of use.

Read more…

Quansight Labs Dask Update

This post provides an update on some recent Dask-related activities the Quansight Labs team has been working on.

Dask community work order

Through a community work order (CWO) with the D. E. Shaw group, the Quansight Labs team has been able to dedicate developer time towards bug fixes and feature requests for Dask. This work has touched on several portions of the Dask codebase, but generally have centered around using Dask Arrays with the distributed scheduler.

Read more…

Spyder 4.0 beta4: Kite integration is here

Kite is sponsoring the work discussed in this blog post, and in addition supports Spyder 4.0 development through a Quansight Labs Community Work Order.

As part of our next release, we are proud to announce an additional completion client for Spyder, Kite. Kite is a novel completion client that uses Machine Learning techniques to find and predict the best autocompletion for a given text. Additionally, it collects improved documentation for compiled packages, i.e., Matplotlib, NumPy, SciPy that cannot be obtained easily by using traditional code analysis packages such as Jedi.

alt_text

Read more…

Quansight presence at SciPy'19

Yesterday the SciPy'19 conference ended. It was a lot of fun, and very productive. You can really feel that there's a lot of energy in the community, and that it's growing and maturing. This post is just a quick update to summarize Quansight's presence and contributions, as well as some of the more interesting things I noticed.

A few highlights

The "Open Source Communities" track, which had a strong emphasis on topics like burnout, diversity and sustainability, as well as the keynotes by Stuart Geiger ("The Invisible Work of Maintaining and Sustaining Open-Source Software") and Carol Willing ("Jupyter: Always Open for Learning and Discovery") showed that many more people and projects are paying more attention to and evolving their thinking on the human and organizational aspects of open source.

I did not go to many technical talks, but did make sure to catch Matt Rocklin's talk "Refactoring the SciPy Ecosystem for Heterogeneous Computing". Matt clearly explained some key issues and opportunities around the state of array computing libraries in Python - I highly recommend watching this talk.

Abigail Cabunoc Mayes' talk "Work Open, Lead Open (#WOLO) for Sustainability" was fascinating - it made me rethink the governance models and roles we use for our projects, and I worked on some of her concrete suggestions during the sprints.

Read more…

Ibis: Python data analysis productivity framework

Ibis is a library pretty useful on data analysis tasks that provides a pandas-like API that allows operations like create filter, add columns, apply math operations etc in a lazy mode so all the operations are just registered in memory but not executed and when you want to get the result of the expression you created, Ibis compiles that and makes a request to the remote server (remote storage and execution systems like Hadoop components or SQL databases). Its goal is to simplify analytical workflows and make you more productive.

Ibis was created by Wes McKinney and is mainly maintained by Phillip Cloud and Krisztián Szűcs. Also, recently, I was invited to become a maintainer of the Ibis repository!

Maybe you are thinking: "why should I use Ibis?". Well, if you have any of the following issues, probably you should consider using Ibis in your analytical workflow!

  • if you need to get data from a SQL database but you don't know much about SQL ...
  • if you create SQL statements manually using string and have a lot of IF's in your code that compose specific parts of your SQL code (it could be pretty hard to maintain and it will makes your code pretty ugly) ...
  • if you need to handle data with a big volume ...