Today I'm incredibly excited to welcome Tania Allard to Quansight as Co-Director of Quansight Labs. Tania (GitHub, Twitter, personal site) is a well-known and prolific PyData community member. In the past few years she has been involved as a conference organizer (JupyterCon, SciPy, PyJamas, PyCon UK, PyCon LatAm, JuliaCon and more), as a community builder (PyLadies, NumFOCUS, RForwards), as a contributor to Matplotlib and Jupyter, and as a regular speaker and mentor. She also brings relevant experience in both industry and academia - she joins us from Microsoft where she was a senior developer advocate, and has a PhD in computational modelling.
Read Our Blog
JupyterLab 3.0 is about to be released and provides many improvements to the extension system. Theming is a way to extend JupyterLab and benefits from those improvements.
While theming is often disregarded as a purely cosmetic endeavour, it can greatly improve software. Theming can be great help for accessibility, and the Jupyter team pays attention to making the default appearance accessibility-aware by using sufficient contrast. For users with a high visual acuity you may also choose to increase the information density.
Theming can also be a great way to improve communication by increasing or decreasing emphasis of the user interface, which can be of use for teaching or presenting. Theming may also help with security, for example, by having a clear distinction between staging and production.
Finally Theming can be a great way to express oneself, for example, by using a branded version of software that fits well into a context, or expressing one's artistic preferences or opinions.
In the following blog post, we will show you step-by-step how you can develop a custom theme for JupyterLab, distribute it, and take the example of the jupyterlab-theme-winter theme we release today to celebrate the end of 2020.
I am happy to announce that NumPy and OpenBLAS have once again been awarded a grant from the Chan Zuckerberg Initiative through Cycle 3 of the Essential Open Source Software for Science (EOSS) program. This new grant totaling $140,000 will fund part of our efforts to improve usability and sustainability in both projects and is excellent news for the scientific computing community, which will certainly benefit from this work downstream.
This blog post is a conversation. Portions lead by Tim George are marked with TG, and those lead by Isabela Presedo-Floyd are marked with IPF.
TG: When I speak with other designers, one common theme I see concerning why they chose this career path is they want to make a difference in the world. We design because we imagine a better world and we want to help make it real. Part of the reason we design as a career is we're unable to go through life without designing; we're always thinking about how things are and how they could be better. This ethos also exists in many open-source communities. It seems like it ought to be an ideal match.
So what's the disconnect? I'm still exploring that myself, but after a few years in open source I want to share my observations, experiences, and hope for a stronger collaboration between design and development. I don't think I have a complete solution, and some days I'm not even sure I grasp the entire problem. What I hope is to say that which often goes unsaid in these spaces: design and development skills in open source coexist precariously.
In our recent Ibis post, we discussed querying & retrieving data using a familiar Pandas-like interface. That discussion focused on the fluent API that Ibis provides to query structure from a SQLite database—in particular, using a single specific backend. In this post, we'll explore Ibis's ability to answer questions about data using two different Ibis backends.
import ibis.omniscidb, dask, intake, sqlalchemy, pandas, pyarrow as arrow, altair, h5py as hdf5
Ibis in the scientific Python ecosystem
Before we delve into the technical details of using Ibis, we'll consider Ibis in the greater historical context of the scientific Python ecosystem. It was started by Wes McKinney, the creator of Pandas, as way to query information on the Hadoop distributed file system and PySpark. More backends were added later as Ibis became a general tool for data queries.
Throughout the rest of this post, we'll highlight the ability of Ibis to generically prescribe query expressions across different data storage systems.
The basic installation format for users who install packages via
the wheel format. Wheel names are composed of four parts: a
package-name-and-version tag (which can be further broken down), a Python tag,
an ABI tag, and a platform tag. More information on the tags can be found in
PEP 425. So a package like NumPy
will be available on PyPI as
numpy-1.19.2-cp36-cp36m-win_amd64.whl for 64-bit
numpy-1.19.2-cp36-cp36m-macosx_10_9_x86_64.whl for macOS. Note
that only the plaform tag
But what about Linux? There is no single, vendor controlled, "Linux platform" e.g., Ubuntu, RedHat, Fedora, Debian, FreeBSD all package software at slightly different versions. What most Linux distributions do have in common is the glibc runtime library, and a smattering of various additional system libraries. So it is possible to define a least common denominator (LCD) of software expected to be on a Linux platform (exceptions apply, e.g. non-glibc distributions).
The decision to converge on a LCD common platform gave birth to the
manylinux1 standard. Going back
to our example, numpy is available as
The first manylinux standard, manylinux1, was based on CentOS5 which has been obsolete since March 2017. The subsequent manylinux2010 standard is based on CentOS6, which will hit end-of-life in December 2020. The manylinux2014 standard still has some breathing room. Based on CentOS7, it will reach end-of-life in July 2024.
So what is next for
manylinux, and what
manylinux should users and package
In several industry and science applications, a filesystem-like storage model such as HDF5 is the more appropriate solution for manipulating large amounts of data. However, suppose that data changes over time. In that case, it's not obvious how to track those different versions, since HDF5 is a binary format and is not well suited for traditional version control systems and tools.
In a previous post, we introduced the Versioned HDF5 library, which implements a mechanism for storing binary data sets in a versioned way that feels natural to users of other version control systems, and described some of its features. In this post, we'll show some of the performance analysis we did while developing the library, hopefully making the case that reading and writing versioned HDF5 files can be done with a nice, intuitive API while being as efficient as possible. The tests presented here show that using the Versioned HDF5 library results in reduced disk space usage, and further reductions in this area can be achieved with the use of HDF5/h5py-provided compression algorithms. That only comes at a cost of <10x file writing speed.