Introducing “Python Libraries You Might Not Know” (
PLYMNK), a new series dedicated to raising awareness of awesome projects and their maintainers in the Python Ecosystem. Often, I’ve run across a library that I wondered - “how did I not hear of this before”? My goal is for PLYMNK to evoke this question from even the most seasoned Python developers.
To kick this off, I’m shouting out a library that is responsible for enabling many of the performant Python libraries every data scientist, machine learning engineer, and scientist use on a daily basis: CIBuildWheel.
Each PLYMNK post will answer 3 questions:
- What is it?
- How to use it?
- Who maintains it?
While some have definitely heard of or used it, I think CIBuildWheel is one of the more underrated projects in the Python ecosystem given how much of the Python ecosystem it enables. CIBuildWheel has a pretty impressive user base including Matplotlib, NumPy and scikit-learn.
What is CIBuildWheel?
CIBuildWheel is a blessing for any Python developer who works with platform python builds. The main point of CIBuildWheel is to enable Continuous Integration/Deployment (CI/CD) systems like Github Actions or CircleCI to build a platform library across various operating systems (Linux, Mac, Windows), compilers, and python versions.
What does this mean for daily users of Python?
If you’ve ever developed in Python, you have most likely benefitted from what CIBuildWheel provides for the libraries you used. Python’s attractiveness for many developers is how easy it is to get started in interactive environments across many platforms. Schools often teach Python because of the quick interactive feedback that it provides students. However, this iterative nature of Python, largely from the fact that it is an interpreted language, comes with the drawback of poor performance for many operations.
To solve this aspect of Python, developers use native extensions. Python extensions can be created in all types of languages: C, C++, Fortran, Rust, etc. With extensions, expensive computations can be offloaded to native code that is significantly faster than Python. I won’t enumerate all the possible ways to make extensions but some popular ones are Cython, PyBind, and cffi.
Python distributes packages a format known as “wheels”. Essentially a zip file with some special content, wheels contain everything needed to install and use a Python library in a project. This distribution includes native extensions.
So wait, how does CIBuildWheel help with this?
CIBuildWheel takes a Python project with a native extension, builds the extension for a number of specified platforms, hence “platform build”, and creates a wheel for each individual platform. To fully understand how CIBuildWheel benefits everyday Python developers, it is important to understand the difference between “pure python” and “platform python” builds.
Pure Python vs Platform Python
What is a platform library? In Python, there are essentially two types of builds
- Platform libraries
- Pure Python libraries.
You can tell wether a Python package is a pure python or platform library when you pip install it. For example,
numpy is a platform library because it depends on some C (and Fortran!) extensions which make it much faster.
When installing NumPy, a platform build, on an intel based Mac for Python 3.8 you will see
❯ pip install numpy Collecting numpy Downloading numpy-1.23.4-cp38-cp38-macosx_10_9_x86_64.whl (18.1 MB)
The piece to examine above is the package string:
numpy-1.23.4-cp38-cp38-macosx_10_9_x86_64.whl. Let’s break this down
numpy: The name of the package
1.23.4: The version of the package
cp38: This package is compatible with CPython (not IronPython or PyPy) version 3.8
cp38(second one): This package depends on the Application Binary Interface (ABI) of Python version 3.8
macosx_10_9: this package is compatible with MacOS versions 10.9 and forward.
x86_64: Built for x86 CPUs (i.e. intel/AMD and not ARM)
.whl signifies that this is a Python wheel.
Let’s contrast this with a pure Python library like
Tabulate on the same system.
❯ pip install tabulate Collecting tabulate Downloading tabulate-0.9.0-py3-none-any.whl (35 kB)
Again, let’s break down this package string:
tabulate: The name of the package
0.9.0: The version of the package
py3: This package is compatible any Python3 distribution.
none: This package does not depend on the Application Binary Interface (ABI) of a specific Python version (because it is only python code).
any: this package run on any system supporting Python3 (Mac/linux/windows/arm/x86/etc).
From the breakdown above, we can see that the wheel that pip installed for
NumPy is a wheel specific to my intel-based (x86) Mac (MacOS) and Python 3.8. This wheel will not install (or work) on a different platform (like arm) or Python version (3.9).
In contrast, I could take the wheel file for
Tabulate and install it on
- A linux server running Python 3.7
- A windows laptop running Python 3.9
- A Jetson Nano (arm) or Mac laptop with apple silicon (arm) with no issues because it is a pure python package.
So now to answer the question of
So is CIBuildWheel important?
CIBuildWheel makes it possible for Python library developers to make platform builds for a crazy number of operating systems, cpu types, python versions and more. It’s used to both test these builds in many settings (CI) and deploy these builds to PyPI (CD).
For everyday Python developers, the combination of Pip, PyPi, and CIBuildWheel enables usage of native extensions, and hence a performance boost, without even knowing the Python library being used has native code. Unless of course, you look at the package names installed by Pip like I do.
How do I use it?
First, just like that second piece of cake, ask yourself if you need it. There is a large amount of complexity involved in the CI/CD process of platform builds for Python.1 While CIBuildWheel makes this easier (dare I say possible), you’re much better off with a pure Python library as it will be much easier to ensure compatibility across platforms.
However, if you have the need for speed, or simply want to try it out, the easiest place to start is with Github Actions.
Below is an example Action from the CIBuildWheel Github page:
name: Build on: [push, pull_request] jobs: build_wheels: name: Build wheels on $ runs-on: $ strategy: matrix: os: [ubuntu-20.04, windows-2019, macOS-11] steps: - uses: actions/checkout@v3 # Used to host cibuildwheel - uses: actions/setup-python@v3 - name: Install cibuildwheel run: python -m pip install cibuildwheel==2.11.2 - name: Build wheels run: python -m cibuildwheel --output-dir wheelhouse # to supply options, put them in 'env', like: # env: # CIBW_SOME_OPTION: value - uses: actions/upload-artifact@v3 with: path: ./wheelhouse/*.whl
The above action will
- Check out the Github Repo where the action is located
- Install Python and CIBuildWheel
- Build the package within the repo for Windows, Linux (ubuntu), and MacOS
- Store the built wheels (with native code) for download.
For a more complex example that includes different compilers and options, see the example I built for the SmartRedis library which uses a C++ Redis client wrapped with Pybind to expose a Python interface.
For even more help and better usage examples, see the CIBuildWheel documentation.
Who Maintains it?
Go give them some love, and next time you make a plot or load in a CSV, think about all the work that went into making that operation fast as hell.