Quantcast
Channel: Seth Michael Larson
Viewing all articles
Browse latest Browse all 41

Patching the libwebp vulnerability across the Python ecosystem

$
0
0
Patching the libwebp vulnerability across the Python ecosystem

Patching the libwebp vulnerability across the Python ecosystem

Published 2023-10-25by Seth Larson
Reading time: minutes

This critical role would not be possible without funding from the OpenSSF Alpha-Omega Project. Massive thank-you to Alpha-Omega for investing in the security of the Python ecosystem!

Vulnerabilities in extremely prolific software components like CVE-2023-4863 affecting libwebp have shown the far-reaching effects that vulnerabilities in bundled open source software can have. libwebp was bundled along with an uncountable number of software installations from iOS, all browsers, all Electron apps, and more.

Python's ecosystem of packages is no different, many projects relied on libwebp for processing images and due to the simple nature of the vulnerability it is likely that many usages of those libraries were also unsafe. In order to learn about mobilizing an entire upstream open source software ecosystem to patch, I set out to do just that for libwebp and documented the experience.

I also wanted to learn about ways that consumers of packages can do their part without putting additional burden on maintainers during a time of mass-patching or mitigating vulnerabilities which have active exploits. This work started shortly after the CVE was associated with libwebp but wasn't published until now to give projects a chance to patch their releases.

Mitigating a vulnerability on this scale requires the following steps:

  • Determine projects which bundle a vulnerable version of libwebp
  • Contact each project which hasn't already patched
  • Wait for releases to be published
  • Notify users about the vulnerable bundled component

Python wheels that include binary extensions tend to bundle shared libraries that they depend on in order to work automatically when installed. There are tools that make this process of bundling easier like auditwheel, delvewheel, and delocate.

Finding projects with vulnerable libwebp

The initial difficulty of this problem is that bundled shared libraries in wheels aren't included in any Python packaging metadata unlike Python package dependencies.

In order to find libwebp shared libraries inside of Python packages on PyPI via a queryable interface I downloaded Tom Forbes' dataset on PyPI locally and queried using DuckDB. Thanks to Tom for putting this together and helping me get started with the dataset.

SELECTproject_name,pathFROM'*.parquet'WHEREregexp_matches(path,'/(lib)?webp[^/]*\.(a|so|dylib|dll)')GROUPBYproject_name;

From examining the output of this query and combining the result with a dataset about downloads, the following projects (among others) get highlighted:

ProjectDownloads/month
Pillow70,053,401
opencv-python21,714,182*
pyproj8,106,066
Fiona6,514,863
rasterio1,953,666

* Sum of downloads for all opencv-*-python flavors.

Looking at all the projects the total monthly downloads exceeds 100,000,000, and that's a lot of downloads! To figure out what the relative magnitude is compared to other bundled shared libraries I ran the following query which attempts to normalize names of shared libraries, so they can be grouped together more easily:

SELECTregexp_replace(regexp_replace(regexp_extract(path,'([^/]+)$'),'\.[0-9\.]*[0-9]','.X'),'-[a-f0-9]{8}\.so','.so')ASlib,COUNT(DISTINCTproject_name)ASprojects,LIST(project_name)FROM'*.parquet'WHEREregexp_matches(path,'/[^/]*\.(a|so|dylib|dll)(\.[^/]+)?$')GROUPBYlibORDERBYprojectsDESCLIMIT1000;

With some manual labeling and data massaging I was able to end up with this table:

Bundled LibraryNameMin Projects
libgcc_s.so.XGCC Runtime920
libgomp.so.XGNU OpenMP747
libstdc++.so.XGNU C++527
libz.so.Xzlib487
libgfortran.so.Xlibgfortran374
libquadmath.so.XGCC Quad Precision Math372
libcrypto.so.X / libssl.so.XOpenSSL (or others)341
liblzma.so.XXz Utils235
libbz2.so.XBzip2200
libselinux.so.XSE Linux189

Compare the above values to libwebp which was bundled by around 50 projects. Note that the last column is "min projects", putting this together was a manual task that I didn't want to spend too much time on, so the magnitude difference is at least that much. The sums above are from all extensions (.so, .dll, .a, and .dylib) not only .so files.

So I had a list of projects which were likely to be impacted. At this point there was not much information about exactly which libwebp functions were impacted only that the vulnerability could be triggered by loading a maliciously crafted image. This meant I didn't have the full information available to know whether a project's usage was definitely vulnerable. This highlights the importance of having as much information as possible in the upstream CVE to avoid churn.

Reaching out to each project and waiting for releases

I saw that Pillow had quickly updated their own wheels, so I didn't have to reach out to the maintainers of that project. Other projects however I had to manually find the security contact information and then reach out. Finding security contact information was difficult for most of the projects as they didn't have a SECURITY.md file in their repositories and some had no contact information directly listed in package metadata at all.

Looking at OpenSSF Scorecard data for the top 5,000 projects on PyPI by downloads, there hasn't been a noticeable change in the percentage of projects which have a defined security policy on GitHub (19% in July 2022, 22% today).

If maintainers plan to handle some security reports I recommend having a short security policy that references how to get in contact, usually via GitHub Security Advisories on the repository or via email.

This was the message template that I used when initially contacting maintainers. I've included it here because I've since learned a few things I want to change about how I reached out to folks:

Hello,

I'm Seth Larson, Security Developer-in-Residence at the PSF. I'm emailing you because there was no documented security policy for (project), if this isn't the correct channel to contact for these types of issues let me know.

I am contacting you because I believe (project) may be vulnerable to CVE-2023-4863 due to bundling a vulnerable version of libwebp in project wheels. v1.3.2 of libwebp fixes the vulnerability. CVE-2023-4863 is on CISA Known Exploited Vulnerabilities list meaning there are active exploits in the wild.

My recommendation to fix the vulnerability:

  • Create a new patch release of (project) with the fixed version of libwebp.
  • In the changelog, mention that the fix upgrades the version of libwebp to not be vulnerable to CVE-2023-4863.

You don't need a new CVE for this fix since the vulnerability exists in a bundled component. After that I can submit an advisory to the PyPA Advisory database on your behalf. Let me know if you have questions.

I received replies from each project that I reached out to, many folks were appreciative of the tap on the shoulder to upgrade their dependency:

"The Python community is fortunate to have you on the beat."

💜

I did however want to note the following through this process:

  • This report added stress to maintainers, likely due to the known exploited status of this vulnerability. In the future there may be a better way of phrasing this.
  • Introducing myself as Security Developer-in-Residence might have come off as "I'm telling you what to do" and that wasn't my intention. This role isn't to boss anyone around, my hope is that the merits of my suggestions along with concurrent time investment into solving the problem together will make the process easier.

There were also a few reasons why a patch-and-release was difficult for some projects, among them having no access to proper build platforms and being blocked on dependencies that were bundling libwebp such as SDL.

Notifying users about the vulnerable bundled component

Because libwebp isn't listed in any packaging metadata it's not currently possible for vulnerability detection tooling to alert based on insecure versions of libwebp there needs to be additional work to make vulnerability detection tooling to work.

For this task, I added entries in the PyPA Advisory database so tools like pip-audit will be able to detect vulnerable bundling of libwebp until there's a standard for encoding bundled projects into Python packaging metadata.

# Use pip freeze to see we have an insecure# version of Pillow (10.0.0) installed.
$ python -m pip freeze
...
Pillow==10.0.0

# Install pip-audit and run it against the current environment:
$ python -m pip install pip-audit
$ pip-audit

Found 3 known vulnerabilities in1 package
Name   Version ID                  Fix Versions
------ ------- ------------------- ------------
pillow 10.0.0  PYSEC-2023-175      10.0.1
pillow 10.0.0  GHSA-j7hp-h8jx-5ppr 10.0.1
pillow 10.0.0  GHSA-56pw-mpj4-fxww 10.0.1

# Upgrade Pillow, then see that pip-audit is happy now!
$ python -m pip install --upgrade Pillow
$ pip-audit

No known vulnerabilities found

You could configure pip-audit to use the OSV database as a source which for Pillow that would work as expected due to using GitHub Security Advisories (GHSA) as source as well but not all projects I contacted created a GitHub Security Advisory. For those cases having a PYSEC vulnerability ID was needed.

Bundling, debundling, and software repositories

Thinking about bundling generally, there are a few different types of software repositories that each have their own behavior:

  • Arbitrary (Debian, Red Hat, Conda)
  • Ecosystem-specific (PyPI, NPM, RubyGems)
  • Applications (Dockerhub, Quay, App Stores)

The Arbitrary software repositories can debundle shared libraries much easier than others because they're capable of installing and managing arbitrary software files rather than only files from a certain ecosystem. The libwebp vulnerability tended to affect applications and ecosystem-specific software packages. Patching an arbitrary software repository only requires patching in one place and ensuring dependent packages aren't restricting its use.

It's interesting to compare Conda to PyPI, as both are known for their Python packaging ecosystems however it's much more likely for "upstream" development of Python packages to land in PyPI and then be redistributed via Conda.

What I learned?

Overall I learned a lot from this exercise:

  • Detecting bundled shared libraries is imperfect, need a method to identify and version them and for that method to be automatic for broad adoption by Python packages.
  • Downstream users may need to install from source if upstream isn't able to patch directly. Currently, there's no way to "unbundle" a wheel or to patch it again with an up-to-date shared library from the system.
  • Contacting maintainers is a manual effort, many projects don't have a security policy.

Other items

That's all for this week! 👋 If you're interested in more you can read next week's report or last week's report.

Don't let social media algorithms decide what you want to see.

Never miss an article and support the decentralized web. Get guaranteed notifications for new publications by following the RSS feed or the email newsletter. Send any thoughts and questions you have via Mastodon or email.

Thanks for reading!
— Seth

This work is licensed under CC BY-SA 4.0


Viewing all articles
Browse latest Browse all 41

Trending Articles