Post

Weird state of PyPi ecosystem in 2025

There are more lessons to be learned from my Pipreqs dependency confusion

Intro

Just over two years ago I got sidetracked into a weird dependency confusion story while trying to get together a requirements.txt file for an internship challenge web app I strapped together in a few days. Full story is available here.

Tl;dr

Like every great professional I just googled my way out of this. Some wise people advocated in favor of using some third party tool called pipreqs instead of tried and true pip freeze, cause who needs virtual environments, am I right?

Well, I installed it, ran it and got a wrong package in my requirements.txt.

error.png

What was more important, it looked like I was not alone in my misery:

pipreqs_issues_2023.png

My mind started racing — was there something wrong with pipreqs? Can this behavior be exploited?

Yep, there is an issue

Weird world of PIP packages

Remember how you wanted to install a jwt package using pip install jwt only to find out that it is actually called pyjwt in PIP and you installed some crap instead?

Well, there is technically no obligation for the PIP package name and the name of exported package modules to be the same. Nothing stops you from defining a PIP package called my_package_python with the following structure:

1
2
3
4
5
6
7
8
9
my_package_python/
├── pyproject.toml
├── README.md
├── my_package/
│   ├── __init__.py
│   └── code.py   
└── tests/
    └── test_main.py

That will export the package in the actual Python like that:

1
from my_package import code

In fact, a lot of popular projects follow this structure, since cool names were already taken in PIP.

CVE-2023-31543

Main advertisement feature of Pipreqs is “smart” resolution of the packages imported by the given project code. How it’s done? Well, it just tries to look up all of the package names from the import statements at PIP repository. See the issue here?

Given our example package above, Pipreqs will actually try to look up and add the my_package PIP package, not my_package_python.

Nothing happens if there is no such package at PIP. But, if someone is to create such malicious my_package at PIP, Pipreqs will happily add it to your requirements.txt file ;)

Together with the pipreqs developers, we scrambled for a little fix that should’ve prevented remote lookup of the package names if they were installed locally, and called it a day.

Did it get any better?

Fast forward 2 years. I got curious and visited the issues page in Pipreqs. I wish I had not done this. Looks like the fix did not help at all. The issue persisted in one form or another:

pipreqs_issues_2025.png

Is Pipreqs still used?

Unfortunately, pipreqs suggestion is stuck firmly in the 2015 StackOverflow answer.

google_results.png

stackoverflow.png

As an unpleasant surprise, it is also now forever stuck as a viable option somewhere in the back neurons of the LLMs:

gpt_results.png

“Only includes libraries actually imported in the code” — no shit, Sherlock!

Given the pypistats.org data, the tool exploded in popularity in early 2023 and then dropped dramatically somewhere near the date I found the issue and submitted the CVE. Most probably due to it failing to properly do it’s job and creating working requirements.txt. I don’t think me finding a critical vulnerability did anything for the public to stop using it.

Nonetheless, Pipreqs still seems to gain some traction, and as of 2025 it has more than doubled in monthly downloads (500k → 1.25M)

pipreqs_downloads.png

Well, at least my PoC dependency confusion packages should, at least, be long forgotten, right? Wait… WTF???

My dependency confusion packages were not, in fact, forgotten

The monthly downloads increased in 5 times since the publishing of the package in 2023. Currently I happen to poison ~50000 installations in month with my “Gotcha” print message…

jwt_downloads.png

Who is to blame here? This could not have been the Pipreqs for sure — it’s download count dropped by a factor of 5 at the time of publishing the research, and still hasn’t recovered yet. Was I fighting windmills this whole time?

I think the answer might be really simple. There were no code vulnerabilities behind this download count surge. It was people who were confusing the package names all along. I personally fell victim to the “pip install jwt instead of pip install pyjwt” trick at least a dozen of times already.

Python really needs a whole redesign of the import naming.

It won’t get any better. For now, at least

You see, Python comittee does acknowledge the problem in numerous PEPs, including 0708, but backporting a fix into the system used by millions is obviously not an easy job to pull.

image.png

Conclusion

I am fighting the windmills indeed. Go grab pip freeze or something. Don’t use third party tools to generate your dependencies.

This post is licensed under CC BY 4.0 by the author.