Combien de packages Python sont correctement versionnés ?

DDD
Libérer: 2024-10-04 06:11:29
original
641 Les gens l'ont consulté

The other day, as I was looking into a database of vulnerabilities in Python packages, I realized that some of the package versions in there could not be easily parsed and compared with other version strings because they did not abide by the standards of Python versioning - either the old PEP 440 or the Version Specifiers specification that superseded it. So I started wondering how common this was. How many packages on the Python Package Index actually have valid versions?

The obvious answer was: go check. So I created a new virtual environment, downloaded requests, and proceeded to write a multiprocessing script to query the PyPI API for literally every version string used by every package . It took me a few hours even running on all cores but by the end of it I had retrieved over 6,057,703 version strings from 545,018 packages, stored in a neat SQLite database. You can find it on Kaggle.

Next came parsing. I found two libraries that promised to validate a version string for compliance:

  • pepver: "PEP-440 version parsing, interpretation and manipulation"
  • parver: "parver allows parsing and manipulation of PEP 440 version numbers"

Note that to be fair both these still stick to PEP-440, which has now been replaced, so I will keep that in mind, especially when looking at the strings marked as non compliant.

After another couple of hours of intense multiprocessing I had updated my database with two boolean columns indicating whether the strings parsed successfully with these two packages (also on Kaggle).

The results

How many Python packages are versioned correctly?

For a quick summary of my findings:

  • out of 6,057,703 version strings, 5,542 (0.09%) were found defective;

  • out of 545,018 packages, 1,285 (0.24%) had at least one defective version string.

So overall the state of the repository seems pretty healthy! The version strings found wrong by both libraries are of all kinds. Some simply use the suffixes in a non-standard way but overall follow the semantic versioning paradigm while others are just commit hashes or strings of words and numbers.

The cases where the two libraries disagree are more interesting. These are the ones that pepver does not validate but parver does:


0.0.2.R
0.0.2.R3
0.0.2.R4
0.0.2.R5
0.0.2.R6
0.0.2.R7


Copier après la connexion

In this case, I would say pepver is in the wrong. Per PEP440 and current versioning rules, r is an acceptable spelling for the post-release tag (standardised to post), and letters are case-insensitive. So effectively 0.0.2.R3 normalizes to 0.0.2.post3 and is perfectly legal.

Meanwhile, here is a random sample of versions that pepver admits but parver does not:


0.0.1dev-20141025
1.5.0-dev-618
0.3.4.dev.20180830
1.15.0-dev-1552
1.4.0-dev-510
0.0.9.dev-20121012
0.2dev-20101203
0.3.4.dev.20180905
1.15.0-dev-1606
0.2.1dev-20110627
1.12.0-dev-1379
1.1.1-dev-275
1.3.1-dev-427


Copier après la connexion

They all have in common the tendency to use other numbers (occasionally dates) after the dev suffix, with some separator. This is indeed also wrong, as the specification doesn't allow for the separator in this case. So again parver seems right.

Anyway, that pretty much satisfied my original curiosity, and reassured me that for the vast majority of cases, the standard methods of parsing and comparing versions will be sufficient. Even among the non-standard versions it's often fairly easy to identify an order, as the deviations are minimal. Still, it's useful to be aware of all the quirks of the official versioning, and to know when we can or can not rely on them.

Ce qui précède est le contenu détaillé de. pour plus d'informations, suivez d'autres articles connexes sur le site Web de PHP en chinois!

source:dev.to
Déclaration de ce site Web
Le contenu de cet article est volontairement contribué par les internautes et les droits d'auteur appartiennent à l'auteur original. Ce site n'assume aucune responsabilité légale correspondante. Si vous trouvez un contenu suspecté de plagiat ou de contrefaçon, veuillez contacter admin@php.cn
Tutoriels populaires
Plus>
Derniers téléchargements
Plus>
effets Web
Code source du site Web
Matériel du site Web
Modèle frontal
À propos de nous Clause de non-responsabilité Sitemap
Site Web PHP chinois:Formation PHP en ligne sur le bien-être public,Aidez les apprenants PHP à grandir rapidement!