Bump chardet from 5.2.0 to 7.0.1 by dependabot[bot] · Pull Request #1644 · cms-dev/cms

dependabot · 2026-03-05T13:44:41Z

Bumps chardet from 5.2.0 to 7.0.1.

Release notes

7.0.1

Fixes

Fixed false UTF-7 detection of SHA-1 git hashes (#324, fixing #323) — requirements files with VCS pins (e.g., +4bafdea3...) were misdetected as UTF-7, breaking tools like tox

Fixed _SINGLE_LANG_MAP missing aliases for single-language encoding lookup (e.g., big5 → big5hkscs)

Fixed PyPy TypeError in UTF-7 codec handling

Improvements

Retrained bigram models — 24 previously failing test cases now pass

Updated language equivalences for mutual intelligibility (Slovak/Czech, East Slavic + Bulgarian, Malay/Indonesian, Scandinavian languages)

New Contributors

@rembish made their first contribution — both reporting the UTF-7 false detection issue and submitting the fix! (#323, #324)

7.0.0

Ground-up, MIT-licensed rewrite of chardet. Same package name, same public API — drop-in replacement for chardet 5.x/6.x. Just way faster and more accurate!

Highlights:

MIT license (previous versions were LGPL)

96.8% accuracy on 2,179 test files (+2.3pp vs chardet 6.0.0, +7.7pp vs charset-normalizer)

41x faster than chardet 6.0.0 with mypyc (28x pure Python), 7.5x faster than charset-normalizer

Language detection for every result (90.5% accuracy across 49 languages)

99 encodings across six eras (MODERN_WEB, LEGACY_ISO, LEGACY_MAC, LEGACY_REGIONAL, DOS, MAINFRAME)

12-stage detection pipeline — BOM, UTF-16/32 patterns, escape sequences, binary detection, markup charset, ASCII, UTF-8 validation, byte validity, CJK gating, structural probing, statistical scoring, post-processing

Bigram frequency models trained on CulturaX multilingual corpus data for all supported language/encoding pairs

Optional mypyc compilation — 1.49x additional speedup on CPython

Thread-safe detect() and detect_all() with no measurable overhead; scales on free-threaded Python 3.13t+

Negligible import memory (96 B)

Zero runtime dependencies

Breaking changes vs 6.0.0:

detect() and detect_all() now default to encoding_era=EncodingEra.ALL (6.0.0 defaulted to MODERN_WEB)

Internal architecture is completely different (probers replaced by pipeline stages). Only the public API is preserved.

LanguageFilter is accepted but ignored (deprecation warning emitted)

chunk_size is accepted but ignored (deprecation warning emitted)

6.0.0.post1

Fixed version number in chardet/version.py still being set to 6.0.0dev0. Otherwise identical to 6.0.0.

6.0.0

Features

Unified single-byte charset detection: Instead of only having trained language models for a handful of languages (Bulgarian, Greek, Hebrew, Hungarian, Russian, Thai, Turkish) and relying on special-case Latin1Prober and MacRomanProber heuristics for Western encodings, chardet now treats all single-byte charsets the same way: every encoding gets proper language-specific bigram models trained on CulturaX corpus data. This means chardet can now accurately detect both the encoding and the language for all supported single-byte encodings.

38 new languages: Arabic, Belarusian, Breton, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Farsi, Finnish, French, German, Icelandic, Indonesian, Irish, Italian, Kazakh, Latvian, Lithuanian, Macedonian, Malay, Maltese, Norwegian, Polish, Portuguese, Romanian, Scottish Gaelic, Serbian, Slovak, Slovene, Spanish, Swedish, Tajik, Ukrainian, Vietnamese, and Welsh. Existing models for Bulgarian, Greek, Hebrew, Hungarian, Russian, Thai, and Turkish were also retrained with the new pipeline.

EncodingEra filtering: New encoding_era parameter to detect allows filtering by an EncodingEra flag enum (MODERN_WEB, LEGACY_ISO, LEGACY_MAC, LEGACY_REGIONAL, DOS, MAINFRAME, ALL) allows callers to restrict detection to encodings from a specific era. detect() and detect_all() default to MODERN_WEB. The new MODERN_WEB default should drastically improve accuracy for users who are not working with legacy data. The tiers are:

MODERN_WEB: UTF-8/16/32, Windows-125x, CP874, CJK multi-byte (widely used on the web)

... (truncated)

Changelog

Sourced from chardet's changelog.

Changelog

7.0.0 (2026-03-02)

Ground-up, MIT-licensed rewrite of chardet. Same package name, same public API — drop-in replacement for chardet 5.x/6.x.

Highlights:

MIT license (previous versions were LGPL)

96.8% accuracy on 2,179 test files (+2.3pp vs chardet 6.0.0, +7.7pp vs charset-normalizer)

41x faster than chardet 6.0.0 with mypyc (28x pure Python), 7.5x faster than charset-normalizer

Language detection for every result (90.5% accuracy across 49 languages)

99 encodings across six eras (MODERN_WEB, LEGACY_ISO, LEGACY_MAC, LEGACY_REGIONAL, DOS, MAINFRAME)

12-stage detection pipeline — BOM, UTF-16/32 patterns, escape sequences, binary detection, markup charset, ASCII, UTF-8 validation, byte validity, CJK gating, structural probing, statistical scoring, post-processing

Bigram frequency models trained on CulturaX multilingual corpus data for all supported language/encoding pairs

Optional mypyc compilation — 1.49x additional speedup on CPython

Thread-safe detect() and detect_all() with no measurable overhead; scales on free-threaded Python 3.13t+

Negligible import memory (96 B)

Zero runtime dependencies

Breaking changes vs 6.0.0:

detect() and detect_all() now default to encoding_era=EncodingEra.ALL (6.0.0 defaulted to MODERN_WEB)

Internal architecture is completely different (probers replaced by pipeline stages). Only the public API is preserved.

LanguageFilter is accepted but ignored (deprecation warning emitted)

chunk_size is accepted but ignored (deprecation warning emitted)

6.0.0 (2026-02-22)

Features:

Unified single-byte charset detection with proper language-specific bigram models for all single-byte encodings (replaces Latin1Prober and MacRomanProber heuristics)

... (truncated)

Commits

330e41e docs: update benchmark numbers for expanded test suite (2,510 files)
83eb965 fix: remove unused cached_specs and add version mismatch diagnostic
b5ef193 feat: skip venv creation when full cache exists for detector
d98e26a fix: use project_root parameter instead of pip_args[0] in _resolve_version_wi...
5a85c25 feat: add helpers for venv-less version/tag resolution and cache checking
f4917a3 Remove plans
06ae339 Use package name in cache filenames and enrich display labels
90fff1d Fix precommit hook failures
611fc0b Bump coverage requirements up to 95% since we have 100%
cc21964 Add separate lint job back
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [chardet](https://github.com/chardet/chardet) from 5.2.0 to 7.0.1. - [Release notes](https://github.com/chardet/chardet/releases) - [Changelog](https://github.com/chardet/chardet/blob/main/docs/changelog.rst) - [Commits](chardet/chardet@5.2.0...7.0.1) --- updated-dependencies: - dependency-name: chardet dependency-version: 7.0.1 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

codecov · 2026-03-05T13:51:37Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.72%. Comparing base (b98e44b) to head (fe095d5).
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1644      +/-   ##
==========================================
- Coverage   54.74%   54.72%   -0.02%     
==========================================
  Files         335      335              
  Lines       27400    27400              
==========================================
- Hits        15000    14995       -5     
- Misses      12400    12405       +5

Flag	Coverage Δ
functionaltests	`0.00% <ø> (ø)`
unittests	`54.72% <ø> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Mar 5, 2026

dependabot bot mentioned this pull request Mar 5, 2026

Bump chardet from 5.2.0 to 7.0.0 #1643

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump chardet from 5.2.0 to 7.0.1#1644

Bump chardet from 5.2.0 to 7.0.1#1644
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/chardet-7.0.1

dependabot bot commented on behalf of github Mar 5, 2026

Uh oh!

codecov bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

0 participants

Conversation

dependabot bot commented on behalf of github Mar 5, 2026

7.0.1

Fixes

Improvements

New Contributors

7.0.0

6.0.0.post1

6.0.0

Features

Changelog

7.0.0 (2026-03-02)

6.0.0 (2026-02-22)

Uh oh!

codecov bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

0 participants

codecov bot commented Mar 5, 2026 •

edited

Loading