Security: v3.0.1 hardening#10
Merged
Merged
Conversation
`Host::tld()` built a PCRE pattern from `'.' . $tld` after escaping only
dots. Every other PCRE metacharacter (`* + ? | ^ $ ( ) [ ] { } \`) passed
through unescaped, and `$tld` is reachable from a public setter, so
malicious or surprising input could change the regex shape, trigger
PCRE compile errors, or open a small ReDoS surface.
The regex was unnecessary: `$tld` is the rightmost suffix of `$host`, so
`str_ends_with` + `substr` removes it cleanly without building any
pattern. Behavior for every existing test case is unchanged.
`strpos($host, 'http://') !== false` matches the literal anywhere in the string, not at position 0, so an input like `evil.example.com/?u=http://x` was treated as already-schemed and the `http://` prefix was not prepended before `parse_url`. The intent was "does the input start with a scheme?", so use `str_starts_with`.
- `Host::tld()` accepts regex metacharacters without crashing. - `HostParser` only treats a scheme as present when the input *starts* with `http://`/`https://`, not when those literals appear elsewhere in a URL's path/query. - `HostParser` still rejects input without an extractable host. - Sanity-check the substring-TLD case (`a.b.c.compass.com`) survives the regex→string-op swap in `Host::tld()`.
Standalone guide covering installation, secure PSL fetch, caching, worked examples, full API surface, error-handling model, and security notes. README will be slimmed in a follow-up commit and link here.
Keep README as a landing page: tagline, requirements, installation, 10-line quick start, and links to USAGE/SECURITY/CHANGELOG. The full usage guide now lives in docs/USAGE.md.
- CHANGELOG: add 3.0.1 entry with Security/Deprecated/Docs subsections. - SECURITY: mark v3.0.0 as deprecated, add a "Known insecure versions" subsection pointing v3.0.0 users at v3.0.1. No exploit detail is published. See the CHANGELOG for the high-level description of the two fixes.
Bump composer.json to 3.0.1. Final commit before tagging.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two defects in the host/TLD parsing path are fixed and v3.0.0 is marked deprecated. No public API changes; all existing tests still pass.
Fixes
Host::tld()— replace dynamic-regex suffix removal withstr_ends_with+substr. The old code escaped only.characters before passing the resulting pattern throughpreg_replace, so any other PCRE metacharacter (* + ? | ^ $ ( ) [ ] { } \) in the supplied TLD landed in a hand-built pattern unescaped — andtld()is a public setter, so a caller could feed it arbitrary input. Result: PCRE compile errors, unintended match behavior, small ReDoS surface. The regex was unnecessary; the suffix is just.<tld>at the end of the host.HostParser— replacestrpos(...) !== falsescheme detection withstr_starts_with. The old check matchedhttp:///https://anywhere in the input, so URLs likeevil.example.com/?u=http://xwere treated as already-schemed and skipped the prefix needed byparse_url.Other changes
tests/Unit/SecurityTest.php— regression coverage for both findings plus sanity checks.docs/USAGE.md— full usage guide, secure PSL fetch, caching, API reference, error handling.SECURITY.md— supported-versions table updated; new "Known insecure versions" subsection naming v3.0.0.CHANGELOG.md—[3.0.1]entry withSecurity/Deprecated/Docssubsections.composer.json— bump to3.0.1.Deprecation
v3.0.0 is deprecated. Consumers should upgrade to v3.0.1.
Test plan
./vendor/bin/phpcs— clean./vendor/bin/phpstan analyse—[OK] No errors./vendor/bin/pest— 36 passing, 4 pre-existing IDN failures unchanged fromorigin/3.xbaseline (these are aparse_url-corrupts-UTF-8 issue separate from this PR)validate('evil.example.com/?u=http://x')->toString()returnsevil.example.com; existingcompass.comsubstring case still resolves correctlyNotes for the maintainer
v3.0.1(annotated) and publish a GitHub Release with notes mirroring the CHANGELOG.parse_urlcorrupting UTF-8 multi-byte hosts) — that's a correctness bug that shipped in v3.0.0 but is not what this PR is about.