Skip to content

Fix GH-21738: undefined behavior in isxdigit() callsites with non-ASCII bytes#21861

Open
lacatoire wants to merge 2 commits intophp:masterfrom
lacatoire:fix/21738-url-decode-isxdigit-ub
Open

Fix GH-21738: undefined behavior in isxdigit() callsites with non-ASCII bytes#21861
lacatoire wants to merge 2 commits intophp:masterfrom
lacatoire:fix/21738-url-decode-isxdigit-ub

Conversation

@lacatoire
Copy link
Copy Markdown
Member

@lacatoire lacatoire commented Apr 24, 2026

Cast the byte to unsigned char before passing it to isxdigit().

<ctype.h> functions require their argument to be representable as unsigned char (0-255) or EOF; passing a sign-extended char for high-bit bytes like \x80 is undefined behavior.

Callsites fixed:

  • php_url_decode_ex and php_raw_url_decode_ex in ext/standard/url.c
  • quoted_printable_decode in ext/standard/quot_print.c
  • php_stripcslashes in ext/standard/string.c

Fixes GH-21738

The isxdigit() family requires its argument to be representable as
unsigned char (0-255) or EOF. Casting a signed char value holding a
high-bit byte (e.g. 0x80) to int produces a negative number (-128)
which triggers undefined behavior, and on some libc implementations
(e.g. NetBSD) can lead to out-of-bounds reads through the internal
character classification table.
Same signed-char UB as in url_decode: isxdigit() is called on byte
values from char-pointers (str_in in quoted_printable_decode,
source in php_stripcslashes), which sign-extend to negative int on
signed-char platforms when the byte has its high bit set.
@lacatoire lacatoire changed the title Fix GH-21738: undefined behavior in url_decode with non-ASCII bytes Fix GH-21738: undefined behavior in isxdigit() callsites with non-ASCII bytes Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UB in php_url_decode_ex

1 participant