Add opt-in URL replacement mode for faster, safer database migrations#232
Add opt-in URL replacement mode for faster, safer database migrations#232swissspidy wants to merge 3 commits intomainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a 'smart URL mode' for the search-replace command, which optimizes performance by automatically skipping columns unlikely to contain URLs, such as numeric, date, or status-related fields. It also adds validation to prevent malformed URL replacements that could cause issues in PHP 8.0+. My feedback highlights that the current regex validation for URLs might be overly restrictive and could lead to false positives for legitimate URLs containing certain characters in their path or query parameters; I recommend exploring a more robust validation approach.
| if ( preg_match( '/[;,\s\t\r\n]/', $new, $matches ) ) { | ||
| WP_CLI::error( sprintf( "The replacement string contains characters that are invalid in a URL (e.g., '%s'). This can cause fatal errors in PHP 8.0+.", $matches[0] ) ); | ||
| } |
There was a problem hiding this comment.
The regex used for validation is slightly restrictive and might flag valid URLs that happen to contain these characters in the path or query parameters. While it prevents fatal errors, it may cause false positives for legitimate URL replacements. Consider if a more robust URL validation approach is possible.
There was a problem hiding this comment.
Pull request overview
Adds an opt-in URL-focused mode to wp search-replace intended to speed up common URL migrations and prevent dangerous malformed URL replacements.
Changes:
- Introduces
--type=url(smart URL mode) plus--analyze-tablesfor schema-based column skipping. - Adds a new
Non_URL_Columnshelper to define static and dynamic “non-URL” column detection. - Expands Behat coverage with a new
search-replace-url.featuresuite and adds a malformed-replacement scenario.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
src/WP_CLI/SearchReplace/Non_URL_Columns.php |
New helper for static core skip list + datatype/pattern-based column skipping. |
src/Search_Replace_Command.php |
Wires --type=url / --analyze-tables into command flow, adds validation + skip merging and table analysis logic. |
features/search-replace.feature |
Adds a scenario asserting malformed replacement prevention behavior. |
features/search-replace-url.feature |
Adds URL-mode BDD suite covering skipping behavior, multisite, analysis mode, and serialization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 'link_rating', | ||
| 'link_updated', | ||
| 'link_rel', | ||
| 'link_rss', |
There was a problem hiding this comment.
get_core_columns() includes to_ping, pinged, and link_rss as “non-URL” columns, but these fields can store URLs in WordPress core (ping URLs and RSS URLs). In --type=url mode this would skip legitimate URL replacements. These columns should be removed from the non-URL skip list (or otherwise handled so URL-containing values are not skipped).
| 'link_rss', |
| /** | ||
| * Get the list of columns that never contain URLs in WordPress core tables. | ||
| * | ||
| * @return string[] List of column names to skip. | ||
| */ | ||
| public static function get_core_columns() { | ||
| return array( | ||
| // wp_posts table - Status, type, and metadata columns | ||
| 'ID', | ||
| 'post_author', | ||
| 'post_date', | ||
| 'post_date_gmt', | ||
| 'post_status', | ||
| 'comment_status', | ||
| 'ping_status', | ||
| 'post_password', | ||
| // Note: post_name is a slug (not a full URL) in normal WordPress usage. | ||
| // In rare edge cases (e.g. imports) it may contain URL-like strings, but we | ||
| // still treat it as non-URL for search/replace to keep this optimization simple. | ||
| 'post_name', | ||
| 'to_ping', | ||
| 'pinged', | ||
| 'post_modified', | ||
| 'post_modified_gmt', | ||
| 'post_parent', | ||
| 'menu_order', | ||
| 'post_type', | ||
| 'post_mime_type', | ||
| 'comment_count', | ||
|
|
||
| // wp_postmeta table | ||
| 'meta_id', | ||
| 'post_id', | ||
|
|
||
| // wp_comments table - IDs, status, type, and dates | ||
| 'comment_ID', | ||
| 'comment_post_ID', | ||
| 'comment_date', | ||
| 'comment_date_gmt', | ||
| 'comment_karma', | ||
| 'comment_approved', | ||
| 'comment_type', | ||
| 'comment_parent', | ||
| 'user_id', | ||
|
|
||
| // wp_commentmeta table | ||
| 'comment_id', | ||
|
|
||
| // wp_users table - User metadata and status | ||
| 'user_login', | ||
| 'user_pass', | ||
| 'user_nicename', | ||
| 'user_email', | ||
| 'user_registered', | ||
| 'user_status', | ||
| 'display_name', | ||
|
|
||
| // wp_usermeta table | ||
| 'umeta_id', | ||
|
|
||
| // wp_terms table | ||
| 'term_id', | ||
| 'slug', | ||
| 'term_group', | ||
|
|
||
| // wp_term_taxonomy table | ||
| 'term_taxonomy_id', | ||
| 'taxonomy', | ||
| 'parent', | ||
| 'count', | ||
|
|
||
| // wp_term_relationships table | ||
| 'object_id', | ||
| 'term_order', | ||
|
|
||
| // wp_options table | ||
| 'option_id', | ||
| 'autoload', | ||
|
|
||
| // wp_links table |
There was a problem hiding this comment.
get_core_columns() is documented as “columns … in WordPress core tables”, but the values are unqualified column names (e.g. ID, parent, count, email). Because --skip-columns applies globally unless table-qualified, enabling --type=url will also skip columns with these names in custom/plugin tables, which can lead to missed replacements outside core. Consider returning table-qualified names for core tables (e.g. wp_posts.ID) or adjusting the implementation/docs to make the scope explicit and avoid skipping non-core columns unintentionally.
| $is_url_mode = 'url' === Utils\get_flag_value( $assoc_args, 'type' ); | ||
| $analyze_tables = Utils\get_flag_value( $assoc_args, 'analyze-tables', false ); | ||
|
|
There was a problem hiding this comment.
If the user passes --type=<type> with an unsupported value (anything other than url), the command silently falls back to normal behavior. Given the help text lists valid options, it would be clearer to error out when --type is set but unrecognized, to avoid users thinking they’re in an optimized/validated mode when they aren’t.
| $is_url_mode = 'url' === Utils\get_flag_value( $assoc_args, 'type' ); | |
| $analyze_tables = Utils\get_flag_value( $assoc_args, 'analyze-tables', false ); | |
| $type = Utils\get_flag_value( $assoc_args, 'type', null ); | |
| $is_url_mode = 'url' === $type; | |
| $analyze_tables = Utils\get_flag_value( $assoc_args, 'analyze-tables', false ); | |
| if ( null !== $type && ! $is_url_mode ) { | |
| WP_CLI::error( sprintf( "Invalid value for --type: '%s'. Supported values: url.", $type ) ); | |
| } |
| // Issue #231: Validate replacement URL for illegal cookie path characters. | ||
| // We do not strictly validate the search string, as users often search for non-schemed domains (e.g. 'example.com'). | ||
| if ( preg_match( '/[;,\s\t\r\n]/', $new, $matches ) ) { | ||
| WP_CLI::error( sprintf( "The replacement string contains characters that are invalid in a URL (e.g., '%s'). This can cause fatal errors in PHP 8.0+.", $matches[0] ) ); |
There was a problem hiding this comment.
The URL-mode validation error text says the character is “invalid in a URL”, but the rationale given is preventing PHP 8+ setcookie() failures (cookie-path invalid characters). Also, interpolating $matches[0] directly can produce hard-to-read output for whitespace/control characters (e.g. newline). Consider updating the message to reference cookie path / WordPress breakage, and formatting the matched character in an escaped/printable form (e.g. JSON-escaped or showing its ASCII code).
| WP_CLI::error( sprintf( "The replacement string contains characters that are invalid in a URL (e.g., '%s'). This can cause fatal errors in PHP 8.0+.", $matches[0] ) ); | |
| $invalid_character = json_encode( $matches[0] ); | |
| if ( false === $invalid_character ) { | |
| $invalid_character = sprintf( 'ASCII %d', ord( $matches[0] ) ); | |
| } | |
| WP_CLI::error( | |
| sprintf( | |
| 'The replacement string contains a character that is invalid in a cookie path used by WordPress cookies (%s). This can break WordPress and cause setcookie() failures in PHP 8.0+.', | |
| $invalid_character | |
| ) | |
| ); |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Why
URL migrations (like moving from a staging to a production domain) are by far the most common use case for
wp search-replace. However, standard search-replace has two main pain points for URLs:post_type,post_status,user_pass, etc.), making large database migrations painfully slow.http;//example.com) creates malformed entries inwp_options(siteurl,home). In PHP 8.0+, this causes an immediate fatalValueErroracross the entire site when WordPress attempts to usesetcookie().What this does
This PR introduces a dedicated, opt-in URL mode via the new
--type=urlflag (addressing the request in #186).--type=urlis active, the command automatically skips ~75 WordPress core columns known not to contain URLs. For large databases, this can reduce scan times from ~30 minutes to under 10 minutes (as noted in Smarter replacement checks for columns #194).--analyze-tablesflag that can be used alongside--type=url. This dynamically inspects the MySQL table schemas to skip non-text columns (integers, enums, dates, blobs) and columns matching common patterns (like*_idor*_count) in custom plugin tables.;,,, spaces, etc.). If detected, it immediately aborts with a helpful error message before modifying the database.Architecture & Difference from #209
This builds upon the excellent groundwork laid out in PR #209, but makes crucial architectural changes to address the maintainer feedback in that thread:
http://. It preserves standard WP-CLI behavior entirely. Users must explicitly opt-in. This ensures backward compatibility.example.com) by removing the strictFILTER_VALIDATE_URLrequirement on the search string, making it much more user-friendly for standard migration workflows.--smart-urlflag with a generic--type=<type>flag, paving the way for future profiles such as--type=emailor--type=username.Testing
Includes a comprehensive 600+ line BDD feature test suite (
search-replace-url.feature) covering multisite setups, serialized data, dynamic datatype skips, and validation errors.Fixes #194
Fixes #186
Fixes #231
Supersedes #209