Skip to content

Add opt-in URL replacement mode for faster, safer database migrations#232

Draft
swissspidy wants to merge 3 commits intomainfrom
fix/search-replace-url
Draft

Add opt-in URL replacement mode for faster, safer database migrations#232
swissspidy wants to merge 3 commits intomainfrom
fix/search-replace-url

Conversation

@swissspidy
Copy link
Copy Markdown
Member

Why

URL migrations (like moving from a staging to a production domain) are by far the most common use case for wp search-replace. However, standard search-replace has two main pain points for URLs:

  1. Performance (Smarter replacement checks for columns #194): It wastes massive amounts of time scanning core columns that never contain URLs (like post_type, post_status, user_pass, etc.), making large database migrations painfully slow.
  2. Safety (wp search-replace silently writes malformed URLs causing fatal error on next page load #231): It does not validate replacement strings. A simple typo in the replacement URL (e.g. http;//example.com) creates malformed entries in wp_options (siteurl, home). In PHP 8.0+, this causes an immediate fatal ValueError across the entire site when WordPress attempts to use setcookie().

What this does

This PR introduces a dedicated, opt-in URL mode via the new --type=url flag (addressing the request in #186).

  1. Performance Optimization: When --type=url is active, the command automatically skips ~75 WordPress core columns known not to contain URLs. For large databases, this can reduce scan times from ~30 minutes to under 10 minutes (as noted in Smarter replacement checks for columns #194).
  2. Advanced Table Analysis: Adds an --analyze-tables flag that can be used alongside --type=url. This dynamically inspects the MySQL table schemas to skip non-text columns (integers, enums, dates, blobs) and columns matching common patterns (like *_id or *_count) in custom plugin tables.
  3. Fatal Error Prevention: In URL mode, the command validates the replacement string against illegal cookie-path characters (;, ,, spaces, etc.). If detected, it immediately aborts with a helpful error message before modifying the database.

Architecture & Difference from #209

This builds upon the excellent groundwork laid out in PR #209, but makes crucial architectural changes to address the maintainer feedback in that thread:

  • No auto-detection: The command does not automatically force URL mode just because a search string starts with http://. It preserves standard WP-CLI behavior entirely. Users must explicitly opt-in. This ensures backward compatibility.
  • Relaxed Search Validation: It allows users to search for domains without protocols (e.g. example.com) by removing the strict FILTER_VALIDATE_URL requirement on the search string, making it much more user-friendly for standard migration workflows.
  • Future-proof API: Replaced the specific --smart-url flag with a generic --type=<type> flag, paving the way for future profiles such as --type=email or --type=username.

Testing

Includes a comprehensive 600+ line BDD feature test suite (search-replace-url.feature) covering multisite setups, serialized data, dynamic datatype skips, and validation errors.


Fixes #194
Fixes #186
Fixes #231
Supersedes #209

@swissspidy swissspidy added the command:search-replace Related to 'search-replace' command label Apr 29, 2026
@github-actions github-actions Bot added the scope:testing Related to testing label Apr 29, 2026
@swissspidy swissspidy requested a review from Copilot April 29, 2026 18:52
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a 'smart URL mode' for the search-replace command, which optimizes performance by automatically skipping columns unlikely to contain URLs, such as numeric, date, or status-related fields. It also adds validation to prevent malformed URL replacements that could cause issues in PHP 8.0+. My feedback highlights that the current regex validation for URLs might be overly restrictive and could lead to false positives for legitimate URLs containing certain characters in their path or query parameters; I recommend exploring a more robust validation approach.

Comment on lines +357 to +359
if ( preg_match( '/[;,\s\t\r\n]/', $new, $matches ) ) {
WP_CLI::error( sprintf( "The replacement string contains characters that are invalid in a URL (e.g., '%s'). This can cause fatal errors in PHP 8.0+.", $matches[0] ) );
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The regex used for validation is slightly restrictive and might flag valid URLs that happen to contain these characters in the path or query parameters. While it prevents fatal errors, it may cause false positives for legitimate URL replacements. Consider if a more robust URL validation approach is possible.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in URL-focused mode to wp search-replace intended to speed up common URL migrations and prevent dangerous malformed URL replacements.

Changes:

  • Introduces --type=url (smart URL mode) plus --analyze-tables for schema-based column skipping.
  • Adds a new Non_URL_Columns helper to define static and dynamic “non-URL” column detection.
  • Expands Behat coverage with a new search-replace-url.feature suite and adds a malformed-replacement scenario.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
src/WP_CLI/SearchReplace/Non_URL_Columns.php New helper for static core skip list + datatype/pattern-based column skipping.
src/Search_Replace_Command.php Wires --type=url / --analyze-tables into command flow, adds validation + skip merging and table analysis logic.
features/search-replace.feature Adds a scenario asserting malformed replacement prevention behavior.
features/search-replace-url.feature Adds URL-mode BDD suite covering skipping behavior, multisite, analysis mode, and serialization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

'link_rating',
'link_updated',
'link_rel',
'link_rss',
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_core_columns() includes to_ping, pinged, and link_rss as “non-URL” columns, but these fields can store URLs in WordPress core (ping URLs and RSS URLs). In --type=url mode this would skip legitimate URL replacements. These columns should be removed from the non-URL skip list (or otherwise handled so URL-containing values are not skipped).

Suggested change
'link_rss',

Copilot uses AI. Check for mistakes.
Comment on lines +87 to +166
/**
* Get the list of columns that never contain URLs in WordPress core tables.
*
* @return string[] List of column names to skip.
*/
public static function get_core_columns() {
return array(
// wp_posts table - Status, type, and metadata columns
'ID',
'post_author',
'post_date',
'post_date_gmt',
'post_status',
'comment_status',
'ping_status',
'post_password',
// Note: post_name is a slug (not a full URL) in normal WordPress usage.
// In rare edge cases (e.g. imports) it may contain URL-like strings, but we
// still treat it as non-URL for search/replace to keep this optimization simple.
'post_name',
'to_ping',
'pinged',
'post_modified',
'post_modified_gmt',
'post_parent',
'menu_order',
'post_type',
'post_mime_type',
'comment_count',

// wp_postmeta table
'meta_id',
'post_id',

// wp_comments table - IDs, status, type, and dates
'comment_ID',
'comment_post_ID',
'comment_date',
'comment_date_gmt',
'comment_karma',
'comment_approved',
'comment_type',
'comment_parent',
'user_id',

// wp_commentmeta table
'comment_id',

// wp_users table - User metadata and status
'user_login',
'user_pass',
'user_nicename',
'user_email',
'user_registered',
'user_status',
'display_name',

// wp_usermeta table
'umeta_id',

// wp_terms table
'term_id',
'slug',
'term_group',

// wp_term_taxonomy table
'term_taxonomy_id',
'taxonomy',
'parent',
'count',

// wp_term_relationships table
'object_id',
'term_order',

// wp_options table
'option_id',
'autoload',

// wp_links table
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_core_columns() is documented as “columns … in WordPress core tables”, but the values are unqualified column names (e.g. ID, parent, count, email). Because --skip-columns applies globally unless table-qualified, enabling --type=url will also skip columns with these names in custom/plugin tables, which can lead to missed replacements outside core. Consider returning table-qualified names for core tables (e.g. wp_posts.ID) or adjusting the implementation/docs to make the scope explicit and avoid skipping non-core columns unintentionally.

Copilot uses AI. Check for mistakes.
Comment thread src/Search_Replace_Command.php
Comment on lines +347 to +349
$is_url_mode = 'url' === Utils\get_flag_value( $assoc_args, 'type' );
$analyze_tables = Utils\get_flag_value( $assoc_args, 'analyze-tables', false );

Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user passes --type=<type> with an unsupported value (anything other than url), the command silently falls back to normal behavior. Given the help text lists valid options, it would be clearer to error out when --type is set but unrecognized, to avoid users thinking they’re in an optimized/validated mode when they aren’t.

Suggested change
$is_url_mode = 'url' === Utils\get_flag_value( $assoc_args, 'type' );
$analyze_tables = Utils\get_flag_value( $assoc_args, 'analyze-tables', false );
$type = Utils\get_flag_value( $assoc_args, 'type', null );
$is_url_mode = 'url' === $type;
$analyze_tables = Utils\get_flag_value( $assoc_args, 'analyze-tables', false );
if ( null !== $type && ! $is_url_mode ) {
WP_CLI::error( sprintf( "Invalid value for --type: '%s'. Supported values: url.", $type ) );
}

Copilot uses AI. Check for mistakes.
// Issue #231: Validate replacement URL for illegal cookie path characters.
// We do not strictly validate the search string, as users often search for non-schemed domains (e.g. 'example.com').
if ( preg_match( '/[;,\s\t\r\n]/', $new, $matches ) ) {
WP_CLI::error( sprintf( "The replacement string contains characters that are invalid in a URL (e.g., '%s'). This can cause fatal errors in PHP 8.0+.", $matches[0] ) );
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL-mode validation error text says the character is “invalid in a URL”, but the rationale given is preventing PHP 8+ setcookie() failures (cookie-path invalid characters). Also, interpolating $matches[0] directly can produce hard-to-read output for whitespace/control characters (e.g. newline). Consider updating the message to reference cookie path / WordPress breakage, and formatting the matched character in an escaped/printable form (e.g. JSON-escaped or showing its ASCII code).

Suggested change
WP_CLI::error( sprintf( "The replacement string contains characters that are invalid in a URL (e.g., '%s'). This can cause fatal errors in PHP 8.0+.", $matches[0] ) );
$invalid_character = json_encode( $matches[0] );
if ( false === $invalid_character ) {
$invalid_character = sprintf( 'ASCII %d', ord( $matches[0] ) );
}
WP_CLI::error(
sprintf(
'The replacement string contains a character that is invalid in a cookie path used by WordPress cookies (%s). This can break WordPress and cause setcookie() failures in PHP 8.0+.',
$invalid_character
)
);

Copilot uses AI. Check for mistakes.
Comment thread features/search-replace.feature
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

command:search-replace Related to 'search-replace' command scope:testing Related to testing

Projects

None yet

2 participants