MDEV-38412 System tablespace fails to shrink due to legacy tables#4884
MDEV-38412 System tablespace fails to shrink due to legacy tables#4884Thirunarayanan merged 1 commit into11.4from
Conversation
|
|
| DBUG_RETURN(2); | ||
| } | ||
|
|
||
| static bool should_drop_garbage_table(const rec_t* rec, ulint len) |
There was a problem hiding this comment.
This is missing noexcept, and it’d be better to use size_t instead of the alias ulint. There is no comment explaining the return value and the parameters.
| @param[in] rec SYS_TABLES record containing the table name | ||
| @param[in] len length of the table name | ||
| @return true if the table should be dropped, false otherwise */ | ||
| static bool should_drop_unknown_table(const rec_t* rec, ulint len) |
There was a problem hiding this comment.
Missing noexcept. A space around the * is misplaced. Please, no [in] in new code. The @return comment should not mention individual return values; we have @retval for that.
| const byte *field= rec_get_nth_field_old(rec, DICT_FLD__SYS_TABLES__ID, &id_len); | ||
| if (dict_sys.is_sys_table(mach_read_from_8(field))) | ||
| return false; |
There was a problem hiding this comment.
The if expression is missing id_len != 8 ||.
| if (dict_table_t *table= dict_sys.load_table( | ||
| {reinterpret_cast<const char*>(pcur.old_rec), len}, | ||
| DICT_ERR_IGNORE_DROP)) |
There was a problem hiding this comment.
What if we have some entries in SYS_INDEXES but the table cannot be loaded because the entries in SYS_TABLES, SYS_COLUMNS, SYS_FIELDS are incorrect? In that case, we would fail to drop the table.
Do we really have to load a table definition into the cache? Would it suffice to unconditionally execute some simpler SQL to delete the corresponding entries from SYS_TABLES, SYS_INDEXES, SYS_FIELDS, SYS_COLUMNS, during a slow shutdown when :autoshrinkis enabled? What really matters here is a call todict_drop_index_tree(). That could be executed by row_purge_remove_clust_if_poss_low()` as part of the slow shutdown.
The basic algorithm would be like this:
- Collect the table ID from
SYS_TABLESand report the table names. (Check forSYS_TABLES.SPACE=0directly from each record.) - For each table ID, execute the SQL to
DELETE FROM SYS_… WHERE SPACE=0 AND TABLE_ID=:id. - Let the purge run into completion. It will take care of invoking
dict_drop_index_tree(), and it’s already checking forSYS_INDEXES.PAGE=FIL_NULLthere.
| if (err == DB_SUCCESS) | ||
| { | ||
| trx->commit(); | ||
| ut_ad(deleted.empty()); |
There was a problem hiding this comment.
deleted.empty() should hold irrespective of the err value.
| for (pfs_os_file_t d : deleted) | ||
| os_file_close(d); |
There was a problem hiding this comment.
There should be no handles of deleted files to close, because these tables are located in the system tablespace (fil_system.sys_space), which will not be deleted.
| sql_print_information("InnoDB: Dropping the unknown table %.*s", | ||
| static_cast<int>(len), rec); |
There was a problem hiding this comment.
int(len) is shorter and equivalent to static_cast<int>(len).
9d68cf6 to
c99072c
Compare
c99072c to
56fabb7
Compare
| using sys_callback_t = std::function<bool(table_id_t table_id, | ||
| const byte* table_name, | ||
| ulint name_len)>; |
There was a problem hiding this comment.
Let’s not make the MDEV-25861 situation worse by adding even more declarations ending in _t.
I would suggest a cleaner API:
using SysCallback= dberr_t(table_id_t, st_::span<const char>);There was a problem hiding this comment.
I initially used a raw function pointer (sysCallback*) to avoid the overhead
of std::function. However, this approach has a significant limitation like function
pointers cannot accept lambdas with captures (e.g., [&orphaned_tbl]), which are needed
in both call sites to collect results. I had to make stateless lambdas with static std::vector
orphaned_tbl.
| static dberr_t scan_system_tablespace_tables(sys_callback_t callback) noexcept | ||
| { | ||
| ut_ad(callback); |
There was a problem hiding this comment.
Why not __attribute__((nonnull))?
There was a problem hiding this comment.
using sysCallback= std::function<dberr_t(table_id_t, st_::span<const char>)>;
static dberr_t scan_system_tablespace_tables(sysCallback callback) noexcept
{
if (UNIV_UNLIKELY(!callback))
return DB_CORRUPTION;
}
std::function provides operator bool() to check if it contains a valid callable target
so avoided _attribute(nonnull)
| not contain '/' (indicating it's not a proper database/table name). | ||
| @retval DB_SUCCESS_LOCKED_REC if legacy table exists | ||
| @return error code or DB_SUCCESS */ | ||
| static dberr_t drop_all_orphaned_tables() |
There was a problem hiding this comment.
Instead of exporting the low-level function dict_drop_table_metadata(), I think that it would be more appropriate to define this function and some of its static dependencies in dict/drop.cc. This should be the non-static entry point that would be invoked from fsp_system_tablespace_truncate().
Maybe you considered this, but it got too complex.
There was a problem hiding this comment.
Yes, it makes it complex. I stick with simple approach.
56fabb7 to
509b3b2
Compare
dr-m
left a comment
There was a problem hiding this comment.
I think that we must also cover a scenario where a SYS_TABLES record is missing but we have some garbage associated with SYS_INDEXES records.
509b3b2 to
bc8f166
Compare
| field= rec_get_nth_field_old(rec, DICT_FLD__SYS_TABLES__ID, &len); | ||
|
|
||
| field_no= scan_indexes ? ulint(DICT_FLD__SYS_INDEXES__TABLE_ID) | ||
| : ulint(DICT_FLD__SYS_TABLES__ID); |
There was a problem hiding this comment.
There is no need to change the generated code. We only need the following:
static_assert(DICT_FLD__SYS_INDEXES__TABLE_ID == DICT_FLD__SYS_TABLES__ID, "");Both are the first column.
There was a problem hiding this comment.
How both are same?
enum dict_fld_sys_tables_enum {
DICT_FLD__SYS_TABLES__NAME = 0,
DICT_FLD__SYS_TABLES__DB_TRX_ID = 1,
DICT_FLD__SYS_TABLES__DB_ROLL_PTR = 2,
DICT_FLD__SYS_TABLES__ID = 3,
/* The field numbers in the SYS_INDEXES clustered index */
enum dict_fld_sys_indexes_enum {
DICT_FLD__SYS_INDEXES__TABLE_ID = 0,
DICT_FLD__SYS_INDEXES__ID = 1,
DICT_FLD__SYS_INDEXES__DB_TRX_ID = 2,
There was a problem hiding this comment.
Oh, sorry, my mistake. I forgot that unfortunately, SYS_TABLES.ID is not the primary key, but a unique secondary index. Lots of things (including the recovery and rollback of RENAME TABLE) would have been simpler in that case.
| std::unordered_set<table_id_t> orphaned_tbl; | ||
| std::unordered_set<table_id_t> valid_sys_tbl; | ||
|
|
||
| /* Step 1: Scan SYS_TABLES for legacy tables */ | ||
| dberr_t err= scan_system_tablespace_metadata(dict_sys.sys_tables, | ||
| [&](table_id_t table_id, st_::span<const char> name) -> dberr_t | ||
| { | ||
| valid_sys_tbl.insert(table_id); |
There was a problem hiding this comment.
Instead of having two hash tables, could we have just one std::unordered_map<const table_id_t,bool> where the value indicates whether the table needs to be dropped?
| err= scan_system_tablespace_metadata(dict_sys.sys_indexes, | ||
| [&](table_id_t table_id, st_::span<const char>) -> dberr_t | ||
| { | ||
| if (orphaned_tbl.count(table_id) || valid_sys_tbl.count(table_id)) | ||
| return DB_SUCCESS; |
There was a problem hiding this comment.
With my idea of std::unordered_map<const table_id_t,bool> found_tables this could become a lot simpler:
if (!found_tables.emplace({table_id, true}).second)
return DB_SUCCESS;However, is there any way that would avoid all tables to be registered? There could be billions of entries. Should the SYS_INDEXES scan be enabled by an additional parameter, so that we can avoid the excessive memory allocation?
There was a problem hiding this comment.
Sorry, I realized that we only store the table_id for all tables that are defined in the system tablespace. We should be fine. The number of such tables should be less than 10. That number won’t include the 4 hard-coded tables SYS_TABLES, SYS_INDEXES, SYS_FIELDS, SYS_COLUMNS, because their metadata is not stored in SYS_TABLES or SYS_INDEXES.
bc8f166 to
d7432eb
Compare
dr-m
left a comment
There was a problem hiding this comment.
Apparently, this branch was accidentally replaced with a fix of MDEV-39261.
d7432eb to
a68a249
Compare
dr-m
left a comment
There was a problem hiding this comment.
I implemented the idea that I suggested yesterday.
diff --git a/storage/innobase/fsp/fsp0fsp.cc b/storage/innobase/fsp/fsp0fsp.cc
index 670794244fe..111528c6fb8 100644
--- a/storage/innobase/fsp/fsp0fsp.cc
+++ b/storage/innobase/fsp/fsp0fsp.cc
@@ -5787,44 +5787,44 @@ tablespace.
@return error code or DB_SUCCESS */
static dberr_t drop_all_orphaned_tables()
{
- std::unordered_set<table_id_t> orphaned_tbl;
- std::unordered_set<table_id_t> valid_sys_tbl;
+ /* SELECT table_id,to_drop FROM SYS_TABLES WHERE SPACE=0 AND ... */
+ std::unordered_map<table_id_t,bool> system_tables;
/* Step 1: Scan SYS_TABLES for legacy tables */
dberr_t err= scan_system_tablespace_metadata(dict_sys.sys_tables,
[&](table_id_t table_id, st_::span<const char> name) -> dberr_t
{
- valid_sys_tbl.insert(table_id);
- if (!memchr(name.data(), '/', name.size()))
- {
+ const bool to_drop{!memchr(name.data(), '/', name.size())};
+ if (to_drop)
sql_print_information("InnoDB: Found an unknown table %.*s",
- (int)name.size(), name.data());
- orphaned_tbl.insert(table_id);
- }
+ int(name.size()), name.data());
+ system_tables.emplace(table_id, to_drop);
return DB_SUCCESS;
});
- if (err)
+ if (err != DB_SUCCESS)
return err;
/* Step 2: Scan SYS_INDEXES for orphaned indexes not in orphaned_tbl */
err= scan_system_tablespace_metadata(dict_sys.sys_indexes,
[&](table_id_t table_id, st_::span<const char>) -> dberr_t
{
- if (orphaned_tbl.count(table_id) || valid_sys_tbl.count(table_id))
- return DB_SUCCESS;
-
- sql_print_information("InnoDB: Found orphaned index for table_id "
- UINT64PF, table_id);
- orphaned_tbl.insert(table_id);
+ if (!system_tables.emplace(table_id, true).second)
+ sql_print_information("InnoDB: Found orphaned index for table_id "
+ UINT64PF, table_id);
return DB_SUCCESS;
});
- if (err || orphaned_tbl.empty())
+ if (err != DB_SUCCESS)
return err;
- sql_print_information("InnoDB: Dropping %zu orphaned table(s)",
- orphaned_tbl.size());
+ size_t count{0};
+ for (const auto &td : system_tables)
+ count+= td.second;
+ if (!count)
+ return DB_SUCCESS;
+
+ sql_print_information("InnoDB: Dropping %zu orphaned table(s)", count);
trx_t *trx= trx_create();
trx_start_for_ddl(trx);
@@ -5834,12 +5834,10 @@ static dberr_t drop_all_orphaned_tables()
dict_sys.lock(SRW_LOCK_CALL);
- for (table_id_t table_id : orphaned_tbl)
- {
- err= dict_drop_table_metadata(table_id, trx);
- if (err)
+ for (auto &td : system_tables)
+ if (td.second &&
+ (err= dict_drop_table_metadata(td.first, trx)) != DB_SUCCESS)
break;
- }
dict_sys.unlock();
The test innodb.sys_truncate_debug passes with this patch. Please include it.
Problem: ======= - InnoDB system tablespace fails to autoshrink when it contains legacy internal tables. These are non-user tables and internal table exist from older version. Because the current shrink logic does recognize these entries as user table, they block the defragmentation process required to reduce the tablespace size. Solution: ========= To enable successful shrinking, InnoDB has been updated to identify and remove these legacy entries during the startup: drop_all_orphaned_tables(): A new function that scans the InnoDB system tables for entries lacking the / naming convention. It triggers the removal of these table objects from InnoDB system tables and ensures the purge thread subsequently drops any associated index trees in SYS_INDEXES. scan_system_tablespace_metadata(): Scan SYS_TABLES or SYS_INDEXES and invoke callback for non-system entries in system tablespace. This function scans either SYS_TABLES or SYS_INDEXES (based on the table parameter) and invokes the provided callback for each non-system table or index found in the system tablespace (SPACE=0). dict_drop_table_metadata(): Function is to remove specific table IDs from internal system tables. fsp_system_tablespace_truncate(): If legacy tables are detected, InnoDB prioritizes their removal. To ensure data integrity and complete the shrink, a two-restart sequence may be required: 1) purge the legacy table 2) Defragment the system tablespace and shrink the system tablespace further Thanks Marko Mäkelä for the contribution of your patch in drop_orphaned_tables()
a68a249 to
0f60410
Compare
MDEV-38412 System tablespace fails to shrink due to legacy tables
Problem:
legacy internal tables. These are non-user tables and internal
table exist from older version. Because the current shrink logic
does recognize these entries as user table, they block the
defragmentation process required to reduce the tablespace size.
Solution:
To enable successful shrinking, InnoDB has been updated to
identify and remove these legacy entries during the startup:
drop_all_orphaned_tables(): A new function that scans the InnoDB
system tables for entries lacking the / naming convention.
It triggers the removal of these table objects from InnoDB system
tables and ensures the purge thread subsequently drops any
associated index trees in SYS_INDEXES.
scan_system_tablespace_tables(): Scan the records in
SYS_TABLES and invoke callback(orphaned record, user table exist check)
on non-system tables in system tablespace
dict_drop_table_metadata(): Function is to remove specific
table IDs from internal system tables.
fsp_system_tablespace_truncate(): If legacy tables are detected,
InnoDB prioritizes their removal. To ensure data integrity and
complete the shrink, a two-restart sequence may be required:
system tablespace further