Updating large tables

14 Sep

After a group is added to the Lucene index, I want to update a column that is a flag to show thatthe record is done (TINYINT(1) UNSIGNED NOT NULL).

This is to support incremental indexing (and sothat if an indexing process fails or is interrupted, it will resume with minimal redundancy).

In this blog post I will try to outline a few strategies to minimize the impact in table availability while managing large data sets.

The fastest way to speed up the update query is to replace it with a bulk-insert operation.As callout A in Listing 1 shows, you can trick the WHILE loop into initially executing by including a meaningless SET statement right before the WHILE condition.After that, the WHILE condition is dependent on the UPDATE statement’s row count.(…) Table and/or index rebuilds may take a significant amount of time for a large table; and will temporarily require as much as double the disk space.If you can segment your data using, for example, sequential IDs you can update rows incrementally in batches.