Wednesday, October 6, 2010

35 times faster with magic of awk

In any computer piece of work, 2 times faster is always a good thing, 5 times faster is pursued by perfectionist, 10 times faster is not something happens every day.

Fortunately I have a chance to make a job 35 times faster today.

Today's task is to restore a mysql database with millions rows from a mysql_dump. 
Unfortunately the dump is created by --skip-extended-insert option, it means only one row per INSERT INTO statement. Situation is getting worse that it's not possible to get another backup (with multiple rows in one insert) as the original database is gone.

Initial test shows that the restore is running at 20 rows / second, it means to completed the job, it may take  one or two days.

I don't want to wait that long, and try to write a simple awk script to convert the INSERTS into one statement.

awk '
BEGIN {i=0; batch=10;};
{
    if ($0 ~ /^INSERT INTO `.*` VALUES \(.*\);$/) {
        i++;
        if (i>1) {
            sub(/^INSERT INTO `.*` VALUES /,",",$0)
        };
        if (i>=batch) {
            i=0
        }
        else{
            sub(/;$/," ",$0);
        }
    }
        else{
            if (i>0){
            i = 0;
                        print ";"
                }
        }
    print $0
} ' /home/bmmk/tmp/test.sql

Actually I found it's a pretty simple script as soon as I decided to make my hands dirty.

The test result is pretty promising, now the restore runs at about 700 ~ 800 rows / second. I am much pleased by the result.

No comments:

Post a Comment