Monday, October 25, 2010

exclude svn folder and backup files when grep recursively

grep is my good friend, I use it at least 50 times every day in my work. Usually I am using it to search the code base a lot. It's a great tool, better than any text search tool I have ever used.

A common issue when you search the code base is the svn folder or backup files shows up as well as the result you are interested in.

A solution I used before is to pipe the result into another grep and use -v to filter out svn results, like following :

grep -R pattern path/to/search/ |grep -v svn
It works fine, but have two issues, it scan all svn folders  (usually it's a not problem, but if you have huge code base like me to work on, it's a pain), secondly, it ruins the wonderful colour highlight grep give to us as a free gift.

A better solution would be using the 'exclude' and 'exclude-dir' options from grep, like following :

grep --exclude-dir=\.svn --exclude=\*~ pattern path/to/search/ -Rn

It's fantastic,  it exclude all .svn folders, and also exclude all files ended '~' (some editor use it a backup file), and it also keep the colour highlighted result.


One step further, I added it to the .bashrc file, so it becomes the default setting and I don't need to type that much :

export GREP_OPTIONS="--exclude-dir=\.svn --exclude=\*~"

Bingo,  that's a perfect solution in my mind.

Wednesday, October 6, 2010

35 times faster with magic of awk

In any computer piece of work, 2 times faster is always a good thing, 5 times faster is pursued by perfectionist, 10 times faster is not something happens every day.

Fortunately I have a chance to make a job 35 times faster today.

Today's task is to restore a mysql database with millions rows from a mysql_dump. 
Unfortunately the dump is created by --skip-extended-insert option, it means only one row per INSERT INTO statement. Situation is getting worse that it's not possible to get another backup (with multiple rows in one insert) as the original database is gone.

Initial test shows that the restore is running at 20 rows / second, it means to completed the job, it may take  one or two days.

I don't want to wait that long, and try to write a simple awk script to convert the INSERTS into one statement.

awk '
BEGIN {i=0; batch=10;};
{
    if ($0 ~ /^INSERT INTO `.*` VALUES \(.*\);$/) {
        i++;
        if (i>1) {
            sub(/^INSERT INTO `.*` VALUES /,",",$0)
        };
        if (i>=batch) {
            i=0
        }
        else{
            sub(/;$/," ",$0);
        }
    }
        else{
            if (i>0){
            i = 0;
                        print ";"
                }
        }
    print $0
} ' /home/bmmk/tmp/test.sql

Actually I found it's a pretty simple script as soon as I decided to make my hands dirty.

The test result is pretty promising, now the restore runs at about 700 ~ 800 rows / second. I am much pleased by the result.