Friday, April 12, 2013

My 5 minute Python data generator

I needed to do some database benchmarks on a 10GB dataset. The problem: I did not have the dataset in hand, and I needed to give an estimate to a customer before signing off a contract. No problemo: enter python and in 5 minutes I had flexible script to create me a dataset with table structure very similar to what the client had:

import csv
import random
import string

class SomeEntity( list ):
    titles = ( 'attr1', 'attr2' ) # ... for all columns
    def _init_( self ):
        self.append( random.randrange( 100, 10000 ) )
        self.append( ''.join(random.choice(string.ascii_lowercase + string.digits) for x in range (1000)) )
        # ... for all columns

myData = [ SomeEntity() for i in range(10000) ]
aFile= open( 'tmp.csv', 'wb' )
dest= csv.writer( aFile )
dest.writerow( SomeEntity.titles )   
dest.writerows( myData )
Post a Comment