Tuesday 15 June 2010

database - Linux: Compare large files -



database - Linux: Compare large files -

i downloading .com zone file every day. it's list of .com domains in world primary nameserver.

sample of zone file:

daytonohiojobs ns ns1.hostingnet daytonohiojobs ns ns2.hostingnet daytonohiomap ns ns1.hostingnet daytonohiomap ns ns2.hostingnet daytonohionews ns ns1.hostingnet daytonohionews ns ns2.hostingnet

to save in disk space, can see .com has been removed domain name (it's .com anyway). same goes nameserver (if ends in .com has been removed).

this zone file around 270,000,000 lines , 9 gb.

my goal monitor specific nameserver. every day want list of domains specific nameserver, list of new domains nameserver (new in: yesterday domain didn't have nameserver yet).

i wrote perl script open , load "yesterdays" database , open "todays" database , loop , compare. takes hours , lots of memory.

what best way this?

here how it, judging know:

have script read first file. each line corresponds nameserver of interest, add together entry hashmap.

have script read sec file. each line corresponds nameserver of interest, check if entry in hashmap. if isn't, new. if is, unchanged - remove hashmap.

at end, entries still left in hashmap have been removed.

this assume hashmap particular nameservers domains fits memory, on reasonable machine , reasonable nameserver, seems reasonable assumption...

linux database perl large-files

No comments:

Post a Comment