Friday 15 June 2012

perl - Nested Loop running very slowly -



perl - Nested Loop running very slowly -

i'm trying run programme check each line of 1 file against each line of sec file see if of elements match. each file around 200k lines.

what i've got far looks this;

#!/usr/bin/perl #gffgenefind.pl utilize strict; utilize warnings; die "snp gff\n" unless @argv == 4; open( $snp, "<", $argv[0] ) or die "can't open $:"; open( $gff, "<", $argv[1] ) or die "can't open $:"; open( $outg, ">", $argv[2] ); open( $outs, ">", $argv[3] ); $scaffold; $site; @snplines = <$snp>; @gfflines = <$gff>; foreach $snpline (@snplines) { @arr = split( /\t/, $snpline ); $scaffold = $arr[0]; $site = $arr[1]; foreach $line (@gfflines) { @arr1 = split( /\t/, $line ); if ( $arr1[3] <= $site , $site <= $arr1[4] , $arr1[0] eq $scaffold ) { print $outg "$line"; print $outs "$snpline"; } } }

file 1 (snp) looks scaffold_100 10689 c 0 0 0 0 0 0 file 2 (gff) looks scaffold_1 phytozomev10 gene 750912 765975 . - . id=carubv10008059m.g.v1.0;name=carubv10008059m.g

essentially, i'm looking see if first values match , if sec value snp within range defined on sec file (in case 750912 765975)

i've seen nested loops avoided, , wondering if there's alternative way me through data.

thanks!

firstly - lose foreach loop. reads whole file memory, when don't need to.

try instead:

while ( $snpline = <$snp> ) {

because reads line line.

generally - mixing array indicies , named variables bad style.

the core problem though because each line of first file, you're cycling all of sec file.

edit: note - because 'scaffold' isn't unique, amended accordingly

this seems place utilize hash. e.g.

my %sites; while ( <$snp> ) { ( $scaffold, $site ) = split ( /\t/ ); $sites{$site}{$scaffold}++ } while ( <$gff> ) { ( $name, $tmp1, $tmp2, $range_start, $range_end ) = split ( /\t/ ); if ( $sites{$name} ) { foreach $scaffold ( keys %{ $sites{$name} ) { if ( $scaffold > $range_start , $scaffold < $range_end ) { #do stuff it; print; } } } }

hopefully gist, if isn't you're after?

perl loops nested

No comments:

Post a Comment