perl - Nested Loop running very slowly -
i'm trying run programme check each line of 1 file against each line of sec file see if of elements match. each file around 200k lines.
what i've got far looks this;
#!/usr/bin/perl #gffgenefind.pl utilize strict; utilize warnings; die "snp gff\n" unless @argv == 4; open( $snp, "<", $argv[0] ) or die "can't open $:"; open( $gff, "<", $argv[1] ) or die "can't open $:"; open( $outg, ">", $argv[2] ); open( $outs, ">", $argv[3] ); $scaffold; $site; @snplines = <$snp>; @gfflines = <$gff>; foreach $snpline (@snplines) { @arr = split( /\t/, $snpline ); $scaffold = $arr[0]; $site = $arr[1]; foreach $line (@gfflines) { @arr1 = split( /\t/, $line ); if ( $arr1[3] <= $site , $site <= $arr1[4] , $arr1[0] eq $scaffold ) { print $outg "$line"; print $outs "$snpline"; } } }
file 1 (snp) looks scaffold_100 10689 c 0 0 0 0 0 0
file 2 (gff) looks scaffold_1 phytozomev10 gene 750912 765975 . - . id=carubv10008059m.g.v1.0;name=carubv10008059m.g
essentially, i'm looking see if first values match , if sec value snp within range defined on sec file (in case 750912
765975
)
i've seen nested loops avoided, , wondering if there's alternative way me through data.
thanks!
firstly - lose foreach
loop. reads whole file memory, when don't need to.
try instead:
while ( $snpline = <$snp> ) {
because reads line line.
generally - mixing array indicies , named variables bad style.
the core problem though because each line of first file, you're cycling all of sec file.
edit: note - because 'scaffold' isn't unique, amended accordingly
this seems place utilize hash. e.g.
my %sites; while ( <$snp> ) { ( $scaffold, $site ) = split ( /\t/ ); $sites{$site}{$scaffold}++ } while ( <$gff> ) { ( $name, $tmp1, $tmp2, $range_start, $range_end ) = split ( /\t/ ); if ( $sites{$name} ) { foreach $scaffold ( keys %{ $sites{$name} ) { if ( $scaffold > $range_start , $scaffold < $range_end ) { #do stuff it; print; } } } }
hopefully gist, if isn't you're after?
perl loops nested
No comments:
Post a Comment