Thursday 15 April 2010

.net - C#: poor performance when manipulating strings from big txt files -



.net - C#: poor performance when manipulating strings from big txt files -

i've got big txt file in have find records , rewrite them other file.

i made method this:

private arraylist getrelatingobjects(string[] ob, string tab) { arraylist rel = new arraylist(); foreach (string[] s in so) { foreach (string x in s) { if (x.length > 0) { if (x[0].equals('w')) { string xtemp = x.substring(x.indexof(',') + 1); xtemp = xtemp.substring(xtemp.indexof(',') + 1); xtemp = xtemp.substring(xtemp.indexof(',') + 1).replace(";", ""); string obtemp = ob[0].substring(ob[0].indexof(',', 3) + 1); obtemp = obtemp.substring(obtemp.indexof(',') + 1); obtemp = obtemp.substring(0, obtemp.indexof(',')); if (xtemp.equals(obtemp) && (x.substring(x.indexof(',', 3) + 1).contains(tab))) { if (!rel.contains(s) && !s[0].substring(x.indexof(',', 3) + 1).contains("g5zmn")) { rel.add(s); } } } } } } homecoming rel; }

so arraylist set arrays of records (there arrays db). here need find relating object chosen object.

the problem when utilize files 2mb it's not plenty fast. (i replaced splits functions substrings (they faster remove checked it).

but performance not still enough.

do have thought how can faster? of cpu powerfulness lose on substrings , replace still dunno if can faster.

it not 100% solution because have no description of elements want receive

private ienumerable<string> readdata(string filepath) { var res = new list<string>( ); var fileinfo = new fileinfo( filepath ); if( !fileinfo.exists ) { throw new argumentexception( "no file exist path " + filepath, "filepath" ); } var filestream = fileinfo.open( filemode.open, fileaccess.read ); var file = new streamreader( filestream, encoding.utf8 ); string lineoftext; while( ( lineoftext = file.readline( ) ) != null ) { var pattern = new regex( @"^[\w]{2},[\w]{0,},[\w]{1,},([\w]{1,})(?:,[\w]{0,},[\w]{1,}){0,1};$"); var match = pattern.match( lineoftext ); if( match.success ) { res.add( match.groups[ 0 ].value ); } else { // handle lines wrong format } } homecoming res; }

to explain pattern:

^ (anchor start of string) character in "\w" 2 times , character in "\w" @ to the lowest degree 0 times , character in "\w" @ to the lowest degree 1 times , capture => element result character in "\w" @ to the lowest degree 1 times end capture non-capturing group , character in "\w" @ to the lowest degree 0 times , character in "\w" @ to the lowest degree 1 times end capture @ to the lowest degree 0, not more 1 times ;$ (anchor end of string)

c# .net string

No comments:

Post a Comment