Monday 15 April 2013

Regex to parse and replace img src in C#/.NET? -



Regex to parse and replace img src in C#/.NET? -

ahoy,

i have problem, see; have strings like:

<img width="594" height="392" src="/sites/it_kb/siteassets/pages/exploding%20the%20vdi%20vdesktop/vdi3.png" alt="" style="margin:5px;width:619px;height:232px" />

they not consistently formatted.

i need parse strings this, , homecoming following:

<img width="594" height="392" src="/exploding%20the%20vdi%20vdesktop-vdi3.png" alt="" style="margin:5px;width:619px;height:232px" />

changes:

remove except immediate directory in image file lay. instead of directory beingness subdirectory, prepend onto file name.

so if file in /blabla/bla/blaaaaah/pickles/pickle.png

then want img src attribute pickles-pickle.png

now, i've been trying regex, after 3 hours, i've discovered myself... awful @ regex. @ weeks, , i'd never anywhere.

thus, asking wonderful community 2 things:

how this? regex right answer? need able parse src attributes within img tags (whether or not have height/width or other attributes). what resources recommend me larn regex .net?

now problem @ hand, suppose string.replace i....

find img tag, , indexes of surrounding '<' , '>' find index of 'src=' , ' ' (space) between 2 instances find lastly index of '/' between src , space indexes find sec lastly index of '/' between src , space indexes replace... er no, remove... before sec lastly instance of '/'... ...string.replace remaining '/' '-'. ....i.. think that'd it?

but damn ugly. regex much prettier, don't think?

any advice?

note: tagged 'homework', it's not homework. i'm volunteering work after-hours save company 200k. literally lastly piece of incredibly convoluted (to me) puzzle. of course, don't see penny of 200k, doing it.

to tag, suggest using htmlagilitypack. it's safer regex on entire html page.

use image nodes:

htmldocument doc = new htmldocument(); doc.loadhtml(html); var imgs = doc.documentnode.selectnodes("//img");

use get/set attributes:

foreach (var img in imgs) { string orig = img.attributes["src"].value; //do replacements on orig new string, newsrc img.setattributevalue("src",newsrc); }

so, kind of replacements should do? agree using regex much more elegant. things these it's after all!

something should trick:

string s = @"/sites/it_kb/siteassets/pages/exploding%20the%20vdi%20vdesktop/vdi3.png"; string n = regex.replace(s,@"(.*?)\/([^\/]*?)\/([^\/]*?)$",@"/$2-$3");

some resources can utilize larn c# regexing:

dotnetperls regex.match

msdn: regex.match method

msdn regex cheat sheet

c# regex image expression

No comments:

Post a Comment