Wednesday, 15 January 2014

php - preg_match_all regex fails when there are spaces -



php - preg_match_all regex fails when there are spaces -

i'm trying image urls html source code using next regex, fails when image url has spaces in it. illustration url:

<img src="http://a57.foxnews.com/global.fncstatic.com/static/managed/img/entertainment/876/493/kazantsev pinkish bikini reuters.jpg?ve=1&amp;tl=1" alt="kazantsev pinkish bikini reuters.jpg" itemprop="image"> $image_regex_src_url = '/<img[^>]*'.'src=[\"|\'](.*)[\"|\']/ui'; preg_match_all($image_regex_src_url, $string, $out, preg_pattern_order);

this gives me following. http://a57.foxnews.com/global.fncstatic.com/static/managed/img/entertainment/876/493/kazantsev

is there way match character including whitespace? or have set in php configuration?

you have several issues regular expression.

first, trying utilize concatenation operator ('.') bring together both parts of look ( this not necessary ). secondly, don't need utilize alternation operator | within of character classes.

the dot . match character except newline sequence. possibility these tags perchance include line breaks since located in html source. utilize s (dotall) modifier forces dot match character including line breaks or utilize negated character class meaning match character except.

using s (dotall) modifier:

$image_regex_src_url = '/<img[^>]*src=(["\'])(.*?)\1/si';

using negated character class [^ ]

$image_regex_src_url = '/<img[^>]*src=(["\'])([^"\']*)\1/i';

although, much easier utilize parser such dom grab results.

$doc = new domdocument; @$doc->loadhtml($html); // load html foreach($doc->getelementsbytagname('img') $node) { $urls[] = $node->getattribute('src'); } print_r($urls);

php regex whitespace

No comments:

Post a Comment