php - preg_match_all regex fails when there are spaces -
i'm trying image urls html source code using next regex, fails when image url has spaces in it. illustration url:
<img src="http://a57.foxnews.com/global.fncstatic.com/static/managed/img/entertainment/876/493/kazantsev pinkish bikini reuters.jpg?ve=1&tl=1" alt="kazantsev pinkish bikini reuters.jpg" itemprop="image"> $image_regex_src_url = '/<img[^>]*'.'src=[\"|\'](.*)[\"|\']/ui'; preg_match_all($image_regex_src_url, $string, $out, preg_pattern_order);
this gives me following. http://a57.foxnews.com/global.fncstatic.com/static/managed/img/entertainment/876/493/kazantsev
is there way match character including whitespace? or have set in php configuration?
you have several issues regular expression.
first, trying utilize concatenation operator ('.'
) bring together both parts of look ( this not necessary ). secondly, don't need utilize alternation operator |
within of character classes.
the dot .
match character except newline sequence. possibility these tags perchance include line breaks since located in html source. utilize s
(dotall) modifier forces dot match character including line breaks or utilize negated character class meaning match character except.
using s
(dotall) modifier:
$image_regex_src_url = '/<img[^>]*src=(["\'])(.*?)\1/si';
using negated character class [^ ]
$image_regex_src_url = '/<img[^>]*src=(["\'])([^"\']*)\1/i';
although, much easier utilize parser such dom grab results.
$doc = new domdocument; @$doc->loadhtml($html); // load html foreach($doc->getelementsbytagname('img') $node) { $urls[] = $node->getattribute('src'); } print_r($urls);
php regex whitespace
No comments:
Post a Comment