Wednesday 15 January 2014

regex - How to parse unicode characters in UTF-8 HTML document with PHP -



regex - How to parse unicode characters in UTF-8 HTML document with PHP -

i have html file generated google next headings,

<!doctype html><html><head><title>ddd</title><meta http-equiv="content-type" content="text/html;charset=utf-8"> <meta http-equiv="x-ua-compatible" content="ie=edge"> <meta name="viewport" content="width=device-width,initial-scale=1,minimum-scale=1,maximum-scale=2">

and utilize next pattern match text contains unicode (chinese , special characters).

$pattern_title = '/class=\"text1t\">[\’\w\s\:\d]+/u';

i know can utilize "u" enable uniform matching in php utf-8 compatible documents.however, though utf-8 document, there wrong here. when run php code , parse online html page (without saving contents in computer), not match due "u" letter. when remove "u", code works fine fails match chinese characters. copied html contents , stored them within string variable php code , saved file. run code "u" , works fine.

so, have no thought how prepare problem. there post in stackoverflow converting non utf-8 utf-8 in php, used no difference @ all. html code generated google.

any idea? in advance.

php regex

No comments:

Post a Comment