Thursday, 15 January 2015

Extract some XML tags of a string with PHP -



Extract some XML tags of a string with PHP -

i have next function:

function translate($params) { $xmldata = '<?xml version="1.0" encoding="utf-8" ?><root>' . html_entity_decode($params['data']) . '</root>'; $lang = ucfirst(strtolower($params['lang'])); if (simplexml_load_string($xmldata) === false) { homecoming $params['data']; } else { $langxmlobj = new simplexmlelement($xmldata); if ($langxmlobj -> $lang) { homecoming ($langxmlobj -> $lang); } else { homecoming $params['data']; } } }

which works great strings :

$params['data'] = '<english>hello</english><french>bonjour</french>'; $params['lang'] = 'english'; print translate($params);

it outputs :

hello

but ...

when string has other tags in :

$params['data'] = '<english><h1>hello</h1></english><french><h1>bonjour</h1></french>'; $params['lang'] = 'english';

it doesn't output anything;

i wanted output :

<h1>hello</h1> or other tag within <languagequotes>

pulling hairs out here; thought ?

version2:

it doesn't work when string like:

$data = '<french><li><span class="pull-right">25 gb</span>espace disque</french><english><li><span class="pull-right">25 gb</span>disk space</english> <french><li><span class="pull-right">yes</span>php 5, mysql 5</french><english><li><span class="pull-right">yes</span>php 5, mysql 5</english> <french><li><span class="pull-right">100</span>bases de donnĂ©es</french><english><li><span class="pull-right">100</span>databases</english> <french><li><span class="pull-right">∞</span>e-mails</french><english><li><span class="pull-right">∞</span>e-mails</english>';

you're problem has 2 parts.

load fragment tags xml document fetch info xml loading info xml

the main problem here is not valid xml fragment, mix of html fragments specific tags. fortunately domdocument can load (and repair) html. not load info utf-8 default, need add together meta-tag specifying encoding.

$data = '<french><li><span class="pull-right">25 gb</span>espace disque</french><english><li><span class="pull-right">25 gb</span>disk space</english> <french><li><span class="pull-right">yes</span>php 5, mysql 5</french><english><li><span class="pull-right">yes</span>php 5, mysql 5</english> <french><li><span class="pull-right">100</span>bases de donnĂ©es</french><english><li><span class="pull-right">100</span>databases</english> <french><li><span class="pull-right">∞</span>e-mails</french><english><li><span class="pull-right">∞</span>e-mails</english>'; $html_data = '<head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head> <body>'.$data.'</body>'; libxml_use_internal_errors(true); $dom = new domdocument(); $dom->loadhtml($html_data); $dom->formatoutput = true; echo $dom->savexml();

output:

<?xml version="1.0" encoding="utf-8" standalone="yes"?> <!doctype html public "-//w3c//dtd html 4.0 transitional//en" "http://www.w3.org/tr/rec-html40/loose.dtd"> <html> <body> <french> <li><span class="pull-right">25 gb</span>espace disque</li> </french> <english> <li><span class="pull-right">25 gb</span>disk space</li> </english> <french> <li><span class="pull-right">yes</span>php 5, mysql 5</li> </french> <english> <li><span class="pull-right">yes</span>php 5, mysql 5</li> </english> ... </body> </html>

as can see keeps language name elements, converts names lowercase. adds html , body elements if missing, not problem.

fetch info xml

now have dom can utilize xpath fetch nodes.

one possibility body element , import simplexml:

$xpath = new domxpath($dom); $root = simplexml_import_dom($xpath->evaluate('/html/body')->item(0)); var_dump($root);

output:

object(simplexmlelement)#4 (2) { ["french"]=> array(4) { [0]=> object(simplexmlelement)#3 (1) { ["li"]=> object(simplexmlelement)#12 (1) { ["span"]=> string(5) "25 gb" } } ... } ["english"]=> array(4) { [0]=> object(simplexmlelement)#5 (1) { ["li"]=> object(simplexmlelement)#12 (1) { ["span"]=> string(5) "25 gb" } } ...

or fetch nodes straight , save them html fragments:

$xpath = new domxpath($dom); $string = ''; foreach ($xpath->evaluate('/html/body/*[name() = "english"]/*') $node) { $string .= $dom->savehtml($node); } echo $string;

output:

<li> <span class="pull-right">25 gb</span>disk space</li><li> <span class="pull-right">yes</span>php 5, mysql 5</li><li> <span class="pull-right">100</span>databases</li><li> <span class="pull-right">∞</span>e-mails</li>

php xml simplexml

No comments:

Post a Comment