Tuesday, 15 June 2010

php - get title tag value using DOMDocument -



php - get title tag value using DOMDocument -

i want value of <title> tag pages of website. trying run script on website domain, , pages links on website , , titles of them.

this code:

$html = file_get_contents('http://xxxxxxxxx.com'); //create new dom document $dom = new domdocument; //parse html. @ used suppress parsing errors //that thrown if $html string isn't valid xhtml. @$dom->loadhtml($html); //get links. utilize other tag name here, //like 'img' or 'table', extract other tags. $links = $dom->getelementsbytagname('a'); //iterate on extracted links , display urls foreach ($links $link){ //extract , show "href" attribute. echo $link->nodevalue; echo $link->getattribute('href'), '<br>'; }

what is: <a href="z1.html">z2</a> z1.html , z2.... z1.html have title named z3. want z1.html , z3, not z2. can help me?

you need create own custom function , phone call in appropriate places , if need multiple tags pages in anchor tag, need create new custom function.

below code help started

$html = my_curl_function('http://www.anchorartspace.org/'); $doc = new domdocument(); @$doc->loadhtml($html); $mytag = $doc->getelementsbytagname('title'); //get , display need: $title = $mytag->item(0)->nodevalue; $links = $doc->getelementsbytagname('a'); //iterate on extracted links , display urls foreach ($links $link) { //extract , show "href" attribute. echo $link->nodevalue; echo "<br/>".'my anchor link : - ' . $link->getattribute('href') . "---title--->"; $a_html = my_curl_function($link->getattribute('href')); $a_doc = new domdocument(); @$a_doc->loadhtml($a_html); $a_html_title = $a_doc->getelementsbytagname('title'); //get , display need: $a_html_title = $a_html_title->item(0)->nodevalue; echo $a_html_title; echo '<br/>'; } echo "title: $title" . '<br/><br/>'; function my_curl_function($url) { $curl_handle = curl_init(); curl_setopt($curl_handle, curlopt_url, $url); curl_setopt($curl_handle, curlopt_connecttimeout, 2); curl_setopt($curl_handle, curlopt_returntransfer, 1); curl_setopt($curl_handle, curlopt_useragent, 'name'); $html = curl_exec($curl_handle); curl_close($curl_handle); homecoming $html; }

let me know if need more help

php html scrape

No comments:

Post a Comment