Thursday 15 September 2011

regex - scraping JSON with PHP -



regex - scraping JSON with PHP -

i have done lot of html scraping using xpath. have scrape json , don't know how that. source want scrape :

{ "asin" : "b00dr4lyhy", "featurename" : "price_feature_div", "type" : "json", "value" : { "content" : {"price_feature_div":"<div id=\"price\" class=\"a-section a-spacing-small\">\n<table class=\"a-lineitem\">\n \n\t\t\n\t\t\n\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\n\n\n\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t \n\t\t \n\t\t\t\t \n\t\t \n\t\t\t\t \n\n\n\n\n\n\t\n<tr>\n <td class=\"a-color-secondary a-size-base a-text-right a-nowrap\">price:<\/td>\n <td class=\"a-span12\">\n <span id=\"priceblock_ourprice\" class=\"a-size-medium a-color-price\">$37.60<\/span>\n \n\n\n\n \n\n\n\n\n\n\n \n\n <span id=\"ourprice_shippingmessage\">\t\n \t\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n \n\n\t \n\t\t\n\t\t\n \n <span class=\"a-size-base a-color-base\">& <b>free shipping<\/b><\/span>\n \n \n \n\n\n\n <\/span>\n \n \n \n \n <\/td>\n<\/tr>\n\n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t \n\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\n\n\n\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\t\t\t\t\n\n\n\n\n\n\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\n\n\n\n\t\t\t\n\t\t\t\n\n\t\t\n\t\n\t\n\t\n\n \n \n\t\n<\/table>\n<\/div>"} } }

i code from:

$url = 'http://www.amazon.com/gp/twister/ajaxv2?sid=188-4344403-7969026&ptd=outerwear&json=1&dpxajaxflag=1&scac=1&isudpflag=1&twisterview=glance&ee=2&pgid=apparel_display_on_website&sr=1-3&nodeid=1036592&rid=0q05fxgqjsa20x44djvg&parentasin=b00dr4luqy&enpre=1&qid=1413775191&dstr=size_name%2ccolor_name&auiajax=1&storeid=apparel&psc=1&asinlist=b00dr4lyhy&isflushing=2&id=b00dr4lyhy&prefetchparam=0&mtype=full&dpenvironment=softlines';

what need cost ($37.60)

the code i'm using , provided venkata is:

$url = 'http://www.amazon.com/gp/twister/ajaxv2?sid=188-4344403-7969026&ptd=outerwear&json=1&dpxajaxflag=1&scac=1&isudpflag=1&twisterview=glance&ee=2&pgid=apparel_display_on_website&sr=1-3&nodeid=1036592&rid=0q05fxgqjsa20x44djvg&parentasin=b00dr4luqy&enpre=1&qid=1413775191&dstr=size_name%2ccolor_name&auiajax=1&storeid=apparel&psc=1&asinlist=b00dr4lyhy&isflushing=2&id=b00dr4lyhy&prefetchparam=0&mtype=full&dpenvironment=softlines'; $page = file_get_contents($url); $decoded = json_decode($page); $html = $decoded->value->content->price_feature_div; $dom = new domdocument(); $dom->loadhtml($html); $xpath = new domxpath($dom); //frem dom method $elements = $dom->getelementbyid("priceblock_ourprice")->item(0); //or utilize extract xpath below line $pricenode = $xpath->query("//*[@id='priceblock_ourprice']"); if (!is_null($elements)) { //$pricenode = $elements->item(0); $ourprice = $pricenode; echo $ourprice; }

i think best utilize regex should look like?

extraction php

$json_string = '{"asin" : "b00dr4lyhy","featurename" : "price_feature_div","type" : "json","value" : {"content" : {"price_feature_div":"<div id=\"price\" class=\"a-section a-spacing-small\">\n<table class=\"a-lineitem\">\n \n\t\t\n\t\t\n\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\n\n\n\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t \n\t\t \n\t\t\t\t \n\t\t \n\t\t\t\t \n\n\n\n\n\n\t\n<tr>\n <td class=\"a-color-secondary a-size-base a-text-right a-nowrap\">price:<\/td>\n <td class=\"a-span12\">\n <span id=\"priceblock_ourprice\" class=\"a-size-medium a-color-price\">$37.60<\/span>\n \n\n\n\n \n\n\n\n\n\n\n \n\n <span id=\"ourprice_shippingmessage\">\t\n \t\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n \n\n\t \n\t\t\n\t\t\n \n <span class=\"a-size-base a-color-base\">& <b>free shipping<\/b><\/span>\n \n \n \n\n\n\n <\/span>\n \n \n \n \n <\/td>\n<\/tr>\n\n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t \n\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\n\n\n\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\t\t\t\t\n\n\n\n\n\n\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\n\n\n\n\t\t\t\n\t\t\t\n\n\t\t\n\t\n\t\n\t\n\n \n \n\t\n<\/table>\n<\/div>"}}}'; $decoded = json_decode($json_string); $html = $decoded->value->content->price_feature_div; $dom = new domdocument(); $dom->loadhtml($html); $xpath = new domxpath($dom); //frem dom method $elements = $dom->getelementbyid("priceblock_ourprice")->item(0); //or utilize extract xpath below line //$pricenode = $xpath->query("//*[@id='priceblock_ourprice']"); if (!is_null($elements)) { $pricenode = $elements->item(0); $ourprice = $pricenode; echo $ourprice; }

extraction in frontend (i used jquery in below solution)

var jsonobj={ "asin" : "b00dr4lyhy", "featurename" : "price_feature_div", "type" : "json", "value" : { "content" : {"price_feature_div":"<div id=\"price\" class=\"a-section a-spacing-small\">\n<table class=\"a-lineitem\">\n \n\t\t\n\t\t\n\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\n\n\n\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t \n\t\t \n\t\t\t\t \n\t\t \n\t\t\t\t \n\n\n\n\n\n\t\n<tr>\n <td class=\"a-color-secondary a-size-base a-text-right a-nowrap\">price:<\/td>\n <td class=\"a-span12\">\n <span id=\"priceblock_ourprice\" class=\"a-size-medium a-color-price\">$37.60<\/span>\n \n\n\n\n \n\n\n\n\n\n\n \n\n <span id=\"ourprice_shippingmessage\">\t\n \t\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n \n\n\t \n\t\t\n\t\t\n \n <span class=\"a-size-base a-color-base\">& <b>free shipping<\/b><\/span>\n \n \n \n\n\n\n <\/span>\n \n \n \n \n <\/td>\n<\/tr>\n\n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\t\t \n\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\n\t\t \n\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\n\n\n\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\t\t\t\t\n\n\n\n\n\n\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\t\t\t\n\t\t\t\n\n\n\n\n\n\t\t\t\n\t\t\t\n\n\t\t\n\t\n\t\n\t\n\n \n \n\t\n<\/table>\n<\/div>"} } }; //using jquery extracted cost var ourprice = $(jsonobj.value.content.price_feature_div).find("#priceblock_ourprice").text(); console.log(ourprice);//"$37.60" value can see in browser-console

note: found syntax error @ "price_feature_div" html value(in json value should in single line html string). noticed 2 line breaks in html.

php regex json web-scraping scrape

No comments:

Post a Comment