Wednesday, 15 August 2012

Python search pattern in string -



Python search pattern in string -

hello please help me? trying lot of days create script finds pattern.

my script is:

<meta content="" property="news_keywords"/> <meta content="tough , true - bostonians read boston herald solid reporting, whether in print or online, on issues affecting daily lives. boston herald gets people talking. our reporters second-to-none, our photographers pulitzer prize-winning , nowadays news bostonians care , respond to." property="description"/> <meta content='{"link":"http:\/\/bostonherald.com\/","type":"frontpage"}' name="parsely-page"/><meta content="" property="keywords"/> <meta content="drupal 7 (http://drupal.org)" name="generator"/> <link href="http://www.bostonherald.com/" rel="canonical"/> <link href="http://www.bostonherald.com/" rel="shortlink"/> <meta content="420" http-equiv="refresh"/> <link href="http://www.bostonherald.com/sites/default/files/images/favicon.ico" rel="shortcut icon" type="image/vnd.microsoft.icon"/> <title>boston herald | boston herald</title> <style media="all" type="text/css">@import url("http://www.bostonherald.com/modules/system/system.base.css?nd76bo"); @import url("http://www.bostonherald.com/modules/system/system.menus.css?nd76bo"); @import url("http://www.bostonherald.com/modules/system/system.messages.css?nd76bo"); @import url("http://www.bostonherald.com/modules/system/system.theme.css?nd76bo");</style> <style media="all" type="text/css">@import url("http://www.bostonherald.com/modules/aggregator/aggregator.css?nd76bo"); @import url("http://www.bostonherald.com/modules/comment/comment.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/modules/date/date_api/date.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/modules/date/date_popup/themes/datepicker.1.7.css?nd76bo"); @import url("http://www.bostonherald.com/modules/field/theme/field.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/modules/mollom/mollom.css?nd76bo"); @import url("http://www.bostonherald.com/modules/node/node.css?nd76bo"); @import url("http://www.bostonherald.com/modules/poll/poll.css?nd76bo"); @import url("http://www.bostonherald.com/modules/user/user.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/modules/views/css/views.css?nd76bo");</style> <style media="all" type="text/css">@import url("http://www.bostonherald.com/sites/all/modules/ctools/css/ctools.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/modules/lightbox2/css/lightbox.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/modules/panels/css/panels.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/modules/rate/rate.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/libraries/superfish/css/superfish.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/libraries/superfish/css/superfish-vertical.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/libraries/superfish/css/superfish-navbar.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/modules/views_slideshow/views_slideshow.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/modules/jcarousel/skins/default/jcarousel-default.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/modules/panels/plugins/layouts/twocol_stacked/twocol_stacked.css?nd76bo");</style> <style media="all" type="text/css">@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/basics.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/custom_blocks.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/navigation.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/view-story_slots.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/taxonomy/taxonomy-styles.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/bhr.css?nd76bo");</style> <style media="print" type="text/css">@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/print.css?nd76bo");</style> <style media="all" type="text/css">@import url("http://www.bostonherald.com/sites/all/themes/omega/alpha/css/alpha-reset.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/themes/omega/alpha/css/alpha-alpha.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/themes/omega/omega/css/formalize.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/themes/omega/omega/css/omega-branding.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/themes/omega/omega/css/omega-forms.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/layout-front.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/global.css?nd76bo");</style> <style media="all" type="text/css">@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/ike-omega-alpha-default.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/ike-omega-alpha-default-normal.css?nd76bo"); @import url("http://www.bostonherald.com/sites/all/themes/omega/alpha/css/grid/alpha_default/normal/alpha-default-normal-24.css?nd76bo");</style>

the pattern

<meta content(+.?)refresh">

the string big tried different approaches neither works. don't save string in txt file.

the script tried didn't work.

#try 1 import re re.findall("<meta content(+.?)refresh">",html) #try 2 matching = [s s in html if "<meta content(+.?)refresh">" in s]

the question comment was: 'i grab out section of string start "meta content" , finishes "refresh">".'

i split lines because way ^ matches start of each line, not whole string. used ^ match start , $ match end. in fact these not necessary since < , > sufficient. note double quote escaped slash character before it.

another key point: it's not +.? .*? work grab characters in middle of string.

>>> import re >>> line in html.splitlines(): ... m = re.match("^<meta content(.*?)refresh\"/>$", line) ... if m: ... print(m.group(0)) ... <meta content="420" http-equiv="refresh"/>

the docs on python regular expressions can found here: https://docs.python.org/2/library/re.html

python python-2.7

No comments:

Post a Comment