利用PyQuery获取HTML指定标签内容_html/css_WEB-ITnose

来源：动视网责编：小采时间：2020-11-27 16:33:45

利用PyQuery获取HTML指定标签内容_html/css_WEB-ITnose

利用PyQuery获取HTML指定标签内容_html/css_WEB-ITnose:安装 sudo pip install pyquery 例子 from pyquery import PyQueryimport urllib2page = urllib2.urlopen(http://www.lzu.edu.cn)text = unicode(page.read(), utf-8)doc = PyQuery(text)for

推荐度：

点击下载本文 文档为doc格式

导读利用PyQuery获取HTML指定标签内容_html/css_WEB-ITnose:安装 sudo pip install pyquery 例子 from pyquery import PyQueryimport urllib2page = urllib2.urlopen(http://www.lzu.edu.cn)text = unicode(page.read(), utf-8)doc = PyQuery(text)for

安装

sudo pip install pyquery

例子

from pyquery import PyQueryimport urllib2page = urllib2.urlopen("http://www.lzu.edu.cn")text = unicode(page.read(), "utf-8")doc = PyQuery(text)for event in doc('.r li'): event = PyQuery(event) #loc = event.find('.h').text() time = event.text().encode('utf-8') #name = event.find('title').text() #print 'name: %s' % name print '名字 : %s' % time #print 'location : %s' % loc print '----------------------'

注意event里是unicode，在内存中运算的一定是固定2字节的unicode，存储要转为变字节的utf-8。

当然还有别的模块也可以用，如

#!/usr/bin/env python#-*- coding: utf8 -*-from HTMLParser import HTMLParserfrom htmlentitydefs import name2codepointimport urllib2class MyHTMLParser(HTMLParser): def __init__(self): HTMLParser.__init__(self) self._flag = '' def handle_starttag(self, tag, attrs): if tag == 'h3' and attrs.__contains__(('class','event-title')): self._flag = 'event-title' if tag == 'time': self._flag = 'time' if tag == 'span' and attrs.__contains__(('class','event-location')): self._flag = 'event-location' def handle_data(self, data): if self._flag == 'event-title': print '会议名称: %s' %data self._flag = '' #if self._flag == 'time': # print '会议时间： %s' %data if self._flag == 'event-location': print '会议地点: %s' %data print '-------------------' self._flag = ''page = urllib2.urlopen('https://www.python.org/events/python-events/').read()parser = MyHTMLParser()parser.feed(page)