python提取页面内url列表的方法

来源：动视网责编：小采时间：2020-11-27 14:41:13

python提取页面内url列表的方法

python提取页面内url列表的方法:本文实例讲述了python提取页面内url列表的方法。分享给大家供大家参考。具体实现方法如下： from bs4 import BeautifulSoup import time,re,urllib2 t=time.time() websiteurls={} def scanpage(url): websiteurl

推荐度：

点击下载本文 文档为doc格式

导读python提取页面内url列表的方法:本文实例讲述了python提取页面内url列表的方法。分享给大家供大家参考。具体实现方法如下： from bs4 import BeautifulSoup import time,re,urllib2 t=time.time() websiteurls={} def scanpage(url): websiteurl

本文实例讲述了python提取页面内url列表的方法。分享给大家供大家参考。具体实现方法如下：

from bs4 import BeautifulSoup
import time,re,urllib2
t=time.time()
websiteurls={}
def scanpage(url):
 websiteurl=url
 t=time.time()
 n=0
 html=urllib2.urlopen(websiteurl).read()
 soup=BeautifulSoup(html)
 pageurls=[]
 Upageurls={}
 pageurls=soup.find_all("a",href=True)
 for links in pageurls:
 if websiteurl in links.get("href") and links.get("href") not in Upageurls and links.get("href") not in websiteurls:
 Upageurls[links.get("href")]=0
 for links in Upageurls.keys():
 try:
 urllib2.urlopen(links).getcode()
 except:
 print "connect failed"
 else:
 t2=time.time()
 Upageurls[links]=urllib2.urlopen(links).getcode()
 print n,
 print links,
 print Upageurls[links]
 t1=time.time()
 print t1-t2
 n+=1
 print ("total is "+repr(n)+" links")
 print time.time()-t
scanpage("http://news.163.com/")

希望本文所述对大家的Python程序设计有所帮助。

python提取页面内url列表的方法

推荐度：

点击下载本文 文档为doc格式

标签：页面提取 python

热门焦点

python提取页面内url列表的方法

python提取页面内url列表的方法

python提取页面内url列表的方法

最新推荐

猜你喜欢

热门推荐