最新文章专题视频专题问答1问答10问答100问答1000问答2000关键字专题1关键字专题50关键字专题500关键字专题1500TAG最新视频文章推荐1 推荐3 推荐5 推荐7 推荐9 推荐11 推荐13 推荐15 推荐17 推荐19 推荐21 推荐23 推荐25 推荐27 推荐29 推荐31 推荐33 推荐35 推荐37视频文章20视频文章30视频文章40视频文章50视频文章60 视频文章70视频文章80视频文章90视频文章100视频文章120视频文章140 视频2关键字专题关键字专题tag2tag3文章专题文章专题2文章索引1文章索引2文章索引3文章索引4文章索引5123456789101112131415文章专题3
当前位置: 首页 - 科技 - 知识百科 - 正文

【推荐】oc解析HTML数据的类库(爬取网页数据)_html/css_WEB-ITnose

来源:动视网 责编:小采 时间:2020-11-27 16:21:23
文档

【推荐】oc解析HTML数据的类库(爬取网页数据)_html/css_WEB-ITnose

【推荐】oc解析HTML数据的类库(爬取网页数据)_html/css_WEB-ITnose:TFhpple是一个用于解析html数据的第三方库,本人感觉功能还算可以,只不过在使用前必须配置项目。 配置 1.导入libxml2.tbd 2.设置编译路径 使用 这里使用一个例子来说明 http://so.gushiwen.org/guwen/book_2.aspx 1.
推荐度:
导读【推荐】oc解析HTML数据的类库(爬取网页数据)_html/css_WEB-ITnose:TFhpple是一个用于解析html数据的第三方库,本人感觉功能还算可以,只不过在使用前必须配置项目。 配置 1.导入libxml2.tbd 2.设置编译路径 使用 这里使用一个例子来说明 http://so.gushiwen.org/guwen/book_2.aspx 1.


  TFhpple是一个用于解析html数据的第三方库,本人感觉功能还算可以,只不过在使用前必须配置项目。

  

  配置

1.导入libxml2.tbd

2.设置编译路径

  使用

这里使用一个例子来说明

http://so.gushiwen.org/guwen/book_2.aspx

1.创建TFHpple对象,data为网站返回的数据

TFHpple *htmlParser = [[TFHpple alloc] initWithHTMLData:data];

2.使用searchWithXPathQuery方法得到有用数据,XPATH知识具体百度

NSArray *temp1 = [htmlParser searchWithXPathQuery:@"//div[@class='shileft']/div[@class='bookcont']"]

这样我们获取了论语的数据

3。获取并分析元素

TFHppleElement *element = [elements objectAtIndex:i];

TFHppleElement对象包含许多属性,下面简单介绍一下各属性

1。

@property (nonatomic, copy, readonly) NSString *raw

raw是包含html标记的网页数据


 
 
 学而篇
 
 为政篇
 
 八佾篇
 
 里仁篇
 
 公冶长篇
 
 雍也篇
 
 述而篇
 
 泰伯篇
 
 子罕篇
 
 乡党篇
 
 先进篇
 
 颜渊篇
 
 子路篇
 
 宪问篇
 
 卫灵公篇
 
 季氏篇
 
 阳货篇
 
 微子篇
 
 子张篇
 
 尧曰篇
 
 
 

2.content是网页的具体数据,不包含html标记

学而篇 为政篇 八佾篇 里仁篇 公冶长篇 雍也篇 述而篇 泰伯篇 子罕篇 乡党篇 先进篇 颜渊篇 子路篇 宪问篇 卫灵公篇 季氏篇 阳货篇 微子篇 子张篇 尧曰篇

3.tagName是html标签

输出只有div

4.attributes,属性。。。。

class = bookcont;

5.children子节点

( "{\n nodeContent = \"\\n \";\n nodeName = text;\n}", "{\n nodeChildArray = (\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_19.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b66\\U800c\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b66\\U800c\\U7bc7\";\n nodeName = a;\n raw = \"\\U5b66\\U800c\\U7bc7\";\n }\n );\n nodeContent = \"\\U5b66\\U800c\\U7bc7\";\n nodeName = span;\n raw = \"\\U5b66\\U800c\\U7bc7\";\n },\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_20.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U4e3a\\U653f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U4e3a\\U653f\\U7bc7\";\n nodeName = a;\n raw = \"\\U4e3a\\U653f\\U7bc7\";\n }\n );\n nodeContent = \"\\U4e3a\\U653f\\U7bc7\";\n nodeName = span;\n raw = \"\\U4e3a\\U653f\\U7bc7\";\n },\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_21.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U516b\\U4f7e\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U516b\\U4f7e\\U7bc7\";\n nodeName = a;\n raw = \"\\U516b\\U4f7e\\U7bc7\";\n }\n );\n nodeContent = \"\\U516b\\U4f7e\\U7bc7\";\n nodeName = span;\n raw = \"\\U516b\\U4f7e\\U7bc7\";\n },\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_22.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U91cc\\U4ec1\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U91cc\\U4ec1\\U7bc7\";\n nodeName = a;\n raw = \"\\U91cc\\U4ec1\\U7bc7\";\n }\n );\n nodeContent = \"\\U91cc\\U4ec1\\U7bc7\";\n nodeName = span;\n raw = \"\\U91cc\\U4ec1\\U7bc7\";\n },\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_23.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U516c\\U51b6\\U957f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U516c\\U51b6\\U957f\\U7bc7\";\n nodeName = a;\n raw = \"\\U516c\\U51b6\\U957f\\U7bc7\";\n }\n );\n nodeContent = \"\\U516c\\U51b6\\U957f\\U7bc7\";\n nodeName = span;\n raw = \"\\U516c\\U51b6\\U957f\\U7bc7\";\n },\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_24.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U96cd\\U4e5f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U96cd\\U4e5f\\U7bc7\";\n nodeName = a;\n raw = \"\\U96cd\\U4e5f\\U7bc7\";\n }\n );\n nodeContent = \"\\U96cd\\U4e5f\\U7bc7\";\n nodeName = span;\n raw = \"\\U96cd\\U4e5f\\U7bc7\";\n },\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_25.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U8ff0\\U800c\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U8ff0\\U800c\\U7bc7\";\n nodeName = a;\n raw = \"\\U8ff0\\U800c\\U7bc7\";\n }\n );\n nodeContent = \"\\U8ff0\\U800c\\U7bc7\";\n nodeName = span;\n raw = \"\\U8ff0\\U800c\\U7bc7\";\n },\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_26.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U6cf0\\U4f2f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U6cf0\\U4f2f\\U7bc7\";\n nodeName = a;\n raw = \"\\U6cf0\\U4f2f\\U7bc7\";\n }\n );\n nodeContent = \"\\U6cf0\\U4f2f\\U7bc7\";\n nodeName = span;\n raw = \"\\U6cf0\\U4f2f\\U7bc7\";\n },\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_27.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b50\\U7f55\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b50\\U7f55\\U7bc7\";\n nodeName = a;\n raw = \"\\U5b50\\U7f55\\U7bc7\";\n }\n );\n nodeContent = \"\\U5b50\\U7f55\\U7bc7\";\n nodeName = span;\n raw = \"\\U5b50\\U7f55\\U7bc7\";\n },\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_28.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U4e61\\U515a\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U4e61\\U515a\\U7bc7\";\n nodeName = a;\n raw = \"\\U4e61\\U515a\\U7bc7\";\n }\n );\n nodeContent = \"\\U4e61\\U515a\\U7bc7\";\n nodeName = span;\n raw = \"\\U4e61\\U515a\\U7bc7\";\n },\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_29.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5148\\U8fdb\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5148\\U8fdb\\U7bc7\";\n nodeName = a;\n raw = \"\\U5148\\U8fdb\\U7bc7\";\n }\n );\n nodeContent = \"\\U5148\\U8fdb\\U7bc7\";\n nodeName = span;\n raw = \"\\U5148\\U8fdb\\U7bc7\";\n },\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_30.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U9c\\U6e0a\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U9c\\U6e0a\\U7bc7\";\n nodeName = a;\n raw = \"\\U9c\\U6e0a\\U7bc7\";\n }\n );\n nodeContent = \"\\U9c\\U6e0a\\U7bc7\";\n nodeName = span;\n raw = \"\\U9c\\U6e0a\\U7bc7\";\n },\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_31.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b50\\U8def\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b50\\U8def\\U7bc7\";\n nodeName = a;\n raw = \"\\U5b50\\U8def\\U7bc7\";\n }\n );\n nodeContent = \"\\U5b50\\U8def\\U7bc7\";\n nodeName = span;\n raw = \"\\U5b50\\U8def\\U7bc7\";\n },\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_32.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5baa\\U95ee\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5baa\\U95ee\\U7bc7\";\n nodeName = a;\n raw = \"\\U5baa\\U95ee\\U7bc7\";\n }\n );\n nodeContent = \"\\U5baa\\U95ee\\U7bc7\";\n nodeName = span;\n raw = \"\\U5baa\\U95ee\\U7bc7\";\n },\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_33.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U536b\\U7075\\U516c\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U536b\\U7075\\U516c\\U7bc7\";\n nodeName = a;\n raw = \"\\U536b\\U7075\\U516c\\U7bc7\";\n }\n );\n nodeContent = \"\\U536b\\U7075\\U516c\\U7bc7\";\n nodeName = span;\n raw = \"\\U536b\\U7075\\U516c\\U7bc7\";\n },\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_34.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b63\\U6c0f\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b63\\U6c0f\\U7bc7\";\n nodeName = a;\n raw = \"\\U5b63\\U6c0f\\U7bc7\";\n }\n );\n nodeContent = \"\\U5b63\\U6c0f\\U7bc7\";\n nodeName = span;\n raw = \"\\U5b63\\U6c0f\\U7bc7\";\n },\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_35.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U9633\\U8d27\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U9633\\U8d27\\U7bc7\";\n nodeName = a;\n raw = \"\\U9633\\U8d27\\U7bc7\";\n }\n );\n nodeContent = \"\\U9633\\U8d27\\U7bc7\";\n nodeName = span;\n raw = \"\\U9633\\U8d27\\U7bc7\";\n },\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_36.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5fae\\U5b50\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5fae\\U5b50\\U7bc7\";\n nodeName = a;\n raw = \"\\U5fae\\U5b50\\U7bc7\";\n }\n );\n nodeContent = \"\\U5fae\\U5b50\\U7bc7\";\n nodeName = span;\n raw = \"\\U5fae\\U5b50\\U7bc7\";\n },\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_37.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5b50\\U5f20\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5b50\\U5f20\\U7bc7\";\n nodeName = a;\n raw = \"\\U5b50\\U5f20\\U7bc7\";\n }\n );\n nodeContent = \"\\U5b50\\U5f20\\U7bc7\";\n nodeName = span;\n raw = \"\\U5b50\\U5f20\\U7bc7\";\n },\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n },\n {\n nodeChildArray = (\n {\n nodeAttributeArray = (\n {\n attributeName = href;\n nodeContent = \"/guwen/bookv_38.aspx\";\n }\n );\n nodeChildArray = (\n {\n nodeContent = \"\\U5c27\\U66f0\\U7bc7\";\n nodeName = text;\n }\n );\n nodeContent = \"\\U5c27\\U66f0\\U7bc7\";\n nodeName = a;\n raw = \"\\U5c27\\U66f0\\U7bc7\";\n }\n );\n nodeContent = \"\\U5c27\\U66f0\\U7bc7\";\n nodeName = span;\n raw = \"\\U5c27\\U66f0\\U7bc7\";\n },\n {\n nodeContent = \"\\n \\n \";\n nodeName = text;\n }\n );\n nodeContent = \"\\n \\n \\U5b66\\U800c\\U7bc7\\n \\n \\U4e3a\\U653f\\U7bc7\\n \\n \\U516b\\U4f7e\\U7bc7\\n \\n \\U91cc\\U4ec1\\U7bc7\\n \\n \\U516c\\U51b6\\U957f\\U7bc7\\n \\n \\U96cd\\U4e5f\\U7bc7\\n \\n \\U8ff0\\U800c\\U7bc7\\n \\n \\U6cf0\\U4f2f\\U7bc7\\n \\n \\U5b50\\U7f55\\U7bc7\\n \\n \\U4e61\\U515a\\U7bc7\\n \\n \\U5148\\U8fdb\\U7bc7\\n \\n \\U9c\\U6e0a\\U7bc7\\n \\n \\U5b50\\U8def\\U7bc7\\n \\n \\U5baa\\U95ee\\U7bc7\\n \\n \\U536b\\U7075\\U516c\\U7bc7\\n \\n \\U5b63\\U6c0f\\U7bc7\\n \\n \\U9633\\U8d27\\U7bc7\\n \\n \\U5fae\\U5b50\\U7bc7\\n \\n \\U5b50\\U5f20\\U7bc7\\n \\n \\U5c27\\U66f0\\U7bc7\\n \\n \";\n nodeName = ul;\n raw = \"
\\n 
\\n \\U5b66\\U800c\\U7bc7
\\n 
\\n \\U4e3a\\U653f\\U7bc7
\\n 
\\n \\U516b\\U4f7e\\U7bc7
\\n 
\\n \\U91cc\\U4ec1\\U7bc7
\\n 
\\n \\U516c\\U51b6\\U957f\\U7bc7
\\n 
\\n \\U96cd\\U4e5f\\U7bc7
\\n 
\\n \\U8ff0\\U800c\\U7bc7
\\n 
\\n \\U6cf0\\U4f2f\\U7bc7
\\n 
\\n \\U5b50\\U7f55\\U7bc7
\\n 
\\n \\U4e61\\U515a\\U7bc7
\\n 
\\n \\U5148\\U8fdb\\U7bc7
\\n 
\\n \\U9c\\U6e0a\\U7bc7
\\n 
\\n \\U5b50\\U8def\\U7bc7
\\n 
\\n \\U5baa\\U95ee\\U7bc7
\\n 
\\n \\U536b\\U7075\\U516c\\U7bc7
\\n 
\\n \\U5b63\\U6c0f\\U7bc7
\\n 
\\n \\U9633\\U8d27\\U7bc7
\\n 
\\n \\U5fae\\U5b50\\U7bc7
\\n 
\\n \\U5b50\\U5f20\\U7bc7
\\n 
\\n \\U5c27\\U66f0\\U7bc7
\\n 
\\n \";\n}", "{\n nodeContent = \"\\n \";\n nodeName = text;\n}")

6.firstChild

{ nodeContent = "\n "; nodeName = text;}

上面属性都是涉及HTML语言的标记,我们一般使用的时content属性,然后处理得到的NSString对象

这样我们就得到并处理为我们想要的数据。TFHppleElement是一个很重要的类,具体使用在这里就不介绍了。

文档

【推荐】oc解析HTML数据的类库(爬取网页数据)_html/css_WEB-ITnose

【推荐】oc解析HTML数据的类库(爬取网页数据)_html/css_WEB-ITnose:TFhpple是一个用于解析html数据的第三方库,本人感觉功能还算可以,只不过在使用前必须配置项目。 配置 1.导入libxml2.tbd 2.设置编译路径 使用 这里使用一个例子来说明 http://so.gushiwen.org/guwen/book_2.aspx 1.
推荐度:
标签: ios 数据 页面
  • 热门焦点

最新推荐

猜你喜欢

热门推荐

专题
Top