加载中…
个人资料
  • 博客等级:
  • 博客积分:
  • 博客访问:
  • 关注人气:
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

在xml里追加结点时添加回车(libxml2)

(2012-10-30 21:30:07)
标签:

c

libxml2

linux

杂谈

例如,有以下的xml例子文件

 1http://www.cppblog.com/Images/OutliningIndicators/None.gif<?xml version="1.0"?>
 2http://www.cppblog.com/Images/OutliningIndicators/None.gif  <filesystem>
 4http://www.cppblog.com/Images/OutliningIndicators/None.gif    <filesystemKeyData>
 5http://www.cppblog.com/Images/OutliningIndicators/None.gif      <filesystemName>Ext3</filesystemName>
 6http://www.cppblog.com/Images/OutliningIndicators/None.gif      <versionNumber>123</versionNumber>
 7http://www.cppblog.com/Images/OutliningIndicators/None.gif      <option>good</option>
 8http://www.cppblog.com/Images/OutliningIndicators/None.gif    </filesystemKeyData>
 9http://www.cppblog.com/Images/OutliningIndicators/None.gif    <timestampSec>456</timestampSec>
10http://www.cppblog.com/Images/OutliningIndicators/None.gif    <status>heasjdkfjaskdfjsk</status>
11http://www.cppblog.com/Images/OutliningIndicators/None.gif  </filesystem>
12http://www.cppblog.com/Images/OutliningIndicators/None.gif  <filesystem>
13http://www.cppblog.com/Images/OutliningIndicators/None.gif    <filesystemKeyData>
14http://www.cppblog.com/Images/OutliningIndicators/None.gif      <filesystemName>Ext3</filesystemName>
15http://www.cppblog.com/Images/OutliningIndicators/None.gif      <versionNumber>123</versionNumber>
16http://www.cppblog.com/Images/OutliningIndicators/None.gif      <option>good</option>
17http://www.cppblog.com/Images/OutliningIndicators/None.gif    </filesystemKeyData>
18http://www.cppblog.com/Images/OutliningIndicators/None.gif    <timestampSec>456</timestampSec>
19http://www.cppblog.com/Images/OutliningIndicators/None.gif    <status>heasjdkfjaskdfjsk</status>
20http://www.cppblog.com/Images/OutliningIndicators/None.gif  </filesystem>
21http://www.cppblog.com/Images/OutliningIndicators/None.gif</BODY>
例如,使用该文章例子中的代码在上面的filesystem节点的最后插入一个keyword的子结点后的,
该xml文件的表示如下:

 1http://www.cppblog.com/Images/OutliningIndicators/None.gif<?xml version="1.0"?>
 2http://www.cppblog.com/Images/OutliningIndicators/None.gif<BODY>
 3http://www.cppblog.com/Images/OutliningIndicators/None.gif  <filesystem>
 4http://www.cppblog.com/Images/OutliningIndicators/None.gif    <filesystemKeyData>
 5http://www.cppblog.com/Images/OutliningIndicators/None.gif      <filesystemName>Ext3</filesystemName>
 6http://www.cppblog.com/Images/OutliningIndicators/None.gif      <versionNumber>123</versionNumber>
 7http://www.cppblog.com/Images/OutliningIndicators/None.gif      <option>good</option>
 8http://www.cppblog.com/Images/OutliningIndicators/None.gif    </filesystemKeyData>
 9http://www.cppblog.com/Images/OutliningIndicators/None.gif    <timestampSec>456</timestampSec>
10http://www.cppblog.com/Images/OutliningIndicators/None.gif   <status>heasjdkfjaskdfjsk</status>
11http://www.cppblog.com/Images/OutliningIndicators/None.gif   <keyword1>hello</keyword1><keyword2>hello</keyword2><keyword3>hello</keyword3></filesystem>
12http://www.cppblog.com/Images/OutliningIndicators/None.gif  <filesystem>
13http://www.cppblog.com/Images/OutliningIndicators/None.gif    <filesystemKeyData>
14http://www.cppblog.com/Images/OutliningIndicators/None.gif      <filesystemName>Ext3</filesystemName>
15http://www.cppblog.com/Images/OutliningIndicators/None.gif      <versionNumber>123</versionNumber>
16http://www.cppblog.com/Images/OutliningIndicators/None.gif      <option>good</option>
17http://www.cppblog.com/Images/OutliningIndicators/None.gif    </filesystemKeyData>
18http://www.cppblog.com/Images/OutliningIndicators/None.gif    <timestampSec>456</timestampSec>
19http://www.cppblog.com/Images/OutliningIndicators/None.gif    <status>heasjdkfjaskdfjsk</status>
20http://www.cppblog.com/Images/OutliningIndicators/None.gif    <keyword1>hello</keyword1><keyword2>hello</keyword2><keyword3>hello</keyword3></filesystem>
21http://www.cppblog.com/Images/OutliningIndicators/None.gif</BODY>

你会发现keyword和/filesystem像下面那样被挤在一起了,这并不是我们想要的.
<keyword1>hello</keyword1><keyword2>hello</keyword2><keyword3>hello</keyword3></filesystem>

通过设定 xmlKeepBlanksDefault(0) 以及 xmlSaveFormatFile(...)的format参数设置成1,都无法实现
在新追加的结点后面添加回车换行。
www.xmlsoft.org的官方网站的maillist里关于这方面的信息非常少。但是,对我帮助最大还是
http://mail.gnome.org/archives/xml/2007-May/msg00043.html 这个问题里的例子代码,里面在设置
属性的时候用的xmlReadFile函数,而且options参数设定的是XML_PARSE_NOBLANKS。

于是,我们用xmlReadFile(...),把它的options参数设定成XML_PARSE_NOBLANKS后,就可以自动添加
回车了。

那,重新修正了的例子程序是如下那样,里面只修改了两条语句。

1http://www.cppblog.com/Images/OutliningIndicators/None.gif#include <string.h>
 3http://www.cppblog.com/Images/OutliningIndicators/None.gif#include <stdlib.h>
 4http://www.cppblog.com/Images/OutliningIndicators/None.gif#include <libxml/xmlmemory.h>
 5http://www.cppblog.com/Images/OutliningIndicators/None.gif#include <libxml/parser.h>
 6http://www.cppblog.com/Images/OutliningIndicators/None.gifvoid
 7http://www.cppblog.com/Images/OutliningIndicators/None.gifparseStory (xmlDocPtr doc, xmlNodePtr cur, char *keyword)
 8http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif   xmlNewTextChild (cur, NULL, "keyword1"keyword);
10http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  xmlNewTextChild (cur, NULL, "keyword2"keyword);
11http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  xmlNewTextChild (cur, NULL, "keyword3"keyword);
12http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  return;
13http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockEnd.gif}
14http://www.cppblog.com/Images/OutliningIndicators/None.gif
15http://www.cppblog.com/Images/OutliningIndicators/None.gifxmlDocPtr
16http://www.cppblog.com/Images/OutliningIndicators/None.gifparseDoc (char *docname, char *keyword)
17在xml里追加结点时添加回车(libxml2)在xml里追加结点时添加回车(libxml2)在xml里追加结点时添加回车(libxml2){
18http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  xmlDocPtr doc;
19http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  xmlNodePtr cur;
20http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  //doc xmlParseFile (docname);
21http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  doc = xmlReadFile(docname, NULL, XML_PARSE_NOBLANKS);
//读取xml文件时忽略空格
22http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  if (doc == NULL)
23http://www.cppblog.com/Images/OutliningIndicators/ContractedSubBlock.gif  {
24http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif      fprintf (stderr, "Document not parsed successfully. \n");
25http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif      return (NULL);
26http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif  }

27http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  cur = xmlDocGetRootElement (doc);
28http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  if (cur == NULL)
29http://www.cppblog.com/Images/OutliningIndicators/ContractedSubBlock.gif  在xml里追加结点时添加回车(libxml2){
30http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif      fprintf (stderr, "empty document\n");
31http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif      xmlFreeDoc (doc);
32http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif      return (NULL);
33http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif  }

34http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  if (xmlStrcmp (cur->name, (const xmlChar *"BODY"))
35http://www.cppblog.com/Images/OutliningIndicators/ContractedSubBlock.gif  {
36http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif      fprintf (stderr, "document of the wrong type, root node != story\n");
37http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif      xmlFreeDoc (doc);
38http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif      return (NULL);
39http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif  }

40http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  cur = cur->xmlChildrenNode;
41http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  while (cur != NULL)
42http://www.cppblog.com/Images/OutliningIndicators/ContractedSubBlock.gif  {
43http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif      if ((!xmlStrcmp (cur->name, (const xmlChar *"filesystem")))
44http://www.cppblog.com/Images/OutliningIndicators/ContractedSubBlock.gif      {
45http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif         parseStory (doc, cur, keyword);
46http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif      }

47http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif      cur = cur->next;
48http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif  }

49http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  return (doc);
50http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockEnd.gif}

51http://www.cppblog.com/Images/OutliningIndicators/None.gif
52http://www.cppblog.com/Images/OutliningIndicators/None.gifint
53http://www.cppblog.com/Images/OutliningIndicators/None.gifmain (int argc, char **argv)
54在xml里追加结点时添加回车(libxml2)在xml里追加结点时添加回车(libxml2)在xml里追加结点时添加回车(libxml2){
55http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  char *docname;
56http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  char *keyword;
57http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  xmlDocPtr doc;
58http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  if (argc <= 2)
59http://www.cppblog.com/Images/OutliningIndicators/ContractedSubBlock.gif  在xml里追加结点时添加回车(libxml2){
60http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif      printf ("Usage: %s docname, keyword\n"argv[0]);
61http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif      return (0);
62http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif  }

63http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  docname = argv[1];
64http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  keyword = argv[2];
65http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  doc = parseDoc (docname, keyword);
66http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  if (doc != NULL)
67http://www.cppblog.com/Images/OutliningIndicators/ContractedSubBlock.gif  在xml里追加结点时添加回车(libxml2){
68http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif      //xmlSaveFormatFile (docname, doc, 0);
69http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif      xmlSaveFormatFile (docname, doc, 1);
70http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif      xmlFreeDoc (doc);
71http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif  }

72http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif  return (1);
73http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockEnd.gif}

修正1:是把xmlParseFile替换成xmlReadFile,并且是options参数设定成XML_PARSE_NOBLANKS;否则的话是不会在结点后面添加回车的。
修正2:把xmlSaveFormatFileformat参数修改成1,否则在使用xmlReadFile打开的xml文件时,在生成的xml文件里是会把所有的结点都放到一行里显示。
另外:xmlKeepBlanksDefault(0) 除了在读入xml文件时忽略空白之外,还会在写出xml文件时在每行前面放置缩进(indent)。如果使用xmlKeepBlanksDefault(1) 则你会发现每行前面的缩进就没有了,但不会影响回车换行。

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
额外话题: 更新结点的值得时候segement fault错误
下面的代码是更新XML文件里的某些结点元素的值的简单的例子。

1http://www.cppblog.com/Images/OutliningIndicators/None.gif    xmlNodePtr element;
 2在xml里追加结点时添加回车(libxml2)    // 在xml里追加结点时添加回车(libxml2)
 3http://www.cppblog.com/Images/OutliningIndicators/None.gif    xmlNodePtr childrenNodePtr = element->children;
 4http://www.cppblog.com/Images/OutliningIndicators/None.gif    while(childrenNodePtr != NULL)
 5在xml里追加结点时添加回车(libxml2)在xml里追加结点时添加回车(libxml2)    在xml里追加结点时添加回车(libxml2){
 6http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif        if(childrenNodePtr->type == XML_TEXT_NODE)
 7在xml里追加结点时添加回车(libxml2)在xml里追加结点时添加回车(libxml2)        在xml里追加结点时添加回车(libxml2){
 8http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif            xmlNodeSetContent(childrenNodePtr, (const xmlChar*)"world");
 9http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif            return NORMAL_RET;
10http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif        }

11http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif        childrenNodePtr = childrenNodePtr->next;
12http://www.cppblog.com/Images/OutliningIndicators/ExpandedBlockEnd.gif    }


运行该段代码,有时候会在使用libxml2的API函数xmlNodeSetContent
处发生段错误,但不是100%发生。
只有该结点在原来值是某些字符串的时候会发生该错误,比如说,
原来的值是"zo"的时候就会让程序崩溃。
阅读了libxml2的源代码发现,xmlNodeSetContent函数,在把结点值
设置成新的字符串之前会调用xmlFree(cur->content)来释放掉原来
字符串缓冲区的内存。
xmlNodeSetContent函数的代码片断:

1http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif        case XML_DOCUMENT_FRAG_NODE:
 3http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif        case XML_ELEMENT_NODE:
 4http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif        case XML_ATTRIBUTE_NODE:
 5http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif        if (cur->children != NULL) xmlFreeNodeList(cur->children);
 6http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif        cur->children = xmlStringGetNodeList(cur->doc, content);
 7http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif        UPDATE_LAST_CHILD_AND_PARENT(cur)
 8http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif        break;
 9http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif        case XML_TEXT_NODE:
10http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif        case XML_CDATA_SECTION_NODE:
11http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif        case XML_ENTITY_REF_NODE:
12http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif        case XML_ENTITY_NODE:
13http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif        case XML_PI_NODE:
14http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif        case XML_COMMENT_NODE:
15http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif        if ((cur->content != NULL) &&
16http://www.cppblog.com/Images/dot.gif{
17http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif            if (!((cur->doc != NULL) && (cur->doc->dict != NULL) &&
18http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif            (xmlDictOwns(cur->doc->dict, cur->content))))
19http://www.cppblog.com/Images/OutliningIndicators/InBlock.gif            xmlFree(cur->content);
20http://www.cppblog.com/Images/OutliningIndicators/ExpandedSubBlockEnd.gif        }

在上面代码里,如果结点值得字符串如果在libxml2的字典缓冲区(cur->doc->dict)里,
就把该字符串释放掉。而原来的字符串"zo"恰好在它的字典缓冲里,那这样传递到
xmlFree函数里的地址是冲区的一部分而不是缓冲区的首地址的话,free函数当然
会死掉了。如果换成其他的字符串就没有任何问题。

但是,令人不解的是在libxml2的另一部分代码里,删除节点的程序去不是这样做。
例如,在一个结点被删除后,通常会使用xmlFreeDoc函数来释放该结点,恰好在这段
代码里却是判断如果该字符串不再字典缓冲区才去释放它,也就是调用宏DICT_FREE
来完成释放工作,这儿正好与前面的相反,很难理解为什么会产生矛盾。
宏DICT_FREE的代码:

 1http://www.cppblog.com/Images/OutliningIndicators/ContractedBlock.gif
 8http://www.cppblog.com/Images/OutliningIndicators/None.gif#define DICT_FREE(str)                        \
 9http://www.cppblog.com/Images/OutliningIndicators/None.gif    if ((str) && ((!dict) ||                 \
10http://www.cppblog.com/Images/OutliningIndicators/None.gif        (xmlDictOwns(dict, (const xmlChar *)(str)) == 0)))    \
11http://www.cppblog.com/Images/OutliningIndicators/None.gif        xmlFree((char *)(str));

上面这段代码判断是该字符串如果不在字典缓冲里才去释放。

在考虑这是否是xmlNodeSetContent函数的bug,不过在maillist和bugzilla也没有翻到关于它
的任何说明。

开发时间上也不允许去跟libxml2深究它是否是bug,只要采用了迂回策略了。
想办法不让libxml2产生字典缓冲不就可以了吗。
通过官方手册我们可以知道,xmlReadFile函数可以附加XML_PARSE_NODICT选项
来避免产生字典缓冲。就像下面这样:
doc = xmlReadFile(docname, NULL, XML_PARSE_NOBLANK | XML_PARSE_NODICT);

这样的话,最开始的那段程序运行起来就没有任何问题了。

0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有