加载中…
个人资料
  • 博客等级:
  • 博客积分:
  • 博客访问:
  • 关注人气:
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

Think in C/C++ when coding LPA

(2012-04-18 17:09:42)
标签:

lpa

标签传播算法

it

分类: 编程经验--积跬步至千里
I have been struggling to code label propagation for several days. Below are some points I got through the work.
Most attention is payed to the efficiency of the data structure and the algorithm, as the graph has more than 50,000,000 edges and about 1,900,000 nodes.
POINT ONE:
choose the data structure:
1.take a array of lists to store the graph
2.take a map of <nodeid,label> to quickly got the label of any node. As map is implemented as some kind of tree.
3.use TYPEDEF to simplify data type

POINT TWO:
improve the accuracy:
1.unified exception-throw function, such as fileError() and emptyPointer().
2.declare a parameter as const if its value is not to be changed.
3.use a test.cpp to test those confused ideas or concepts. Visit www.cplusplus.com and other websites to get odds understood.
4.clearly and carefully define a functoin, its input and its output.
POINT THREE:
enhance the efficiency:
1.if without a limitation of main memory, allocate as much memory as possible. Run the code in memory instead of in disk!
2.locate the bottleneck of the code using clock(), print as much information as possible to help debugging.
3.declare a variable as static if it is initialized only once and used frequently. E.g. labelMap, rather tempLableMap(?)
4.use io library in C, rather than in C++, especially when tackling big files.
5.separate labels from the graph as a great redundance among the labels.

POINT FOUR:
instance to learn:
void updateLabel(list<datatype> adjlist, LABELMAP tempLabelMap, LABELMAP& labelMap);
Notice the tempLableMap!
When the function is called, a new tempLabelMap is duplicated, which cost UNBELIEVABLE time!
As in my case, for a map with about 1,900,000 pairs, the procedure cost about 300ms per time! And the time of collapse the stack of the function is nearly 100ms!
When rewritten as void updateLabel(list<datatype> &adjlist, LABELMAP &tempLabelMap, LABELMAP& labelMap), it only take no more than 10ms!
So the conclusion should be that when a map is a parameter of function, it duplicates the whole map in the function. The solution is to use the reference of the map instead!

ANY ONE NEED THE SOURCE CODE, PLEASE CONTACT ME!

0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有