Think in C/C++ when coding LPA
(2012-04-18 17:09:42)
标签:
lpa标签传播算法it |
分类: 编程经验--积跬步至千里 |
I have been struggling to code label propagation for several
days. Below are some points I got through the work.
Most attention is payed to the efficiency of the data
structure and the algorithm, as the graph has more than 50,000,000
edges and about 1,900,000 nodes.
POINT ONE:
choose the data structure:
1.take a array of lists to store the graph
2.take a map of <nodeid,label>
to quickly got the label of any node. As map is implemented as some
kind of tree.
3.use TYPEDEF to simplify data type
POINT TWO:
improve the accuracy:
1.unified exception-throw function, such as fileError() and
emptyPointer().
2.declare a parameter as const if its value is not to be
changed.
3.use a test.cpp to test those confused ideas or concepts.
Visit www.cplusplus.com and other websites to get odds
understood.
4.clearly and carefully define a functoin, its input and its
output.
POINT THREE:
enhance the efficiency:
1.if without a limitation of main memory, allocate as much
memory as possible. Run the code in memory instead of in
disk!
2.locate the bottleneck of the code using clock(), print as
much information as possible to help debugging.
3.declare a variable as static if it is initialized only once
and used frequently. E.g. labelMap, rather tempLableMap(?)
4.use io library in C, rather than in C++, especially when
tackling big files.
5.separate labels from the graph as a great redundance among
the labels.
POINT FOUR:
instance to learn:
void updateLabel(list<datatype>
adjlist, LABELMAP tempLabelMap, LABELMAP&
labelMap);
Notice the tempLableMap!
When the function is called, a new tempLabelMap is duplicated,
which cost UNBELIEVABLE time!
As in my case, for a map with about 1,900,000 pairs, the
procedure cost about 300ms per time! And the time of collapse the
stack of the function is nearly 100ms!
When rewritten as void
updateLabel(list<datatype>
&adjlist, LABELMAP &tempLabelMap,
LABELMAP& labelMap), it only take no more than
10ms!
So the conclusion should be that when a map is a parameter of
function, it duplicates the whole map in the function. The solution
is to use the reference of the map instead!
ANY ONE NEED THE SOURCE CODE, PLEASE CONTACT ME!
前一篇:about C and C++