LLVM源码阅读之五 --- StringMap
今天来认识一下这个StringMap类,它是一个Map类,把String做Key,任意对象做Value
template AllocatorTy =
MallocAllocator>
class StringMap : public StringMapImpl {
AllocatorTy Allocator;
public:
using MapEntryTy = StringMapEntry;
从StringMapImpl继承。
class StringMapImpl {
protected:
// Array of NumBuckets pointers to entries, null pointers are
holes.
// TheTable[NumBuckets] contains a sentinel value for easy
iteration. Followed
// by an array of the actual hash values as unsigned
integers.
StringMapEntryBase **TheTable =
nullptr;
unsigned NumBuckets = 0;
unsigned NumItems = 0;
unsigned NumTombstones = 0;
unsigned ItemSize;
它存了StringMapEntryBase的二级指针
/// StringMapEntryBase - Shared base class of StringMapEntry
instances.
class StringMapEntryBase {
size_t keyLength;
这个类直接把字符串存起来。
在分配内存的时候
TheTable = static_cast(safe_calloc(
NewNumBuckets + 1, sizeof(StringMapEntryBase **) +
sizeof(unsigned)));
它分配了default 16+1个StringMapEntryBase * 和 16+1个 unsigned,unsigned
用来存放hash值。
void StringMapImpl::init(unsigned InitSize) {
assert((InitSize & (InitSize - 1)) == 0
&&
"Init Size must be a power of 2 or
zero!");
unsigned NewNumBuckets = InitSize ? InitSize :
16;
NumItems = 0;
NumTombstones = 0;
TheTable = static_cast(safe_calloc(
NewNumBuckets + 1, sizeof(StringMapEntryBase **) +
sizeof(unsigned)));
// Set the member only if TheTable was successfully
allocated
NumBuckets = NewNumBuckets;
// Allocate one extra bucket, set it to look filled so the
iterators stop at
// end.
TheTable[NumBuckets] = (StringMapEntryBase *)2;
}
它会把第十七个存进2(这个Map当需要的时候会变大,最后一个2就是分断符号),告诉查找的这里结束。
(lldb) memory read -fx -s8 -c25 TheTable
0x10210eb90: 0x0000000000000000 0x0000000000000000
0x10210eba0: 0x0000000000000000 0x0000000000000000
0x10210ebb0: 0x0000000000000000 0x0000000000000000
0x10210ebc0: 0x0000000000000000 0x0000000000000000
0x10210ebd0: 0x0000000000000000 0x0000000000000000
0x10210ebe0: 0x0000000000000000 0x0000000000000000
0x10210ebf0: 0x0000000000000000 0x0000000000000000
0x10210ec00: 0x0000000000000000 0x0000000000000000
0x10210ec10: 0x0000000000000002 0x0000000000000000
0x10210ec20: 0x000000000ac5a292 0x0000000000000000
0x10210ec30: 0x0000000000000000 0x0000000000000000
0x10210ec40: 0x0000000000000000 0x0000000000000000
0x10210ec50: 0x0000000000000000
可以看到这个是存进2,然后得到的BucketNo正好是2,hash算法如下
/// The Bernstein hash function used by the DWARF accelerator
tables.
inline uint32_t djbHash(StringRef Buffer, uint32_t
H = 5381) {
for (unsigned char C : Buffer.bytes())
H = (H << 5) + H +
C;
return H;
}
于是在之后的NO2号Hash表里存下Hash值0ac5a292.
StringMap
OptionsMap; 在这个case下,每个StringMapEntry对像分配的内存为
size_t AllocSize = EntrySize + KeyLength +
1;
其中EntrySize是基类中的size_t keyLength;
8个字节加上KeyLenght+1个字节存放String值。
(llvm::cl::opt > *) this =
0x000000010280b480
所以
(lldb) memory read -fx -s8 0x000000010250a190
0x10250a190: 0x0000000000000009 0x000000010280b480
0x10250a1a0: 0x73696c2d706c6568 0x6c6c2f7463650074
0x10250a1b0: 0x0000000000000000 0x0000000000000000
0x10250a1c0: 0x727450796e690003 0x6554726f74636556
可以看到9是字符串长度,0x000000010280b480就是Value的值。
接下来的字节是字符串help-list的二进制值。
(lldb) p (char*)0x10250a1a0
(char *) $19 = 0x000000010250a1a0
"help-list"
再观察一次TheTable,发现它把这个指针也存在2号位置。
(lldb) memory read -fx -s8 -c20 TheTable
0x10210eb90: 0x0000000000000000 0x0000000000000000
0x10210eba0: 0x000000010250a190 0x0000000000000000
0x10210ebb0: 0x0000000000000000 0x0000000000000000
0x10210ebc0: 0x0000000000000000 0x0000000000000000
0x10210ebd0: 0x0000000000000000 0x0000000000000000
0x10210ebe0: 0x0000000000000000 0x0000000000000000
0x10210ebf0: 0x0000000000000000 0x0000000000000000
0x10210ec00: 0x0000000000000000 0x0000000000000000
0x10210ec10: 0x0000000000000002 0x0000000000000000
0x10210ec20: 0x000000000ac5a292 0x0000000000000000
下面是存了两个指针的情况
(lldb) memory read -fx -s8 -c25 TheTable
0x10210eb90: 0x0000000000000000 0x0000000000000000
0x10210eba0: 0x000000010250a190 0x0000000000000000
0x10210ebb0: 0x0000000000000000 0x0000000000000000
0x10210ebc0: 0x0000000000000000 0x0000000000000000
0x10210ebd0: 0x0000000000000000 0x0000000000000000
0x10210ebe0: 0x0000000000000000 0x00000001027041a0
0x10210ebf0: 0x0000000000000000 0x0000000000000000
0x10210ec00: 0x0000000000000000 0x0000000000000000
0x10210ec10: 0x0000000000000002 0x0000000000000000
0x10210ec20: 0x000000000ac5a292 0x0000000000000000
0x10210ec30: 0x0000000000000000 0x0000000000000000
0x10210ec40: 0x83a0934b00000000 0x0000000000000000
0x10210ec50: 0x0000000000000000
查找的时候,它会去这个表里先算出hash,然后看看在几号洞里,如果洞里没东西,说明不在,如果有,而hash不同,它会有往后放的逻辑。从后面的洞接着找。
这个和放是一样的,多个hash一样的字符串,会被放进相邻的bucket里面。
相邻逻辑也有意思,先放隔壁,不行就+2,不行再加3.因为有空间,所以整除,它不会循环,总能找到一个适合的空洞。
如果空间不够,它会重新扩容,并且重新更新所有的hash和洞号。
当删除一个Entry的时候,它会把它的值变成墓碑。然后当空隙+墓碑大于一定的百分比,就会重新分配空间。
LLVM源码阅读之五---StringMap
LLVM源码阅读之五 --- StringMap
今天来认识一下这个StringMap类,它是一个Map类,把String做Key,任意对象做Value
template AllocatorTy = MallocAllocator>
class StringMap : public StringMapImpl {
AllocatorTy Allocator;
public:
using MapEntryTy = StringMapEntry;
从StringMapImpl继承。
class StringMapImpl {
protected:
// Array of NumBuckets pointers to entries, null pointers are holes.
// TheTable[NumBuckets] contains a sentinel value for easy iteration. Followed
// by an array of the actual hash values as unsigned integers.
StringMapEntryBase **TheTable = nullptr;
unsigned NumBuckets = 0;
unsigned NumItems = 0;
unsigned NumTombstones = 0;
unsigned ItemSize;
它存了StringMapEntryBase的二级指针
/// StringMapEntryBase - Shared base class of StringMapEntry instances.
class StringMapEntryBase {
size_t keyLength;
这个类直接把字符串存起来。
在分配内存的时候
TheTable = static_cast(safe_calloc(
NewNumBuckets + 1, sizeof(StringMapEntryBase **) + sizeof(unsigned)));
它分配了default 16+1个StringMapEntryBase * 和 16+1个 unsigned,unsigned 用来存放hash值。
void StringMapImpl::init(unsigned InitSize) {
assert((InitSize & (InitSize - 1)) == 0 &&
"Init Size must be a power of 2 or zero!");
unsigned NewNumBuckets = InitSize ? InitSize : 16;
NumItems = 0;
NumTombstones = 0;
TheTable = static_cast(safe_calloc(
NewNumBuckets + 1, sizeof(StringMapEntryBase **) + sizeof(unsigned)));
// Set the member only if TheTable was successfully allocated
NumBuckets = NewNumBuckets;
// Allocate one extra bucket, set it to look filled so the iterators stop at
// end.
TheTable[NumBuckets] = (StringMapEntryBase *)2;
}
它会把第十七个存进2(这个Map当需要的时候会变大,最后一个2就是分断符号),告诉查找的这里结束。
(lldb) memory read -fx -s8 -c25 TheTable
0x10210eb90: 0x0000000000000000 0x0000000000000000
0x10210eba0: 0x0000000000000000 0x0000000000000000
0x10210ebb0: 0x0000000000000000 0x0000000000000000
0x10210ebc0: 0x0000000000000000 0x0000000000000000
0x10210ebd0: 0x0000000000000000 0x0000000000000000
0x10210ebe0: 0x0000000000000000 0x0000000000000000
0x10210ebf0: 0x0000000000000000 0x0000000000000000
0x10210ec00: 0x0000000000000000 0x0000000000000000
0x10210ec10: 0x0000000000000002 0x0000000000000000
0x10210ec20: 0x000000000ac5a292 0x0000000000000000
0x10210ec30: 0x0000000000000000 0x0000000000000000
0x10210ec40: 0x0000000000000000 0x0000000000000000
0x10210ec50: 0x0000000000000000
可以看到这个是存进2,然后得到的BucketNo正好是2,hash算法如下
/// The Bernstein hash function used by the DWARF accelerator tables.
inline uint32_t djbHash(StringRef Buffer, uint32_t H = 5381) {
for (unsigned char C : Buffer.bytes())
H = (H << 5) + H + C;
return H;
}
于是在之后的NO2号Hash表里存下Hash值0ac5a292.
StringMap
OptionsMap; 在这个case下,每个StringMapEntry对像分配的内存为size_t AllocSize = EntrySize + KeyLength + 1;
其中EntrySize是基类中的size_t keyLength;
8个字节加上KeyLenght+1个字节存放String值。
(llvm::cl::opt > *) this = 0x000000010280b480
所以
(lldb) memory read -fx -s8 0x000000010250a190
0x10250a190: 0x0000000000000009 0x000000010280b480
0x10250a1a0: 0x73696c2d706c6568 0x6c6c2f7463650074
0x10250a1b0: 0x0000000000000000 0x0000000000000000
0x10250a1c0: 0x727450796e690003 0x6554726f74636556
可以看到9是字符串长度,0x000000010280b480就是Value的值。
接下来的字节是字符串help-list的二进制值。
(lldb) p (char*)0x10250a1a0
(char *) $19 = 0x000000010250a1a0 "help-list"
再观察一次TheTable,发现它把这个指针也存在2号位置。
(lldb) memory read -fx -s8 -c20 TheTable
0x10210eb90: 0x0000000000000000 0x0000000000000000
0x10210eba0: 0x000000010250a190 0x0000000000000000
0x10210ebb0: 0x0000000000000000 0x0000000000000000
0x10210ebc0: 0x0000000000000000 0x0000000000000000
0x10210ebd0: 0x0000000000000000 0x0000000000000000
0x10210ebe0: 0x0000000000000000 0x0000000000000000
0x10210ebf0: 0x0000000000000000 0x0000000000000000
0x10210ec00: 0x0000000000000000 0x0000000000000000
0x10210ec10: 0x0000000000000002 0x0000000000000000
0x10210ec20: 0x000000000ac5a292 0x0000000000000000
下面是存了两个指针的情况
(lldb) memory read -fx -s8 -c25 TheTable
0x10210eb90: 0x0000000000000000 0x0000000000000000
0x10210eba0: 0x000000010250a190 0x0000000000000000
0x10210ebb0: 0x0000000000000000 0x0000000000000000
0x10210ebc0: 0x0000000000000000 0x0000000000000000
0x10210ebd0: 0x0000000000000000 0x0000000000000000
0x10210ebe0: 0x0000000000000000 0x00000001027041a0
0x10210ebf0: 0x0000000000000000 0x0000000000000000
0x10210ec00: 0x0000000000000000 0x0000000000000000
0x10210ec10: 0x0000000000000002 0x0000000000000000
0x10210ec20: 0x000000000ac5a292 0x0000000000000000
0x10210ec30: 0x0000000000000000 0x0000000000000000
0x10210ec40: 0x83a0934b00000000 0x0000000000000000
0x10210ec50: 0x0000000000000000
查找的时候,它会去这个表里先算出hash,然后看看在几号洞里,如果洞里没东西,说明不在,如果有,而hash不同,它会有往后放的逻辑。从后面的洞接着找。
这个和放是一样的,多个hash一样的字符串,会被放进相邻的bucket里面。
相邻逻辑也有意思,先放隔壁,不行就+2,不行再加3.因为有空间,所以整除,它不会循环,总能找到一个适合的空洞。
如果空间不够,它会重新扩容,并且重新更新所有的hash和洞号。
当删除一个Entry的时候,它会把它的值变成墓碑。然后当空隙+墓碑大于一定的百分比,就会重新分配空间。