LLVM源码阅读之五---StringMap

2024-10-19 10:58:53
标签: llvm stringmap

LLVM源码阅读之五 --- StringMap

今天来认识一下这个StringMap类,它是一个Map类,把String做Key,任意对象做Value

template AllocatorTy = MallocAllocator>

class StringMap : public StringMapImpl {

AllocatorTy Allocator;

public:

using MapEntryTy = StringMapEntry;

从StringMapImpl继承。

class StringMapImpl {

protected:

// Array of NumBuckets pointers to entries, null pointers are holes.

// TheTable[NumBuckets] contains a sentinel value for easy iteration. Followed

// by an array of the actual hash values as unsigned integers.

StringMapEntryBase **TheTable = nullptr;

unsigned NumBuckets = 0;

unsigned NumItems = 0;

unsigned NumTombstones = 0;

unsigned ItemSize;

它存了StringMapEntryBase的二级指针

/// StringMapEntryBase - Shared base class of StringMapEntry instances.

class StringMapEntryBase {

size_t keyLength;

这个类直接把字符串存起来。

在分配内存的时候

TheTable = static_cast(safe_calloc(

NewNumBuckets + 1, sizeof(StringMapEntryBase **) + sizeof(unsigned)));

它分配了default 16+1个StringMapEntryBase * 和 16+1个 unsigned,unsigned 用来存放hash值。

void StringMapImpl::init(unsigned InitSize) {

assert((InitSize & (InitSize - 1)) == 0 &&

"Init Size must be a power of 2 or zero!");

unsigned NewNumBuckets = InitSize ? InitSize : 16;

NumItems = 0;

NumTombstones = 0;

TheTable = static_cast(safe_calloc(

NewNumBuckets + 1, sizeof(StringMapEntryBase **) + sizeof(unsigned)));

// Set the member only if TheTable was successfully allocated

NumBuckets = NewNumBuckets;

// Allocate one extra bucket, set it to look filled so the iterators stop at

// end.

TheTable[NumBuckets] = (StringMapEntryBase *)2;

}

它会把第十七个存进2(这个Map当需要的时候会变大,最后一个2就是分断符号),告诉查找的这里结束。

(lldb) memory read -fx -s8 -c25 TheTable

0x10210eb90: 0x0000000000000000 0x0000000000000000

0x10210eba0: 0x0000000000000000 0x0000000000000000

0x10210ebb0: 0x0000000000000000 0x0000000000000000

0x10210ebc0: 0x0000000000000000 0x0000000000000000

0x10210ebd0: 0x0000000000000000 0x0000000000000000

0x10210ebe0: 0x0000000000000000 0x0000000000000000

0x10210ebf0: 0x0000000000000000 0x0000000000000000

0x10210ec00: 0x0000000000000000 0x0000000000000000

0x10210ec10: 0x0000000000000002 0x0000000000000000

0x10210ec20: 0x000000000ac5a292 0x0000000000000000

0x10210ec30: 0x0000000000000000 0x0000000000000000

0x10210ec40: 0x0000000000000000 0x0000000000000000

0x10210ec50: 0x0000000000000000

可以看到这个是存进2,然后得到的BucketNo正好是2,hash算法如下

/// The Bernstein hash function used by the DWARF accelerator tables.

inline uint32_t djbHash(StringRef Buffer, uint32_t H = 5381) {

for (unsigned char C : Buffer.bytes())

H = (H << 5) + H + C;

return H;

}

于是在之后的NO2号Hash表里存下Hash值0ac5a292.

StringMap

OptionsMap; 在这个case下,每个StringMapEntry对像分配的内存为

size_t AllocSize = EntrySize + KeyLength + 1;

其中EntrySize是基类中的size_t keyLength;

8个字节加上KeyLenght+1个字节存放String值。

(llvm::cl::opt > *) this = 0x000000010280b480

所以

(lldb) memory read -fx -s8 0x000000010250a190

0x10250a190: 0x0000000000000009 0x000000010280b480

0x10250a1a0: 0x73696c2d706c6568 0x6c6c2f7463650074

0x10250a1b0: 0x0000000000000000 0x0000000000000000

0x10250a1c0: 0x727450796e690003 0x6554726f74636556

可以看到9是字符串长度,0x000000010280b480就是Value的值。

接下来的字节是字符串help-list的二进制值。

(lldb) p (char*)0x10250a1a0

(char *) $19 = 0x000000010250a1a0 "help-list"

再观察一次TheTable,发现它把这个指针也存在2号位置。

(lldb) memory read -fx -s8 -c20 TheTable

0x10210eb90: 0x0000000000000000 0x0000000000000000

0x10210eba0: 0x000000010250a190 0x0000000000000000

0x10210ebb0: 0x0000000000000000 0x0000000000000000

0x10210ebc0: 0x0000000000000000 0x0000000000000000

0x10210ebd0: 0x0000000000000000 0x0000000000000000

0x10210ebe0: 0x0000000000000000 0x0000000000000000

0x10210ebf0: 0x0000000000000000 0x0000000000000000

0x10210ec00: 0x0000000000000000 0x0000000000000000

0x10210ec10: 0x0000000000000002 0x0000000000000000

0x10210ec20: 0x000000000ac5a292 0x0000000000000000

下面是存了两个指针的情况

(lldb) memory read -fx -s8 -c25 TheTable

0x10210eb90: 0x0000000000000000 0x0000000000000000

0x10210eba0: 0x000000010250a190 0x0000000000000000

0x10210ebb0: 0x0000000000000000 0x0000000000000000

0x10210ebc0: 0x0000000000000000 0x0000000000000000

0x10210ebd0: 0x0000000000000000 0x0000000000000000

0x10210ebe0: 0x0000000000000000 0x00000001027041a0

0x10210ebf0: 0x0000000000000000 0x0000000000000000

0x10210ec00: 0x0000000000000000 0x0000000000000000

0x10210ec10: 0x0000000000000002 0x0000000000000000

0x10210ec20: 0x000000000ac5a292 0x0000000000000000

0x10210ec30: 0x0000000000000000 0x0000000000000000

0x10210ec40: 0x83a0934b00000000 0x0000000000000000

0x10210ec50: 0x0000000000000000

查找的时候,它会去这个表里先算出hash,然后看看在几号洞里,如果洞里没东西,说明不在,如果有,而hash不同,它会有往后放的逻辑。从后面的洞接着找。

这个和放是一样的,多个hash一样的字符串,会被放进相邻的bucket里面。

相邻逻辑也有意思,先放隔壁,不行就+2,不行再加3.因为有空间,所以整除,它不会循环,总能找到一个适合的空洞。

如果空间不够,它会重新扩容,并且重新更新所有的hash和洞号。

当删除一个Entry的时候,它会把它的值变成墓碑。然后当空隙+墓碑大于一定的百分比,就会重新分配空间。


阅读(0) 收藏(0) 转载(0) 举报/Report
相关阅读

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有