Oracle - 怎样使用B树索引和位图索引
(2017-06-08 12:08:46)
标签:
it |
分类: oracle |
注:low-cardinality是指该列或者列的组合具有的不同值的个数较少,即该列有很多重复值。high-cardinality是指该列或者列的组合具有不同的值的个数较多,即该列有很少的重复值。
理解每种索引的适用场合将对性能产生重大影响。
传统观念认为位图索引最适用于拥有很少不同值的列 ---- 例如GENDER, MARITAL_STATUS,和RELATION。但是,这种假设是不准确的。实际上,对于大多非频繁更新的并发系统,位图索引也是适用的。事实上,下面将会展示,对以一个具有100%唯一值的列(主键的候选列)来说,位图索引和B树索引一样有效。
在唯一列上使用位图索引有一些缺点 ---
-
Create
table test_normal (empno number(10), ename varchar2(30), sal number(10)); -
-
Begin
-
For
i in 1..1000000 -
Loop
-
Insert into test_normal -
values(i, dbms_random.string('U',30), dbms_random.value(1000,7000)); -
If mod(i, 10000) = 0 then -
Commit; -
End if; -
End
loop; -
End;
-
/
-
-
Create
table test_random - as
-
select
* from test_normal order by dbms_random.random; -
-
SQL>
select count(*) "Total Rows" from test_normal; -
-
Total
Rows -
----------
-
1000000 -
-
Elapsed:
00:00:01.09 -
-
SQL>
select count(distinct empno) "Distinct Values" from test_normal; -
-
Distinct
Values -
---------------
-
1000000 -
-
Elapsed:
00:00:06.09 -
SQL>
select count(*) "Total Rows" from test_random; -
-
Total
Rows -
----------
-
1000000 -
-
Elapsed:
00:00:03.05 -
SQL>
select count(distinct empno) "Distinct Values" from test_random; -
-
Distinct
Values -
---------------
-
1000000 -
-
Elapsed:
00:00:12.07
注意,test_normal表是组织良好的,test_random表是随机创建的,因此,其中的数据是无组织的。在上面的表中,EMPNO列上的值完全不同,因此可以作为候选主键。如果你把该列定义为主键,oracle将会建立一个B树索引,因为Oracle不支持主键位图索引。
- 在表test_normal上:
- 在EMPNO列上建立一个位图索引,并执行一些相等性查询。
- 在EMPNO列上建立一个B树索引,执行一些相等性查询,并且比较获得不同结果集所执行的查询需要的物理I/O和逻辑I/O的次数。
- 在表test_random表上:
- 和1.1相同的步骤
- 和1.2相同的步骤
- 在表test_normal上:
- 和1.1相同的步骤,但是执行范围查询。
- 和1.2相同的步骤,但是执行范围查询。比较统计结果。
- 在表test_random表上:
- 和3.1相同的步骤。
- 和3.2相同的步骤
- 在表test_normal上:
- 在SAL列上建立一个位图索引,并且执行一些相等性查询和范围查询。
- 在SAL列上建立一个B树索引,并且执行一些相等性查询和范围查询(和5.1相同的结果集),比较获取结果执行的I/O次数。
- 在两个表中添加GENDER列,并且把该列更新为3个可能的值:M(女性), F(男性), null(未知)。根据一些条件更新该列的值。
- 在该列上建立一个位图索引并且执行一些相等性查询。
- 在GENDER列上建立一个B树索引并且执行一些相等性查询,和步骤7的结果比较。
步骤1到4涉及一个high-cardinality列(完全不同),步骤5是一个normal-cardinality列,步骤7和8是一个low-cardinality列。
在该步中,我们在表test_normal上建立一个位图索引,然后检查索引的大小、聚簇因子(clustering factor)和表的大小。然后执行一些相等性查询并且查看使用位图索引时查询需要的I/O次数。
-
SQL>
create bitmap index normal_empno_bmx on test_normal(empno); -
-
Index
created. -
-
Elapsed:
00:00:29.06 -
SQL>
analyze table test_normal compute statistics for table forall forindexes all indexed columns; -
-
-
Table
analyzed. -
-
Elapsed:
00:00:19.01 -
SQL>
select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" -
2 from user_segments -
3* where segment_name in ( 'TEST_NORMAL','NORMAL_EMPNO_BMX'); -
-
SEGMENT_NAME
Size in MB -
------------------------------------
--------------- -
TEST_NORMAL
50 -
NORMAL_EMPNO_BMX
28 -
-
Elapsed:
00:00:02.00 -
SQL>
select index_name, clustering_factor from user_indexes; -
-
INDEX_NAME
CLUSTERING_FACTOR -
------------------------------
--------------------------------- -
NORMAL_EMPNO_BMX
1000000 -
-
Elapsed:
00:00:00.00
可以看到,表上索引的大小是28M并且聚簇因子的大小等于表中的行数。现在我们为不同的结果集执行一些相等性查询:
-
SQL>
set autotrace only -
SQL>
select * from test_normal where empno=&empno; -
Enter
value for empno: 1000 -
old
1: select * from test_normal where empno=&empno - new
1: select * from test_normal where empno=1000 -
-
Elapsed:
00:00:00.01 -
-
Execution
Plan -
----------------------------------------------------------
-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34) -
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car -
d=1 Bytes=34) -
2 1 BITMAP CONVERSION (TO ROWIDS) -
3 2 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_EMPNO_BMX' -
-
Statistics
-
----------------------------------------------------------
-
0 recursive calls -
0 db block gets -
5 consistent gets -
0 physical reads -
0 redo size -
515 bytes sent via SQL*Net to client -
499 bytes received via SQL*Net from client -
2 SQL*Net roundtrips to/from client -
0 sorts (memory) -
0 sorts (disk) -
1 rows processed
现在删除表中EMPNO列上的位图索引并创建一个B树索引。像前面一样我们查看索引的大小、聚簇因子的大小并且执行相同的查询,比较I/O的次数。
-
SQL>
drop index NORMAL_EMPNO_BMX; -
-
Index
dropped. -
-
SQL>
create index normal_empno_idx on test_normal(empno); -
-
Index
created. -
-
SQL>
analyze table test_normal compute statistics for table forall forindexes all indexed columns; -
-
Table
analyzed. -
-
SQL>
select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" -
2 from user_segments -
3 where segment_name in ( 'TEST_NORMAL','NORMAL_EMPNO_IDX'); -
-
SEGMENT_NAME
Size in MB -
----------------------------------
--------------- -
TEST_NORMAL
50 -
NORMAL_EMPNO_IDX
18 -
-
SQL>
select index_name, clustering_factor from user_indexes; -
-
INDEX_NAME
CLUSTERING_FACTOR -
----------------------------------
---------------------------------- -
NORMAL_EMPNO_IDX
6210
很明显,在该表的EMPNO列上,B树索引比位图索引要小。B树索引上的聚簇因子接近于表中的数据块数;因此B树索引对于范围查询更有效。
-
SQL>
set autot trace -
SQL>
select * from test_normal where empno=&empno; -
Enter
value for empno: 1000 -
old
1: select * from test_normal where empno=&empno - new
1: select * from test_normal where empno=1000 -
-
Elapsed:
00:00:00.01 -
-
Execution
Plan -
----------------------------------------------------------
-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34) -
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car -
d=1 Bytes=34) -
2 1 INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (C -
ost=3 Card=1) -
-
Statistics
-
----------------------------------------------------------
-
29 recursive calls -
0 db block gets -
5 consistent gets -
0 physical reads -
0 redo size -
515 bytes sent via SQL*Net to client -
499 bytes received via SQL*Net from client -
2 SQL*Net roundtrips to/from client -
0 sorts (memory) -
0 sorts (disk) -
1 rows processed
可以看到,对于相同的结果集,在唯一列上的位图索引和B树索引需要相同的物理和逻辑读取次数。
BITMAP(位图) | EMPNO | B-TREE(B树) | ||
Consistent Reads | Physical Reads | Consistent Reads | Physical Reads | |
5 | 0 | 1000 | 5 | 0 |
5 | 2 | 2398 | 5 | 2 |
5 | 2 | 8545 | 5 | 2 |
5 | 2 | 98008 | 5 | 2 |
5 | 2 | 85342 | 5 | 2 |
5 | 2 | 128444 | 5 | 2 |
5 | 2 | 858 | 5 | 2 |
现在,在test_random表上执行相同的操作:
-
SQL>
create bitmap index random_empno_bmx on test_random(empno); -
-
Index
created. -
-
SQL>
analyze table test_random compute statistics for table forall forindexes all indexed columns; -
-
Table
analyzed. -
-
SQL>
select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" -
2 from user_segments -
3* where segment_name in ( 'TEST_RANDOM','RANDOM_EMPNO_BMX'); -
-
SEGMENT_NAME
Size in MB -
------------------------------------
--------------- -
TEST_RANDOM
50 -
RANDOM_EMPNO_BMX
28 -
-
SQL>
select index_name, clustering_factor from user_indexes; -
-
INDEX_NAME
CLUSTERING_FACTOR -
------------------------------
--------------------------------- -
RANDOM_EMPNO_BMX
1000000
再次,索引上的统计结果(大小和聚簇因子)和在表test_normal中是相同的:
-
SQL>
select * from test_random where empno=&empno; -
Enter
value for empno: 1000 -
old
1: select * from test_random where empno=&empno - new
1: select * from test_random where empno=1000 -
-
Elapsed:
00:00:00.01 -
-
Execution
Plan -
----------------------------------------------------------
-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34) -
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1 Bytes=34) -
2 1 BITMAP CONVERSION (TO ROWIDS) -
3 2 BITMAP INDEX (SINGLE VALUE) OF 'RANDOM_EMPNO_BMX' -
-
Statistics
-
----------------------------------------------------------
-
0 recursive calls -
0 db block gets -
5 consistent gets -
0 physical reads -
0 redo size -
515 bytes sent via SQL*Net to client -
499 bytes received via SQL*Net from client -
2 SQL*Net roundtrips to/from client -
0 sorts (memory) -
0 sorts (disk) -
1 rows processed
步骤2.2(在表test_random上)
现在,和步骤1.2一样,我们删除EMPNO列上的位图索引并且创建一个B树索引。
-
SQL>
drop index RANDOM_EMPNO_BMX; -
-
Index
dropped. -
-
SQL>
create index random_empno_idx on test_random(empno); -
-
Index
created. -
-
SQL>
analyze table test_random compute statistics for table forall forindexes all indexed columns; -
-
Table
analyzed. -
-
SQL>
select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" -
2 from user_segments -
3 where segment_name in ( 'TEST_RANDOM','RANDOM_EMPNO_IDX'); -
-
SEGMENT_NAME
Size in MB -
----------------------------------
--------------- -
TEST_RANDOM
50 -
RANDOM_EMPNO_IDX
18 -
-
SQL>
select index_name, clustering_factor from user_indexes; -
-
INDEX_NAME
CLUSTERING_FACTOR -
----------------------------------
---------------------------------- -
RANDOM_EMPNO_IDX
999830
该表的索引大小和表test_normal是一样的,但是聚簇因子更接近于行数,这就使得该索引对于范围查询不再高效。该聚簇因子不影响相等性查询,因为该列的值是唯一的,每个键对应1行记录。
现在,在相同的结果集上执行相等性查询。
-
SQL>
select * from test_random where empno=&empno; -
Enter
value for empno: 1000 -
old
1: select * from test_random where empno=&empno - new
1: select * from test_random where empno=1000 -
-
Elapsed:
00:00:00.01 -
-
Execution
Plan -
----------------------------------------------------------
-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34) -
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1 Bytes=34) -
2 1 INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_IDX' (NON-UNIQUE) (Cost=3 Card=1) -
-
Statistics
-
----------------------------------------------------------
-
0 recursive calls -
0 db block gets -
5 consistent gets -
0 physical reads -
0 redo size -
515 bytes sent via SQL*Net to client -
499 bytes received via SQL*Net from client -
2 SQL*Net roundtrips to/from client -
0 sorts (memory) -
0 sorts (disk) -
1 rows processed
再次表明,结果和步骤1.1和1.2几乎相同。对于唯一列来说,数据分布不影响逻辑和物理I/O。
步骤3.1(在表test_normal上)
在该步中,我们将创建一个位图索引。我们知道索引的聚簇因子大小和表中的行数相同。现在我们执行一些范围查询。
-
SQL>
select * from test_normal where empno between &range1 and &range2; -
Enter
value for range1: 1 -
Enter
value for range2: 2300 -
old
1: select * from test_normal where empno between &range1 and &range2 - new
1: select * from test_normal where empno between 1 and 2300 -
-
2300
rows selected. -
-
Elapsed:
00:00:00.03 -
-
Execution
Plan -
----------------------------------------------------------
-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=451 Card=2299 Bytes=78166) -
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=451 Card=2299 Bytes=78166) -
2 1 BITMAP CONVERSION (TO ROWIDS) -
3 2 BITMAP INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_BMX' -
-
Statistics
-
----------------------------------------------------------
-
0 recursive calls -
0 db block gets -
331 consistent gets -
0 physical reads -
0 redo size -
111416 bytes sent via SQL*Net to client -
2182 bytes received via SQL*Net from client -
155 SQL*Net roundtrips to/from client -
0 sorts (memory) -
0 sorts (disk) -
2300 rows processed
步骤3.2(在表test_normal上)
该步中,我们在test_normal的B树索引上执行查询。
-
SQL>
select * from test_normal where empno between &range1 and &range2; -
Enter
value for range1: 1 -
Enter
value for range2: 2300 -
old
1: select * from test_normal where empno between &range1 and &range2 - new
1: select * from test_normal where empno between 1 and 2300 -
-
2300
rows selected. -
-
Elapsed:
00:00:00.02 -
-
Execution
Plan -
----------------------------------------------------------
-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=23 Card=2299 Bytes=78166) -
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=23 Card=2299 Bytes=78166) -
2 1 INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (Cost=8 Card=2299) -
-
Statistics
-
----------------------------------------------------------
-
0 recursive calls -
0 db block gets -
329 consistent gets -
15 physical reads -
0 redo size -
111416 bytes sent via SQL*Net to client -
2182 bytes received via SQL*Net from client -
155 SQL*Net roundtrips to/from client -
0 sorts (memory) -
0 sorts (disk) -
2300 rows processed
BITMAP | EMPNO (Range) | B-TREE | ||
Consistent Reads | Physical Reads | Consistent Reads | Physical Reads | |
331 | 0 | 1-2300 | 329 | 0 |
285 | 0 | 8-1980 | 283 | 0 |
346 | 19 | 1850-4250 | 344 | 16 |
427 | 31 | 28888-31850 | 424 | 28 |
371 | 27 | 82900-85478 | 367 | 23 |
2157 | 149 | 984888-1000000 | 2139 | 35 |
可以看到,在两种索引上需要的逻辑和物理IO基本上是相同的。最后一个范围(984888-1000000)差不多返回了15,000行,是所有范围查询中最大的。当我们执行全表扫描时(通过/*+ full(test_normal) */ ),物理和逻辑IO的次数是7239和5663.
步骤4.1(在表test_random上)
在该步中,我们将在表test_random的位图索引上执行范围查询,在这儿,你将看到聚簇因子的影响。
-
SQL>select
* from test_random where empno between &range1 and &range2; -
Enter
value for range1: 1 -
Enter
value for range2: 2300 -
old
1: select * from test_random where empno between &range1 and &range2 - new
1: select * from test_random where empno between 1 and 2300 -
-
2300
rows selected. -
-
Elapsed:
00:00:08.01 -
-
Execution
Plan -
----------------------------------------------------------
-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=453 Card=2299 Bytes=78166) -
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=453 Card=2299 Bytes=78166) -
2 1 BITMAP CONVERSION (TO ROWIDS) -
3 2 BITMAP INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_BMX' -
-
Statistics
-
----------------------------------------------------------
-
0 recursive calls -
0 db block gets -
2463 consistent gets -
1200 physical reads -
0 redo size -
111416 bytes sent via SQL*Net to client -
2182 bytes received via SQL*Net from client -
155 SQL*Net roundtrips to/from client -
0 sorts (memory) -
0 sorts (disk) -
2300 rows processed
在该步中,我们将在test_random的B树索引上执行范围查询。回想一下,该索引上的聚簇因子接近于表中记录的行数。下面是优化器的输出:
-
SQL>
select * from test_random where empno between &range1 and &range2; -
Enter
value for range1: 1 -
Enter
value for range2: 2300 -
old
1: select * from test_random where empno between &range1 and &range2 - new
1: select * from test_random where empno between 1 and 2300 -
-
2300
rows selected. -
-
Elapsed:
00:00:03.04 -
-
Execution
Plan -
----------------------------------------------------------
-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=613 Card=2299 Bytes=78166) -
1 0 TABLE ACCESS (FULL) OF 'TEST_RANDOM' (Cost=613 Card=2299 Bytes=78166) -
-
Statistics
-
----------------------------------------------------------
-
0 recursive calls -
0 db block gets -
6415 consistent gets -
4910 physical reads -
0 redo size -
111416 bytes sent via SQL*Net to client -
2182 bytes received via SQL*Net from client -
155 SQL*Net roundtrips to/from client -
0 sorts (memory) -
0 sorts (disk) -
2300 rows processed
因为聚簇因子的缘故,优化器选择了全表扫描而不是使用索引:
BITMAP | EMPNO (Range) | B-TREE | ||
Consistent Reads | Physical Reads | Consistent Reads | Physical Reads | |
2463 | 1200 | 1-2300 | 6415 | 4910 |
2114 | 31 | 8-1980 | 6389 | 4910 |
2572 | 1135 | 1850-4250 | 6418 | 4909 |
3173 | 1620 | 28888-31850 | 6456 | 4909 |
2762 | 1358 | 82900-85478 | 6431 | 4909 |
7254 | 3329 | 984888-1000000 | 7254 | 4909 |
仅对于最后一个范围(984888-1000000),对于位图索引优化器选择了全表扫描。然而,对于B树索引,全部使用全表扫描。引起这种差异的原因是聚簇因子:优化器在产生执行计划时不考虑位图索引的聚簇因子,但是对于B树索引来说,则需要 考虑聚簇因子。在上面的情况中,位图索引比B树索引更有效。
在表test_normal的SAL列上建立一个位图索引,该列拥有普通的cardinality。
-
SQL>
create bitmap index normal_sal_bmx on test_normal(sal); -
-
Index
created. -
-
SQL>
analyze table test_normal compute statistics for table forall forindexes all indexed columns; -
-
Table
analyzed.
得到索引的大小和聚簇因子:
-
SQL>select
substr(segment_name,1,30) "Sizesegment_name, bytes/1024/1024 in MB" -
2* from user_segments -
3* where segment_name in ( 'TEST_NORMAL','NORMAL_SAL_BMX'); -
-
SEGMENT_NAME
Size in MB -
------------------------------
-------------- -
TEST_NORMAL
50 -
NORMAL_SAL_BMX
4 -
-
SQL>
select index_name, clustering_factor from user_indexes; -
-
INDEX_NAME
CLUSTERING_FACTOR -
------------------------------
---------------------------------- -
NORMAL_SAL_BMX
6001
下面执行查询,首先执行相等性查询:
-
SQL>
set autot trace -
SQL>
select * from test_normal where sal=&sal; -
Enter
value for sal: 1869 -
old
1: select * from test_normal where sal=&sal - new
1: select * from test_normal where sal=1869 -
-
164
rows selected. -
-
Elapsed:
00:00:00.08 -
-
Execution
Plan -
----------------------------------------------------------
-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=39 Card=168 Bytes=4032) -
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=39 Card=168 Bytes=4032) -
2 1 BITMAP CONVERSION (TO ROWIDS) -
3 2 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' -
-
Statistics
-
----------------------------------------------------------
-
0 recursive calls -
0 db block gets -
165 consistent gets -
0 physical reads -
0 redo size -
8461 bytes sent via SQL*Net to client -
609 bytes received via SQL*Net from client -
12 SQL*Net roundtrips to/from client -
0 sorts (memory) -
0 sorts (disk) -
164 rows processed
接下来是范围查询:
-
SQL>
select * from test_normal where sal between &sal1 and &sal2; -
Enter
value for sal1: 1500 -
Enter
value for sal2: 2000 -
old
1: select * from test_normal where sal between &sal1 and &sal2 - new
1: select * from test_normal where sal between 1500 and 2000 -
-
83743
rows selected. -
-
Elapsed:
00:00:05.00 -
-
Execution
Plan -
----------------------------------------------------------
-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes -
=2001024) -
1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376 -
Bytes=2001024) -
-
Statistics
-
----------------------------------------------------------
-
0 recursive calls -
0 db block gets -
11778 consistent gets -
5850 physical reads -
0 redo size -
4123553 bytes sent via SQL*Net to client -
61901 bytes received via SQL*Net from client -
5584 SQL*Net roundtrips to/from client -
0 sorts (memory) -
0 sorts (disk) -
83743 rows processed
现在,删除test_normal上的位图索引并且建立一个B树索引。
-
SQL>
create index normal_sal_idx on test_normal(sal); -
-
Index
created. -
-
SQL>
analyze table test_normal compute statistics for table forall forindexes all indexed columns; -
-
Table
analyzed.
查看索引大小和聚簇因子:
-
SQL>
select "Sizesubstr(segment_name,1,30) segment_name, bytes/1024/1024 in MB" -
2 from user_segments -
3 where segment_name in ( 'TEST_NORMAL','NORMAL_SAL_IDX'); -
-
SEGMENT_NAME
Size in MB -
------------------------------
--------------- -
TEST_NORMAL
50 -
NORMAL_SAL_IDX
17 -
-
SQL>
select index_name, clustering_factor from user_indexes; -
-
INDEX_NAME
CLUSTERING_FACTOR -
------------------------------
---------------------------------- -
NORMAL_SAL_IDX
986778
从上表可以看出,B树索引大于相同列上的位图索引,它的聚簇因子接近于表中的行数。
现在,先执行相等性查询:
-
SQL>
set autot trace -
SQL>
select * from test_normal where sal=&sal; -
Enter
value for sal: 1869 -
old
1: select * from test_normal where sal=&sal - new
1: select * from test_normal where sal=1869 -
-
164
rows selected. -
-
Elapsed:
00:00:00.01 -
-
Execution
Plan -
----------------------------------------------------------
-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=169 Card=168 Bytes=4032) -
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=169 Card=168 Bytes=4032) -
2 1 INDEX (RANGE SCAN) OF 'NORMAL_SAL_IDX' (NON-UNIQUE) (Cost=3 Card=168) -
-
Statistics
-
----------------------------------------------------------
-
0 recursive calls -
0 db block gets -
177 consistent gets -
0 physical reads -
0 redo size -
8461 bytes sent via SQL*Net to client -
609 bytes received via SQL*Net from client -
12 SQL*Net roundtrips to/from client -
0 sorts (memory) -
0 sorts (disk) -
164 rows processed
接下来是范围查询:
-
SQL>
select * from test_normal where sal between &sal1 and &sal2; -
Enter
value for sal1: 1500 -
Enter
value for sal2: 2000 -
old
1: select * from test_normal where sal between &sal1 and &sal2 - new
1: select * from test_normal where sal between 1500 and 2000 -
-
83743
rows selected. -
-
Elapsed:
00:00:04.03 -
-
Execution
Plan -
----------------------------------------------------------
-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes -
=2001024) -
1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376 -
Bytes=2001024) -
-
-
Statistics
-
----------------------------------------------------------
-
0 recursive calls -
0 db block gets -
11778 consistent gets -
3891 physical reads -
0 redo size -
4123553 bytes sent via SQL*Net to client -
61901 bytes received via SQL*Net from client -
5584 SQL*Net roundtrips to/from client -
0 sorts (memory) -
0 sorts (disk) -
83743 rows processed
在不同的数据集上执行查询的结果如下,可以看出逻辑和物理I/O的次数基本上是相同的。
BITMAP |
SAL (Equality)
|
B-TREE | Rows Fetched | ||
Consistent Reads | Physical Reads | Consistent Reads | Physical Reads | ||
165 | 0 | 1869 | 177 | 164 | |
169 | 163 | 3548 | 181 | 167 | |
174 | 166 | 6500 | 187 | 172 | |
75 | 69 | 7000 | 81 | 73 | |
177 | 163 | 2500 | 190 | 175 | |
BITMAP |
SAL (Range)
|
B-TREE | Rows Fetched | ||
Consistent Reads | Physical Reads | Consistent Reads | Physical Reads | ||
11778 | 5850 | 1500-2000 | 11778 | 3891 | 83743 |
11765 | 5468 | 2000-2500 | 11765 | 3879 | 83328 |
11753 | 5471 | 2500-3000 | 11753 | 3884 | 83318 |
17309 | 5472 | 3000-4000 | 17309 | 3892 | 166999 |
39398 | 5454 | 4000-7000 | 39398 | 3973 | 500520 |
对于范围查询,优化器选择了全表扫描,根本没有使用索引。但是对于相等性查询,优化器使用了索引。再次,逻辑和物理I/O是相同的。
因此,可以得出结论,对于一个具有normal-cardinality的列来说,优化器对于两种类型的索引的选择是相同的,并且没有明显的I/O差异。
步骤6(增加GENDER列)
-
SQL>
alter table test_normal add GENDER varchar2(1); -
-
Table
altered. -
-
SQL>
select GENDER, count(*) from test_normal group by GENDER; -
-
S
COUNT(*) -
-
---------- -
F
333769 -
M
499921 -
166310 -
-
3
rows selected.
该列上位图索引的大小大约为570KB,如下表所示:
-
SQL>
create bitmap index normal_GENDER_bmx on test_normal(GENDER); -
-
Index
created. -
-
Elapsed:
00:00:02.08 -
-
SQL>
select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" -
2 from user_segments -
3 where segment_name in ( 'TEST_NORMAL','NORMAL_GENDER_BMX'); -
-
SEGMENT_NAME
Size in MB -
------------------------------
--------------- -
TEST_NORMAL
50 -
NORMAL_GENDER_BMX
.5625 -
-
2
rows selected.
相对而言,改列上的B树索引的大小为13M,比位图索引大的多。
-
SQL>
create index normal_GENDER_idx on test_normal(GENDER); -
-
Index
created. -
-
SQL>
select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB" -
2 from user_segments -
3 where segment_name in ( 'TEST_NORMAL','NORMAL_GENDER_IDX'); -
-
SEGMENT_NAME
Size in MB -
------------------------------
--------------- -
TEST_NORMAL
50 -
NORMAL_GENDER_IDX
13 -
-
2
rows selected.
现在,执行相等性查询,优化器将不使用该索引,不论是位图索引还是B树索引,它将使用全部扫描。
-
SQL>
select is* from test_normal where GENDER null; -
-
166310
rows selected. -
-
Elapsed:
00:00:06.08 -
-
Execution
Plan -
----------------------------------------------------------
-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=166310 Bytes=4157750) -
1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=166310 Bytes=4157750) -
-
SQL>
select * from test_normal where GENDER='M'; -
-
499921
rows selected. -
-
Elapsed:
00:00:16.07 -
-
Execution
Plan -
----------------------------------------------------------
-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=499921 Bytes=12498025) -
1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=499921Bytes=12498025) -
-
SQL>select
* from test_normal where GENDER='F' -
/ -
-
333769
rows selected. -
-
Elapsed:
00:00:12.02 -
-
Execution
Plan -
----------------------------------------------------------
-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=333769 Byte -
s=8344225) -
1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=333769 -
Bytes=8344225)
结论
现在我们了解了优化器对于这些技术做出的反应,现在我们来看对于位图索引和B树索引最适合的程序。
保持GENDER列上的位图索引,在SAL列上再建立一个位图索引然后执行一些查询。对这些列上的B树索引执行同样的查询。
在表test_normal中,你需要所有工资等于下列值的所有女性雇员的雇员号码:
1000
1500
2000
2500
3000
3500
4000
4500
因此:
-
SQL>select
* from test_normal -
where
sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) 'M';and GENDER=
这是一个典型的数据仓库查询,绝对不要在OLTP系统中执行该查询。下面是两列上具有位图索引时的结果:
-
SQL>select
* from test_normal -
where
sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) 'M';and GENDER= -
-
1453
rows selected. -
-
Elapsed:
00:00:02.03 -
-
Execution
Plan -
----------------------------------------------------------
-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=198 Card=754 Bytes=18850) -
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=198 Card=754 Bytes=18850) -
2 1 BITMAP CONVERSION (TO ROWIDS) -
3 2 BITMAP AND -
4 3 BITMAP OR -
5 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' -
6 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' -
7 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' -
8 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' -
9 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' -
10 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' -
11 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' -
12 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' -
13 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX' -
14 3 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_GENDER_BMX' -
-
Statistics
-
----------------------------------------------------------
-
0 recursive calls -
0 db block gets -
1353 consistent gets -
920 physical reads -
0 redo size -
75604 bytes sent via SQL*Net to client -
1555 bytes received via SQL*Net from client -
98 SQL*Net roundtrips to/from client -
0 sorts (memory) -
0 sorts (disk) -
1453 rows processed
下面是B树索引时的结果:
-
SQL>select
* from test_normal -
where
sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) 'M';and GENDER= -
-
1453
rows selected. -
-
Elapsed:
00:00:03.01 -
-
Execution
Plan -
----------------------------------------------------------
-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=754 Bytes=18850) -
1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=754 Bytes=18850) -
-
Statistics
-
----------------------------------------------------------
-
0 recursive calls -
0 db block gets -
6333 consistent gets -
4412 physical reads -
0 redo size -
75604 bytes sent via SQL*Net to client -
1555 bytes received via SQL*Net from client -
98 SQL*Net roundtrips to/from client -
0 sorts (memory) -
0 sorts (disk) -
1453 rows processed
- 使用位图索引,优化器可以高效地执行包含AND,OR或者XOR的查询。
- 使用位图索引,优化器可以回答对null的查询和计数。null值在位图索引时同样被加上索引(不像B树索引)。
- 最重要的是,在决策支持系统中,位图索引支持特殊的查询,但B树索引则不能。具体来说,如果你有一个包含50列的表,用户经常查询其中的10列 ---- 10列的组合或者有时是其中一列,创建B树索引会比价困难。如果你在这些列上建立10个位图索引,这些查询都可以通过索引回答,不论你查询的是全部10列,还是10列中的4列或者6列,或者其中的一列。
相反,B树索引非常适合于OLTP系统,其用户执行的都是常规的查询。因为在OLTP系统中,数据会频繁地更新和删除,如果使用位图索引将会引起严重的锁定性能问题。
两种索引都有一个共同的目的:尽快地得到结果。但是你应该依据程序的类型来选择其中之一,而不是根据cardinality水平。