oracle_多列键分区表的数据入表规则(multicolumn keys)
(2016-09-08 21:47:11)
标签:
multicolumnpartitionkeysbounddate-format |
分类: oracleops |
在库中存在此类多列键分区表的数据,通过下面官方文档对多列分区键的数据入表规则得到释然。
首先看下表的定义语句:
这里只把关键的几个字段和分区定义列出来。
SQL> SELECT recdate,region FROM tbcs.GROUP_MEMBER_RECDETAIL
PARTITION (p_537_201604) WHERE rownum <
5;
----------
-------------------
SQL> SELECT recdate,region FROM tbcs.GROUP_MEMBER_RECDETAIL
PARTITION (p_536_201605) WHERE rownum < 5;
REGION
RECDATE
不理解的地方是:
1.表定义是LESS
THAN,region=536,recdate不为null的数据什么会入到分区p_536_201605,而不是p_537_201605
?
2.第一点成立条件下,recdate为null时,为什么region=536数据会入到分区p_537_201604中?
好像这两点,本来就有点矛盾的意思,那么我们通过下面的解释和猜想来证明这两点。
Using Multicolumn Partitioning Keys
使用多列键分区
For range-partitioned and hash-partitioned tables, you can
specify up to 16 partitioning key columns.
对于范围和哈希分析的表,可以最多说明16个列作为分区键。
Use multicolumn partitioning when the partitioning key is
composed of several columns and subsequent columns define a higher
granularity than the preceding ones.
当分区键由多列组成,且后续的列定义为更高粒度的分区表。
The most common scenario is a decomposed DATE or TIMESTAMP
key, consisting of separated columns, for year, month, and
day.
最普遍的场景是定义时间多列,分别为年,月,日多个列。
In evaluating multicolumn partitioning keys, the database uses
the second value only if the first value cannot uniquely identify a
single target partition,
在评估多列分区键时,当第一列无法唯一确认到单一分区时,才使用第二个分区键。
and uses the third value only if the first and second do not
determine the correct partition, and so
forth.
当第一,二个分区键无法确定正确分区时,才使用第三个分区键,以此类推。
A value cannot determine the correct partition only when a
partition bound exactly matches that value and the same bound is
defined for the next partition.
数据值与一个分区键的边界吻合,同时也与下一个分区键吻合,那么数据值无法判断正确分区。
The nth column is investigated only when all previous (n-1)
values of the multicolumn key exactly match the (n-1) bounds of a
partition.
当前n-1个分区列都是多个分区温和时,才会进行第n个列的判断。
A second column, for example, is evaluated only if the first
column exactly matches the partition boundary
value.
当第一列无法唯一确认到单一分区时,才使用第二个分区键。
If all column values exactly match all of the bound values for
a partition,
如果所有的列都和多个分区边界吻合,
then the database determines that the row does not fit in this
partition and considers the next partition for a match.
那么就可以判断数据值不与当前分区吻合,会继续判断其他分区。
For nondeterministic boundary definitions (successive
partitions with identical values for at least one
column),
对于无法确认边界的定义
the partition boundary value becomes an inclusive value,
representing a "less than or equal to"
boundary.
分区边界变为一个范围值,呈现为"less than or equal to"
上面介绍怎么多,我的理解,把多列比作日期年(key1)-月(key2)-日(key3),判断是否吻合某个分区,
只有数据值(年-月-日(比如:2016-08-10))less than
当前分区(年-月-日(比如:2016-08-11)),
类似2016-08-10<2016-08-11是成立的,那么就吻合当前分区,是比较三个值总体的"大小"是否满足。
通过试验验证一下:
CREATE TABLE sales_demo (
PARTITION BY RANGE (year,month)
REM 12-DEC-2000
INSERT INTO sales_demo VALUES(2000,12,12, 1000);
--2000-12小于2001-01 入分区before2001
REM 17-MAR-2001
INSERT INTO sales_demo VALUES(2001,3,17, 2000);
--2001-03存在三个2001,但比04小,人分区q1_2001
REM 1-NOV-2001
INSERT INTO sales_demo VALUES(2001,11,1, 5000);
--2001-11存在三个2001,但大于最大的10,入分区q4_2001
REM 1-JAN-2002
INSERT INTO sales_demo VALUES(2002,1,1, 4000);
--2002-01比2001-01大,入分区future
The year value for 12-DEC-2000 satisfied the first partition,
before2001, so no further evaluation is needed:
SELECT * FROM sales_demo PARTITION(before2001);
---------- ---------- ---------- -----------
The information for 17-MAR-2001 is stored in partition
q1_2001. The first partitioning key column, year, does not by
itself determine the correct partition, so the second partitioning
key column, month, must be evaluated.
SELECT * FROM sales_demo PARTITION(q1_2001);
---------- ---------- ---------- -----------
Following the same determination rule as for the previous
record, the second column, month, determines partition q4_2001 as
correct partition for 1-NOV-2001:
SELECT * FROM sales_demo PARTITION(q4_2001);
---------- ---------- ---------- -----------
The partition for 01-JAN-2002 is determined by evaluating only
the year column, which indicates the future partition:
SELECT * FROM sales_demo PARTITION(future);
---------- ---------- ---------- -----------
If the database encounters MAXVALUE in one of the partitioning
key columns, then all other values of subsequent columns become
irrelevant. That is, a definition of partition future in the
preceding example, having a bound of (MAXVALUE,0) is equivalent to
a bound of (MAXVALUE,100) or a bound of (MAXVALUE,MAXVALUE).
如果对应的year或month是null的话,会做怎么的判断呢?
SQL> INSERT INTO sales_demo
values(2001,null,10,10);
SQL> commit;
SQL> SELECT * FROM sales_demo partition
(q4_2001);
---------- ---------- ---------- -----------
当第二个值是null的时候,null是无法做比较的,
在比较时,2001有四个分区,在比较null和第二值的时候,无法判断,故不吻合;
到分区2002,2001<2002成立,那么null也就不用比较,吻合,入分区q4_2001,
如果存在多个2002分区,null值做最小值处理;
SQL> INSERT INTO sales_demo
values(null,10,10,10);
SQL> INSERT INTO sales_demo
values(null,3,10,10);
SQL> commit;
SQL> SELECT * FROM sales_demo partition (future);
---------- ---------- ---------- -----------
当第一个值是null的时候,null是无法做比较的,
所有数值分区2001,2002都无法判断,不吻合,那么就默认入MAXVAUE分区future ;
对于我们之前的两个问题:
1.表定义是LESS
THAN,region=536,recdate不为null的数据什么会入到分区p_536_201605,而不是p_537_201605
?
2.第一点成立条件下,recdate为null时,为什么region=536数据会入到分区p_537_201604中?
ANSWER 1:
数据"536 2016-05-01
00:10:07",存在536分区,比较第二个值,2016-05-01<2016-06-01,故吻合,
入分区p_536_201605 。
ANSWER 2:
数据"536
null",存在536分区,比较第二个值,null无法比较,不满足p_536_xxxxxx,
小于分区537,
null值做最小值处理,入分区p_537_201604。
第二个试验也同样说明我们的理解,最后提供分区规则。
The following example illustrates the use of a multicolumn
partitioned approach for table supplier_parts, storing the
information about which suppliers deliver which parts. To
distribute the data in equal-sized partitions, it is not sufficient
to partition the table based on the supplier_id, because some
suppliers might provide hundreds of thousands of parts, while
others provide only a few specialty parts. Instead, you partition
the table on (supplier_id, partnum) to manually enforce equal-sized
partitions.
CREATE TABLE supplier_parts (
PARTITION BY RANGE (supplier_id, partnum)
The following three records are inserted into the table:
INSERT INTO supplier_parts VALUES (5,5, 1000);
INSERT INTO supplier_parts VALUES (5,150, 1000);
INSERT INTO supplier_parts VALUES (10,100, 1000);
The first two records are inserted into partition p1, uniquely
identified by supplier_id. However, the third record is inserted
into partition p2; it matches all range boundary values of
partition p1 exactly and the database therefore considers the
following partition for a match. The value of partnum satisfies the
criteria < 200, so it is inserted into partition p2.
SELECT * FROM supplier_parts PARTITION (p1);
SUPPLIER_ID PARTNUM
PRICE
----------- ---------- ----------
SELECT * FROM supplier_parts PARTITION (p2);
SUPPLIER_ID PARTNUM
PRICE
----------- ---------- ----------
Every row with supplier_id < 10 is stored in partition p1,
regardless of the partnum value. The column partnum is evaluated
only if supplier_id =10, and the corresponding rows are inserted
into partition p1, p2, or even into p3 when partnum >=200. To
achieve equal-sized partitions for ranges of supplier_parts, you
could choose a composite range-hash partitioned table, range
partitioned by supplier_id, hash subpartitioned by partnum.
Defining the partition boundaries for multicolumn partitioned
tables must obey some rules. For example, consider a table that is
range partitioned on three columns a, b, and c. The individual
partitions have range values represented as follows:
P0(a0, b0, c0)
P1(a1, b1, c1)
P2(a2, b2, c2)
...
Pn(an, bn, cn)
The range values you provide for each partition must follow
these rules: