加载中…
个人资料
  • 博客等级:
  • 博客积分:
  • 博客访问:
  • 关注人气:
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

oracle_多列键分区表的数据入表规则(multicolumn keys)

(2016-09-08 21:47:11)
标签:

multicolumn

partition

keys

bound

date-format

分类: oracleops

在库中存在此类多列键分区表的数据,通过下面官方文档对多列分区键的数据入表规则得到释然。

首先看下表的定义语句:
  CREATE TABLE "TBCS"."GROUP_MEMBER_RECDETAIL"
   (    "REGION" NUMBER(5,0) NOT NULL ENABLE,
        "RECDATE" DATE
   
  PARTITION BY RANGE ("REGION","RECDATE")
 (PARTITION "P_536_201604"  VALUES LESS THAN (530, TO_DATE(' 2016-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')),
 (PARTITION "P_536_201605"  VALUES LESS THAN (530, TO_DATE(' 2016-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')),
 (PARTITION "P_536_201606"  VALUES LESS THAN (530, TO_DATE(' 2016-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')),
 (PARTITION "P_537_201604"  VALUES LESS THAN (530, TO_DATE(' 2016-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')),
 (PARTITION "P_537_201605"  VALUES LESS THAN (530, TO_DATE(' 2016-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')),
 (PARTITION "P_537_201606"  VALUES LESS THAN (530, TO_DATE(' 2016-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'));


这里只把关键的几个字段和分区定义列出来。
SQL> SELECT recdate,region FROM tbcs.GROUP_MEMBER_RECDETAIL PARTITION (p_537_201604) WHERE rownum < 5; 
    REGION     RECDATE             
----------     ------------------- 
       536                         
       536                         
       536                         
       536                         
SQL> SELECT recdate,region FROM tbcs.GROUP_MEMBER_RECDETAIL PARTITION (p_536_201605) WHERE rownum < 5;
REGION        RECDATE 
 ----------  -------------------
 536         2016-05-01 00:10:07
 536         2016-05-01 06:37:06
 536         2016-05-01 06:07:20
 536         2016-05-01 07:34:35
         
         
不理解的地方是:
1.表定义是LESS THAN,region=536,recdate不为null的数据什么会入到分区p_536_201605,而不是p_537_201605 ?
2.第一点成立条件下,recdate为null时,为什么region=536数据会入到分区p_537_201604中?

好像这两点,本来就有点矛盾的意思,那么我们通过下面的解释和猜想来证明这两点。     

Using Multicolumn Partitioning Keys
使用多列键分区
For range-partitioned and hash-partitioned tables, you can specify up to 16 partitioning key columns. 
对于范围和哈希分析的表,可以最多说明16个列作为分区键。
Use multicolumn partitioning when the partitioning key is composed of several columns and subsequent columns define a higher granularity than the preceding ones. 
当分区键由多列组成,且后续的列定义为更高粒度的分区表。
The most common scenario is a decomposed DATE or TIMESTAMP key, consisting of separated columns, for year, month, and day.
最普遍的场景是定义时间多列,分别为年,月,日多个列。
In evaluating multicolumn partitioning keys, the database uses the second value only if the first value cannot uniquely identify a single target partition, 
在评估多列分区键时,当第一列无法唯一确认到单一分区时,才使用第二个分区键。
and uses the third value only if the first and second do not determine the correct partition, and so forth. 
当第一,二个分区键无法确定正确分区时,才使用第三个分区键,以此类推。
A value cannot determine the correct partition only when a partition bound exactly matches that value and the same bound is defined for the next partition. 
数据值与一个分区键的边界吻合,同时也与下一个分区键吻合,那么数据值无法判断正确分区。
The nth column is investigated only when all previous (n-1) values of the multicolumn key exactly match the (n-1) bounds of a partition.
当前n-1个分区列都是多个分区温和时,才会进行第n个列的判断。
A second column, for example, is evaluated only if the first column exactly matches the partition boundary value. 
当第一列无法唯一确认到单一分区时,才使用第二个分区键。
If all column values exactly match all of the bound values for a partition, 
如果所有的列都和多个分区边界吻合,
then the database determines that the row does not fit in this partition and considers the next partition for a match.
那么就可以判断数据值不与当前分区吻合,会继续判断其他分区。
For nondeterministic boundary definitions (successive partitions with identical values for at least one column), 
对于无法确认边界的定义
the partition boundary value becomes an inclusive value, representing a "less than or equal to" boundary. 
分区边界变为一个范围值,呈现为"less than or equal to"

上面介绍怎么多,我的理解,把多列比作日期年(key1)-月(key2)-日(key3),判断是否吻合某个分区,
只有数据值(年-月-日(比如:2016-08-10))less than 当前分区(年-月-日(比如:2016-08-11)),
类似2016-08-10<2016-08-11是成立的,那么就吻合当前分区,是比较三个值总体的"大小"是否满足。


通过试验验证一下:
CREATE TABLE sales_demo (
   year          NUMBER, 
   month         NUMBER,
   day           NUMBER,
   amount_sold   NUMBER) 
PARTITION BY RANGE (year,month) 
  (PARTITION before2001 VALUES LESS THAN (2001,1),
   PARTITION q1_2001    VALUES LESS THAN (2001,4),
   PARTITION q2_2001    VALUES LESS THAN (2001,7),
   PARTITION q3_2001    VALUES LESS THAN (2001,10),
   PARTITION q4_2001    VALUES LESS THAN (2002,1),
   PARTITION future     VALUES LESS THAN (MAXVALUE,0));

REM  12-DEC-2000
INSERT INTO sales_demo VALUES(2000,12,12, 1000);   --2000-12小于2001-01 入分区before2001
REM  17-MAR-2001
INSERT INTO sales_demo VALUES(2001,3,17, 2000);    --2001-03存在三个2001,但比04小,人分区q1_2001
REM  1-NOV-2001
INSERT INTO sales_demo VALUES(2001,11,1, 5000);    --2001-11存在三个2001,但大于最大的10,入分区q4_2001
REM  1-JAN-2002
INSERT INTO sales_demo VALUES(2002,1,1, 4000);     --2002-01比2001-01大,入分区future

The year value for 12-DEC-2000 satisfied the first partition, before2001, so no further evaluation is needed:

SELECT * FROM sales_demo PARTITION(before2001);

      YEAR      MONTH        DAY AMOUNT_SOLD
---------- ---------- ---------- -----------
      2000         12         12        1000

The information for 17-MAR-2001 is stored in partition q1_2001. The first partitioning key column, year, does not by itself determine the correct partition, so the second partitioning key column, month, must be evaluated.

SELECT * FROM sales_demo PARTITION(q1_2001);

      YEAR      MONTH        DAY AMOUNT_SOLD
---------- ---------- ---------- -----------
      2001          3         17        2000

Following the same determination rule as for the previous record, the second column, month, determines partition q4_2001 as correct partition for 1-NOV-2001:

SELECT * FROM sales_demo PARTITION(q4_2001);

      YEAR      MONTH        DAY AMOUNT_SOLD
---------- ---------- ---------- -----------
      2001         11          1        5000

The partition for 01-JAN-2002 is determined by evaluating only the year column, which indicates the future partition:

SELECT * FROM sales_demo PARTITION(future);

      YEAR      MONTH        DAY AMOUNT_SOLD
---------- ---------- ---------- -----------
      2002          1          1        4000

If the database encounters MAXVALUE in one of the partitioning key columns, then all other values of subsequent columns become irrelevant. That is, a definition of partition future in the preceding example, having a bound of (MAXVALUE,0) is equivalent to a bound of (MAXVALUE,100) or a bound of (MAXVALUE,MAXVALUE).


如果对应的year或month是null的话,会做怎么的判断呢?


SQL> INSERT INTO  sales_demo values(2001,null,10,10);
SQL> commit;
SQL>  SELECT * FROM sales_demo partition (q4_2001);

      YEAR      MONTH        DAY AMOUNT_SOLD
---------- ---------- ---------- -----------
      2001                    10          10
当第二个值是null的时候,null是无法做比较的,
在比较时,2001有四个分区,在比较null和第二值的时候,无法判断,故不吻合;
到分区2002,2001<2002成立,那么null也就不用比较,吻合,入分区q4_2001,
如果存在多个2002分区,null值做最小值处理;

SQL> INSERT INTO  sales_demo values(null,10,10,10);
SQL> INSERT INTO  sales_demo values(null,3,10,10);
SQL> commit;
SQL> SELECT * FROM sales_demo partition (future);

      YEAR      MONTH        DAY AMOUNT_SOLD
---------- ---------- ---------- -----------
                   10         10          10
                    3         10          10
当第一个值是null的时候,null是无法做比较的,
所有数值分区2001,2002都无法判断,不吻合,那么就默认入MAXVAUE分区future ;






对于我们之前的两个问题:
1.表定义是LESS THAN,region=536,recdate不为null的数据什么会入到分区p_536_201605,而不是p_537_201605 ?
2.第一点成立条件下,recdate为null时,为什么region=536数据会入到分区p_537_201604中?


ANSWER 1:
数据"536   2016-05-01 00:10:07",存在536分区,比较第二个值,2016-05-01<2016-06-01,故吻合,
入分区p_536_201605 。

ANSWER 2:
数据"536  null",存在536分区,比较第二个值,null无法比较,不满足p_536_xxxxxx, 小于分区537,
null值做最小值处理,入分区p_537_201604。






第二个试验也同样说明我们的理解,最后提供分区规则。
The following example illustrates the use of a multicolumn partitioned approach for table supplier_parts, storing the information about which suppliers deliver which parts. To distribute the data in equal-sized partitions, it is not sufficient to partition the table based on the supplier_id, because some suppliers might provide hundreds of thousands of parts, while others provide only a few specialty parts. Instead, you partition the table on (supplier_id, partnum) to manually enforce equal-sized partitions.

CREATE TABLE supplier_parts (
   supplier_id      NUMBER, 
   partnum          NUMBER,
   price            NUMBER)
PARTITION BY RANGE (supplier_id, partnum)
  (PARTITION p1 VALUES LESS THAN  (10,100),
   PARTITION p2 VALUES LESS THAN (10,200),
   PARTITION p3 VALUES LESS THAN (MAXVALUE,MAXVALUE));

The following three records are inserted into the table:

INSERT INTO supplier_parts VALUES (5,5, 1000);
INSERT INTO supplier_parts VALUES (5,150, 1000);
INSERT INTO supplier_parts VALUES (10,100, 1000);

The first two records are inserted into partition p1, uniquely identified by supplier_id. However, the third record is inserted into partition p2; it matches all range boundary values of partition p1 exactly and the database therefore considers the following partition for a match. The value of partnum satisfies the criteria < 200, so it is inserted into partition p2.

SELECT * FROM supplier_parts PARTITION (p1);

SUPPLIER_ID    PARTNUM      PRICE
----------- ---------- ----------
          5          5       1000
          5        150       1000

SELECT * FROM supplier_parts PARTITION (p2);

SUPPLIER_ID    PARTNUM      PRICE
----------- ---------- ----------
          10       100       1000

Every row with supplier_id < 10 is stored in partition p1, regardless of the partnum value. The column partnum is evaluated only if supplier_id =10, and the corresponding rows are inserted into partition p1, p2, or even into p3 when partnum >=200. To achieve equal-sized partitions for ranges of supplier_parts, you could choose a composite range-hash partitioned table, range partitioned by supplier_id, hash subpartitioned by partnum.

Defining the partition boundaries for multicolumn partitioned tables must obey some rules. For example, consider a table that is range partitioned on three columns a, b, and c. The individual partitions have range values represented as follows:

P0(a0, b0, c0)
P1(a1, b1, c1)
P2(a2, b2, c2)
...
Pn(an, bn, cn)

The range values you provide for each partition must follow these rules:

    a0 must be less than or equal to a1, and a1 must be less than or equal to a2, and so on.

    If a0=a1, then b0 must be less than or equal to b1. If a0 < a1, then b0 and b1 can have any values. If a0=a1 and b0=b1, then c0 must be less than or equal to c1. If b0

    If a1=a2, then b1 must be less than or equal to b2. If a1

0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有