http://blog.sina.com.cn/jumptoflycs[订阅][手机订阅]
字体大小: 正文
Use ARRAY and GIN INDEX in postgresql(2009-09-10 17:28:39)
I have encounter a problem for m_x_n mapping in database design.
For example, for products and sales, a product may have many sales and a sales may response for many products.

And sometimes I need to get all sales of a particular product when I need all products of a sales in other times.

So I make two design the mapping relation in postgresql(I omit sales table in both two cases since they are nothing different):

1. Use a productXsales table:

                         Table "public.product_"
 Column |  Type                        Modifiers                      
--------+---------+-------------------------------------------------------
 id     | integer | not null default nextval('product__id_seq'::regclass)
 name   | text    |
Indexes:
    "product__pkey" PRIMARY KEY, btree (id)

     Table "public.p_x_s"
 Column |  Type   | Modifiers
--------+---------+-----------
 p_id   | integer |
 s_id   | integer |
Indexes:
    "p_x_s_by_p" btree (p_id)
    "p_x_s_by_s" btree (s_id)


2. User ARRAY and GIN INDEX in product table:

                          Table "public.product"
 Column |   Type                        Modifiers                      
--------+-----------+------------------------------------------------------
 id     | integer   | not null default nextval('product_id_seq'::regclass)
 name   | text      |
 sals   | integer[] |
Indexes:
    "product_pkey" PRIMARY KEY, btree (id)
    "product_by_sale" gin (sals)

I make same test data for the two alternatives:
    10000 products
    3 sales for each product
    for each product, its sales ids are product_id, product_id + 1, product_id + 2.

So the rows count should be:
test=# SELECT count(1) from product_;
 count
-------
 10000
(1 row)

test=# SELECT count(1) from p_x_s;
 count
-------
 30000
(1 row)

test=# SELECT count(1) from product;
 count
-------
 10000
(1 row)

Then start performance test.
Query and Results:
test=# SELECT id, name from product_, p_x_s where id = p_id and s_id = 100;
 id    name   
-----+------------
  98 | Prodcut98
  99 | Prodcut99
 100 | Prodcut100
(3 rows)

test=# SELECT id, name from product where 100 = any(sals);
 id    name   
-----+------------
  98 | Product98
  99 | Product99
 100 | Product100
(3 rows)

test=# SELECT id, name from product where '{100}' <@ sals;
 id    name   
-----+------------
  98 | Product98
  99 | Product99
 100 | Product100
(3 rows)

Analyze results:

User usal B-tree index in a 30000 row map table
test=# EXPLAIN ANALYZE SELECT id, name from product_, p_x_s where id = p_id and s_id = 100;
                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..33.15 rows=3 width=15) (actual time=0.055..0.092 rows=3 loops=1)
   ->  Index Scan using p_x_s_by_s on p_x_s  (cost=0.00..8.31 rows=3 width=4) (actual time=0.033..0.039 rows=3 loops=1)
         Index Cond: (s_id = 100)
   ->  Index Scan using product__pkey on product_  (cost=0.00..8.27 rows=1 width=15) (actual time=0.008..0.010 rows=1 loops=3)
         Index Cond: (product_.id = p_x_s.p_id)
 Total runtime: 0.168 ms
(6 rows)

Use ANY for ARRAY without index
test=# EXPLAIN ANALYZE SELECT id, name from product where 100 = ANY (sals);
                                              QUERY PLAN                                             
------------------------------------------------------------------------------------------------------
 Seq Scan on product  (cost=0.00..328.00 rows=489 width=15) (actual time=0.091..7.719 rows=3 loops=1)
   Filter: (100 = ANY (sals))
 Total runtime: 7.772 ms
(3 rows)

Use GIN INDEX
test=# EXPLAIN ANALYZE SELECT id, name from product where '{100}' <@ sals;
                                                       QUERY PLAN                                                       
-------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on product  (cost=4.33..35.11 rows=10 width=15) (actual time=0.046..0.050 rows=3 loops=1)
   Recheck Cond: ('{100}'::integer[] <@ sals)
   ->  Bitmap Index Scan on product_by_sale  (cost=0.00..4.33 rows=10 width=0) (actual time=0.032..0.032 rows=3 loops=1)
         Index Cond: ('{100}'::integer[] <@ sals)
 Total runtime: 0.109 ms
(5 rows)


Use GIN index on ARRAY perform a bit better than the mapping table, much better than ANY since ANY can't use index.

Another advantage of use ARRAY and GIN index is table size.
test=# SELECT relname, reltuples, relpages from pg_class where relname in ('product', 'product_', 'p_x_s');
 relname  | reltuples | relpages
----------+-----------+----------
 product     10000 |      103
 product_ |     10000 |       55
 p_x_s       30000 |      133
(3 rows)

1st alternative cost 188 pages in total when 2nd only cost 103.


Then I make a test for another group test data like:
    100 products and 1000 sales per product

The results likes(I change the table and columns name for this test, frt=product, user=sales):

Use a map table, B-tree index and join
test=# EXPLAIN ANALYZE SELECT id from frt_, frt_x_u where id = tid and uid = 888;
                                                           QUERY PLAN                                                           
---------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=19.98..250.72 rows=93 width=4) (actual time=0.504..1.156 rows=100 loops=1)
   Hash Cond: (frt_x_u.tid = frt_.id)
   ->  Bitmap Heap Scan on frt_x_u  (cost=4.98..234.45 rows=93 width=4) (actual time=0.128..0.515 rows=100 loops=1)
         Recheck Cond: (uid = 888)
         ->  Bitmap Index Scan on frt_x_u_by_u  (cost=0.00..4.96 rows=93 width=0) (actual time=0.088..0.088 rows=100 loops=1)
               Index Cond: (uid = 888)
   ->  Hash  (cost=13.75..13.75 rows=100 width=4) (actual time=0.348..0.348 rows=100 loops=1)
         ->  Index Scan using frt__pkey on frt_  (cost=0.00..13.75 rows=100 width=4) (actual time=0.016..0.176 rows=100 loops=1)
 Total runtime: 1.339 ms
(9 rows)

Use ARRAY
test=# EXPLAIN ANALYZE SELECT id from frt where '{888}' <@ users;
                                          QUERY PLAN                                          
-----------------------------------------------------------------------------------------------
 Seq Scan on frt  (cost=0.00..3.25 rows=1 width=4) (actual time=0.111..5.079 rows=100 loops=1)
   Filter: ('{888}'::integer[] <@ users)
 Total runtime: 5.242 ms
(3 rows)

Problem got. It didn't use the GIN index since there are too little rows in the table.
To force it use index and test again:

test=# SET enable_seqscan = off;
SET
test=# EXPLAIN ANALYZE SELECT id from frt where '{888}' <@ users;
                                                     QUERY PLAN                                                    
--------------------------------------------------------------------------------------------------------------------
 Index Scan using frt_by_user on frt  (cost=0.00..44.31 rows=1 width=4) (actual time=0.043..0.244 rows=100 loops=1)
   Index Cond: ('{888}'::integer[] <@ users)
 Total runtime: 0.399 ms
(3 rows)

Table size:
test=# SELECT relname, reltuples, relpages from pg_class where relname in ('frt', 'frt_', 'frt_x_u');
 relname | reltuples | relpages
---------+-----------+----------
 frt          100 |        2
 frt_         100 |        1
 frt_x_u |    100000 |      443
(3 rows)

In this case, use ARRAY with GIN INDEX is much better than another map table.

The disadvantage of GIN index is it's slow for INSERT and UPDATE.

For more information for GIN index in postgresql, refs http://www.postgresql.org/docs/8.3/interactive/gin.html
加载中,请稍候...
  • 评论加载中,请稍候...

验证码:请点击后输入验证码  收听验证码

发评论

以上网友发言只代表其个人观点,不代表新浪网的观点或立场。

相关博文
读取中...
推荐博文
读取中...