PostgreSQL Tips -- 强大的tablefunc
(2012-09-15 19:19:46)| 标签: postgresqltablefunc二维表转换杂谈 | 分类: 工作 | 
tablefunc扩展模块包含一系列返回记录表的函数。这些函数对数据记录的处理以及展示用C语言开发返回多行记录都是非常好的例子。
下表列出了tablefunc 扩展模块所包含的功能。
| 函数名称 | 返回 | 描述 | 
|---|---|---|
| normal_rand(int numvals, float8 mean, float8
stddev) | 一组浮点数 | 返回一系列正态分布(又称高斯分布)的随机数 | 
| crosstab(text sql) | 一组记录 | 返回一个第一列为行名加上N列数值的二维表,其中的N根据查询产生的结果确定。 | 
| crosstabN(text sql) | 一组N列记录表 | 产生包含第一列为行名加上N列数值的二维表。 crosstab2,crosstab3和crosstab4是预定义好的函数,用户也可以按下文描述的方式创建crosstabN的函数 | 
| crosstab(text source_sql, text category_sql) | 一组记录 | 返回一个第一列为行名加上N列数值的二维表,其中的N根据第二项查询产生的结果确定。 | 
| crosstab(text sql, int N) | 一组记录 | 此为过期不再维护的函数。 | 
| connectby(text relname, text keyid_fld, text
parent_keyid_fld [, text orderby_fld ], text start_with, int
max_depth [, text branch_delim ]) | 一组记录 | 可以产生一个分层树状结构结果的函数 (强大不是一般地) | 
函数1 normal_rand
(int numvals, float8 mean, float8 stddev)
normal_rand 返回一系列正态分布(又称高斯分布)的随机数。
其中numvals 是函数返回的数据数量。
mean 是正态分布随机变量的均值。
stddev 是随机变量的方差。
如为了产生1000个均值为5,方差为3的正态分布随机数:
test=# SELECT * FROM normal_rand(1000, 5, 3);
     
normal_rand
----------------------
     
1.56556322244898
     
9.10040991424657
     
5.36957140345079
   
-0.369151492880995
    
0.283600703686639
       .
 .
       .
     
4.82992125404908
     
9.71308014517282
     
2.49639286969028
(1000 rows)
 函数2.
crosstab(text)
crosstab(text sql) crosstab(text sql, int N)
The crosstab function is used to produce
"pivot" displays, wherein data is listed across the
page rather than down. For example, we might have data like
row1 val11 row1 val12 row1 val13 ... row2 val21 row2 val22 row2 val23 ...
which we wish to display like
row1 val11 val12 val13 ... row2 val21 val22 val23 ... ...
The crosstab function takes a text parameter that
is a SQL query producing raw data formatted in the first way, and
produces a table formatted in the second way.
The sql parameter is a SQL statement that produces the source set of data. This statement must return one row_name column, one category column, and one value column. N is an obsolete parameter, ignored if supplied (formerly this had to match the number of output value columns, but now that is determined by the calling query).
For example, the provided query might produce a set something like:
row_name cat value ----------+-------+------- row1 cat1 val1 row1 cat2 val2 row1 cat3 val3 row1 cat4 val4 row2 cat1 val5 row2 cat2 val6 row2 cat3 val7 row2 cat4 val8
The crosstab function is declared to return
setof record, so the actual names and types of the output
columns must be defined in the FROM clause of the calling
SELECT statement, for example:
SELECT * FROM crosstab('...') AS ct(row_name text, category_1 text, category_2 text);
This example produces a set something like:
           <== value  columns  ==>
 row_name   category_1   category_2
----------+------------+------------
  row1        val1         val2
  row2        val5         val6
The FROM clause must define the output as one row_name column (of the same data type as the first result column of the SQL query) followed by N value columns (all of the same data type as the third result column of the SQL query). You can set up as many output value columns as you wish. The names of the output columns are up to you.
The crosstab function produces one output row for
each consecutive group of input rows with the same
row_name value. It fills the output value
columns, left to right, with the value fields from these
rows. If there are fewer rows in a group than there are output
value columns, the extra output columns are filled with
nulls; if there are more rows, the extra input rows are
skipped.
In practice the SQL query should always specify ORDER BY
1,2 to ensure that the input rows are properly ordered, that
is, values with the same row_name are brought together and
correctly ordered within the row. Notice that crosstab
itself does not pay any attention to the second column of the query
result; it's just there to be ordered by, to control the order in
which the third-column values appear across the page.
Here is a complete example:
CREATE TABLE ct(id SERIAL, rowid TEXT, attribute TEXT, value TEXT);
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att1','val1');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att2','val2');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att3','val3');
INSERT INTO ct(rowid, attribute, value) VALUES('test1','att4','val4');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att1','val5');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att2','val6');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att3','val7');
INSERT INTO ct(rowid, attribute, value) VALUES('test2','att4','val8');
SELECT *
FROM crosstab(
  'select rowid, attribute, value
   from ct
   where attribute = ''att2'' or attribute = ''att3''
   order by 1,2')
AS ct(row_name text, category_1 text, category_2 text, category_3 text);
 row_name | category_1 | category_2 | category_3
----------+------------+------------+------------
 test1    | val2       | val3       |
 test2    | val6       | val7       |
(2 rows)
You can avoid always having to write out a FROM clause to define the output columns, by setting up a custom crosstab function that has the desired output row type wired into its definition. This is described in the next section. Another possibility is to embed the required FROM clause in a view definition.
F.41.1.3.
crosstabN(text)
crosstabN(text sql)
The crosstabN functions are examples of
how to set up custom wrappers for the general crosstab
function, so that you need not write out column names and types in
the calling SELECT query. The tablefunc module
includes crosstab2, crosstab3, and
crosstab4, whose output row types are defined as
CREATE TYPE tablefunc_crosstab_N AS (
    row_name TEXT,
    category_1 TEXT,
    category_2 TEXT,
        .
        .
        .
    category_N TEXT
);
Thus, these functions can be used directly when the input query
produces row_name and value columns of type
text, and you want 2, 3, or 4 output values columns. In
all other ways they behave exactly as described above for the
general crosstab function.
For instance, the example given in the previous section would also work as
SELECT * FROM crosstab3( 'select rowid, attribute, value from ct where attribute = ''att2'' or attribute = ''att3'' order by 1,2');
These functions are provided mostly for illustration purposes.
You can create your own return types and functions based on the
underlying crosstab() function. There are two ways to
do it:
- 
Create a composite type describing the desired output columns, similar to the examples in contrib/tablefunc/tablefunc--1.0.sql. Then define a unique function name accepting one text parameter and returning setof your_type_name, but linking to the same underlying crosstabC function. For example, if your source data produces row names that are text, and values that are float8, and you want 5 value columns:CREATE TYPE my_crosstab_float8_5_cols AS ( my_row_name text, my_category_1 float8, my_category_2 float8, my_category_3 float8, my_category_4 float8, my_category_5 float8 ); CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(text) RETURNS setof my_crosstab_float8_5_cols AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;
- 
Use OUT parameters to define the return type implicitly. The same example could also be done this way: CREATE OR REPLACE FUNCTION crosstab_float8_5_cols( IN text, OUT my_row_name text, OUT my_category_1 float8, OUT my_category_2 float8, OUT my_category_3 float8, OUT my_category_4 float8, OUT my_category_5 float8) RETURNS setof record AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;
F.41.1.4.
crosstab(text, text)
crosstab(text source_sql, text category_sql)
The main limitation of the single-parameter form of
crosstab is that it treats all values in a group
alike, inserting each value into the first available column. If you
want the value columns to correspond to specific categories of
data, and some groups might not have data for some of the
categories, that doesn't work well. The two-parameter form of
crosstab handles this case by providing an explicit
list of the categories corresponding to the output columns.
source_sql is a SQL statement that produces the source set of data. This statement must return one row_name column, one category column, and one value column. It may also have one or more "extra" columns. The row_name column must be first. The category and value columns must be the last two columns, in that order. Any columns between row_name and category are treated as "extra". The "extra" columns are expected to be the same for all rows with the same row_name value.
For example, source_sql might produce a set something like:
SELECT row_name, extra_col, cat, value FROM foo ORDER BY 1; row_name extra_col cat value ----------+------------+-----+--------- row1 extra1 cat1 val1 row1 extra1 cat2 val2 row1 extra1 cat4 val4 row2 extra2 cat1 val5 row2 extra2 cat2 val6 row2 extra2 cat3 val7 row2 extra2 cat4 val8
category_sql is a SQL statement that produces the set of categories. This statement must return only one column. It must produce at least one row, or an error will be generated. Also, it must not produce duplicate values, or an error will be generated. category_sql might be something like:
SELECT DISTINCT cat FROM foo ORDER BY 1;
    cat
  -------
    cat1
    cat2
    cat3
    cat4
The crosstab function is declared to return
setof record, so the actual names and types of the output
columns must be defined in the FROM clause of the calling
SELECT statement, for example:
SELECT * FROM crosstab('...', '...')
    AS ct(row_name text, extra text, cat1 text, cat2 text, cat3 text, cat4 text);
This will produce a result something like:
                  <==  value  columns   ==>
row_name   extra   cat1   cat2   cat3   cat4
---------+-------+------+------+------+------
  row1     extra1  val1   val2          val4
  row2     extra2  val5   val6   val7   val8
The FROM clause must define the proper number of output columns of the proper data types. If there are N columns in the source_sql query's result, the first N-2 of them must match up with the first N-2 output columns. The remaining output columns must have the type of the last column of the source_sql query's result, and there must be exactly as many of them as there are rows in the category_sql query's result.
The crosstab function produces one output row for
each consecutive group of input rows with the same
row_name value. The output row_name column, plus
any "extra" columns, are copied from the first row of
the group. The output value columns are filled with the
value fields from rows having matching category
values. If a row's category does not match any output of
the category_sql query, its value is ignored.
Output columns whose matching category is not present in any input
row of the group are filled with nulls.
In practice the source_sql query should always specify ORDER BY 1 to ensure that values with the same row_name are brought together. However, ordering of the categories within a group is not important. Also, it is essential to be sure that the order of the category_sql query's output matches the specified output column order.
Here are two complete examples:
create table sales(year int, month int, qty int); insert into sales values(2007, 1, 1000); insert into sales values(2007, 2, 1500); insert into sales values(2007, 7, 500); insert into sales values(2007, 11, 1500); insert into sales values(2007, 12, 2000); insert into sales values(2008, 1, 1000); select * from crosstab( 'select year, month, qty from sales order by 1', 'select m from generate_series(1,12) m' ) as ( year int, "Jan" int, "Feb" int, "Mar" int, "Apr" int, "May" int, "Jun" int, "Jul" int, "Aug" int, "Sep" int, "Oct" int, "Nov" int, "Dec" int ); year | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec ------+------+------+-----+-----+-----+-----+-----+-----+-----+-----+------+------ 2007 | 1000 | 1500 | | | | | 500 | | | | 1500 | 2000 2008 | 1000 | | | | | | | | | | | (2 rows)
CREATE TABLE cth(rowid text, rowdt timestamp, attribute text, val text);
INSERT INTO cth VALUES('test1','01 March 2003','temperature','42');
INSERT INTO cth VALUES('test1','01 March 2003','test_result','PASS');
INSERT INTO cth VALUES('test1','01 March 2003','volts','2.6987');
INSERT INTO cth VALUES('test2','02 March 2003','temperature','53');
INSERT INTO cth VALUES('test2','02 March 2003','test_result','FAIL');
INSERT INTO cth VALUES('test2','02 March 2003','test_startdate','01 March 2003');
INSERT INTO cth VALUES('test2','02 March 2003','volts','3.1234');
SELECT * FROM crosstab
(
  'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
  'SELECT DISTINCT attribute FROM cth ORDER BY 1'
)
AS
(
       rowid text,
       rowdt timestamp,
       temperature int4,
       test_result text,
       test_startdate timestamp,
       volts float8
);
 rowid |          rowdt           | temperature | test_result |      test_startdate      | volts
-------+--------------------------+-------------+-------------+--------------------------+--------
 test1 | Sat Mar 01 00:00:00 2003 |          42 | PASS        |                          | 2.6987
 test2 | Sun Mar 02 00:00:00 2003 |          53 | FAIL        | Sat Mar 01 00:00:00 2003 | 3.1234
(2 rows)
You can create predefined functions to avoid having to write out
the result column names and types in each query. See the examples
in the previous section. The underlying C function for this form of
crosstab is named crosstab_hash.
F.41.1.5.
connectby
connectby(text relname, text keyid_fld, text parent_keyid_fld
          [, text orderby_fld ], text start_with, int max_depth
          [, text branch_delim ])
The connectby function produces a display of
hierarchical data that is stored in a table. The table must have a
key field that uniquely identifies rows, and a parent-key field
that references the parent (if any) of each row.
connectby can display the sub-tree descending from any
row.
Table F-28 explains the parameters.
Table F-28. connectby Parameters
| Parameter | Description | 
|---|---|
| relname | Name of the source relation | 
| keyid_fld | Name of the key field | 
| parent_keyid_fld | Name of the parent-key field | 
| orderby_fld | Name of the field to order siblings by (optional) | 
| start_with | Key value of the row to start at | 
| max_depth | Maximum depth to descend to, or zero for unlimited depth | 
| branch_delim | String to separate keys with in branch output (optional) | 
The key and parent-key fields can be any data type, but they must be the same type. Note that the start_with value must be entered as a text string, regardless of the type of the key field.
The connectby function is declared to return
setof record, so the actual names and types of the output
columns must be defined in the FROM clause of the calling
SELECT statement, for example:
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
    AS t(keyid text, parent_keyid text, level int, branch text, pos int);
The first two output columns are used for the current row's key and its parent row's key; they must match the type of the table's key field. The third output column is the depth in the tree and must be of type integer. If a branch_delim parameter was given, the next output column is the branch display and must be of type text. Finally, if an orderby_fld parameter was given, the last output column is a serial number, and must be of type integer.
The "branch" output column shows the path of keys taken to reach the current row. The keys are separated by the specified branch_delim string. If no branch display is wanted, omit both the branch_delim parameter and the branch column in the output column list.
If the ordering of siblings of the same parent is important, include the orderby_fld parameter to specify which field to order siblings by. This field can be of any sortable data type. The output column list must include a final integer serial-number column, if and only if orderby_fld is specified.
The parameters representing table and field names are copied
as-is into the SQL queries that connectby generates
internally. Therefore, include double quotes if the names are
mixed-case or contain special characters. You may also need to
schema-qualify the table name.
In large tables, performance will be poor unless there is an index on the parent-key field.
It is important that the branch_delim string not appear
in any key values, else connectby may incorrectly
report an infinite-recursion error. Note that if
branch_delim is not provided, a default value of
~ is used for recursion detection purposes.
Here is an example:
CREATE TABLE connectby_tree(keyid text, parent_keyid text, pos int);
INSERT INTO connectby_tree VALUES('row1',NULL, 0);
INSERT INTO connectby_tree VALUES('row2','row1', 0);
INSERT INTO connectby_tree VALUES('row3','row1', 0);
INSERT INTO connectby_tree VALUES('row4','row2', 1);
INSERT INTO connectby_tree VALUES('row5','row2', 0);
INSERT INTO connectby_tree VALUES('row6','row4', 0);
INSERT INTO connectby_tree VALUES('row7','row3', 0);
INSERT INTO connectby_tree VALUES('row8','row6', 0);
INSERT INTO connectby_tree VALUES('row9','row5', 0);
-- with branch, without orderby_fld (order of results is not guaranteed)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~')
 AS t(keyid text, parent_keyid text, level int, branch text);
 keyid | parent_keyid | level |       branch
-------+--------------+-------+---------------------
 row2  |              |     0 | row2
 row4  | row2         |     1 | row2~row4
 row6  | row4         |     2 | row2~row4~row6
 row8  | row6         |     3 | row2~row4~row6~row8
 row5  | row2         |     1 | row2~row5
 row9  | row5         |     2 | row2~row5~row9
(6 rows)
-- without branch, without orderby_fld (order of results is not guaranteed)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0)
 AS t(keyid text, parent_keyid text, level int);
 keyid | parent_keyid | level
-------+--------------+-------
 row2  |              |     0
 row4  | row2         |     1
 row6  | row4         |     2
 row8  | row6         |     3
 row5  | row2         |     1
 row9  | row5         |     2
(6 rows)
-- with branch, with orderby_fld (notice that row5 comes before row4)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
 AS t(keyid text, parent_keyid text, level int, branch text, pos int);
 keyid | parent_keyid | level |       branch        | pos
-------+--------------+-------+---------------------+-----
 row2  |              |     0 | row2                |   1
 row5  | row2         |     1 | row2~row5           |   2
 row9  | row5         |     2 | row2~row5~row9      |   3
 row4  | row2         |     1 | row2~row4           |   4
 row6  | row4         |     2 | row2~row4~row6      |   5
 row8  | row6         |     3 | row2~row4~row6~row8 |   6
(6 rows)
-- without branch, with orderby_fld (notice that row5 comes before row4)
SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0)
 AS t(keyid text, parent_keyid text, level int, pos int);
 keyid | parent_keyid | level | pos
-------+--------------+-------+-----
 row2  |              |     0 |   1
 row5  | row2         |     1 |   2
 row9  | row5         |     2 |   3
 row4  | row2         |     1 |   4
 row6  | row4         |     2 |   5
 row8  | row6         |     3 |   6
(6 rows)

 加载中…
加载中…