Validba'S Home

从末停止过

求两个时间点之间的时间序列问题

without comments

create table t(start1 date,end1 date) tablespace users;

insert into  t values(sysdate,sysdate+5);
commit;

现在需要的结果是;

trunc(sysdate)
trunc(sysdate+1 )
trunc(sysdate+2 )
trunc(sysdate+3 )
trunc(sysdate+4 )
trunc(sysdate+5 )

可能大多数人跟我一样,通过connect by去解决这个问题。看下面的测试

select start1 + level - 1 from t connect by level <= end1 - start1 + 1;

set linesize 120
alter session set statistics_level=all;

select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

SQL> select * from t;

START1              END1
------------------- -------------------
2010-07-23 14:17:46 2010-07-28 14:17:46

Elapsed: 00:00:00.00

SQL> select start1 + level - 1 from t connect by level <= end1 - start1 + 1;

START1+LEVEL-1
-------------------
2010-07-23 14:17:46
2010-07-24 14:17:46
2010-07-25 14:17:46
2010-07-26 14:17:46
2010-07-27 14:17:46
2010-07-28 14:17:46

6 rows selected.

Elapsed: 00:00:00.01

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
SQL_ID  42yx56f49f77c, child number 1
-------------------------------------
select start1 + level - :"SYS_B_0" from t connect by level <= end1 - start1 +
:"SYS_B_1"

Plan hash value: 1071874124

-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------
|   1 |  CONNECT BY WITHOUT FILTERING|      |      1 |        |      6 |00:00:00.01 |       7 |

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
|   2 |   TABLE ACCESS FULL          | T    |      1 |      1 |      1 |00:00:00.01 |       7 |
-----------------------------------------------------------------------------------------------

14 rows selected.

执行计划不错,但如果是多条数据呢

SQL> insert into t values(sysdate+10,sysdate+12);

1 row created.

Elapsed: 00:00:00.01
SQL> commit;

Commit complete.

Elapsed: 00:00:00.01

SQL> select start1 + level - 1 from t connect by level <= end1 - start1 + 1;

START1+LEVEL-1
-------------------
2010-08-02 16:02:30
2010-08-03 16:02:30
2010-08-04 16:02:30
2010-07-26 14:17:46
2010-07-27 14:17:46
2010-07-28 14:17:46
2010-07-25 14:17:46
2010-07-26 14:17:46
2010-07-27 14:17:46
2010-07-28 14:17:46
2010-07-24 14:17:46

START1+LEVEL-1
-------------------
2010-08-04 16:02:30
2010-07-26 14:17:46
2010-07-27 14:17:46
2010-07-28 14:17:46
2010-07-25 14:17:46
2010-07-26 14:17:46
2010-07-27 14:17:46
2010-07-28 14:17:46
2010-07-23 14:17:46
2010-08-03 16:02:30
2010-08-04 16:02:30

START1+LEVEL-1
-------------------
2010-07-26 14:17:46
2010-07-27 14:17:46
2010-07-28 14:17:46
2010-07-25 14:17:46
2010-07-26 14:17:46
2010-07-27 14:17:46
2010-07-28 14:17:46
2010-07-24 14:17:46
2010-08-04 16:02:30
2010-07-26 14:17:46
2010-07-27 14:17:46

START1+LEVEL-1
-------------------
2010-07-28 14:17:46
2010-07-25 14:17:46
2010-07-26 14:17:46
2010-07-27 14:17:46
2010-07-28 14:17:46

38 rows selected.

SQL>
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
SQL_ID  42yx56f49f77c, child number 1
-------------------------------------
select start1 + level - :"SYS_B_0" from t connect by level <= end1 - start1 +
:"SYS_B_1"

Plan hash value: 1071874124

-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------
|   1 |  CONNECT BY WITHOUT FILTERING|      |      1 |        |     38 |00:00:00.01 |       7 |

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
|   2 |   TABLE ACCESS FULL          | T    |      1 |      1 |      2 |00:00:00.01 |       7 |
-----------------------------------------------------------------------------------------------

Oracle做了一个递归查询,递归查询的内部路径见下面的结果

select sys_connect_by_path(start1 + level - 1,'/') from t connect by level <= end1 - start1 + 1;

SYS_CONNECT_BY_PATH(START1+LEV
--------------------------------------------------------------------------------
/2010-08-02 16:02:30
/2010-08-02 16:02:30/2010-08-03 16:02:30
/2010-08-02 16:02:30/2010-08-03 16:02:30/2010-08-04 16:02:30
/2010-08-02 16:02:30/2010-08-03 16:02:30/2010-08-04 16:02:30/2010-07-26 14:17:46
/2010-08-02 16:02:30/2010-08-03 16:02:30/2010-08-04 16:02:30/2010-07-26 14:17:46
/2010-07-27 14:17:46

/2010-08-02 16:02:30/2010-08-03 16:02:30/2010-08-04 16:02:30/2010-07-26 14:17:46
/2010-07-27 14:17:46/2010-07-28 14:17:46

/2010-08-02 16:02:30/2010-08-03 16:02:30/2010-07-25 14:17:46
/2010-08-02 16:02:30/2010-08-03 16:02:30/2010-07-25 14:17:46/2010-07-26 14:17:46
/2010-08-02 16:02:30/2010-08-03 16:02:30/2010-07-25 14:17:46/2010-07-26 14:17:46
/2010-07-27 14:17:46

/2010-08-02 16:02:30/2010-08-03 16:02:30/2010-07-25 14:17:46/2010-07-26 14:17:46
/2010-07-27 14:17:46/2010-07-28 14:17:46

/2010-08-02 16:02:30/2010-07-24 14:17:46
/2010-08-02 16:02:30/2010-07-24 14:17:46/2010-08-04 16:02:30
/2010-08-02 16:02:30/2010-07-24 14:17:46/2010-08-04 16:02:30/2010-07-26 14:17:46
/2010-08-02 16:02:30/2010-07-24 14:17:46/2010-08-04 16:02:30/2010-07-26 14:17:46
/2010-07-27 14:17:46

/2010-08-02 16:02:30/2010-07-24 14:17:46/2010-08-04 16:02:30/2010-07-26 14:17:46
/2010-07-27 14:17:46/2010-07-28 14:17:46

/2010-08-02 16:02:30/2010-07-24 14:17:46/2010-07-25 14:17:46
/2010-08-02 16:02:30/2010-07-24 14:17:46/2010-07-25 14:17:46/2010-07-26 14:17:46
/2010-08-02 16:02:30/2010-07-24 14:17:46/2010-07-25 14:17:46/2010-07-26 14:17:46
/2010-07-27 14:17:46

/2010-08-02 16:02:30/2010-07-24 14:17:46/2010-07-25 14:17:46/2010-07-26 14:17:46
/2010-07-27 14:17:46/2010-07-28 14:17:46

/2010-07-23 14:17:46
/2010-07-23 14:17:46/2010-08-03 16:02:30
/2010-07-23 14:17:46/2010-08-03 16:02:30/2010-08-04 16:02:30
/2010-07-23 14:17:46/2010-08-03 16:02:30/2010-08-04 16:02:30/2010-07-26 14:17:46
/2010-07-23 14:17:46/2010-08-03 16:02:30/2010-08-04 16:02:30/2010-07-26 14:17:46
/2010-07-27 14:17:46

/2010-07-23 14:17:46/2010-08-03 16:02:30/2010-08-04 16:02:30/2010-07-26 14:17:46
/2010-07-27 14:17:46/2010-07-28 14:17:46

/2010-07-23 14:17:46/2010-08-03 16:02:30/2010-07-25 14:17:46
/2010-07-23 14:17:46/2010-08-03 16:02:30/2010-07-25 14:17:46/2010-07-26 14:17:46
/2010-07-23 14:17:46/2010-08-03 16:02:30/2010-07-25 14:17:46/2010-07-26 14:17:46
/2010-07-27 14:17:46

/2010-07-23 14:17:46/2010-08-03 16:02:30/2010-07-25 14:17:46/2010-07-26 14:17:46
/2010-07-27 14:17:46/2010-07-28 14:17:46

/2010-07-23 14:17:46/2010-07-24 14:17:46
/2010-07-23 14:17:46/2010-07-24 14:17:46/2010-08-04 16:02:30
/2010-07-23 14:17:46/2010-07-24 14:17:46/2010-08-04 16:02:30/2010-07-26 14:17:46
/2010-07-23 14:17:46/2010-07-24 14:17:46/2010-08-04 16:02:30/2010-07-26 14:17:46
/2010-07-27 14:17:46

/2010-07-23 14:17:46/2010-07-24 14:17:46/2010-08-04 16:02:30/2010-07-26 14:17:46
/2010-07-27 14:17:46/2010-07-28 14:17:46

/2010-07-23 14:17:46/2010-07-24 14:17:46/2010-07-25 14:17:46
/2010-07-23 14:17:46/2010-07-24 14:17:46/2010-07-25 14:17:46/2010-07-26 14:17:46
/2010-07-23 14:17:46/2010-07-24 14:17:46/2010-07-25 14:17:46/2010-07-26 14:17:46
/2010-07-27 14:17:46

/2010-07-23 14:17:46/2010-07-24 14:17:46/2010-07-25 14:17:46/2010-07-26 14:17:46
/2010-07-27 14:17:46/2010-07-28 14:17:46

38 rows selected.

Oracle根据level的层数在做递归查询,那么这样的话随着数据量的增加,比如上百万条的数据,那么Oracle将会完全耗费所有的CPU做递归,为了我想要的结果,还必须过滤递归后的重复数据,加一个distinct 那么这个sql的执行时间将变得遥遥无期。

最后从http://www.os2ora.com/how-to-return-continuous-dates-between-two-dates/发现一个同样的案例,OS2提供了一个非常不错的方法(关于connect by的性能分析详见http://www.os2ora.com/connect-by-performance-tunning/):

SELECT a1 FROM (
 SELECT a+level-1 a1
   FROM (
          SELECT min(start1) a,max(end1) b
           FROM t
	 )
   CONNECT BY LEVEL<=b - a + 1) q,t
WHERE trunc(q.a1) BETWEEN trunc(t.start1) and trunc(t.end1)
/  

这条SQL的中心思想是:

1、找出表中最小的start和最大的end;

2、计算start和end之间的连续时间

3、跟t表join,找出符合t表时间段的时间列表;

SQL> SELECT a1 FROM (
  2   SELECT a+level-1 a1
  3     FROM (
  4            SELECT min(start1) a,max(end1) b
  5             FROM t
  6      )
  7     CONNECT BY LEVEL<=b - a + 1) q,t
  8  WHERE trunc(q.a1) BETWEEN trunc(t.start1) and trunc(t.end1)
  9  / 

A1
-------------------
2010-07-23 14:17:46
2010-07-24 14:17:46
2010-07-25 14:17:46
2010-07-26 14:17:46
2010-07-27 14:17:46
2010-07-28 14:17:46
2010-08-02 14:17:46
2010-08-03 14:17:46
2010-08-04 14:17:46

9 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
SQL_ID  18v3ubnmwhq7n, child number 0
-------------------------------------
SELECT a1 FROM (  SELECT a+level-:"SYS_B_0" a1    FROM (           SELECT min(start1)
a,max(end1) b            FROM t   )    CONNECT BY LEVEL<=b - a + :"SYS_B_1") q,t WHERE
trunc(q.a1) BETWEEN trunc(t.start1) and trunc(t.end1)

Plan hash value: 1814807998

-------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   1 |  NESTED LOOPS                  |      |      1 |      1 |      9 |00:00:00.01 |      99 |
|   2 |   VIEW                         |      |      1 |      1 |     13 |00:00:00.01 |       7 |
|   3 |    CONNECT BY WITHOUT FILTERING|      |      1 |        |     13 |00:00:00.01 |       7 |
|   4 |     VIEW                       |      |      1 |      1 |      1 |00:00:00.01 |       7 |
|   5 |      SORT AGGREGATE            |      |      1 |      1 |      1 |00:00:00.01 |       7 |
|   6 |       TABLE ACCESS FULL        | T    |      1 |      1 |      2 |00:00:00.01 |       7 |
|*  7 |   TABLE ACCESS FULL            | T    |     13 |      1 |      9 |00:00:00.01 |      92 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - filter((TRUNC("Q"."A1")>=TRUNC(INTERNAL_FUNCTION("T"."START1")) AND
              TRUNC("Q"."A1")<=TRUNC(INTERNAL_FUNCTION("T"."END1"))))

这个plan虽然看起来没有第一个好看,但随着数据量的增加,关键部分connect的计算将变得非常简单。影响这条SQL的唯一路径是最后和T表的NESTED LOOPS ,转换成HASH JOIN也不具备条件。在大批量数据中做 NESTED LOOPS显然不是最优的,还有一种方法就是在start1,end1上创建函数索引,这可能是唯一能降低执行时间的方法:

CREATE INDEX IDX_START1_END1 ON T (TRUNC(START1),TRUNC(END1));

SQL> set autotrace traceonly explain
SQL> SELECT a1 FROM (
  2   SELECT a+level-1 a1
  3     FROM (
  4            SELECT min(start1) a,max(end1) b
  5             FROM t
  6      )
  7     CONNECT BY LEVEL<=b - a + 1) q,t
  8  WHERE trunc(q.a1) BETWEEN trunc(t.start1) and trunc(t.end1)
  9  /
Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 2337532387

--------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                 |   250 |  5500 |    90   (4)| 00:00:02 |
|   1 |  NESTED LOOPS                  |                 |   250 |  5500 |    90   (4)| 00:00:02 |
|   2 |   VIEW                         |                 |     1 |     6 |    86   (4)| 00:00:02 |
|*  3 |    CONNECT BY WITHOUT FILTERING|                 |       |       |            |          |
|   4 |     VIEW                       |                 |     1 |    18 |    86   (4)| 00:00:02 |
|   5 |      SORT AGGREGATE            |                 |     1 |    16 |            |          |
|   6 |       TABLE ACCESS FULL        | T               |   100K|  1562K|    86   (4)| 00:00:02 |
|*  7 |   INDEX RANGE SCAN             | IDX_START1_END1 |   250 |  4000 |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(LEVEL<="B"-"A"+1)
   7 - access(TRUNC("Q"."A1")<=TRUNC(INTERNAL_FUNCTION("END1")) AND
              TRUNC("Q"."A1")>=TRUNC(INTERNAL_FUNCTION("START1")))
       filter(TRUNC("Q"."A1")<=TRUNC(INTERNAL_FUNCTION("END1")))

Execution Plan
----------------------------------------------------------
Plan hash value: 1814807998

---------------------------------------------------------------------------------------
| Id  | Operation                      | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |      |   250 |  5500 |   172   (4)| 00:00:03 |
|   1 |  NESTED LOOPS                  |      |   250 |  5500 |   172   (4)| 00:00:03 |
|   2 |   VIEW                         |      |     1 |     6 |    86   (4)| 00:00:02 |
|*  3 |    CONNECT BY WITHOUT FILTERING|      |       |       |            |          |
|   4 |     VIEW                       |      |     1 |    18 |    86   (4)| 00:00:02 |
|   5 |      SORT AGGREGATE            |      |     1 |    16 |            |          |
|   6 |       TABLE ACCESS FULL        | T    |   100K|  1562K|    86   (4)| 00:00:02 |
|*  7 |   TABLE ACCESS FULL            | T    |   250 |  4000 |    86   (4)| 00:00:02 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(LEVEL<="B"-"A"+1)
   7 - filter(TRUNC("Q"."A1")>=TRUNC(INTERNAL_FUNCTION("T"."START1")) AND
              TRUNC("Q"."A1")<=TRUNC(INTERNAL_FUNCTION("T"."END1")))

从执行计划上看,加了加索引明显降低了CPU的使用,降低了执行的时间。

Relative Posts

Written by admin

七月 23rd, 2010 at 8:38 上午

Leave a Reply