The new method is totally based on costs, and it is sensitive to the value of n.
With small values of n, the CBO tends to generate plans that consist of nested loop joins with index lookups.
With large values of n, the CBO tends to generate plans that consist of hash joins and full table scans.
The CBO performs the following steps:
1. The optimizer generates a set of potential plans for the SQL statement based on available access paths and hints.
2. The optimizer estimates the cost of each plan based on statistics in the data dictionary for the data distribution and storage characteristics of the tables, indexes, and partitions accessed by the statement.
The cost is an estimated value proportional to the expected resource use needed to execute the statement with a particular plan. The optimizer calculates the cost of access paths and join orders based on the estimated computer resources, which includes I/O, CPU, and memory.
Serial plans with higher costs take more time to execute than those with smaller costs. When using a parallel plan, however, resource use is not directly related to elapsed time.
3. The optimizer compares the costs of the plans and chooses the one with the lowest cost.
In a nested loop join, for every row in the outer row set, the inner row set is accessed to find all the matching rows to join. Therefore, in a nested loop join, the inner row set is accessed as many times as the number of rows in the outer row set.
cost = outer access cost + (inner access cost * outer cardinality)
In a sort merge join, the two row sets being joined are sorted by the join keys if they are not already in key order.
cost = outer access cost + inner access cost + sort costs (if sort is used)
In a hash join, the inner row set is hashed into memory, and a hash table is built using the join key. Each row from the outer row set is then hashed, and the hash table is probed to join all matching rows. If the inner row set is very large, then only a portion of it is hashed into memory. This portion is called a hash
partition.
cost = (outer access cost * # of hash partitions) + inner access cost
CBO Join Method
A nested loop join is inefficient when a join returns a large number of rows (typically, more than 10,000 rows is considered large), and the optimizer might choose not to use it. The cost of a nested loop join is calculated by the following formula:
cost= access cost of A + (access cost of B * number of rows from A)
If you are using the CBO, then a hash join is the most efficient join when a join returns a large number or rows. The cost of a hash join is calculated by the following formula:
cost= (access cost of A * number of hash partitions of B) + access cost of B
If you are using the RBO, then a merge join is the most efficient join when a join returns a large number or rows. The cost of a merge join is calculated by the following formula:
cost= access cost of A + access cost of B +(sort cost of A + sort cost of B)
--------------------------------------------------------------------------------------------------------
常用的SQL访问路径和表连接
数据访问路径:指明了oracle通过何种方法去访问需要的数据,常用的可有以下三种
1. 全表扫描(Full Table Scans, FTS),为实现全表扫描,Oracle读取表中所有的行,并检查每一行是否满足语句的WHERE限制条件。Oracle顺序地读取分配给表的每个数据块,直到读到表的最高水线处(high water mark, HWM,标识表的当前使用的最后一个数据块)。Oracle在全表扫描的时候,会一次读入多个db block,每次读取块数由参数 db_block_multiblock_read_count设定,这极大的减少了I/O总次数,提高了系统的吞吐量,所以利用多块读的方法可以十分高效地实现全表扫描,而且只有在全表扫描的情况下才能使用多块读操作。在这种访问模式下,每个数据块只被读一次。但很多情况下,我们只要获取表中很少一部分数据,而表恰恰又很大,这样通过全表扫描来检索数据,效率就很低了,为此,oracle提供了通过索引来访问数据的方法。
SQL> explain plan for select count(job) from emp;
SQL> select * from table(dbms_xplan.display);
--------------------------------------------------------------------
| Id | Operation                        | Name       | Rows | Bytes | Cost |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                    |     1      |     6     | 9619 |
|   1 | SORT AGGREGATE      |                  |     1      |     6     |           |
|   2 |   TABLE ACCESS FULL | EMP           |    10M |    57M | 9619 |
2. 通过ROWID的表存取(Table Access by ROWID或rowid lookup),where条件中直接使用rowid限制,或者第一步先访问index,获取rowid后再访问表。获取少量数据的时候,这种方法效率极高,因为rowid包含了相对文件号,块号,块内行号,对象号,通过这些信息,可以直接定位到数据文件中读取需要的块。
SQL> explain plan for select a.empno,ename from emp a where rowid='AAAHW7AABAAAMUiAAA';
SQL> select * from table(dbms_xplan.display);
--------------------------------------------------------------------------
| Id | Operation                                   | Name       | Rows | Bytes | Cost |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                       |     1   |     16 |     1 |
|   1 | TABLE ACCESS BY USER ROWID| EMP    |     1   |    16 |     1 |
3. 索引扫描(Index Scan)
索引唯一扫描(index unique scan)
索引范围扫描(index range scan)
索引全扫描(index full scan)
索引快速扫描(index fast full scan)
索引跳跃扫描(index skip scan)
表的连接join
表连接(Join)是一种试图将两个表结合在一起的谓词,一次只能连接2个表,表连接也可以被称为表关联。
目前为止,无论连接操作符如何,典型的连接类型共有3种:
排序合并连接(Sort Merge Join (SMJ) )
嵌套循环(Nested Loops (NL) )
哈希连接(Hash Join)
排序合并连接(Sort Merge Join, SMJ)
内部连接过程:
1) 首先生成row source1需要的数据,然后对这些数据按照连接操作关联列(如A.col3)进行排序。
2) 随后生成row source2需要的数据,然后对这些数据按照与sort source1对应的连接操作关联列(如B.col4)进行排序。
3) 最后两边已排序的行被放在一起执行合并操作,即将2个row source按照连接条件连接起来
下面是连接步骤的图形表示:
MERGE
/         \
SORT    SORT
|            |
Row Source 1      Row Source 2
SQL> explain plan for select /*+ user_nl(a,b) */ a.empno,ename,job from emp a,dept b where a.deptno
>b.deptno;
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------
| Id | Operation                         | Name       | Rows | Bytes | Cost |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                   |     4      |    84    |     5 |
|   1 | MERGE JOIN          |                             |      4      |    84    |     5 |
|   2 |   INDEX FULL SCAN    | PK_DEPT    |     4      |    12    |     1 |
|* 3 |   SORT JOIN          |                             |    21    |   378 |     4 |
|   4 |    TABLE ACCESS FULL | EMP        |    21     |   378 |     2 |
SQL> explain plan for select /*+ use_nl(a,b) */ a.empno,ename,job from emp a,dept b where a.deptno>b
.deptno;
SQL> select * from table(dbms_xplan.display);
----------------------------------------------------------------------------------------------------
| Id | Operation                         | Name       | Rows | Bytes | Cost |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                    |     4      |    84   |     9 |
|   1 | NESTED LOOPS        |                         |     4      |    84   |     9 |
|   2 |   INDEX FULL SCAN    | PK_DEPT     |     4      |    12   |     1 |
|* 3 |   TABLE ACCESS FULL | EMP         |     1      |    18   |     2 |
嵌套循环(Nested Loops, NL)
内部连接过程:
1)Oracle 优化器根据基于规则RBO或基于成本CBO的原则,选择两个表中的一个作为驱动表,并指定其为外部表,再将另外一个表指定为内部表。
2) Oracle从外部表中读取第一行,然后和内部表中的数据逐一进行对比,所有匹配的记录放在结果集中。
3) Oracle读取外部表中的第二行,再和内部表中的数据逐一进行对比,所有匹配的记录添加到结果集中。重复上述步骤,直到外部表中的所有纪录全部处理完。最后产生满足要求的结果集。
SQL> explain plan for select a.empno,ename,job from emp a,dept b where a.deptno=b.deptno;
SQL> select * from table(dbms_xplan.display);
-------------------------------------------------------------------------------------------
| Id | Operation                          | Name       | Rows      | Bytes | Cost |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                        |    14     |   294 |     2 |
|   1 | NESTED LOOPS             |                        |    14     |   294 |     2 |
|* 2 |   TABLE ACCESS FULL | EMP              |    14     |   252 |     2 |
|* 3 |   INDEX UNIQUE SCAN | PK_DEPT     |     1      |     3     |       |
哈希连接(Hash Join, HJ)
这种连接是在oracle 7.3以后引入的,从理论上来说比NL与SMJ更高效,而且只用在CBO优化器中。
1) 较小的row source被用来在内存中构建hash table
2) 第2个row source中的每一条纪录被用来被hansed,并与第一个row source生成hash table进行匹配
3) 匹配成功的记录放到结果集中
当hash table比较大而不能全部容纳在内存中时,这种查找方法更为有用。这种连接方法也有NL连接中所谓的驱动表的概念,被构建为hash table与bitmap的表为驱动表,当被构建的hash table与bitmap能被容纳在内存中时,这种连接方式的效率极高。要使哈希连接有效,需要设置HASH_JOIN_ENABLED=TRUE,缺省情况下该参数为TRUE,另外,不要忘了还要设置hash_area_size参数,以使哈希连接高效运行,因为哈希连接会在该参数指定大小的内存中运行,过小的参数会使哈希连接的性能比其他连接方式还要低。
SQL> explain plan for select a.empno,ename,B.DNAME,job from emp a,dept b where a.deptno=b.deptno;
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Rows | Bytes | Cost |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                   | 9999K|   295M| 9744 |
|* 1 | HASH JOIN                     |                      | 9999K|   295M| 9744 |
|   2 |   TABLE ACCESS FULL | DEPT          |     4     |    52   |     2 |
|   3 |   TABLE ACCESS FULL | EMP            |    10M |   171M| 9619 |
在哪种情况下用哪种连接方法比较好:
排序合并连接(Sort Merge Join, SMJ):
对于非等值连接,这种连接方式的效率是比较高的,因此主要用于不等价连接,如<、 <=、 >、 >=,但是不包括 <>,如果在关联的列上都有索引,效果更好。对于将2个较大的row source做连接,或者连接列缺乏可使用或可选择性的索引,该连接方法比NL连接要好一些。但是如果sort merge返回的row source过大,则又会导致使用过多的rowid在表中查询数据时,数据库性能下降,因为过多的I/O。所有的表都需要排序。它为最优化的吞吐量而设计,并且在结果没有全部找到前不返回数据。
嵌套循环(Nested Loops, NL):
如果driving row source(外部表)比较小,并且在inner row source(内部表)上 有唯一索引,或有高选择性非唯一索引时,使用这种方法可以得到较好的效率。
NESTED LOOPS有其它连接方法没有的的一个优点是:可以先返回已经 连接的行,而不必等待所有的连接操作理完才返回数据, 这可以实现快速的响应时间。
哈希连接(Hash Join, HJ):
这种方法是在oracle7后来引入的,使用了比较先进的连接理论, 一般来说,其效率应该好于其它2种连接,但是这种连接只能用在 CBO优化器中,而且需要设置合适的hash_area_size参数, 才能取得较好的性能。
在2个较大的row source之间连接时会取得相对较好的效率,在一个 row source较小时则能取得更好的效率。
只能用于等值连接中
From: http://space.itpub.net/?uid-10159839-action-viewspace-itemid-256939
This work is licensed under a CC A-S 4.0 International License.