在批量更新中,merge语句性能是最好的,因为merge可以多快读,而且可以启用并行,merge语句比一般update语句快,另外还有一种就是根据rowid来更新,这几种方法更新各有好处,首先使用merge更新,可以多快读,且速度是最快的,而且可以并行,速度就更快了,其次用rowid来更新,虽然是单块读,但如果加上组合索引的方式,那通过组合索引就不用回表,只扫描扫描就可以了,这也是比较快的,最后用普通的update,这种写法好处就是方便,容易写。下面来看一个简单的例子
1. 创建测试表,插入数据
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
SQL> create table test_a as select * from dba_objects; Table created. SQL> create table test_b as select * from dba_objects; Table created. SQL> insert into test_b select * from test_b; 91570 rows created. SQL> / 183140 rows created. SQL> / 366280 rows created. SQL> / 732560 rows created. SQL> commit; Commit complete. SQL> SELECT SUM(BYTES)/1024/1024 "SIZE(MB)" FROM DBA_SEGMENTS WHERE SEGMENT_NAME='TEST_B'; SIZE(MB) ---------- 200 |
2. 在两个测试表上创建2个索引
1 2 3 4 5 6 7 |
SQL> create index idx_a on test_a(object_name,object_id); Index created. SQL> create index idx_b on test_b(object_id); Index created. |
3. 如果现在用最原始的update语句从test_a表更新test_b表,关联条件是object_id
1 |
SQL> update test_b b set b.object_name=(select a.object_name from test_a a where a.object_id=b.object_id); |
我这里反正是过了很久都没有更新完成,毕竟测试库,没有那么大的buffer cache,这里的update相当于嵌套循环,test_b表有多少数据,那么子查询就要执行扫描多少次,对于一个几十万的表,扫描这么多次无疑是非常慢的。
4. 这里直接改成merge语句,设置多快读参数为128,开启并行
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
SQL> alter session set db_file_multiblock_read_count=128; Session altered. SQL> alter session enable parallel dml; Session altered. SQL> conn scott/tiger Connected. SQL> merge /*+ USE_HASH(C,H) FULL(C) */ into test_b c 2 using (select /*+INDEX(A) USE_HASH(A) */ a.object_name, a.object_id 3 from test_a a 4 where a.object_id in (select /*+ use_hash(b) index(b) */ object_id from test_b b)) h 5 on (c.object_id = h.object_id) 6 when matched then 7 update set c.object_name = h.object_name; 1465104 rows merged. Elapsed: 00:00:12.03 SQL> |
可以看到使用merge语句更新了146W的数据,但是只花掉了10多秒的时间,而第一个update语句却迟迟没有更新成功。
简单看看这2中update方式的执行计划
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
SQL> explain plan for merge /*+ USE_HASH(C,H) FULL(C) */ into test_b c using (select /*+INDEX(A) USE_HASH(A) */ a.object_name, a.object_id from test_a a 2 3 4 where a.object_id in (select /*+ use_hash(b) index(b) */ object_id from test_b b)) h 5 on (c.object_id = h.object_id) 6 when matched then 7 update set c.object_name = h.object_name; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------------------- Plan hash value: 1121481720 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | MERGE STATEMENT | | 91569 | 8137K| | 1684 (1)| 00:00:01 | | 1 | MERGE | TEST_B | | | | | | | 2 | VIEW | | | | | | | |* 3 | HASH JOIN | | 91569 | 12M| 3760K| 1684 (1)| 00:00:01 | | 4 | INDEX FULL SCAN | IDX_A | 91569 | 2682K| | 523 (1)| 00:00:01 | | 5 | TABLE ACCESS FULL| TEST_B | 91570 | 10M| | 428 (1)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("C"."OBJECT_ID"="A"."OBJECT_ID") 17 rows selected. SQL> explain plan for update test_b b set b.object_name=(select a.object_name from test_a a where a.object_id=b.object_id); Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------------------- Plan hash value: 703839442 ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------- | 0 | UPDATE STATEMENT | | 91570 | 2682K| 47M (1)| 00:31:11 | | 1 | UPDATE | TEST_B | | | | | | 2 | TABLE ACCESS FULL| TEST_B | 91570 | 2682K| 428 (1)| 00:00:01 | |* 3 | INDEX FULL SCAN | IDX_A | 1 | 30 | 523 (1)| 00:00:01 | ----------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("A"."OBJECT_ID"=:B1) filter("A"."OBJECT_ID"=:B1) 16 rows selected. 使用merge into方式正确的使用的hash join连接,但普通的update语句,相当于嵌套循环,在id=3处看出来,当驱动表test_b返回一条数据的时候,被驱动表扫描一次,这里对于全表扫描会返回很多行的update来说,是非常不适合的。 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
SQL> DECLARE 2 CURSOR CUR_B IS 3 SELECT a.object_id, a.object_name, b.ROWID ROW_ID 4 FROM test_a A, test_b B 5 WHERE A.object_id = B.object_id 6 ORDER BY B.ROWID; 7 V_COUNTER NUMBER; 8 BEGIN 9 V_COUNTER := 0; 10 FOR ROW_B IN CUR_B LOOP 11 UPDATE test_b b SET object_name = ROW_B.object_name WHERE ROWID = ROW_B.ROW_ID; 12 V_COUNTER := V_COUNTER + 1; 13 IF (V_COUNTER >= 10000) THEN 14 COMMIT; 15 dbms_output.put_line('Updated: ' ||V_COUNTER || ' lines.'); 16 V_COUNTER := 0; 17 END IF; 18 END LOOP; 19 COMMIT; 20 END; 21 / PL/SQL procedure successfully completed. Elapsed: 00:01:09.35 |
这里通过rowid来更新,前文已经说过,rowid是单块读的,没有merge快,而实验也正好证明了这个问题,merge语句要比通过rowid来更新要快很多,不过在通过rowid更新的时候要注意,我这里用了order by rowid,是在buffer cache不够大的情况下使用,性能比较好,保证被更新的表不被page out,当然如果buffer cache够大,就不会存在这种情况了,反而比不加order by rowid要快,因为少了排序,而且buffer cache也能容纳下这2个表。用rowid的好处就是可以批量提交,但不足的地方就是没有merge快,不能并行更新。
- 本文固定链接: http://www.savedba.com/?p=571
- 转载请注明: 版权所有,文章允许转载,但必须以链接方式注明源地址,否则追究法律责任!