项目中,我们要求凡是有主子关系的表都要使用外键约束,来保证主子表之间关系的正确,不推荐由应用自己控制这种关系。
但发现有时开发人员提交SQL语句时未必会注意外键列需要定义索引,或者不清楚为什么外键列需要建立索引,网上一些所谓的“宝典”也会将外键列建索引作为其中的一条,包括TOM大师,曾说过:
导致死锁的头号原因是外键未加索引(第二号原因是表上的位图索引遭到并发更新)。在以下两种情况下,Oracle在修改父表后会对子表加一个全表锁:
1)如果更新了父表的主键(倘若遵循关系数据库的原则,即主键应当是不可变的,这种情况就很少见),由于外键上没有索引,所以子表会被锁住。
2)如果删除了父表中的一行,整个子表也会被锁住(由于外键上没有索引)。
因此,无论从什么角度看,都有必要从原理上好好理解外键为何需要创建索引,或者说外键不创建索引会有什么问题?
首先我们看下Concept官方是怎么描述这个问题。
(引用:E11882_01/server.112/e40540/consist.htm#CNCPT88978)
Indexes and Foreign Keys
As a rule, foreign keys should be indexed. The only exception is when the matching unique or primary key is never updated or deleted. Indexing the foreign keys in child tables provides the following benefits:
(1) Prevents a full table lock on the child table. Instead, the database acquires a row lock on the index.
(2) Removes the need for a full table scan of the child table. As an illustration, assume that a user removes the record for department 10 from the departments table. If employees.department_id is not indexed, then the database must scan employees to see if any employees exist in department 10.
Locks and Foreign Keys
Oracle Database maximizes the concurrency control of parent keys in relation to dependent foreign keys. Locking behavior depends on whether foreign key columns are indexed. If foreign keys are not indexed, then the child table will probably be locked more frequently, deadlocks will occur, and concurrency will be decreased. For this reason foreign keys should almost always be indexed. The only exception is when the matching unique or primary key is never updated or deleted.
Locks and Unindexed Foreign Keys
When both of the following conditions are true, the database acquires a full table lock on the child table:
(1) No index exists on the foreign key column of the child table.
(2) A session modifies a primary key in the parent table (for example, deletes a row or modifies primary key attributes) or merges rows into the parent table. Inserts into the parent table do not acquire table locks on the child table.
Suppose that hr.departments table is a parent of hr.employees, which contains the unindexed foreign key department_id. Figure 9-3 shows a session modifying the primary key attributes of department 60 in the departments table.
In Figure 9-3, the database acquires a full table lock on employees during the primary key modification of department 60. This lock enables other sessions to query but not update the employees table. For example, employee phone numbers cannot be updated. The table lock on employees releases immediately after the primary key modification on the departments table completes. If multiple rows in departments undergo primary key modifications, then a table lock on employees is obtained and released once for each row that is modified in departments.
Locks and Indexed Foreign Keys
When both of the following conditions are true, the database does not acquire a full table lock on the child table:
(1) A foreign key column in the child table is indexed.
(2) A session modifies a primary key in the parent table (for example, deletes a row or modifies primary key attributes) or merges rows into the parent table.
A lock on the parent table prevents transactions from acquiring exclusive table locks, but does not prevent DML on the parent or child table during the primary key modification. This situation is preferable if primary key modifications occur on the parent table while updates occur on the child table.
Figure 9-4 shows child table employees with an indexed department_id column. A transaction deletes department 280 from departments. This deletion does not cause the database to acquire a full table lock on the employees table as in the scenario described in “Locks and Unindexed Foreign Keys”.
If the child table specifies ON DELETE CASCADE, then deletions from the parent table can result in deletions from the child table. For example, the deletion of department 280 can cause the deletion of records from employees for employees in the deleted department. In this case, waiting and locking rules are the same as if you deleted rows from the child table after deleting rows from the parent table.
按照官方文档的说明,
只有当唯一键或主键不被更新或删除的情况下,才不需要为外键创建索引。为子表外键创建索引可以有两个好处:
(1) 避免子表上有表锁,取而代之的是,数据库会获取索引上的行锁。
(2) 避免子表上的全表扫描。假设删除departments主表id=10的记录,如果employees子表的department_id外键没有索引,那么就会全表扫描employees子表,以确认是否存在department id=10的记录。当满足以下两个条件时,会获取子表的表锁:
(1) 子表外键不存在索引。
(2) 修改主表的主键(例如,删除一行记录或者修改主键值)或者合并主表的多行记录。向主表插入记录不会获取子表的表锁。
只有当主键值修改完成,子表的表锁才会被放开。
效果是这样么?我们是用实验来验证。
创建测试表
SQL> create table t1
2 (id number,
3 name varchar2(1)
4 );
Table created.
SQL> alter table t1 add constraint pk_t1 primary key (id);
Table altered.
SQL> create table t2
2 (id number,
3 t1_id number,
4 name varchar2(1)
5 );
Table created.
SQL> alter table t2 add constraint pk_t2 primary key (id);
Table altered.
SQL> alter table t2 add constraint fk_t2 foreign key (t1_id) references t1(id);
Table altered.
SQL> insert into t1 values(1, 'a');
1 row created.
SQL> insert into t1 values(2, 'b');
1 row created.
SQL> insert into t2 values(1, 1, 'c');
1 row created.
SQL> commit;
Commit complete.
SQL> select * from t1;
ID N
---------- -
1 a
2 b
SQL> select * from t2;
ID T1_ID N
---------- ---------- -
1 1 c
t1是主表,t2是子表,t2的t1_id列是外键,参考t1表的id主键列。
实验1
session 1删除主表id=2的记录:
SQL> delete from t1 where id = 2;
1 row deleted.
session 2删除子表id=1的记录:
SQL> delete from t2 where id = 1;
1 row deleted.
发现好像并没有像文档中描述的,删除主表一行记录,就会锁住子表整张表,这是为什么?我们先继续看实验2。
实验2
session 1删除子表id=1的记录(sid是150):
SQL> delete from t2 where id = 1;
1 row deleted.
或update t2 set name = 'c' where id = 1;
session 2删除主表id=2的记录(sid是144):
SQL> delete from t1 where id = 2;
这个session处于hang的状态。
SQL> select object_name from dba_objects where object_id = 76828;
OBJECT_NAME
T1
SQL> select object_name from dba_objects where object_id = 76830;
OBJECT_NAME
T2
我们可以看出session 1有两个TM锁,一个TX锁,session 2有两个TM锁。
这里隐含的知识点就是v$lock视图中ID1和ID2列的含义,Reference手册中有介绍,但基本没什么用。
参考secooler老师的文章(http://m.blog.itpub.net/519536/viewspace-693689/),参阅MOS:29787.1,得知了ID1和ID2更详细的说明:
对于TM锁来说,ID1表示表对象,ID2一般是0。
对于TX锁来说,ID1表示Decimal RBS & slot,当前事务的回滚段编号和槽位号(十进制,RBS和slot的组合,根据0xRRRRSSSS RRRR = RBS number, SSSS = slot的定义,高16位表示RBS值,对应于VTRANSACTION中的XIDUSN字段,低16位表示slot值,对应于VTRANSACTION中的XIDSLOT字段),ID2表示Decimal WRAP number,序列号。
因此两个TM锁,其中object_id=76828对应的是T1表,object_id=76830对应的是T2表。
对于TX这行记录,确实能够按照如下计算得到当前事务的回滚段编号、槽位号以及序列号。
SQL> select trunc(589843/65536) from dual;
TRUNC(589843/65536)
9
SQL> select mod(589843,65536) from dual;
MOD(589843,65536)
19
SQL> select XIDUSN,XIDSLOT,XIDSQN from V$TRANSACTION where XIDSQN=715;
XIDUSN XIDSLOT XIDSQN
9 19 715
进一步,我们从处于hang的session 2,即执行删除主表操作的10046事件中还可以发现一些细节:
PARSING IN CURSOR #11135800 len=25 dep=0 uid=90 oct=7 lid=90 tim=1458405691531128 hv=2708121416 ad='526137f8' sqlid='9kjcfz6hqp9u8'
delete from t1 where id=2
WAIT #11135800: nam='enq: TM - contention' ela= 4202458 name|mode=1414332420 object #=76830 table/partition=0 obj#=76830 tim=1458405695733683
PARSING IN CURSOR #11130048 len=68 dep=1 uid=0 oct=3 lid=0 tim=1458405695736681 hv=4142137182 ad='57d9d9f0' sqlid='cpbvjc7vf7zuy'
select /*+ all_rows */ count(1) from "TEST"."T2" where "T1_ID" = :1
END OF STMT
PARSE #11130048:c=2000,e=2221,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=1458405695736680
STAT #11128324 id=2 cnt=1 pid=1 pos=1 obj=76830 op='TABLE ACCESS FULL T2 (cr=7 pr=0 pw=0 time=111 us cost=2 size=0 card=82)'
从这块可以得出两个结论:
(1) 此时删除主表记录时,会出现一个TM锁争用,需要说明的是,trc文件中这条信息是在session 1执行commit或rollback释放子表之后才写入的,我们看到session 1的执行,导致session 2执行需要请求T2表的TM锁时出现了等待。
(2) 删除主表记录的过程中,会根据外键字段检索子表记录,select /+ all_rows / count(1) from “TEST”.”T2” where “T1_ID” = :1,这就意味着,如果外键没有索引,则这条SQL语句会执行全表扫描,从后面的TABLE ACCESS FULL T2也证明了。
实验3
和实验2操作过程相同,只是session 2从delete操作换为了update主键操作:update t1 set id=3 where id=2
效果和实验2相同,session 2处于hang,且从10046事件看,和实验2还是相同:
PARSING IN CURSOR #11123668 len=29 dep=0 uid=90 oct=6 lid=90 tim=1458407400035758 hv=1728227981 ad='5278e0fc' sqlid='43bqtdxmh5and'
97 update t1 set id=3 where id=2
PARSING IN CURSOR #11109876 len=68 dep=1 uid=0 oct=3 lid=0 tim=1458407409907499 hv=4142137182 ad='57d9d9f0' sqlid='cpbvjc7vf7zuy'
105 select /*+ all_rows */ count(1) from "TEST"."T2" where "T1_ID" = :1
实验4
session 1删除子表id=1的记录:
SQL> delete from t2 where id = 1;
1 row deleted.
session 2更新主表id=2记录的非主键字段:
SQL> update t1 set name = 'c' where id = 2;
1 row updated.
没有出现hang,可以操作。
实验5
session 1插入子表新记录:
SQL> insert into t2 values(2, 1, 'c');
1 row created.
session 2插入主表新记录:
SQL> insert into t1 values(3, 'c');
1 row created.
没有出现hang,可以操作。
实验6
创建外键索引,看下效果,
SQL> create index idx_t2_id on t2(t1_id);
Index created.
session 1删除子表id=1的记录:
SQL> delete from t2 where id = 1;
1 row deleted.
session 2删除主表id=2的记录:
SQL> delete from t1 where id = 2;
此时没有hang,
确实没有锁:
查看10046事件:
PARSING IN CURSOR #12183444 len=27 dep=0 uid=90 oct=7 lid=90 tim=1458479427981508 hv=3481522657 ad='57ded014' sqlid='87pqrfv7s7ng1'
delete from t1 where id = 2
END OF STMT
PARSE #12183444:c=1999,e=2469,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=1636297587,tim=1458479427981505
EXEC #12183444:c=0,e=365,p=0,cr=1,cu=6,mis=0,r=1,dep=0,og=1,plh=1636297587,tim=1458479427982059
STAT #12183444 id=1 cnt=0 pid=0 pos=1 obj=0 op='DELETE T1 (cr=1 pr=0 pw=0 time=298 us)'
STAT #12183444 id=2 cnt=1 pid=1 pos=1 obj=76829 op='INDEX UNIQUE SCAN PK_T1 (cr=1 pr=0 pw=0 time=26 us cost=0 size=3 card=1)'
WAIT #12183444: nam='SQL*Net message to client' ela= 12 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1458479427983901
确实没有产生锁争用的现象。
总结:
(1) 外键没有索引,确实可能导致子表产生表锁,但是有前提:
a. 子表有删改操作。
b. 主表有删操作,或者更新主键的操作。
满足以上两个条件才会出现主表操作hang状态。
(2) 外键不建索引,则删除主表记录或主子表关联查询,都会进行子表的全表扫描。
(3) 主子表任何插入操作,无论顺序,不会产生锁或hang状态。
(4) 只有外键创建索引,(1)中的操作才不会出现锁或hang状态,(2)中的操作才有可能使用索引。
通过以上实验,至少对外键不建立索引产生的影响,有了一些感性的认识,对外键为何要建立索引,应该有了更深入的理解。