首页 > 其他分享 >Notes for RAC Installation

Notes for RAC Installation

时间:2022-11-15 19:33:36浏览次数:62  
标签:node 03 Installation RAC Notes number 11.2 pci

Notes for RAC Installation   Problem #1:  RAC node does fails to start CRS after turning off 1 Infiniband switch   Symptom: After one of the redundant Infiniband switches are powered off, one or more nodes lose connection to cluster and is/are evicted.   Cause:  The Infiniband configuration on the RAC nodes are different between the nodes   Error: 2014-01-03 10:58:12.376 [cssd(10047)]CRS-1612:Network communication with node kmaiptmsdb03 (3) missing for 50% of timeout interval.  Removal of this node from c luster in 299.178 seconds 2014-01-03 11:00:42.434 [cssd(10047)]CRS-1611:Network communication with node kmaiptmsdb03 (3) missing for 75% of timeout interval.  Removal of this node from c luster in 149.131 seconds 2014-01-03 11:01:22.151 [/opt/app/crs/11.2.0/11.2.0.3/bin/orarootagent.bin(10023)]CRS-5818:Aborted command 'check' for resource 'ora.crsd'. Details at (:CRSAGF0 0113:) {0:0:2} in /opt/app/crs/11.2.0/11.2.0.3/log/kmaiptmsdb01/agent/ohasd/orarootagent_root/orarootagent_root.log. 2014-01-03 11:02:12.468 [cssd(10047)]CRS-1610:Network communication with node kmaiptmsdb03 (3) missing for 90% of timeout interval.  Removal of this node from c luster in 59.104 seconds 2014-01-03 11:03:11.593 [cssd(10047)]CRS-1607:Node kmaiptmsdb03 is being evicted in cluster incarnation 283963173; details at (:CSSNM00007:) in /opt/app/crs/11. 2.0/11.2.0.3/log/kmaiptmsdb01/cssd/ocssd.log. 2014-01-03 11:03:14.601 [cssd(10047)]CRS-1625:Node kmaiptmsdb03, number 3, was manually shut down 2014-01-03 11:03:20.546 [cssd(10047)]CRS-1601:CSSD Reconfiguration complete. Active nodes are kmaiptmsdb01 kmaiptmsdb02     Solution:  Ensure /etc/path_to_inst has the same Infiniband configuration across all RAC nodes path_to_inst:"/pci@400/pci@2/pci@0/pci@8/pciex15b3,673c@0/ibport@1,ffff,ipib" 1 "ibd" path_to_inst:"/pci@400/pci@2/pci@0/pci@8/pciex15b3,673c@0/ibport@2,ffff,ipib" 2 "ibd" path_to_inst:"/pci@500/pci@2/pci@0/pci@a/pciex15b3,673c@0/ibport@1,ffff,ipib" 0 "ibd" path_to_inst:"/pci@500/pci@2/pci@0/pci@a/pciex15b3,673c@0/ibport@2,ffff,ipib" 3 "ibd"           Problem #2:  11.2 RAC database fails to come up after restart   Symptom:  RAC database throws syntax error during startup phase   Cause:  The remote_listener parameter is not interpreted properly by 11.2 RAC database   Error: ORA-00132: syntax error or unresolved network name hmaiptmsdb-scan:1521'   Solution:  Ensure EZCONNECT parameter is included in sqlnet.ora.  This is new behavior in 11.2 RAC.
NAMES.DIRECTORY_PATH= (TNSNAMES, EZCONNECT)


*********************************************************************************************************************************


When a communication failure occurs 


If you are still facing the issue please follow below steps and upload the details from the failing node. 

A>. On the failing node force stop crs using "crsctl stop crs -f" 

B>. Delete the socket files available at the below location on the failing node 
<< you may need to be root for this >> 
/var/tmp/.oracle 
or 
/tmp/.oracle 
or 
/usr/tmp/.oracle 
<< rm -rf /var/tmp/.oracle >> 

C>. on the failing node issue 
ps -ef | grep 'init d.bin' | grep -v grep 
kill any d.bin process on the failing node 

D>. Then kill the gipcd on the good node and let it respawn automatically. 
That should allow the gipcd on both nodes to talk to each other over the private interconnect. 
You can kill gipcd process by getting the process id of gipcd 
(issue ps -ef | grep gipcd.bin" to find the gipcd.bin pid). 

E>. start crs on the failing node using "date; crsctl start crs" 

For verification that we've used this process before you can check: 
#1>11gR2 GI Node May not Join the Cluster After Private Network is Functional After Eviction due to Private Network Problem ( Doc ID 1479380.1 ) 
#2>Due to bug 13899736 - Node cannot join the cluster after reboot or "interconnect restored" 



*********************************************************************************************************************************
  Oracle support may ask you to take a system dump when hang occurs Here are the steps System dump gather     sqlplus / as sysdba  SQL> oradebug setospid <ospid of diag process>  SQL> oradebug unlimit  SQL> oradebug -g all hanganalyze 3  ##..Wait about 1-2 minutes  SQL> oradebug -g all hanganalyze 3  SQL> oradebug -g all dump systemstate 258    If you can NOT connect to the instance as / as sysdba, you can use prelim however hanganalyze will not be possible with prelim:  sqlplus -prelim '/ as sysdba'  SQL> oradebug setospid <ospid of diag process>  SQL> oradebug unlimit  SQL> oradebug -g all dump systemstate 258    Additional details for collecting system state dumps can be found in MOS note: 452358.1.    ****************************************************** If you have large SGA than 100gb and running rac please follow the recommendations here   a.      Set _lm_sync_timeout to 1200 
           Setting this will prevent some timeouts during reconfiguration and DRM

b.      Set _ksmg_granule_size to 134217728
           Setting this will cut down the time needed to locate the resource for a data block.

c.      Set shared_pool_size to 15% or larger of the total SGA size.
        For example, if SGA size is 1 TB, the shared pool size should be at least 150 GB.

d.      Set _gc_policy_minimum to 15000
        There is no need to set _gc_policy_minimum if DRM is disabled by setting _gc_policy_time = 0

e.      Set _lm_tickets to 5000
        Default is 1000.   Allocating more tickets (used for sending messages) avoids issues where we ran out of tickets during the reconfiguration. 

f.      Set gcs_server_processes to the twice the default number of lms processes that are allocated.
        The default number of lms processes depends on the number of CPUs/cores that the server has, 
        so please refer to the gcs_server_processes init.ora parameter section in the Oracle Database Reference Guide 
        for the default number of lms processes for your server.  Please make sure that the total number of lms processes 
        of all databases on the server is less than the total number of CPUs/cores on the server.  Please refer to the Document 558185.1 ****************************************************** Also consider changing these   2). Set "_gc_read_mostly_locking"=FALSE  This disables read mostly locking, this feature helps in an environment where the objects are mostly read without being modified. Once an object is identified as read-mostly,  share locks are granted immediately, to reduce nodes constantly opening and closing share locks and sending lots of messages.  example:  /opt/oradiag/diag/rdbms/phsub/phsub1/trace/phsub1_lmd0_8457.trc  *** 2016-02-03 01:36:21.136  Begin DRM(12080) (swin 1) - READMOSTLY transfer pkey 96063.0 to 1 oscan 1.1  kjiobjscn 1  .  3). Set _gc_policy_minimum=15000  This reduces the chances of 'affinity' DRM to happen, the default in 11.2.0.3 is 1500 (note this was 6000 in 10.2.0.5). By setting to 15000 we are saying an object must be accessed 250 times a second by a node to initiate affinity,  the default was 25 (100 in 10.2.0.5)  example:  READMOSTLY object id 96063.0, objscan 1.1, create affinity to instance 1  Total pkey count in this drm 1  * drm quiesce  .  4). Another workaround is increasing the number of Lock Elements by setting _gc_element_percent=140 (default is 110 in 11.2, 120 in 12.1, and this may increase to 140 in 12.2).  This will increase the number of LEs to 140% of number of data block buffers in the buffer cache. This will result in a slight increase in the shared pool usage.  .  5). Verify the number of LSM processes - you should have 4-  show parameter gcs_serverprocesses 

标签:node,03,Installation,RAC,Notes,number,11.2,pci
From: https://www.cnblogs.com/lkj371/p/16893611.html

相关文章

  • 认识 MySQL OPTIMIZER_TRACE--转
    手把手教你认识OPTIMIZER_TRACE前 言我们在日常维护数据库的时候,如果遇到慢语句查询的时候,我们一般会怎么做?执行EXPLAIN去查看它的执行计划? ......
  • 设置oracle19c开机自启动
    1、以root身份登录系统,修改oratab,如下图所示:vi/etc/oratab进入vi编辑器后,找到“ORCL:/u01/app/oracle/product/19.3.0/db_1::N”将文件最后面的N,修改为Y,如下:......
  • Open Cascade 中的 AIS_InteractiveContext、V3d_Viewer 与 V3d_View 之间的关系
    转载请注明原文链接:https://www.cnblogs.com/mechanicoder/p/16892989.html1.前言本想通过Context与Viewer的多对一关系尝试实现三维视图图层、图元分类管理的功能,......
  • 解决:ORA-01034: ORACLE not available问题
    1先看oracle的监听和oracle的服务是否都启动了。启动oracle监听:cmd的命令行窗口下,输入lsnrctlstart,回车即启动监听。2查看oracle的sid叫什么,比如创建数据库的时候,实例名......
  • Promise.all、race和any方法都是什么意思?
    ////执行多个并行任务constpromiseAll=[thenFs.readFile('./files/1.txt','utf8'),thenFs.readFile('./files/2.txt','utf8'),thenFs.readFile('./f......
  • Oracle配置st_shapelib.dll(转)
    首先当然是找到st_shapelib.dll或st_geometry.dll文件,比如我本机的:C:\soft\ArcGIS\Desktop10.2\DatabaseSupport\Oracle\Windows64目录下找到st_shapelib.dll文件。......
  • oracle触发器简单使用
    触发器的作用数据确认,实施复杂的安全性检查,数据的备份和同步,对于违反规定数据库操作进行监控 触发器创建语法 创建前置触发器,在执行insert操作时,自动修改创建时间......
  • Oracle中的substr()函数,截取字符串
    实例、selectsubstr('HelloWorld',0,3)valuefromdual;//返回结果:Hel,截取从“H”开始3个字符、selectsubstr('HelloWorld',1,3)valuefromdual;//返回结果:Hel,截取......
  • 抽象类、abstract关键字
    目录abstract关键字抽象类的匿名子类抽象类的应用:模板方法的设计模式abstract关键字abstract:抽象的abstract可以用来修饰的结构:类、方法abstract修饰类:抽象类此类......
  • CF903E Swapping Characters
    CF903E:一个复杂度较优的做法首先对于题目情况分类讨论一下,整理出2种主要情况:即分别有3,4个位置不同,对于具体情况直接模拟即可。为什么两个位置不同不行呢?因为无法保证......