通过oracle ASM实现存储数据迁移

场景描述

客户原有部署在vmware虚拟化环境下的Oracle RAC双节点集群,通过iSCSI方式连接一台IBM FastT老存储,承载轻量数据库业务; 客户当前新采购一台华为5500v3存储,取新存储一部分空间,用来替代原有Oracle RAC双节点集群所使用的IBM老式存储,以解决因老的IBM存储空间不足、性能不足等问题给业务系统带来的影响;

基础环境描述: OS:Red Hat Enterprise Linux Server release 6.4 DB:Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production 旧存储:IBM FastT700 新存储:OceanStor 5500v3

数据迁移方案: 原存储挂载给Oracle数据库虚拟机,通过ASM自动存储管理方式提供给数据库使用; 数据迁移通过将新存储5500v3存储空间映射给OracleRAC主机,通过ASM的动态重平衡功能来进行磁盘替换的方式,进行新老存储数据迁移;

迁移方案

1.新存储系统规划

按规划创建硬盘域、存储池、LUN、主机映射等,通过不再赘述;


2.在存储上把新的LUN映射给二个RAC节点,并且被各节点正常识别;

2.1.在主机上识别新的LUN:

# service iscsi restart 确认新映射的LUN被系统发现: # fdisk -l

2.2.取新磁盘的SCSI_ID,用于UDEV管理:

for i in e f g;do
echo "KERNEL==\"sd*\", BUS==\"scsi\", PROGRAM==\"/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/\$name\", RESULT==\"`/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/sd$i`\", NAME=\"asm-disk$i\", OWNER=\"grid\", GROUP=\"asmadmin\", MODE=\"0660\""
done

# 脚本输出结果如下:
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="369c37f41007ca7d15a24f37100000041", NAME="asm-diske", OWNER="grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="369c37f41007ca7d15a2f915400000042", NAME="asm-diskf", OWNER="grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/$name", RESULT=="369c37f41007ca7d15af086ec00000045", NAME="asm-diskg", OWNER="grid", GROUP="asmadmin", MODE="0660"

# 把上面命令结果输入到以下配置文件里,然后启动UDEV服务:
# vi /etc/udev/rules.d/99-oracleasm.rules
# /sbin/start_udev
# ls -l /dev/asm*
brw-rw---- 1 grid asmadmin 8, 16 Oct 20 17:43 /dev/asm-diske
brw-rw---- 1 grid asmadmin 8, 32 Oct 20 17:43 /dev/asm-diskf
brw-rw---- 1 grid asmadmin 8, 48 Oct 20 17:43 /dev/asm-diskg

# 确保新增加的磁盘被系统正常识别,且权限正确。

在主机上识别新的LUN的时候,因为没有停用RAC的集群服务,所以导致集群的一些资源出错,花费了一些时间。

具体错误如下:

###################### 第一处错误 ########################

[grid@rac01 ~]$ sqlplus "/as sysdba"
SQL> select NAME , STATE FROM V$ASM_DISKGROUP;
NAME         STATE
------------------------------ -----------
CRS         DISMOUNTED
DATA         MOUNTED
FRA         MOUNTED

SQL>
SQL> alter diskgroup crs mount;
alter diskgroup crs mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15260: permission denied on ASM disk group

# 原因:因为登录的时候使用的是sysdba的身份,所以报此错误,改用sysasm的身份登录后,命令可正常执行:
[grid@rac01 ~]$ sqlplus "/as sysasm"

SQL> alter diskgroup CRS mount;
Diskgroup altered.

SQL> select NAME , STATE FROM V$ASM_DISKGROUP;

NAME         STATE
------------------------------ -----------
CRS         MOUNTED
DATA         MOUNTED
FRA         MOUNTED

###################### 第二处错误 ########################

# 原存储空间是iSCSI方式连接到主机的,在发现新磁盘的时候,直接重启了iSCSI服务,导致数据库和原磁盘的读写失败,OracleRAC集群报错,具体如下:
[grid@rac01 ~]$ crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

[grid@rac01 ~]$ crsctl status resource -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

[root@rac01 bin]# ps -ef | grep crsd.bin | grep -v grep
[root@rac01 bin]#

# 发现RAC集群服务失败后,尝试重启:
[root@rac01 bin]# ./crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.

[root@rac01 bin]# ./crsctl stop crs
CRS-2796: The command may not proceed when Cluster Ready Services is not running
CRS-4687: Shutdown command has completed with errors.
CRS-4000: Command Stop failed, or completed with errors.

# 在正常启动和关闭均失败的情况下,选择强制关闭服务:
[root@rac01 bin]# ./crsctl stop crs -f
#此处需要等待一段时间,过程中会出错,无视即可;

# 查看crsd.bin,cssd.bin,evmd.bin,都已经没有了相关进程
[root@rac01 bin]# ps -ef | grep crsd.bin | grep -v grep
[root@rac01 bin]# ps -ef | grep cssd.bin | grep -v grep
[root@rac01 bin]# ps -ef | grep evmd.bin | grep -v grep

# 尝试再次启动crs,成功!
[root@rac01 bin]# ./crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

# 启动也需要一段时间,确认各进程都已启动:
[root@rac01 bin]# ps -ef | grep crsd.bin | grep -v grep
[root@rac01 bin]# ps -ef | grep cssd.bin | grep -v grep
grid   1596   1 1 17:37 ?    00:00:00 /oracle/app/11.2.0/grid/bin/ocssd.bin
[root@rac01 bin]# ps -ef | grep evmd.bin | grep -v grep
grid   1699   1 0 17:37 ?    00:00:00 /oracle/app/11.2.0/grid/bin/evmd.bin
[root@rac01 bin]# ps -ef | grep crsd.bin | grep -v grep
[root@rac01 bin]# ps -ef | grep crsd.bin | grep -v grep
[root@rac01 bin]# ps -ef | grep crsd.bin | grep -v grep
root   1828   1 2 17:38 ?    00:00:00 /oracle/app/11.2.0/grid/bin/crsd.bin reboot

# 再次查看资源使用状况,可正常查看:
[root@rac01 bin]# ./crsctl stat resource -t
#至此,第1个节点非常不正常的处理完毕;

2.3.开始在第2个节点上准备新映射的LUN:

a.停止当前节点的cluster服务:

[root@rac02 bin]# pwd
/oracle/app/11.2.0/grid/bin
[root@rac02 bin]# ./crsctl stop cluster

b.确认各进程都不在:

[root@rac02 bin]# ps -ef | grep crsd.bin | grep -v grep
[root@rac02 bin]# ps -ef | grep pmon | grep -v grep

c.识别新存储:

[root@rac02 bin]# service iscsi restart

添加2.2步骤里的输出结果到/etc/udev/rules.d/99-oracleasm.rules配置文件,并启动UDEV服务:

[root@rac02 bin]# /sbin/start_udev
[root@rac02 bin]# ls -l /dev/asm*
brw-rw---- 1 grid asmadmin 8, 16 Oct 20 17:43 /dev/asm-diske
brw-rw---- 1 grid asmadmin 8, 32 Oct 20 17:43 /dev/asm-diskf
brw-rw---- 1 grid asmadmin 8, 48 Oct 20 17:43 /dev/asm-diskg

确保新增加的磁盘被系统正常识别,且权限正确。

d.重新启动当前节点的cluster服务:

[root@rac02 bin]# ./crsctl start cluster

e.检查各进程及服务

[root@rac02 bin]# ./crsctl status resource -t
#结果不再赘述。

[root@rac02 bin]# ps -ef | grep pmon | grep -v grep
grid   6728   1 0 18:10 ?    00:00:00 asm_pmon_+ASM2
oracle  7118   1 0 18:10 ?    00:00:00 ora_pmon_orcl2

[root@rac02 bin]# ps -ef | grep crsd.bin | grep -v grep
root   6788   1 1 18:10 ?    00:00:00 /oracle/app/11.2.0/grid/bin/crsd.bin reboot

#至此,新增加的LUN在2个节点上都已经被正常识别出来了!

3.在ASM里进行磁盘替代:

3.1.查看当前磁盘信息:

使用以下命令查看并记录当前磁盘组信息:

SQL> select group_number,name,state,total_mb,free_mb from v$asm_diskgroup;
GROUP_NUMBER NAME STATE     TOTAL_MB   FREE_MB

------------ ---------- ----------- ---------- ----------

   1 CRS   MOUNTED     5270     3746
   2 DATA CONNECTED   51200    19206
   3 FRA   CONNECTED   8192     7402

使用以下命令查看当前磁盘组包含的ASM磁盘信息:

SQL> select header_status,name,failgroup,path,total_mb,free_mb from v$asm_disk where GROUP_NUMBER in(1,2,3);
HEADER_STATU NAME FAILGROUP PATH       TOTAL_MB FREE_MB
------------ ---------- --------------- -------------------- ---------- ----------
MEMBER     CRS_0000   CRS_0000   /dev/asm-diskb      5270    3746
MEMBER     FRA_0000   FRA_0000   /dev/asm-diskc      8192    7402
MEMBER     DATA_0000 DATA_0000 /dev/asm-diskd      51200   19206

使用以下命令查看可被ASM识别的所有可用磁盘信息:

SQL> select header_status,name,failgroup,path,total_mb,free_mb from v$asm_disk;
HEADER_STATU NAME FAILGROUP PATH       TOTAL_MB FREE_MB
------------ ---------- --------------- -------------------- ---------- ----------
MEMBER     CRS_0000   CRS_0000   /dev/asm-diskb      5270    3746
MEMBER     FRA_0000   FRA_0000   /dev/asm-diskc      8192    7402
MEMBER     DATA_0000 DATA_0000 /dev/asm-diskd      51200   19196
CANDIDATE           /dev/asm-diske       0    0
CANDIDATE           /dev/asm-diskf       0    0
CANDIDATE           /dev/asm-diskg       0    0
6 rows selected.

3.2.新磁盘加入原有磁盘组,准备数据迁移:

根据磁盘容量大小确认迁移的磁盘的对应关系,然后将新盘加入到:

SQL> select header_status,name,failgroup,path,os_mb,total_mb,free_mb from v$asm_disk;
HEADER_STATU NAME FAILGROUP PATH       OS_MB TOTAL_MB FREE_MB

------------ ---------- --------------- -------------------- ------- ---------- ----------

MEMBER     CRS_0000   CRS_0000   /dev/asm-diskb     5270  5270    3746
MEMBER     FRA_0000   FRA_0000   /dev/asm-diskc     8192  8192    7402
MEMBER     DATA_0000 DATA_0000 /dev/asm-diskd     51200  51200   19196
CANDIDATE             /dev/asm-diske      0    0    0
CANDIDATE             /dev/asm-diskf      0    0    0
CANDIDATE             /dev/asm-diskg      0    0    0
6 rows selected.

SQL> alter diskgroup CRS add disk '/dev/asm-diskg' name CRS_0001;
SQL> alter diskgroup FRA add disk '/dev/asm-diskf' name FRA_0001;
SQL> alter diskgroup DATA add disk '/dev/asm-diske' name CRS_0001;

命令执行后,系统会自动在后台执行Rebalance动作,查看是否结束:

SQL> select operation,state,power from v$asm_operation;
OPERATION STAT   POWER
---------- ---- ----------
REBAL   RUN     1

SQL> select operation,state,power from v$asm_operation;
no rows selected

当以上命令没有输出时,表示自动重平衡完成;

后台重平衡完成后,再次查看ASM磁盘:

SQL> select header_status,name,failgroup,path,total_mb,free_mb from v$asm_disk;
HEADER_STATU NAME FAILGROUP PATH       TOTAL_MB FREE_MB
------------ ---------- --------------- -------------------- ---------- ----------
MEMBER     CRS_0000   CRS_0000   /dev/asm-diskb      5270    4442
MEMBER     CRS_0001   CRS_0001   /dev/asm-diskg      5270    4574
MEMBER     FRA_0000   FRA_0000   /dev/asm-diskc      8192    7754
MEMBER     DATA_0001 DATA_0001 /dev/asm-diskd      51350   35342
MEMBER     FRA_0001   FRA_0001   /dev/asm-diskf      8342    7900
MEMBER     DATA_0000 DATA_0000 /dev/asm-diske      51200   35192
6 rows selected.

删除原有(老存储)ASM磁盘:

SQL> alter diskgroup CRS drop disk CRS_0000;
SQL> alter diskgroup FRA drop disk FRA_0000;
SQL> alter diskgroup DATA drop disk DATA_0000;

等待自动重平衡结束:

SQL> select operation,state,power from v$asm_operation;
OPERATION STAT   POWER
---------- ---- ----------
REBAL   RUN     1

SQL> select operation,state,power from v$asm_operation;
no rows selected

删除原有(老存储)ASM磁盘后的磁盘状态:

SQL> select header_status,name,failgroup,path,total_mb,free_mb from v$asm_disk;
HEADER_STATU NAME FAILGROUP PATH       TOTAL_MB FREE_MB
------------ ---------- --------------- -------------------- ---------- ----------
FORMER         /dev/asm-diskb       0    0
FORMER         /dev/asm-diskc       0    0
FORMER         /dev/asm-diskd       0    0
MEMBER     CRS_0001   CRS_0001   /dev/asm-diskg      5270    3620
MEMBER     DATA_0001 DATA_0001 /dev/asm-diske      51350   19336
MEMBER     FRA_0001   FRA_0001   /dev/asm-diskf      8342    7464
6 rows selected.

3.3.在系统层面删除老存储的磁盘相关信息:

#(注意:以下动作在各个节点上滚动执行,完全执行完其中一个后,再执行另外一个节点。)
a.停止Oracle集群服务器:# crsctl stop cluster
b.修改/etc/udev/rules.d/99-oracleasm.rules文件,将其中的老磁盘信息删除;
c.在存储侧删除老旧磁盘的映射信息;
d.重启rac节点:shutdown -r now
e.重启后Oracle集群服务自动启动,检查各服务状态:
# crs_stat -t
# crsctl status resource -t

所有服务运行正常,新老存储替换完成。

合智互联客户成功服务热线:400-1565-661

admin
admin管理员

上一篇:LiteOS内核源码分析系列一 盘点那些重要的数据结构 (1)
下一篇:Vue进阶(幺柒零):前端用户体验提升(四)字体自适应

留言评论

暂无留言
取消
扫码支持