GotoDBA Features,How Things Work,Infrastructure,Troubleshooting Oracle Data Guard Explained – Step By Step

Oracle Data Guard Explained – Step By Step

Oracle Data Guard consists of many components. In this post I’ll explain about an environment that helped me understand how all the components work. This is a 12.1.0.2 (with APR 2018 PSU, so should be pretty stable)with a physical standby in a remote site. The primary has 4 redo groups of 10GB each (single member) and the network between the sites (primary to standby) is slow (only a few MB/s). Because the files are large and the network is slow, every step of the standby operation takes time. This had taught me quite a lot about how DG works step by step.

Data Guard Overview

Before we start diving, a quick data guard overview. The common scenario for physical standby is async real time apply and this is what I’ll explain here (“real time apply” = redo is applied as soon at copied to the standby, “async” = the primary doesn’t wait for a confirmation from the standby to perform a commit). With this configuration (and actually in any DG configuration for that matter), there are 2 separated steps:

  • Log Transport – move redo data from the primary to the standby
  • Redo Apply – apply the redo on the standby database

Log Transport

Under normal circumstances, a TT process (Redo Transport Slave) is sending the redo stream to the standby site. On the standby site an RFS process (Remote File Server) will get the data and write it to the standby logs.

When there is a lag for some reason, the standby will use FAL (Fetch Archive Log) information to connect to the primary to ask for the missing archives. The primary will allocate an ARC process to copy the archive to the standby (it will be received by an RFS process as well).

How can we see this?

standby: lsof -p 35260 |grep sea
primary: netstat -np|grep 28215 (based on the output of previous)

Query the standby database to find a specific RFS process

SQL> select process,pid,status from v$managed_standby where process='RFS';

PROCESS   PID                      STATUS
--------- ------------------------ ------------
RFS       38028                    RECEIVING
RFS       50159                    RECEIVING
RFS       32011                    RECEIVING
RFS       33444                    RECEIVING

The PID column is the OS process ID, so we’ll continue with shell on the standby (using PID 50159):

[oracle@standby:/tmp] $ lsof -p 50159 |grep TCP
oracle_50 50159 oracle   16u  IPv6 2552326962        0t0        TCP standby:ncube-lm->primary:10698 (ESTABLISHED)

Now we see the port this RFS connects to in the primary, so we’ll do the last step using shell on the primary (using port 10698):

[oracle@primary:~] $ netstat -np|grep 10698
tcp        0 1504800 1.1.1.1:10698         2.2.2.2:1521          ESTABLISHED 56242/ora_arc4_DB1

And this is how we see that ARC4 process on the primary is copying data to this specific RFS.

Redo Apply

This part happens on the standby database and is basically recovering the standby database from redo information. The process responsible for this is called MRP0 (Managed Standby Recovery Process). This process reads archive logs and standby redo logs and applies them to the database files.

You can see its progress in v$managed_standby:

SQL> select process,status,sequence#,block#
  2  from v$managed_standby where process='MRP0';

PROCESS   STATUS        SEQUENCE#     BLOCK#
--------- ------------ ---------- ----------
MRP0      WAIT_FOR_LOG      33883      23871

Log Transport Process

Now, that we know how things work, let’s see what really happens. In this environment the redo logs were 10GB each, which means that it takes time to archive them, even locally. What I saw is that when a redo is full, Oracle first archives the file locally and only then starts copying it to the standby database.

As I said before, the network was slow, so copying a 10GB file to the standby took a little bit more than 1 hour. In many cases, during this time the primary managed to create more archives. What I saw was that LGWR (or actually the TT process) took part of copying the files. It took the current redo log while other archived logs were copied by ARCH processes.

One last thing, as far as I understand, in theory, when ARCH is copying an archive, it copies it to the archive location on the standby side. Then, MRP can access the file and apply it. What I saw is that even when ARCH was copying a file, in some cases the data went into the standby redo logs (I followed this in v$standby_log where you see the sequence and used_bytes).

I discussed that with Pieter van Puymbroeck (PM of Oracle DG) and this is what he said:

“When RFS receives live redo, it attempts to write it to an SRL. When RFS receives gap redo, it will use an SRL ONLY if there is an SRL already containing that branch/thread/sequence. Otherwise, RFS will write to an archive log (AL) file.

When RFS writes to an SRL, that file is available to MRP because the file already exists. When RFS writes to an archive log file, that file is not available to the MRP until all of the redo has been received and RFS has closed and committed the file. If RFS dies in the middle of receiving redo into an archive log file, the file is deleted. But redo written to an SRL is not lost. It is in the file as long as the disk write completes.”

Redo Apply Process

Following the process in v$managed_standby, I saw that even though data that came from the ARCH process and RFS wrote the data to a standby log, the MRP0 couldn’t apply it in real time. Every file that was copied by ARCH had to be fully copied before MRP0 started applying it.

Pieter explained this:

“Once the MRP has access to the file (SRL or AL), it needs to know how much redo it can read. For an AL, it looks at the NAB field in the header block (block 1) of the AL. For an SRL, it uses the RTA boundary array or the NAB field in the header block of the SRL. The RTA boundary array will only contain the number of blocks that can be read if the SRL is for live redo. So if gap redo is being received into an SRL, the RTA boundary array will NOT contain the number of blocks that can be read and the NAB field in the header block will not be correct until the entire file has been received because it is only updated after all the redo has been received.”

One more thing that happened to me is that sometimes MRP0 just got stuck in the middle of a log. In this case it waited for the entire redo file to be copied and then it continued. When this happened I got a “RTA boundary” message in MRP0 trace file. I guess that this happens when the primary switches a log file and start writing to a new one, but I haven’t checked that. I also have an SR on this, I will update this post when I have more information about it.

Minimizing the Lag

If you read carefully and followed the logic, you probably realized that we had a constant lag issue. The fact that copying an entire redo log takes over 1 hour, while during this time we create more archives led to a really huge lag, few hours during peaks, and we wanted to reduce this lag.

The problem is that in peak time we created more redo than we could copy. This caused to a case when MRP0 can’t apply in real time, so it waited for an entire redo to be copied (which took more than an hour), and because it took so long to copy the file we had a large lag all the time.

Our solution was to reduce the size of the redo logs to 2GB. That way, even if we create a few GB in a short while, we can copy them in parallel, allowing shorter delay until MRP0 can start applying. We still have lag during this time, of course, but we minimize the lag because we allow MRP to start applying faster (after 2GB instead of 10GB) while we keep copying more archives in parallel.

Script to Calculate Transfer Rate

This is not a perfect script, but I use it to measure to transfer rate. It’s quite simple, so it has the following limitations:

  • Assumes that block size for redo is 512B
  • Works only if during the script execution the files being copied have not changed (so if Oracle starts copying a new archive or finished copying an old one, it will return an error message instead of the rate)
#!/bin/bash
t=30
if [ -n "$1" ]; then
t=$1
fi
first=$(sqlplus -s / as sysdba << EOF
set pages 0 lines 1000 echo off
select sum(sequence#),sum(block#) from v\$managed_standby where process='RFS';
exit
EOF
)
echo "sleeping for $t seconds"
sleep $t
second=$(sqlplus -s / as sysdba << EOF
set pages 0 lines 1000 echo off
select sum(sequence#),sum(block#) from v\$managed_standby where process='RFS';
exit
EOF
)
first_seq=$(echo ${first} | awk '{print $1}')
first_sum=$(echo ${first} | awk '{print $2}')
second_seq=$(echo ${second} | awk '{print $1}')
second_sum=$(echo ${second} | awk '{print $2}')
if [ "${first_seq}" != "${second_seq}" ]; then
echo "something has changed, try again"
else
diff=$((second_sum – first_sum))
echo "blocked copied in $t seconds is $diff"
diff_mb=$(bc <<< "scale=2; ${diff}*0.5/${t}/1024")
echo "${diff_mb} MB/s"
fi
view raw DG_rate.sh hosted with ❤ by GitHub

1 thought on “Oracle Data Guard Explained – Step By Step”

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post