GotoDBA Features,Infrastructure,Troubleshooting An Issue with Automatic Flashback of Standby Database

An Issue with Automatic Flashback of Standby Database

I dedicate this post to Pieter Van Puymbroeck, who left us way too early. I actually sent him details about this issue one day before I got the news of him passing. RIP.


Oracle 19c introduced a new feature called “Automatic Flashback of Standby Database”. This feature performs flashback database of the standby database automatically after we issue a flashback database on the primary. Without this feature, managing the standby database should be done manually, but with 19c, it happens automatically and reduce the manual overhead involved when issuing flashback database.

This is a cool feature and a few blogs wrote about it (Including Rene Antunez (on Pythian blog) and Gavin Soorma), and showed how it simply works. However, they all created a basic standby environment, and simply flashed back the primary database. While this is great, it’s not always how things work in the real world. And indeed, a client contacted me with a problem (and steps to reproduce), so here I am reproducing it and presenting the problem.

As I wrote above, I tried to contact Pieter about it, but after I heard the terrible news, I put that aside for a while. I’ll try to contact Oracle soon to try and investigate it (and will update about it).

In this post I’ll follow the flashback database command, and just like in real production environments, I’ll also manage the archives created by the database.

Checking the environment

I have 2 linux servers with 19.9. ora19pri is the primary server (with database unique name tst19a), ora19sec is the secondary (with database unique name tst19b).

[oracle@ora19sec ~]$ dgmgrl /
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Wed Aug 4 16:50:43 2021
Version 19.9.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected to "tst19b"
Connected as SYSDG.
DGMGRL> show configuration

Configuration - tst19

  Protection Mode: MaxPerformance
  Members:
  tst19a - Primary database
    tst19b - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 58 seconds ago)

DGMGRL> show database tst19b

Database - tst19b

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          0 seconds (computed 0 seconds ago)
  Average Apply Rate: 3.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    tst19b

Database Status:
SUCCESS

All looks good, the standby is fully synchronized and we can continue.

Creating a restore point

Let’s create a guarantee restore point on the primary and save the relevant information about it.

SQL> create restore point GRP guarantee flashback database;

Restore point created.

SQL> select * from v$restore_point;

       SCN DATABASE_INCARNATION# GUA STORAGE_SIZE TIME                             RESTORE_POINT_TIME  PRE NAME  PDB CLE PDB_INCARNATION# REP     CON_ID
---------- --------------------- --- ------------ -------------------------------- ------------------- --- ----- --- --- ---------------- --- ----------
   3257010                     4 YES    209715200 04-AUG-21 04.52.27.000000000 PM                      YES GRP   NO  NO                 0 NO           0

SQL> select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIM NEXT_CHANGE# NEXT_TIME     CON_ID
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- ------------ --------- ----------
         1          1         16  209715200        512          1 NO  CURRENT                3255495 04-AUG-21   9.2954E+18                    0
         2          1         14  209715200        512          1 YES INACTIVE               2872574 02-AUG-21      3081824 03-AUG-21          0
         3          1         15  209715200        512          1 YES INACTIVE               3081824 03-AUG-21      3255495 04-AUG-21          0

From the information above we see that the SCN of the restore point is 3257010 which is in the current logfile (sequence 16). Let’s move on.

Manage archives

Before we try to manage the archives, let’s perform a few log switches:

SQL> alter system switch logfile;

System altered.

SQL> /

System altered.

SQL> /

System altered.

SQL> /

System altered.

SQL> select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIM NEXT_CHANGE# NEXT_TIME     CON_ID
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- ------------ --------- ----------
         1          1         19  209715200        512          1 YES INACTIVE               3257600 04-AUG-21      3257603 04-AUG-21          0
         2          1         20  209715200        512          1 NO  CURRENT                3257603 04-AUG-21   9.2954E+18                    0
         3          1         18  209715200        512          1 YES INACTIVE               3257589 04-AUG-21      3257600 04-AUG-21          0

Now we are on log sequence 20 which starts with SCN 3257603 (while the restore point was 3257010). Let’s verify that the standby is synchronized:

DGMGRL>  show database tst19b

Database - tst19b

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 1 second ago)
  Apply Lag:          0 seconds (computed 1 second ago)
  Average Apply Rate: 3.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    tst19b

Database Status:
SUCCESS

And now let’s backup the database and delete archives on the primary:

[oracle@ora19pri ~]$ rman target /

Recovery Manager: Release 19.0.0.0.0 - Production on Wed Aug 4 16:58:44 2021
Version 19.9.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: TST19 (DBID=4106941138)

RMAN> backup archivelog all delete input;

Starting backup at 04-AUG-21
current log archived
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1 device type=DISK
channel ORA_DISK_1: starting archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=15 RECID=181 STAMP=1079714477
input archived log thread=1 sequence=16 RECID=182 STAMP=1079715407
input archived log thread=1 sequence=17 RECID=184 STAMP=1079715409
input archived log thread=1 sequence=18 RECID=187 STAMP=1079715413
input archived log thread=1 sequence=19 RECID=188 STAMP=1079715413
input archived log thread=1 sequence=20 RECID=190 STAMP=1079715547
channel ORA_DISK_1: starting piece 1 at 04-AUG-21
channel ORA_DISK_1: finished piece 1 at 04-AUG-21
piece handle=/oracle/backup/6u05m9mr_1_1 tag=TAG20210804T165907 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:07
channel ORA_DISK_1: deleting archived log(s)
archived log file name=/oracle/oradata/FRA/TST19A/archivelog/2021_08_04/o1_mf_1_15_jjp9dkhb_.arc RECID=181 STAMP=1079714477
RMAN-08139: warning: archived redo log not deleted, needed for guaranteed restore point
archived log file name=/oracle/oradata/FRA/TST19A/archivelog/2021_08_04/o1_mf_1_16_jjpb9zdv_.arc thread=1 sequence=16
archived log file name=/oracle/oradata/FRA/TST19A/archivelog/2021_08_04/o1_mf_1_17_jjpbb1ry_.arc RECID=184 STAMP=1079715409
archived log file name=/oracle/oradata/FRA/TST19A/archivelog/2021_08_04/o1_mf_1_18_jjpbb5oy_.arc RECID=187 STAMP=1079715413
RMAN-08120: warning: archived log not deleted, not yet applied by standby
archived log file name=/oracle/oradata/FRA/TST19A/archivelog/2021_08_04/o1_mf_1_19_jjpbb5rl_.arc thread=1 sequence=19
RMAN-08120: warning: archived log not deleted, not yet applied by standby
archived log file name=/oracle/oradata/FRA/TST19A/archivelog/2021_08_04/o1_mf_1_20_jjpbgc4y_.arc thread=1 sequence=20
Finished backup at 04-AUG-21

Starting Control File and SPFILE Autobackup at 04-AUG-21
piece handle=/oracle/oradata/FRA/TST19A/autobackup/2021_08_04/o1_mf_s_1079715555_jjpbgn2d_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 04-AUG-21

RMAN> list archivelog all;

List of Archived Log Copies for database with db_unique_name TST19A
=====================================================================

Key     Thrd Seq     S Low Time
------- ---- ------- - ---------
182     1    16      A 04-AUG-21
        Name: /oracle/oradata/FRA/TST19A/archivelog/2021_08_04/o1_mf_1_16_jjpb9zdv_.arc

188     1    19      A 04-AUG-21
        Name: /oracle/oradata/FRA/TST19A/archivelog/2021_08_04/o1_mf_1_19_jjpbb5rl_.arc

190     1    20      A 04-AUG-21
        Name: /oracle/oradata/FRA/TST19A/archivelog/2021_08_04/o1_mf_1_20_jjpbgc4y_.arc

As you can see, some of the archives are not deleted (16 for example is required for the restore point as RMAN says. I guess 17 and 18 don’t contain anything so they got deleted).

Let’s do the same on the standby:

[oracle@ora19sec ~]$ rman target /

Recovery Manager: Release 19.0.0.0.0 - Production on Wed Aug 4 17:01:54 2021
Version 19.9.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: TST19 (DBID=4106941138, not open)

RMAN> backup archivelog all delete input;

Starting backup at 04-AUG-21
using target database control file instead of recovery catalog
RMAN-06820: warning: failed to archive current log at primary database
cannot connect to remote database
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=412 device type=DISK
channel ORA_DISK_1: starting archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=16 RECID=47 STAMP=1079715408
input archived log thread=1 sequence=17 RECID=48 STAMP=1079715410
input archived log thread=1 sequence=18 RECID=49 STAMP=1079715414
input archived log thread=1 sequence=19 RECID=50 STAMP=1079715414
input archived log thread=1 sequence=20 RECID=51 STAMP=1079715547
channel ORA_DISK_1: starting piece 1 at 04-AUG-21
channel ORA_DISK_1: finished piece 1 at 04-AUG-21
piece handle=/oracle/backup/8j05m9s9_1_1 tag=TAG20210804T170201 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
channel ORA_DISK_1: deleting archived log(s)
archived log file name=/oracle/oradata/FRA/TST19B/archivelog/2021_08_04/o1_mf_1_16_jjpb9zz5_.arc RECID=47 STAMP=1079715408
archived log file name=/oracle/oradata/FRA/TST19B/archivelog/2021_08_04/o1_mf_1_17_jjpbb29c_.arc RECID=48 STAMP=1079715410
archived log file name=/oracle/oradata/FRA/TST19B/archivelog/2021_08_04/o1_mf_1_18_jjpbb62w_.arc RECID=49 STAMP=1079715414
archived log file name=/oracle/oradata/FRA/TST19B/archivelog/2021_08_04/o1_mf_1_19_jjpbb6cm_.arc RECID=50 STAMP=1079715414
RMAN-08120: warning: archived log not deleted, not yet applied by standby
archived log file name=/oracle/oradata/FRA/TST19B/archivelog/2021_08_04/o1_mf_1_20_jjpbgclv_.arc thread=1 sequence=20
Finished backup at 04-AUG-21

Starting Control File and SPFILE Autobackup at 04-AUG-21
piece handle=/oracle/oradata/FRA/TST19B/autobackup/2021_08_04/o1_mf_s_1079715544_jjpbmv5x_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 04-AUG-21

RMAN> list archivelog all;

List of Archived Log Copies for database with db_unique_name TST19B
=====================================================================

Key     Thrd Seq     S Low Time
------- ---- ------- - ---------
51      1    20      A 04-AUG-21
        Name: /oracle/oradata/FRA/TST19B/archivelog/2021_08_04/o1_mf_1_20_jjpbgclv_.arc

Here, all of the archives got deleted. The standby is not aware of the restore point, so it didn’t have a problem deleting these archives. That’s actually not a problem. What should happen when I flashback the primary, is that the archives will be available on the primary and will be sent to the standby (again) for the flashback.

Flashback database

First, let’s verify that we have flashback on for both databases.

Primary:

[oracle@ora19pri ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Wed Aug 4 17:03:37 2021
Version 19.9.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.9.0.0.0

SQL> select FLASHBACK_ON from v$database;

FLASHBACK_ON
------------------
YES

And standby:

[oracle@ora19sec ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Wed Aug 4 17:05:38 2021
Version 19.9.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.9.0.0.0

SQL> select FLASHBACK_ON from v$database;

FLASHBACK_ON
------------------
YES

And now let’s flashback:

[oracle@ora19pri ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Wed Aug 4 17:07:44 2021
Version 19.9.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.9.0.0.0

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area  805304048 bytes
Fixed Size                  9139952 bytes
Variable Size             478150656 bytes
Database Buffers          314572800 bytes
Redo Buffers                3440640 bytes
Database mounted.
SQL> flashback database to restore point grp;

Flashback complete.

SQL> alter database open resetlogs;

Database altered.

Great, but what’s going on with the standby? It should request log seq 16 from the primary, perform the flashback and continue recovering, right? Here is the alert log of the standby (it’s long so I cut some pieces out):

....
2021-08-04T17:11:14.820307-07:00
Incarnation entry added for Branch(resetlogs_id): 1079716215 (tst19b)
2021-08-04T17:11:14.881893-07:00
Setting recovery target incarnation to 5
2021-08-04T17:11:14.882928-07:00
MRP0 (PID:1980): MRP0: Incarnation has changed! Retry recovery...
2021-08-04T17:11:15.012305-07:00
Errors in file /oracle/diag/rdbms/tst19b/tst19b/trace/tst19b_mrp0_1980.trc:
ORA-19906: recovery target incarnation changed during recovery
MRP0 (PID:1980): Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
....
Errors in file /oracle/diag/rdbms/tst19b/tst19b/trace/tst19b_mrp0_1980.trc:
ORA-19909: datafile 1 belongs to an orphan incarnation
ORA-01110: data file 1: '/oracle/oradata/TST19/system01.dbf'
2021-08-04T17:11:36.623483-07:00
MRP0 (PID:1980): Recovery coordinator performing automatic flashback of database to SCN:0x000000000031b2b2 (3257010)
Flashback Restore Start
2021-08-04T17:11:38.749241-07:00
Flashback Restore Complete
Flashback Media Recovery Start
2021-08-04T17:11:38.785482-07:00
Setting recovery target incarnation to 4
2021-08-04T17:11:38.786941-07:00
Serial Media Recovery started
max_pdb is 3
...
Flashback Media Recovery stopped with error 38861
Recovery interrupted!
Some recovered datafiles maybe left media fuzzy
Media recovery may continue but open resetlogs may fail
stopping change tracking
2021-08-04T17:12:44.658925-07:00
Setting recovery target incarnation to 5
2021-08-04T17:12:44.830471-07:00
Errors in file /oracle/diag/rdbms/tst19b/tst19b/trace/tst19b_mrp0_1980.trc:
ORA-00283: recovery session canceled due to errors
ORA-38861: flashback recovery stopped before reaching recovery target
ORA-16016: archived log for thread 1 sequence# 15 unavailable
2021-08-04T17:12:44.830604-07:00
MRP0 (PID:1980): Recovery coordinator encountered one or more errors during automatic flashback on standby
2021-08-04T17:12:44.830719-07:00
Background Media Recovery process shutdown (tst19b)

Check the highlighted line, the recovery process is waiting for log sequence 15! It should not need sequence 15, the restore point was taken when the current log was 16. This means that the standby database is now broken.

Fixing the environment

In order to fix the standby, I simply restored sequence 15 on the primary and restarted the standby apply process:

[oracle@ora19pri ~]$ rman target /

Recovery Manager: Release 19.0.0.0.0 - Production on Wed Aug 4 17:19:24 2021
Version 19.9.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: TST19 (DBID=4106941138)

RMAN> list incarnation;


List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       1       TST19    4106941138       PARENT  1          12-APR-20
2       2       TST19    4106941138       PARENT  2044456    13-JUL-21
3       3       TST19    4106941138       PARENT  2048069    13-JUL-21
4       4       TST19    4106941138       PARENT  2062321    13-JUL-21
5       5       TST19    4106941138       CURRENT 3257012    04-AUG-21

RMAN> restore archivelog from sequence 15 until sequence 15 incarnation 4;

Starting restore at 04-AUG-21
using channel ORA_DISK_1

channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=15
channel ORA_DISK_1: reading from backup piece /oracle/backup/6u05m9mr_1_1
channel ORA_DISK_1: piece handle=/oracle/backup/6u05m9mr_1_1 tag=TAG20210804T165907
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
Finished restore at 04-AUG-21

RMAN> exit

Recovery Manager complete.
[oracle@ora19pri ~]$ dgmgrl /
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Wed Aug 4 17:22:08 2021
Version 19.9.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected to "TST19A"
Connected as SYSDG.
DGMGRL> show database tst19b

Database - tst19b

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      (unknown)
  Apply Lag:          44 minutes 11 seconds (computed 1 second ago)
  Average Apply Rate: 3.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    tst19b

  Database Error(s):
    ORA-16766: Redo Apply is stopped

  Database Warning(s):
    ORA-16853: apply lag has exceeded specified threshold
    ORA-16856: transport lag could not be determined

Database Status:
ERROR

DGMGRL> edit database tst19b set state='apply-off';
Succeeded.
DGMGRL> edit database tst19b set state='apply-on';
Succeeded.
DGMGRL>  show database tst19b

Database - tst19b

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      0 seconds (computed 0 seconds ago)
  Apply Lag:          0 seconds (computed 0 seconds ago)
  Average Apply Rate: 346.00 KByte/s
  Real Time Query:    OFF
  Instance(s):
    tst19b

Database Status:
SUCCESS

And the alert log indeed shows recovery from sequence 15 (the highlighted line):

2021-08-04T17:22:28.692913-07:00
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL
2021-08-04T17:22:28.758114-07:00
Errors in file /oracle/diag/rdbms/tst19b/tst19b/trace/tst19b_rsm0_1967.trc:
ORA-16136: Managed Standby Recovery not active
ORA-16136 signalled during: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL...
2021-08-04T17:22:34.583294-07:00
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT  NODELAY
2021-08-04T17:22:34.583677-07:00
Attempt to start background Managed Standby Recovery process (tst19b)
Starting background process MRP0
2021-08-04T17:22:34.596533-07:00
MRP0 started with pid=46, OS id=1842
2021-08-04T17:22:34.598332-07:00
Background Managed Standby Recovery process started (tst19b)
2021-08-04T17:22:39.604011-07:00
Serial Media Recovery started
MRP0 (PID:1842): Managed Standby Recovery starting Real Time Apply
max_pdb is 3
Media Recovery start incarnation depth : 1, target inc# : 5, irscn : 3257011
stopping change tracking
MRP0 (PID:1842): Media Recovery Waiting for B-1077825879.T-1.S-15
MRP0 (PID:1842): Fetching gap from B-1077825879.T-1.S-15 to B-1077825879.T-1.S-15
2021-08-04T17:22:40.066663-07:00
 rfs (PID:1228): Opened log for T-1.S-15 dbid 4106941138 branch 1077825879
2021-08-04T17:22:40.605017-07:00
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT  NODELAY
2021-08-04T17:22:42.573434-07:00
 rfs (PID:1228): Archived Log entry 54 added for B-1077825879.T-1.S-15 ID 0x0 LAD:2
2021-08-04T17:22:43.210476-07:00
MRP0 (PID:1842): Media Recovery Log /oracle/oradata/FRA/TST19B/archivelog/2021_08_04/o1_mf_1_15_jjpctj1w_.arc
2021-08-04T17:22:43.212578-07:00
MRP0 (PID:1842): Media Recovery Log /oracle/oradata/FRA/TST19B/archivelog/2021_08_04/o1_mf_1_16_jjpc5vj9_.arc
2021-08-04T17:22:46.308970-07:00
MRP0 (PID:1842): Media Recovery Log /oracle/oradata/FRA/TST19B/archivelog/2021_08_04/o1_mf_1_1_jjpc535q_.arc
MRP0 (PID:1842): Media Recovery Waiting for T-1.S-2 (in transit)
2021-08-04T17:22:46.335327-07:00
Recovery of Online Redo Log: Thread 1 Group 6 Seq 2 Reading mem 0
  Mem# 0: /oracle/oradata/FRA/TST19B/onlinelog/o1_mf_6_jgwc67xh_.log

Conclusion

The automatic flashback is a cool feature and saves time when you’re using a standby database. However, there are still some glitches with it. Be aware that if you constantly delete archives you might run into this case and if the needed archives are unavailable, you’ll have to recreate the standby.

As I said I’ll try to contact Oracle regarding this and will keep you updated.

1 thought on “An Issue with Automatic Flashback of Standby Database”

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post