In an environment I’ve upgraded lately from 11.2 to 18c I found out that some backups are not getting deleted from the standby database. The environment has a single instance primary database (with GI+ASM) and a single instance standby database (with GI+ASM) and we run backups on the standby database only (to reduce load from the primary).
What I saw
As I said, I run the RMAN backups on the standby side, while the script executes “delete obsolete” after the backup (retention is configured to redundancy 2).
On a well behaving environment, redundancy 2 keeps 4-7 days on the disk (I have full backups on Sundays and Wednesdays and incremental backups all other days). On this environment, after the upgrade I saw that RMAN keeps a few backup files (not all) of every backup for much longer (at the time I looked it was 2 weeks, and it didn’t seem to delete these files at all).
What I initially did
At first I went to the basic idea that something has changed or is broken in 18c RMAN. I checked RMAN logs and v$backup_datafile but everything looked OK. I checked the redundancy settings (as I though something has changed there) but it looked fine. I also searched MOS for some RMAN issues but couldn’t find anything.
The research
The next step was to see what these backup pieces actually were. I listed them (“list backuppiece” in RMAN to see the backupset, then “list backupset” to see the content), all of them were archive logs.
Then I thought maybe there is an issue with the “delete obsolete” command, so I decided to check what RMAN thinks about the restore process. I used “restore database preview” in RMAN to see which backup files RMAN wants to restore and realized that RMAN actually needs the archive logs in these files to restore the database. OK, this is not a problem with the retention or the “delete obsolete” command. RMAN really thinks it needs these files, but why? It doesn’t make sense. I have a full backup of all files from 2 days ago, but Oracle still needs archives from 2 weeks ago.
Things are getting clear
Another MOS search and I found note 282617.1. This note talks about archive logs not being deleted by RMAN and the reason is that a file is offline, so its SCN is old. A file with old SCN requires Oracle to apply all archive logs from that SCN onward. This was not my case, as I didn’t have offline files, but that made me check the SCN of the standby database files (select file#,checkpoint_change# from v$datafile_header). The SCN of all the files was the same, but when I compared that to the primary I realized that the SCN of the standby files is much older than the one of the primary files.
Next was another MOS search that lead me to bug 29056767. This bug is for 18c and causes the datafile checkpoint information not to be updated in the standby database. It is fixed in 18.8, 19.4 and 20.1. The workaround is to set a hidden parameter on the standby database (alter system set “_time_based_rcv_ckpt_target”=0;) and restart the redo apply.
Now it all makes sense. If the SCN of the files is old, the recovery process will restore these files with the old SCN and will require all archive logs from this SCN forward, even if there is a full backup of the files from a later time.
Solution
I set the parameter on the standby database (alter system set “_time_based_rcv_ckpt_target”=0;) and restarted the redo apply (using dgmgrl: edit database <stby> set state=”apply-off”, then: edit database <stby> set state=”apply-on”).
After that I checked and saw that the SCN was getting updated. The last step was to run “delete obsolete” in RMAN and see that the old archive log backups are getting deleted.
Nice article.
Actually same thing happen to also. we facing this after upgrade database from 10.2.0.5 to 12.1.0,2 rman did not deleted archive log backup along with incremental backups.
its deleting with full backup as per retention policy. it keep archivelog backups entair week after having another full backup.
So did you figure out what was causing this? Is this the expected behavior in 12.1 or some kind of a bug?