Over the years I ran into all kind of weird and wonderful backup and restore scenarios. This case has challenged me for a while now and I finally had the chance to check it properly and figure out what’s going on. So here is the story.
The environment
This client has a few database, some of them are quite large, and they are all very similar. Each database has an active standby database, and we’re using the standby for backup. Every couple of months they restore the backup to a test environment, perform an application upgrade and then shutdown the test server (to reduce cost as everything is in the cloud). The next time they want to restore the database, they simply start the server and we drop the current database and restore the updated backup.
To make this process easier I created a script that does everything (it starts an instance, restores the spfile and control file, restores and recovers the database, activates the standby database and does some other stuff so the database will be ready). The script works great but a couple of months ago we started having problems with 2 specific environments (the busiest ones).
The problem
With these environments, usually (but not always) the restore process fails. The failure has a few characteristics:
- New incarnations of the database sometimes appear (and when this happens the recover part fails because the backed up archives belong to a different incarnation)
- The recovery part sometimes get “RMAN-06025 no backup of archived log…” about archive logs that do not exist in the backup
- After I fix the incarnation and restore the correct archives (using “set until logseq” in RMAN), the database still won’t open with “ORA-1194 file 1 needs more recovery to be consistent”
Explanation
I wrote a short post about the incarnation issue, so you can read about it here.
Regarding the RMAN-06025, when backing up with RMAN, these is what RMAN does:
- Backup of archive logs
- Backup of data files
- Backup of archive logs which were created during step 2
- Backup of control file and spfile
What happened here is that these databases are very active, so during step 3 more archives were created. These archives were not backed up as part of the RMAN backup, but the control file that was backed up in step 4 had them registered.
When I restored the database I used the control file from step 4. The “recover database” command wanted to restore the archived it knows about, and this includes the archives that were created during step 3 above. This is the reason I got the “RMAN-06025”.
The last issue (the ORA-1194) puzzled me. I thought it’s related to the same issue as the previous one. If the control file is newer than the other data files we might get this error (even though the data files are in sync), but it didn’t make sense. Restoring a database using a control file from another time (older or newer) is quite common. I decided to try it, and when I did (restoring a database using a newer control file) it simply worked. I opened an SR and just lately got the answer. There is no problem with a regular control file but it doesn’t work for standby control file. And there is an enhancement request for that.
Solution
Once I realized that, the complete solution was quite easy:
- After I drop the database and before I start restoring, I manually delete all orphan archive logs (to avoid the incarnation issue)
- To solve the problem of archive logs that don’t exist in the backup, I restore the control file, then I check which is the last archive log I backed up and in the “recover” command I add “until logseq” to this log file sequence (plus 1, because because RMAN recover until this sequence not including)
- And the last thing was to avoid the recovery issue. On the backup disk we keep a week of backups, so it was very easy to take an older control file and restore it instead of taking the last one. The last one is newer than the data files and includes archive logs that I don’t have, but the one before doesn’t have these problems
Excellent explaination