This is the post that will complete my Real Application Testing series (for now at least). In this post I’ll talk about the DB Replay.
If you haven’t read my previous post about RAT, you should probably start with them. The first post was introduction to RAT and the second was about the SPA.
As I explained before, RAT allows us to perform all kind of tests before making a change to the production database. The first component is the SPA which is a “unit testing” for SQL statements in order to see how the change affects the SQLs. The second component is the DB Replay which I’ll discuss here.
Overview
Database Replay allows us to test the overall load on the system after we make the change. The goal is to answer the question: “How is the change going to affect the overall performance and the general behavior of the system?”
With SPA we checked each SQL statement individually, here we will run the real production load to see how the system behaves. SPA showed us differences in execution time, execution plans, I/O and memory access, etc. With DB Replay we will look for changes derived from overall load, such as Buffer Cache and Shared Pool management and behavior, CPU load, Wait Events, latches, locking and more.
The process includes capturing the actual load from the production, copy the capture files to the test server and replay them. We need to consider that here, as in SPA, we should have the test environment to be identical to the production (hardware and software), we also should have exactly the same data that was in the production during the capture, so we won’t have different behavior because of different data in the database.
Some of the changes that can be tested with DB Replay are:
- Hardware changes (storage subsystem, CPU, memory, etc.)
- OS changes (patches, versions and changing the OS altogether)
- Oracle memory allocation change
- Moving from single instance to RAC or vice versa
- Initialization parameter change
The process
1. Capturing the Load
The capture should be done on the production, obviously. It is highly recommended to save the exact time or even the SCN when we started the capture, so we can restore the test to that exact point.
The capture process is made by DB Control (EM Express in 12c), Grid Control (or Cloud Control) or using the DBMS_WORKLOAD_CAPTURE package. The capture files are saved to a directory that we created and configured in the capture process.
Several things to know:
- We can configure filters (so that the capture will include or exclude certain activity)
- There is no dependency between the capture files and the OS or any other infrastructure. We can replay the files on any environment
- Database jobs and background processes are not captured (the test environment should have jobs configured and run them by itself)
- Currently these are not captured: SQL Loader direct load, distributed transactions and flashback queries
- The overhead is relatively small (4%-5%)
- The capture require some memory resources, about additional 64KB for each PGA
2. Copy to the Test Environment
After the capture is completed, we need to copy the files. These are simple OS files and can be copied at the OS level in any way.
3. Processing the Capture Files
After we copy the capture files, we need to process them. It is done on the test environment (to reduce production overhead).
The process is done to prepare the load for replay and is dependent on the target database version. After processing, we can replay the files on any DB at the same version. In order to run the load on different version, we will have to process them again.
4. Replay
At this stage we are ready to replay the captured load. As I said before, it is highly recommended to restore the production database to the same point in time when the capture started.
The replay itself is done by a tool called wrc. This is a small utility that comes with Oracle installations, and it will be the one to connect to the database and run the capture files. The wrc should run on a different server that the database server (remember that the clients connect to the database from a different machines as well). It is a software that can run up to 50 concurrent threads and it needs access to the directory with the processed capture files and the ability to connect to the database.
The first step will be to run the wrc in calibrate mode. In this mode, Oracle will read the processed files and will show us how many machines with wrc we need so the wrc client side will not be the bottleneck.
After calibration we will allocate machines to run the wrc from. When we start the wrc, it will connect to the database and wait for instructions. Starting the replay is done by using the DB Console, EM Express, Grid Control, Cloud Control or DBMS_WORKLOAD_REPLAY package, and it will give the commands to all of the connected wrc to start running the load simultaneously.
Several points related to the replay process:
- You can replay a load on a standby database. In order to do that you need to change the standby database to snapshot standby database and after the replay, change it back to regular standby.
- Filters can be applied in the replay process as well.
- There is a filter to replay only queries without any DML and DDL operations.
5. Comparison
During the replay, dedicated dashboards in the web tools (DB Console, EM Express, Grid Control or Cloud Control) will show us the progress of the replay process and the difference between the production and test runs. After the replay is completed we can compare the results, including AWR reports.
All of this data is available using the DBMS_WORKLOAD_CAPTURE and DBMS_WORKLOAD_REPLAY packages.
6. Additional Iteration
Just like SPA, we can restore the database again and run the replay as many time as we like. When we find a problem, we can investigate it, fix it and then run the test again and again until we are ready for production.
Limitations and Comments
- As in SPA, we should have test environment that will be as identical as possible to the production, hardware, software, database structure and database data. if something is different, there can be a difference between the results on the test environment and the production.
- The capture process only capture PL/SQL calls, not the PL/SQL code that is executed. For example, assume that there is different behavior between Sundays and Mondays in a procedure. If we captured on Sunday and replayed on Monday, the procedure will run the code of Monday, and not Sunday which was executed during the capture.
New Features in 12c
DB Replay was introduced in Oracle 11g and can run a real load from the production database, so how about running a subset of the load or multiplication of it? This is exactly what Oracle added in 12c.
In this version we can run a workload subset on the test environment, which means run the real load but partially. This is still a behavior of a production environment, but less loaded. I guess it can help for general testing purposes where the test servers are less powerful that the production ones and can’t handle the real load. Besides that, I personally can’t think of any other use.
Another feature is capacity planning, and this is much more useful. With capacity planning we can enlarge the load during the replay, and that allows us to simulate the real production behavior but with heavier load (like more users). This can help us to plan the hardware needs according to capacity growth.
There is also another new feature call DB Replay Concurrent. This feature allows us to gather several capture files from different databases and run them together on the same target database. The goal here is testing consolidation of several databases, of course.
Summary
This post is the last one in the short RAT series. When RAT was introduced, it was mentioned only in upgrade related tasks, and this is not enough to justify the cost and effort of using it.
I hope that in this short series you understood that it is not only for upgrades and the added value of such a tool is tremendous. The ability to test almost any change we wish to perform on the production is amazing and I sure that any DBA who is responsible for production environment appreciate Oracle’s effort in this tool.
I hope you enjoyed,
Liron