GotoDBA Troubleshooting Do You Know the TFA Collector?

Do You Know the TFA Collector?

This might be one of the most important posts I have written lately for the community. I mean, I write mostly about interesting technical stuff (well, interesting for me anyway), but this post is for the community and those of you who work with Oracle Support.

Working with Oracle Support

I have been an Oracle DBA for 18 years now, 14 years of which as a consultant. I visited many companies, did lots of troubleshooting and worked with many aspects of the database. During these years I obviously interacted quite a lot with Oracle support. The thing with Oracle support is that they need a lot of information to investigate a problem, sometimes it is justified, but sometimes it isn’t. And I’m sure you know the feeling when they ask for irrelevant information, or information you already gave them. It’s frustrating sometimes, and I know people that just don’t trust them anymore.
I had luck and met several support people at OOW15, a senior support engineer named Bryan and a support Director called Scott (who later introduced me to his VP, Lauren, and we had a very good talk). I say I was lucky, because I really believe that these three changed my view about Oracle Support. All in all I probably talked with them for 2 hours, and it seems that they see their support role as I see my services when working with my clients. Our work is to really solve the problem as quickly and as good as we can, even though I don’t always get this feeling from the other side of the MOS.
One of the things that I suffer from quite a lot when opening an SR, is the amount of data I’m asked to supply: AWR, RDA, different logs and trace files, OSWatcher, SQLT and more and more. Some of them (like RDA, OSWatcher and SQLT) are external tools that we need to download from MOS, install, run and upload the output (and think about RAC, we need to run all of this and get all the logs from all nodes). This is annoying when I just want to troubleshoot a general issue, and can be a real problem when working on a production environment that currently isn’t working well.

TFA Collector

I mentioned this to Scott and Bryan when they talked about the TFA collector, a new tool that gathers a lot of information from the database. The first thing that came to my mind is “really? Another tool?”. But then they explained about it so I had to write this post.
The idea behind it is to consolidate all the tools into one, the TFA collector. The first incentive to create this tool was the complexity of RAC environment. So this tool connects to all RAC nodes to get the information, then it saves all the output (as zip file) on one node, so uploading them is really easy. But that’s not all; it runs as a daemon to collect some basic information (like OSWatcher) all the time, and you can use it to run SQLT, collect all logs and traces from the server and much more. They are working on adding more tools and more capabilities to it all the time.
It is another tool, right, but hopefully it will eventually be the ONLY tool. So we can install it on our servers and update it periodically (it is shipped with the latest versions of Oracle database), and when we have any issue with the database, we just run it and add the output to the SR. That way, we don’t need to download and install any additional tools, and the support engineers have all the information they need to start analyzing the issue.
Currently, the TFA is supported on most Linux and UNIX environments, but not on Windows (I don’t know if this is planned). All Oracle database, clusterware and GI versions are supported, and it can be downloaded from MOS. Everything you need is in note 1513912.2.

Examples

After I came back from OOW15, I had to download and try it, so here are some examples for using it.
The installation process is quite easy – download the installation file and run it. It requires java 1.5 or later, and it installs everything in a designated directory.
After the installation is completed, there is a command line utility called tfactl which can be used to run all the tools integrated with TFA.
This is the output of “tfactl toolstatus”:

[root@ora12c bin]# ./tfactl toolstatus
.-------------------------------------.
|       External Support Tools        |
+--------+--------------+-------------+
| Host   | Tool         | Status      |
+--------+--------------+-------------+
| ora12c | alertsummary | DEPLOYED    |
| ora12c | exachk       | DEPLOYED    |
| ora12c | ls           | DEPLOYED    |
| ora12c | pstack       | DEPLOYED    |
| ora12c | orachk       | DEPLOYED    |
| ora12c | sqlt         | DEPLOYED    |
| ora12c | grep         | DEPLOYED    |
| ora12c | summary      | DEPLOYED    |
| ora12c | prw          | NOT RUNNING |
| ora12c | vi           | DEPLOYED    |
| ora12c | tail         | DEPLOYED    |
| ora12c | param        | DEPLOYED    |
| ora12c | dbglevel     | DEPLOYED    |
| ora12c | darda        | DEPLOYED    |
| ora12c | history      | DEPLOYED    |
| ora12c | oratop       | DEPLOYED    |
| ora12c | oswbb        | NOT RUNNING |
| ora12c | changes      | DEPLOYED    |
| ora12c | events       | DEPLOYED    |
| ora12c | ps           | DEPLOYED    |
'--------+--------------+-------------'

These are all the tools TFA supports and their status. DEPLOYED means the tool is installed, NOT RUNNING and RUNNING are for daemon tools and whether they are running in the background or not.
Probably the most important command for working with support will be the diagcollect:

[root@ora12c bin]# ./tfactl diagcollect
Collecting data for the last 4 hours for all components...
Collecting data for all nodes
Collection Id : 20151103075956ora12c
Repository Location in ora12c : /opt/oracle/tfa/repository
Collection monitor will wait up to 30 seconds for collections to start
2015/11/03 08:00:01 IST : Collection Name : tfa_Tue_Nov_3_07_59_56_IST_2015.zip
2015/11/03 08:00:01 IST : Scanning of files for Collection in progress...
2015/11/03 08:00:01 IST : Collecting extra files...
2015/11/03 08:00:06 IST : Getting list of files satisfying time range [11/03/2015 04:00:01 IST, 11/03/2015 08:00:06 IST]
2015/11/03 08:00:06 IST : Starting Thread to identify stored files to collect
2015/11/03 08:00:06 IST : Getting List of Files to Collect
2015/11/03 08:00:06 IST : Finished Getting List of Files to Collect
2015/11/03 08:00:06 IST : Collecting ADR incident files...
2015/11/03 08:00:06 IST : Waiting for collection of extra files
2015/11/03 08:00:07 IST : Completed collection of extra files...
2015/11/03 08:00:11 IST : Completed Zipping of all files
2015/11/03 08:00:11 IST : Cleaning up temporary files
2015/11/03 08:00:11 IST : Finished Cleaning up temporary files
2015/11/03 08:00:11 IST : Finalizing the Collection Zip File
2015/11/03 08:00:11 IST : Finished Finalizing the Collection Zip File
2015/11/03 08:00:11 IST : Total Number of Files checked : 48
2015/11/03 08:00:11 IST : Total Size of all Files Checked : 1.4MB
2015/11/03 08:00:11 IST : Number of files containing required range : 4
2015/11/03 08:00:11 IST : Total Size of Files containing required range : 1MB
2015/11/03 08:00:11 IST : Number of files trimmed : 0
2015/11/03 08:00:11 IST : Total Size of data prior to zip : 1.1MB
2015/11/03 08:00:11 IST : Saved 0kB by trimming files
2015/11/03 08:00:11 IST : Zip file size : 174kB
2015/11/03 08:00:11 IST : Total time taken : 10s
2015/11/03 08:00:11 IST : Completed collection of zip files.
Logs are being collected to: /opt/oracle/tfa/repository/collection_Tue_Nov_3_07_59_56_IST_2015_node_all
/opt/oracle/tfa/repository/collection_Tue_Nov_3_07_59_56_IST_2015_node_all/ora12c.tfa_Tue_Nov_3_07_59_56_IST_2015.zip

Some other cool features are:

  • The amount of data kept can be configured and automatically purged
  • When we collect diagnostic data, we can configure a time window for the collection, TFA will trim the files to collect only the relevant data
  • All tools can be executed using tfactl (interactively or with command line arguments)
  • TFA can be configured to start when the server starts

Summary

I’m very excited about this tool, mainly because I hope it will be the only one and I won’t need to go to MOS, look for tools, download, install and run them every time I have a problem with my database. I really hope that it will be used by the support and be adopted by customers. And I mainly hope it will reduce the time of analyzing an SR, and will allow quicker resolutions. If you manage to use this tool in real life, let me know.
I would like to thank Lauren, Scott and Bryan for meeting with me and listening to what I had to say about the support. I hope I managed to help them as a customer.
As I always say, when you go to Oracle Open World, go to meet the people that are behind the computer monitors, either planning the tools, developing them or supporting them, it is always interesting.

6 thoughts on “Do You Know the TFA Collector?”

  1. Glad to see you writing about it. Part of my job is to interview DBA’s, and I always ask if they have used TFA.
    This tool is not as widely known as it should be; it is a very important diagnostic tool.
    Here is an example: Knowing the approximate time of an incident on a RAC cluster, the following command can return all interesting bits of data that may have have contributed to the outage:
    tfactl analyze -from “Nov/29/2015 13:00:00” -to “Nov/29/2015 13:30:00”

    1. Thanks for your comment, it is an important tool, and it seems that not even all support engineers use it yet. I know the support guys that created this tool are doing their best to make everybody’s life easier, and they push the tool internally as well.
      Thanks for the example, there are really many use-cases for it.

  2. I’m curious.. is there a tool that “we”, the non-Oracle Support guys use to read TFA output in a somewhat consolidated fashion?

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post