GotoDBA Database Thoughts Who Can We Trust?

Who Can We Trust?

Searching google for answers is something we all do with every question or request for information we have. I sometimes wonder how much of this data is simply wrong. This can be because people write stuff they don’t know about, people are sharing false information (that was published by others by accident or on purpose), people with agenda that don’t care about facts, and so on.

This is the same in the professional world as well. Anyone can start a blog (I did it, so there is your proof) and anybody can go to forums and answer questions. With the professional world (as well as in other areas) you need to know if you can trust this person or not. There are several database related sites on the internet that me (and many others) don’t trust and don’t like (don’t ask me for names please), while other sites are excellent and very trustful. The junior DBA or someone that needs answers but is not a DBA at all, wouldn’t know which site is which. they will probably click the few first that appear on google.

There is usually no way to know who is behind a site, blog or answer. There is usually no way to validate that they are what they claim, so we basically need to decide for ourselves and take chances. For example, I claim to be an Oracle ACE and a senior consultant. You can check that I’m an ACE on Oracle’s site, and you can check my past positions on LinkedIn, for example. But does that mean I know what I’m talking about? Does the fact that I am a consultant for almost 20 years say I’m a good one?

I have several examples for stuff on the internet and it sometimes annoys me to see these things. The worst example I had was a few years ago with one of my biggest clients. They have a NetApp storage (they are using NFS) and they had a network problem that disconnected the storage from database server. This is a known issue, once the network is disconnected, NetApp keeps a certain lock on the files and Oracle can’t start because it has a problem with the control files (the error in the alert log says that the control file cannot be locked because it’s already in use). When the sysadmin (a great guy) called me, I told him that I know the problem, but I didn’t remember the command that he needs to use in the NetApp console to release the lock. I was driving and my laptop wasn’t with me so I couldn’t look it up. He opened google and searched the Oracle error message to look for the command. I was on the line when he said that there is a forum thread with the same error, so he read it to me. It was something like this (I’m paraphrasing as I don’t remember, but this is how it sounded to me):
q: “I have a problem starting  my database. I had a network problem, so my NFS got disconnected from the database server and now I cannot start the database. Oracle claims that it cannot lock the control files, and I see them in the directory. What should I do?”
a: “Delete the control files and try again”
“WHAT?” I shouted into the phone “DID HE JUST SAY TO DELETE THE CONTROL FILES? Whatever you do or try, DO NOT delete the control files”. The sysadmin said “Of course I won’t unless you tell me”, but I was shocked that this guy suggested it in a forum just like that.
For the “fun” of it, he continue reading. The response was “OK, now Oracle claims that it can’t find the control files”, how surprising.
At the end we found the command. The command “priv set diag; lock status -f” will show the existing locks and “priv set diag; sm_mon -l <server_ip>” will release the locks for the specific IP.

Another thing that I saw lately was a discussion on LinkedIn. This is not a case of people suggesting bad things, but it will explain that using forums or even LinkedIn might be problematic. If, by any chance, you know the discussion or took part of it, I just want to explain the problem, I don’t mean to insult or criticize anyone (as opposed to the person who suggested to delete the control files in the forum thread).

The thread talked about a problem when starting the database. The error was “ORA-01113: file 1 needs media recovery”, and file 1 is SYSTEM tablespace, so this is indeed a problem. The thing with discussions like that is that everyone (me included) are trying to give a short answer to help in this situation. Well, this is tricky since there are many reasons for the system datafile to need recovery, these are a few:

  1. The file got completely corrupted because of disk problem or whatever. In this case restore will probably be the only solution.
  2. The database crashed while it was in “backup mode”. In this case all we need to do is to execute “alter database end backup;” and open the database (it actually happened to me in the past). Just as a side note, if you read my post how to diagnose a problem this is an example for why I first check the logs before I do anything. Starting a recovery in this case will be the wrong solution. Doing the right thing will bring the database up and running in no time.
  3. The database simply crashed and there was a write problem to the datafiles. A regular database recovery will take the required information from the redo logs and will solve the problem.

You see what I mean? Even if I wrote that in the discussion, many other people will write other stuff as well, everyone with their own experience and cases they saw in the past. Some of the answers were actually suggesting to check the alert log and get some more information, which is great, but with so many answers, what should the guy who asked the question do?

This is not simple, the amount of data out there is unbelievable, and some (if not most) is inaccurate (to say the least). We all need to realize who we can trust. We need to be suspicious when asking something in a forum, LinkedIn or Facebook (or even Oracle forums), and think for ourselves (or consult someone) if the answer looks reasonable.

8 thoughts on “Who Can We Trust?”

  1. This is about Computers.Same things happens in social media too. It has many suggestions,information(unnecessary,not correct,never tested,unreliable,completely wrong) which are shared and trusted. Actually it is a waste of time.Many people without verifying forward/trust it and comes to a judgment.I really hate these things.

  2. Exactly! It happens a lot in health care (as @swesley_perth replied on twitter), politics and in every other field as well. In some cases it can cause serious damage (to many things, and in our case, your database)

  3. This is also a big problem when your company is in agreement with a big “support” company and
    and then this is the only one you actually call. You usually get support from the one who is on guard but usually you never know his or hers expertise….

  4. Fabi, there are great experts out there, and I obviously can’t name them all. I can say that you can probably trust the Oracle ACE community, as Oracle checks the ACE community activities very well. People that are invited to speak in conferences often, write books, write articles for Oracle, and more should be trustful.
    Probably the best resources would be Tim Hall at http://www.oracle-base.com and Jonathan Lewis at jonathanlewis.wordpress.com. Besides them, http://www.oaktable.net is a network of Oracle experts (ACEs or not), and I don’t know all of them, but the ones I know from the list are definitely experts you can trust.
    Last but not least is the famous asktom.oracle.com, and even though Tom Kyte itself is not part of the team anymore, these Oracle employees give excellent information.
    I hope that helps.
    Liron

  5. That’s a nice post Liron and lots of good comments.
    I think that a good sign of someone to trust is – if they will admit error. Someone who blogs about how X works and then gets a comment that corrects or refines that explanation – simply their willingness to allow that comment to be seen is a good sign. If they take on board the correction I think is a real sign of someone to trust.
    Anyone who blogs and says “in this earlier post I was wrong and the correct explanation is…” is also a good sign.
    This is because even the very best make mistakes. And the reason they are the very best is because they learn from those mistakes.
    On the other side of the coin, a strong indicator of a bogus “expert” is someone who will not provide any evidence for the technical “truth” they put forward. “I just know” or “in my experience” is not evidence, it is an opinion. Refusal to provide proof is tantamount to an admission of ignorance, I feel.

  6. Thanks Martin, I agree. However, in some forums (or when asking a question on LinkedIn or Facebook) you might not have the chance to see such behaviors. In these cases, you will need to use your gut feeling sometimes and be very very cautious.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post