Of course it’s the Database, I don’t need a Profiler

July 28, 2007 at 6:26 pm Leave a comment

We all know that when any database backed application starts to behave slowly, its always the databases fault, right? After all, if the application code hasn’t change so what else could it be?

If you have spent much time around Oracle, you are sure to have seen complex queries when Oracle decides to changes its query plan for “no good reason” and application performance changes drastically. Perhaps this is what leads to all this finger pointing at the database, mostly by people who don’t understand how it works.

Last week I was contacted to investigate a performance problem in an application that had everyone baffled. The system works off 15 – 20 queues of requests, and a batch job is responsible for processing each queue, so there are up to 20 batch jobs running at once. Normal throughput for a batch job is several requests per second, but this had dropped to several a minute and things were backing up.

Lost Time

The DBAs had provided me with a level 1 trace file captured during this slowdown that looked very normal – no bad queries in sight. Only strange thing was a lot of walk clock time in the trace compared to CPU seconds (5 or 6 times more elapsed time). My initial reaction was certain there was a locking issue somewhere in the database, so I asked the DBAs to create a level 8 trace file that would surely should show me where the all this “lost time” had gone.

No such luck, Oracle was not waiting on anything, so where had all this time gone?

Perhaps the database server was under heavy load you say? Not so, this was a 24 core server 50% idle through the test, humm.

Luckily, I had read Optimising Oracle Performance and actually understand much of the gibberish inside an SQL Trace file. Still mystified where all the time had gone, I fired up Emacs and used Ruby to tell me the biggest time between two database calls, which stood at an impressive 30 seconds between a fetch on cursor 18 and exec on 19 – ie the “lost time” was in the application somewhere between those two queries, not the database calls.

In this case its a bit greyer, as the application is written in PLSQL, but the problem turned out to be the OS struggling to open log files in a directory containing over 2 million files!

Power to the Profiler

The morale of this story, apart from the need to housekeep your log files, is that its important to have the ability to profile your application code as well as the database calls.

Instead of writing scripts to parse Oracle trace files, I should have ran the code through DBMS_Profiler which would have solved this mystery in about two minutes and saved much head scratching and other people making ridiculous suggestions to change irrelevant database parameters!

Advertisements

Entry filed under: oracle.

Emacs and Oracle ActiveRecord, Oracle and stored procedures

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


Feeds


%d bloggers like this: