The Slow Query Log is Not

Cross posted from Gavin's personal blog, Information is Contagious.

The slow query log is one of the most misused features of MySQL. I say this because I often talk to people who are using it, but they're not getting everything they could from it. It's an awesomely powerful source of information -- when used correctly it can make performance optimization much simpler, and enable you to really understand what your mysql server spends its time doing.

Slow is faster than you think 

The most misunderstood part of the query log is the "slow" part. People often set the long query time to some high value (default 10 seconds) and wait for a few outlier queries to trickle in. At most they will reduce the threshold to 1 second to be really aggressive. Here's the a secret: those queries aren't really the ones that matter.

A query that runs for 60 seconds and hits your server twice a day, may look ripe for optimization. It probably is. You can probably get that query to run in less than a second and pat yourself on the back for improving it by several orders of magnitude. That's great, but that long query only represents a tiny fraction of what keeps your server busy.

On the other hand you probably have queries which may take 50 milliseconds to run, but they're executing once per second. In aggregate your server will be spending 4,320 seconds executing that query.Even if you improve that query to take 25 ms, a much more modest improvement, you're still cutting that 4k seconds of execution time in half.

If you never gather stats from your slow query log for less than 1 second, then you would never see that 50 ms query and never optimize it. Those are the queries that are most important to fix!

Pre-MySQL 5.5 the minimum time you could set was 1 second. Percona builds allowed you to set it to sub-second values which was huge. Now it's built-in to the Oracle MySQL builds. It was a such a powerful feature that it was impossible to ignore. I think lots of people have been trained to never think of it because old versions of mysql didn't allow it. Now they do - so use it!

You don't have to leave it on all the time

So you're sold on gathering data with a slower threshold, but you say "My server can't possibly handle writing that much slow log, I have thousands of queries per second!"

That's where sampling comes in. Don't capture everything all the time. Capture everything part of the time. I only gather about 1% of all the queries that hit the server.

That can be as simple as:
[sql]mysql -e "SET GLOBAL long_query_time=0.0; SET GLOBAL slow_log=1" && sleep(36) && 
mysql -e "SET GLOBAL slow_log=0"[/sql]

Percona builds also allow for rate limiting slow query log events. This is fantastic, because it means I can set it to log 1 in 4 events, and then leave my collection on for 4 times as long and still get 1% overall. This lets me sample over a longer period of time and get better coverage.

Process those logs!

Now that you're getting a firehose of performance data about your server, it's not really practical to simply read them top to bottom. You need a way to summarize the information to extract the important information.Not only that, but these collections are like a gold mine of information -- you don't want to throw them away!

You don't have to save the raw .log files; the best way to keep the data is to use pt-query-digest to process the log and insert the results into a database. This digest keeps stats about queries for each collection period. It intelligently strips out all the variable information to identify which queries are the same, and aggregates information about all of them. This compacts that huge file into just a few hundred rows. Keeping these digested samples is a good way for you to build a history of performance on your database.

Visualise the data

Now here's the part where I self promote: I wrote an awesome tool (we use it extensively at Box) to take this data that has been collected and digested, and give you a nice web interface for exploring and reporting on it. It's called Anemometer; it takes only minutes to get it up and running and it's a really powerful tool combined with the slow query log and pt-query-digest.

It allows you to quickly see which are the most expensive queries running on your system, even across multiple hosts!


That was really a lot of information, but getting slow query logging right will make a huge difference in how you handle performance optimization in your sever. Let's put it all together:

  1. When you turn on the slow query log, gather everything. That is set long_query_time = 0.0 or 0.001 or something similarly small.
  2. Sample your slow logs. You don't need to leave them on all the time. Set up a cron job to do this regularly so you have consistent data gathering. Aim for 1% sampling to start, you may never need more than that.
  3. Use pt-query-digest to process and store your data.
  4. Use Anemometer to visualise the data.