I get ask, by application developers, “how do you optimize MySQL”. I do lots of things that don’t really relate to a developer. I analyze the percent of queries are being pulled from cache for instance. What a developer can do to optimize the SQL they develop is a different questions. So here is a quick list of things applications developers should know about MySQL.
Explain will analyze your query.
This example shows the possible indexes (keys) that could be used and the index that was selected. 2,262 rows where selected and then sorted (Using file sorts) and one record was returned (limit 1).
mysql> explain SELECT 5/9*(temp_F-32) as t, 5/9*(dewpt_F-32) as td, speed_mps as spd, dir > where stn='KLDM' and date_time'2010-02-12 17:45' > order by ABS( date_time - CAST('2010-02-12 18:00:00' as datetime) ) limit 1;
+----+-------------+----------+------+-----------------------+------+---------+-------+------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------+-----------------------+------+---------+-------+------+-----------------------------+ | 1 | SIMPLE | metar_nc | ref | PRIMARY,stn,date_time | stn | 8 | const | 2262 | Using where; Using filesort | +----+-------------+----------+------+-----------------------+------+---------+-------+------+-----------------------------+
Using profiling can give you even more information; Don’t forget to turn it off with a ‘set profiling=0’ when you are done.
mysql> set profiling=1; +--------------------+----------+ | Status | Duration | +--------------------+----------+ | starting | 0.000110 | | Opening tables | 0.000014 | | query end | 0.000004 | | freeing items | 0.000008 | | logging slow query | 0.000002 | | cleaning up | 0.000003 | +--------------------+----------+ 6 rows in set (0.01 sec) mysql> set profiling=0;
Indexing Basics
Avoiding disk reads is the name of the game. Indexes are presorted and small. Two or three disk reads of an index can point to a large amount of data.
- MySQL Can only use Prefixes of the index
mysql> SELECT AVG(age) FROM user GROUP BY city;
This is a complex query that needs to scan all the rows. You can make it traverse shorter indexes by adding an index on (city,age) - Index (A,B) can be used for
WHERE A=5 , WHERE A=6 AND B=5 , WHERE A=7 AND B>5
It can’t be used for – WHERE B=6 AND B - Only Equality/List allows second key part usage
WHERE A=5 AND B>6 – will use 2 key parts
IN (1,2) AND B=2 – will use 2 key parts - A>5 and B=2 will use 1 key part only
The B=2 will be checked while reading row/index only
A=5 ORDER BY B – will use the index
A>5 ORDER BY B – will NOT use the index - For simple cross reference look ups, add the data to the index to skip the data read.
SELECT name FROM login=”Jack123”;
If this is a very common part of your code, make the index (login,name). When the index is read the data is in memory. Don’t add every column you just double the disk space and magnify the disk access.
More Tips
- Check that all tables have PRIMARY KEYs on columns with high cardinality. Primary keys must be unique.
A column like, `gender` low cardinality (selectivity), an id column (Int – auto increment) is a good candidate to become a primary key. - All joins (inter, outer, ‘,’) should have indexes.
- Fields you often search on (appear frequently in WHERE, ORDER BY or GROUP BY clauses) need indexes.
But don’t add too many: the worst thing you can do is to add an index on every column of a table. - Don’t use DISTINCT when you have or could use GROUP BY
- Open to the server just before you are going to use it. Unless you are using a persistent connection library, don’t open a database connections and then run minutes of calculations before making your query. You may find your connections has been “gone away” before you make your query.
- When your index many columns, create a hash column. Then your query will look like:
SELECT *
FROM table
WHERE hash_column = MD5( CONCAT(col1, col2) )
AND col1=’aaa’ AND col2=’bbb’; - Use less RAM by declaring columns only as large as they need to be to hold the values stored in them.
Use CHAR type when possible (instead of VARCHAR, BLOB or TEXT) — when values of a column have constant length: MD5-hash (32 symbols) or ICAO or IATA airport code (4 and 3 symbols). This is also true for indexes. If only the last 4 symbols are unique index only that part. - Use SQL_NO_CACHE when you are SELECT-ing frequently updated data or large sets of data. This way you will not kick good data out of the cache.
- Avoid wildcards at the start of LIKE queries. (LIKE ‘%find%’). Finding ‘1234find’ in 10000 records requires up to 40,000 searches.
- Normalizing redundant data is good but don’t split a table because you have too many columns.
- Think of storing users sessions data (or any non-critical / high access data) in MEMORY table — it’s very fast.
- Divide complex queries into several simpler ones — they have more chances to be cached, so will be quicker.
- A column must be declared as NOT NULL if it really is. This speeds up table traversing.
- If you usually retrieve rows in the same order like expr1, expr2, …, make ALTER TABLE … ORDER BY expr1, expr2, … to optimize the table.
- Don’t use PHP loop to fetch rows from database one by one just because you can — use IN instead, e.g.
SELECT *
FROM `table`
WHERE `id` IN (1,7,13,42); - Reuse your database connections. Opening a new connection to the database will add one or more seconds to your query.
In PHP use mysql_pconnect() to open a persistent connection with mod_php. Perl provides persistent connections with Apache::DBI with mod_perl. Python does not have persistent connections in mod_python. But you can maintain them in your application. (http://www.modpython.org/FAQ/faqw.py) - When inserting data, insert only those values that differs from the default. This reduces the query parsing time.
- Use INSERT DELAYED or INSERT LOW_PRIORITY (for MyISAM) to write to your change log table.
Also, if it’s MyISAM, you can add DELAY_KEY_WRITE=1 option — this makes index updates faster because they are not flushed to disk until the table is closed. - For your web application, images and other binary assets should normally be stored as files.
- That is, store only a reference to the file rather than the file itself in the database.
Tweet
Nic Zero wrote:
Typo in “WHERE B=6 AND B
Link | May 6th, 2010 at 2:02 am
Nic Zero wrote:
Tips 1 & 2 are peculiar. A PK has to be unique — cardinality doesn’t get any bigger than that!
Link | May 6th, 2010 at 2:05 am
admin wrote:
Face palm. I updated it.
Link | May 6th, 2010 at 9:36 am