So you have inherited a MySQL server and no one know what applications are use what databases. Space is every tight and every night transactions logs almost fill your hard disk. The server was once used as a replication server and it seems every database the company uses is duplicated here.
You could do a full dump, drop everything and wait for the requests to restore. (This might be OK on a test database.)
We could ask MySQL for all the databases that have tables that have been updated in the last 30 days.
$ mysql information_schema -e \ "select DISTINCT TABLE_SCHEMA as NAME from TABLES WHERE UPDATE_TIME IS NOT NULL and UPDATE_TIME > NOW() - INTERVAL 30 DAY" \ > Databases
But this only gives a list of databases that have changed. What about tables that are only read from?
Bin-Logs don’t help because they too only have updates. What we really need is a long snapshot of the general log. But, starting the general log will just fill your disk space even more.
As long as all queries are remote we can use tcpdump.
Filter all the MySQL connections for ‘FROM tablenames’.
$ tcpdump -i eth1 -s 0 -l -w - dst port 3306 | strings | \ sed -n 's/\(.*\)\(from.*\)/\2/p' | awk '{print $2}' > TableList $ cat TableList | sort | uniq > ShortList
Some of the queries may reference databases.tablename so we can filter out the database names and add them to your Database list.
$ cat TableList | sed -n 's/\./ /p' | awk '{print $1}' | sort | uniq >> Databases
Now for the queries that only use table names. We need to find the database each belongs to. This isn’t completely accurate because the table name may be in more then one databases.
$ for x in `cat TableList | sort | uniq` ; do mysql information_schema -e "select TABLE_SCHEMA from TABLES where TABLE_NAME='$x'\G" \ | grep TABLE | awk '{print $2}' ; done | sort | uniq >> Databases
To create the final list of database that should not be dropped.
$ sort Databases | uniq > KEEPTHESE ; cat KEEPTHESE
Now you can do the deed with some confidence your not removing anything being used.
P.S. Did I tell you, Never do anything you can’t reverse. Make a BACKUP FIRST!
Tweet
Baron Schwartz wrote:
Two ways that are even more accurate:
1. Use pt-query-digest with tcpdump and –group-by=tables.
2. Use Percona Server and look at INFORMATION_SCHEMA.TABLE_STATISTICS to see what tables are used.
Link | September 30th, 2011 at 1:04 pm
admin wrote:
Yup – Took me some time to work out the HOWTO but that works!
sudo /usr/sbin/tcpdump -i eth1 port 3306 -s 65535 -x -n -q -tttt -c 100000 | pt-query-digest –group-by=tables –type=tcpdump > report.txt
Link | October 2nd, 2011 at 11:19 pm