mysql取随机数之索引使用

Posted by Vincent on May 29, 2011

之前写过关于mysql取随机数的记录,当时没有去分析表里数据量的分布情况,所以当执行计划不同时,实际的效果有非常大的差别。
[cc lang='sql']dbaeyes 06:47:52>select count(*) ,status from act_info group by status;
+----------+--------+
| count(*) | status |
+----------+--------+
| 989978 | 0 |
| 20 | 1 |
| 1 | 2 |
| 1 | 3 |
+----------+--------+[/cc]
表中的status的大部分数据都是0(初始化状态)
用之前文章中提到的方式 索引 idx_info_status(status,id)
[cc lang='sql']dbaeyes 06:53:30>explain SELECT t1.*
-> FROM
-> (SELECT ROUND(RAND() * (
-> (SELECT MAX(id) FROM act_info WHERE status = 0)-
-> (SELECT MIN(id) FROM act_info WHERE status = 0))
-> +(SELECT MIN(id) FROM act_info WHERE status = 0)) AS id) AS t2,act_info AS t1
-> WHERE t1.id >= t2.id AND t1.status = 0
-> ORDER BY t1.id LIMIT 3 ;
+----+-------------+------------+--------+--------------------------------+------------------------+---------+-------+--------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+--------------------------------+------------------------+---------+-------+--------+------------------------------+
| 1 | PRIMARY | | system | NULL | NULL | NULL | NULL | 1 | |
| 1 | PRIMARY | t1 | ref | PRIMARY,idx_info_stat | idx_info_stat| 4 | const | 257438 | Using where |
| 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
| 5 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
| 4 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
| 3 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+------------+--------+--------------------------------+------------------------+---------+-------+--------+------------------------------+
6 rows in set (0.00 sec)[/cc]
通过explain可以看到可选主键primary或者索引方式,mysql选择了索引扫描,扫描记录是有25w
直接执行时间:
3 rows in set (3.81 sec)

通过改用强制索引(主键)的方式

[cc lang='sql']dbaeyes 06:53:48>explain SELECT t1.*
-> FROM
-> (SELECT ROUND(RAND() * (
-> (SELECT MAX(id) FROM act_info WHERE status = 0)-
-> (SELECT MIN(id) FROM act_info WHERE status = 0))
-> +(SELECT MIN(id) FROM act_info WHERE status = 0)) AS id) AS t2,act_info AS t1 force index (primary)
-> WHERE t1.id >= t2.id AND t1.status = 0
-> ORDER BY t1.id LIMIT 3 ;
+----+-------------+------------+--------+---------------+---------+---------+------+--------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+------+--------+------------------------------+
| 1 | PRIMARY | | system | NULL | NULL | NULL | NULL | 1 | |
| 1 | PRIMARY | t1 | range | PRIMARY | PRIMARY | 8 | NULL | 494509 | Using where |
| 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
| 5 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
| 4 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
| 3 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+------------+--------+---------------+---------+---------+------+--------+------------------------------+
6 rows in set (0.00 sec)[/cc]
通过主键扫描的记录数有49w之多
sql时间执行时间
3 rows in set (0.00 sec)
explain看到是通过索引扫描的记录数比主键要少,但是实际消耗的时间远远超过通过主键扫描。
理解:上面两条sql的性能差距只是在id > ? and status = 0

当表中大量数据都是status=0时,通过主键快速定位到行,然后删选少量数据,就可以得到满足的记录。
而通过索引扫描,再回表,效率就比较低。

--如果获取的status=1,数据量较少时
[cc lang='sql']dbaeyes 06:55:05>explain SELECT t1.*
-> FROM
-> (SELECT ROUND(RAND() * (
-> (SELECT MAX(id) FROM act_info WHERE status = 1)-
-> (SELECT MIN(id) FROM act_info WHERE status = 1))
-> +(SELECT MIN(id) FROM act_info WHERE status = 1)) AS id) AS t2,act_info AS t1
-> WHERE t1.id >= t2.id AND t1.status =1
-> ORDER BY t1.id LIMIT 3 ;[/cc]

那么走status效率更高。因为索引status=1可以快速的获取所需的记录。而通过主键需要扫描非常大量的记录才能得到3条满足需求的记录

[cc lang='sql']dbaeyes 06:55:05>explain SELECT t1.*
-> FROM
-> (SELECT ROUND(RAND() * (
-> (SELECT MAX(id) FROM act_info WHERE status = 1)-
-> (SELECT MIN(id) FROM act_info WHERE status = 1))
-> +(SELECT MIN(id) FROM act_info WHERE status = 1)) AS id) AS t2,act_info AS t1
-> WHERE t1.id >= t2.id AND t1.status =1
-> ORDER BY t1.id LIMIT 3 ;
+----+-------------+------------+--------+--------------------------------+------------------------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+--------------------------------+------------------------+---------+------+------+------------------------------+
| 1 | PRIMARY | | system | NULL | NULL | NULL | NULL | 1 | |
| 1 | PRIMARY | t1 | range | PRIMARY,idx_info_stat | idx_info_stat | 12 | NULL | 2 | Using where |
| 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
| 5 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
| 4 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
| 3 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+------------+--------+--------------------------------+------------------------+---------+------+------+------------------------------+
6 rows in set (0.01 sec)

dbaeyes 06:55:09>SELECT t1.*
-> FROM
-> (SELECT ROUND(RAND() * (
-> (SELECT MAX(id) FROM act_info WHERE status = 1)-
-> (SELECT MIN(id) FROM act_info WHERE status = 1))
-> +(SELECT MIN(id) FROM act_info WHERE status = 1)) AS id) AS t2,act_info AS t1
-> WHERE t1.id >= t2.id AND t1.status =1
-> ORDER BY t1.id LIMIT 3 ;
+--------+-------------------+--------+-------------+---------------------+---------------------+
| id | account | status | activity_id | gmt_create | gmt_modified |
+--------+-------------------+--------+-------------+---------------------+---------------------+
| 287970 | 297969@dbaeye.com | 1 | 1 | 2011-05-24 15:39:39 | 2011-05-27 14:28:17 |
| 290249 | 300248@dbaeye.com | 1 | 1 | 2011-05-24 15:39:39 | 2011-05-25 15:18:13 |
| 295588 | 305587@dbaeye.com | 1 | 1 | 2011-05-24 15:39:39 | 2011-05-26 13:47:08 |
+--------+-------------------+--------+-------------+---------------------+---------------------+
3 rows in set (0.00 sec)[/cc]

[cc lang='sql']
dbaeyes 06:55:38>explain SELECT t1.*
-> FROM
-> (SELECT ROUND(RAND() * (
-> (SELECT MAX(id) FROM act_info WHERE status = 1)-
-> (SELECT MIN(id) FROM act_info WHERE status = 1))
-> +(SELECT MIN(id) FROM act_info WHERE status = 1)) AS id) AS t2,act_info AS t1 force index (primary)
-> WHERE t1.id >= t2.id AND t1.status =1
-> ORDER BY t1.id LIMIT 3 ;
+----+-------------+------------+--------+---------------+---------+---------+------+--------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+------+--------+------------------------------+
| 1 | PRIMARY | | system | NULL | NULL | NULL | NULL | 1 | |
| 1 | PRIMARY | t1 | range | PRIMARY | PRIMARY | 8 | NULL |153278 | Using where |
| 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
| 5 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
| 4 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
| 3 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+------------+--------+---------------+---------+---------+------+--------+------------------------------+
6 rows in set (0.00 sec)

dbaeyes 06:55:53>
dbaeyes 06:55:54>SELECT t1.*
-> FROM
-> (SELECT ROUND(RAND() * (
-> (SELECT MAX(id) FROM act_info WHERE status = 1)-
-> (SELECT MIN(id) FROM act_info WHERE status = 1))
-> +(SELECT MIN(id) FROM act_info WHERE status = 1)) AS id) AS t2,act_info AS t1 force index (primary)
-> WHERE t1.id >= t2.id AND t1.status =1
-> ORDER BY t1.id LIMIT 3 ;
+--------+-------------------+--------+-------------+---------------------+---------------------+
| id | account | status | activity_id | gmt_create | gmt_modified |
+--------+-------------------+--------+-------------+---------------------+---------------------+
| 664048 | 674047@dbaeye.com | 1 | 1 | 2011-05-24 15:40:08 | 2011-05-25 16:01:27 |
| 675669 | 685668@dbaeye.com | 1 | 1 | 2011-05-24 15:40:09 | 2011-05-25 17:47:41 |
| 732710 | 742709@dbaeye.com | 1 | 1 | 2011-05-24 15:40:13 | 2011-05-25 16:26:18 |
+--------+-------------------+--------+-------------+---------------------+---------------------+
3 rows in set (0.16 sec)
[/cc]
--EOF--


This work is licensed under a CC A-S 4.0 International License.