Hardcore MySQL query optimisation

Not sure if you're still needing help with this, but here goes...

SELECT COUNT(c_id) FROM contacts, mgroups
LEFT JOIN mgroups_explicit ON mg_id = me_mg_id
WHERE mgroups.site_id = '10'
AND mg_id = '20'
AND me_c_id = c_id
AND contacts.site_id = '10'
OR (contacts.site_id = '10' AND ( c_tags LIKE '%tag1%')) AND c_id NOT IN
( SELECT mex_c_id FROM mgroups_exclude WHERE c_id = mex_c_id ) GROUP BY c_id

First things off you're mixing old and new (ANSI) style joins in the same query. This is considered a bad idea in SQL circles. By old style I mean we write a query with a join along these lines

SELECT a.column_name, b.column2
FROM table1 a, second_table b
WHERE a.id_key = b.fid_key
AND b.some_other_criteria = 'Y';

In the newer ANSI style we'd rewrite the above to this:

SELECT a.column_name, b.column2
FROM table1 a INNER JOIN second_table b ON a.id_key = b.fid_key
WHERE b.some_other_criteria = 'Y';

Its neater and easier to read which bits are join conditions and which are where clauses. Its also best to get into the habit of using ANSI style as old style support may (at some point) be discontinued.

Also try and be consistent in your use of dot notation and/or aliases. Again it makes big queries easier to read.

Back to your problem query, I began by starting to convert it into ANSI style and straight-away noticed that you don't have a join condition between contacts and mgroups. This means that optimizer will create a cross join (also called a cartesian product), which was probably something you don't want to do. The cross join (in case you didn't know) joins every row in the contacts table with every row in the mgroups table. So if you have 50,000 rows in contacts and 20,000 rows in mgroup you're going to get a joined result set containing 1,000,000,000 rows!

The other thing that is going to slow this query drastically is the subquery on mgroups_exclude. A subquery is executed once for each row in the outer query eg:

SELECT a.column1
FROM table1 a
WHERE a.id_key NOT IN ( SELECT * FROM table2 b WHERE a.id_key = b.fid_key);

Assume that table1 has 2,000,000 rows and table2 has 500,000. For each and every row in the outer query (table1) the database is going to have to do a full scan on the inner query. So to get a result the database will have read 1,000,000,000,000 rows and we may only be interested in 1,000! It will not touch any indexes no matter what.

To get around this we can use a left join (also called a left outer join) on the two tables.

SELECT a.column1
FROM table1 a LEFT JOIN table2 b ON a.id_key = b.fid_key
WHERE b.fid_key IS NULL;

An outer join does not require each record in the joined tables to have a matching record. So the example above we'd get all the records from table1 even if there is no match on table2. For non-matched records the database returns a NULL and we can test for that in the where clause. Now the optimizer can scan the indexes on the two tables id_key fields (assuming there are any), resulting in a much faster query.

So, to wrap up. I'd rewrite your orginal query thus:

SELECT COUNT( a.c_id )
FROM contacts a
INNER JOIN mgroups b ON a.c_id = b.mg_id
LEFT JOIN mgroups_explicit c ON b.mg_id = c.me_mg_id
LEFT JOIN mgroups_exclude d ON a.c_id = d.mex_c_id
WHERE b.mg_id = '20'
AND a.site_id = '10'
AND a.c_tags LIKE '%tag1%'
AND d.mex_c_id IS NULL
GROUP BY c_id;

Hope that helps :)
 
ahh nice one :) got a few bits to do today but I will have a look as soon as I get a chance, thanks for the responses!
 
Hambut_Bulge...can you contact me, email in trust, if you have some spare time? I have a client in need of some optimisation of their rather messy site. Nothing too complex but I don't have the time to trawl through their scripts any more.
 
Are you sure a sub query is executed for each row and doesn't use indexes? I use sql server at work and was told to use subqueries if you don't need any columns and joins if you do. After looking at several execution plans in the past however, performance seems identical... as sql is set based general I'd be surprised a core function (sub queries) would be cursor like?
 
MySQL seems poor at them :)

I have a client with some terrible queries which after optimisation sped up significantly, all by rewriting from subqueries to joins, albeit with a little WHERE clause re-ordering + appropriate indexing.
 
Not sure if you're still needing help with this, but here goes...

Dude :) Nice one, trying to puzzle this out at the mo, its returning 0 at the mo but I am sure thats because of my original explanation rather than your solution, thanks muchly!
 
Ahh, the inner join needed to be a.site_id = b.site_id :) Thanks again Hambut_Bulge, you are an SQL machine :D

EXPLAIN BEFORE

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY mgroups ALL PRIMARY,mg_id NULL NULL NULL 9 Using temporary; Using filesort
1 PRIMARY mgroups_explicit ref me_mg_id me_mg_id 4 engine_4.mgroups.mg_id 8750
1 PRIMARY contacts ALL PRIMARY,c_id NULL NULL NULL 86012 Using where; Using join buffer
2 DEPENDENT SUBQUERY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE noticed after reading const table..

EXPLAIN AFTER

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE d system NULL NULL NULL NULL 0 const row not found
1 SIMPLE b const PRIMARY,mg_id PRIMARY 4 const 1
1 SIMPLE c system me_mg_id NULL NULL NULL 1
1 SIMPLE a ref site_id site_id 4 const 8155 Using where

Win :D
 
Last edited:
Back
Top Bottom