Hardcore MySQL query optimisation

Mort · 21 Jan 2011 at 09:41

Hey all

Having some issues optimising a query in mysql, and wondered if any of the gurus in here could pick it apart?

I know about adding indexes and stuff but optimising really high result set queries is doing me in a bit

All info is here - http://stackoverflow.com/questions/...ql-structure-queries-for-large-mailing-groups

Any help would be appreciated

Hambut_Bulge · 26 Jan 2011 at 16:27

Not sure if you're still needing help with this, but here goes...

SELECT COUNT(c_id) FROM contacts, mgroups
LEFT JOIN mgroups_explicit ON mg_id = me_mg_id
WHERE mgroups.site_id = '10'
AND mg_id = '20'
AND me_c_id = c_id
AND contacts.site_id = '10'
OR (contacts.site_id = '10' AND ( c_tags LIKE '%tag1%')) AND c_id NOT IN
( SELECT mex_c_id FROM mgroups_exclude WHERE c_id = mex_c_id ) GROUP BY c_id

First things off you're mixing old and new (ANSI) style joins in the same query. This is considered a bad idea in SQL circles. By old style I mean we write a query with a join along these lines

SELECT a.column_name, b.column2
FROM table1 a, second_table b
WHERE a.id_key = b.fid_key
AND b.some_other_criteria = 'Y';

In the newer ANSI style we'd rewrite the above to this:

SELECT a.column_name, b.column2
FROM table1 a INNER JOIN second_table b ON a.id_key = b.fid_key
WHERE b.some_other_criteria = 'Y';

Its neater and easier to read which bits are join conditions and which are where clauses. Its also best to get into the habit of using ANSI style as old style support may (at some point) be discontinued.

Also try and be consistent in your use of dot notation and/or aliases. Again it makes big queries easier to read.

Back to your problem query, I began by starting to convert it into ANSI style and straight-away noticed that you don't have a join condition between contacts and mgroups. This means that optimizer will create a cross join (also called a cartesian product), which was probably something you don't want to do. The cross join (in case you didn't know) joins every row in the contacts table with every row in the mgroups table. So if you have 50,000 rows in contacts and 20,000 rows in mgroup you're going to get a joined result set containing 1,000,000,000 rows!

The other thing that is going to slow this query drastically is the subquery on mgroups_exclude. A subquery is executed once for each row in the outer query eg:

SELECT a.column1
FROM table1 a
WHERE a.id_key NOT IN ( SELECT * FROM table2 b WHERE a.id_key = b.fid_key);

Assume that table1 has 2,000,000 rows and table2 has 500,000. For each and every row in the outer query (table1) the database is going to have to do a full scan on the inner query. So to get a result the database will have read 1,000,000,000,000 rows and we may only be interested in 1,000! It will not touch any indexes no matter what.

To get around this we can use a left join (also called a left outer join) on the two tables.

SELECT a.column1
FROM table1 a LEFT JOIN table2 b ON a.id_key = b.fid_key
WHERE b.fid_key IS NULL;

An outer join does not require each record in the joined tables to have a matching record. So the example above we'd get all the records from table1 even if there is no match on table2. For non-matched records the database returns a NULL and we can test for that in the where clause. Now the optimizer can scan the indexes on the two tables id_key fields (assuming there are any), resulting in a much faster query.

So, to wrap up. I'd rewrite your orginal query thus:

SELECT COUNT( a.c_id )
FROM contacts a
INNER JOIN mgroups b ON a.c_id = b.mg_id
LEFT JOIN mgroups_explicit c ON b.mg_id = c.me_mg_id
LEFT JOIN mgroups_exclude d ON a.c_id = d.mex_c_id
WHERE b.mg_id = '20'
AND a.site_id = '10'
AND a.c_tags LIKE '%tag1%'
AND d.mex_c_id IS NULL
GROUP BY c_id;

Hope that helps

DAIR · 26 Jan 2011 at 16:47

^

Hes good..

Izi · 26 Jan 2011 at 18:46

DAIR said:
^

Hes good..

yeah I learnt some valuable things there, thanks...

Beansprout · 26 Jan 2011 at 19:47

That's pretty awesome

Mort....can you post the output from an EXPLAIN [query] for the before and after?

Mort · 28 Jan 2011 at 13:22

ahh nice one

got a few bits to do today but I will have a look as soon as I get a chance, thanks for the responses!

Beansprout · 29 Jan 2011 at 23:21

Hambut_Bulge...can you contact me, email in trust, if you have some spare time? I have a client in need of some optimisation of their rather messy site. Nothing too complex but I don't have the time to trawl through their scripts any more.

joeyjojo · 30 Jan 2011 at 14:01

DAIR said:
^

Hes good..

Real good..

Goksly · 31 Jan 2011 at 21:35

Are you sure a sub query is executed for each row and doesn't use indexes? I use sql server at work and was told to use subqueries if you don't need any columns and joins if you do. After looking at several execution plans in the past however, performance seems identical... as sql is set based general I'd be surprised a core function (sub queries) would be cursor like?

Beansprout · 31 Jan 2011 at 23:57

MySQL seems poor at them

I have a client with some terrible queries which after optimisation sped up significantly, all by rewriting from subqueries to joins, albeit with a little WHERE clause re-ordering + appropriate indexing.

Goksly · 1 Feb 2011 at 09:34

Ah that explains it then

Looks like Microsoft have done a good job for once eh?

Mort · 1 Feb 2011 at 10:31

Hambut_Bulge said:
Not sure if you're still needing help with this, but here goes...

Dude

Nice one, trying to puzzle this out at the mo, its returning 0 at the mo but I am sure thats because of my original explanation rather than your solution, thanks muchly!

Mort · 1 Feb 2011 at 10:43

Ahh, the inner join needed to be a.site_id = b.site_id

Thanks again Hambut_Bulge, you are an SQL machine

EXPLAIN BEFORE

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY mgroups ALL PRIMARY,mg_id NULL NULL NULL 9 Using temporary; Using filesort
1 PRIMARY mgroups_explicit ref me_mg_id me_mg_id 4 engine_4.mgroups.mg_id 8750
1 PRIMARY contacts ALL PRIMARY,c_id NULL NULL NULL 86012 Using where; Using join buffer
2 DEPENDENT SUBQUERY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE noticed after reading const table..

EXPLAIN AFTER

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE d system NULL NULL NULL NULL 0 const row not found
1 SIMPLE b const PRIMARY,mg_id PRIMARY 4 const 1
1 SIMPLE c system me_mg_id NULL NULL NULL 1
1 SIMPLE a ref site_id site_id 4 const 8155 Using where

Win