I prefer "in" myself. Exists can be unpredictable with nulls.
I also think a sub clause as "bad". I only use it when checking something. If it was going to run frequently as in production I'd rewrite it as join.
Select
id,
PersonName
from
Person a
where exists
(select 1 from Sales s where a.id = s.PersonID)
Select
id,
PersonName
from
Person a
where not exists
(select 1 from Sales s where a.id = s.PersonID)
I find them unpredictable because I'm dealing with a lot of legacy data and less common (to me at least) old data types.
I'm generally looking for specific data to test a query before I run it against a much larger set of data. Or I'm filtering and/or transforming all the weird old data.
..and I generally don't know what I'm doing or have a mentor to ask.
What does appropriately optimised mean in this case...
We were discussing partition's this week in work we felt unless you had a very specific use case they were more hassle than they were worth. One of min devs wanted to try them for performance.
What's your opinion on partitions.
I would suggest on this system which is excessively normalised, and where only the last 2yrs is relevant for most part. That should be their focus rather than partitions. Also I know from experience they change their tables and views often.
"and, when used correctly, will generally outperform joins for filtering data when the queries and datasets are appropriately optimised."
Indeed but to me someone who is asking what 'exists' does, isn't someone who is building their own database or has the control over the performance, indexing, etc.
For that situation, my point has come out to be correct most of the time because without that control, exists have had terrible performance compared to going through the joining route.
If you are able to DBA the situation, then use exists ofc.
How would you explain it to a none techy?
Partitions are far from an instant performance boost, they can significantly speed up common queries on large datasets where the common searches are always for a similar data duration (say a day or a week), but at the expense of increased overhead on data modification, and placing additional requirements in query creation onto the users so that they write queries where the optimiser uses the partitions properly.
Are you primarily working with a OLTP (online transaction processing) or OLAP (online analytics processing) focused DB? I would agree that generally, optimising the wider structures, relationships, indexes and things like retention periods will almost certainly give better results than partitioning, partitioning is best used when those other aspects have been fully exhausted and your datasets are still creating performance challenges purely through sheer size, but be prepared to have to train those using the dB in how to write queries that effectively use those partitions and how to read an explain plan to check it.
It's not really a DBA role either, it's the data architect (formally appointed or not) who defines the structure of the data, key relationships, indexes etc. It's also the architects responsibility to ensure that is appropriately documented and available to those who need it (whether they are dbas, analysts or whatever).
Generally most SQL methods are highly design dependent to ensure they are giving you the information you need, so understanding the design and the implications it has on the query processing is vital, hence why I'm not a fan of 'use X, it's easier' type answers. Joins can still throw plenty of unexpected outputs if the underlying data isn't understood. One of the big benefits of exists is that you can never accidentally create a Cartesian multiplication effect through not understanding the data. (The example I used earlier around people and sales wouldn't work with joins without a lot more complexity, as there's a one to many relationship between people and sales that would have to be factored in).