Putting aside any opinions on performance, I've been trying to test a notion about whether a couple queries would output the same data (ordering doesn't matter).
SELECT *
FROM articles
WHERE (
last_updated >= %s
OR id IN (1, 2, 3)
)
AND created_at IS NOT NULL
SELECT *
FROM articles
WHERE last_updated >= %s
AND created_at IS NOT NULL
UNION
SELECT *
FROM articles
WHERE id IN (1, 2, 3)
AND created_at IS NOT NULL
I think they're equivalent, but I can't prove it to myself.
Edit:
Aye, looking at the replies, I'm becoming aware that I left out a couple key assumptions I've made. Assuming:
a) id
is a PRIMARY KEY
(or otherwise UNIQUE
)
b) I mean equivalent
insofar as "the rows returned will contain equivalent data same (though maybe ordered differently)"
UNION
is used to append the result of one query to the result of another: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-union-transact-sql?view=sql-server-ver17Your two queries are not equivalent.
There aren't any joins in either query (and only one table involved), so that quoted bit of documentation isn't relevant.
Order aside, the two results would be identical In the same way that 2(4 + 3) = 2×4 + 2×3
I really didn't think that's correct--though it's been a few years since I did SQL regularly.
That should give a list of all articles updated after whatever date (regardless of ID), appended to a list of all articles where the ID is 1, 2, or 3 (regardless of when they were last updated). I would expect to see extra articles that only fit one criteria or the other, and also duplicate articles.
I included the join quote because an inner join would be the way to do this, rather than a union--though it would likely be less efficient than just filtering on the required parameters.
If I'm wrong here, I'd love an explanation of why.
It's equivalent because
UNION
removes duplicates; the behaviour you're describing happens withUNION ALL
. Since both queries are article.*, both halves will have the same columns and the dedupe will be successful.UNION
is less efficient because of this deduplication, but it's the default since that's what most people want. If that matters then you'd be correct that a JOIN version will be more efficient (possibly depending on indexes present and sql engine).