9
submitted 4 days ago* (last edited 3 days ago) by roadrunner_ex@lemmy.ca to c/sql@programming.dev

Putting aside any opinions on performance, I've been trying to test a notion about whether a couple queries would output the same data (ordering doesn't matter).

SELECT *
FROM articles
WHERE (
  last_updated >= %s
  OR id IN (1, 2, 3)
  )
  AND created_at IS NOT NULL
SELECT *
FROM articles
WHERE last_updated >= %s
  AND created_at IS NOT NULL
UNION
SELECT *
FROM articles
WHERE id IN (1, 2, 3)
  AND created_at IS NOT NULL

I think they're equivalent, but I can't prove it to myself.

Edit: Aye, looking at the replies, I'm becoming aware that I left out a couple key assumptions I've made. Assuming:

a) id is a PRIMARY KEY (or otherwise UNIQUE)

b) I mean equivalent insofar as "the rows returned will contain equivalent data same (though maybe ordered differently)"

you are viewing a single comment's thread
view the rest of the comments
[-] boatswain@infosec.pub 4 points 3 days ago

UNION is used to append the result of one query to the result of another: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-union-transact-sql?view=sql-server-ver17

A UNION concatenates result sets from two queries. But a UNION does not create individual rows from columns gathered from two tables. A JOIN compares columns from two tables, to create result rows composed of columns from two tables.

Your two queries are not equivalent.

[-] Deebster@programming.dev 2 points 3 days ago* (last edited 3 days ago)

There aren't any joins in either query (and only one table involved), so that quoted bit of documentation isn't relevant.

Order aside, the two results would be identical In the same way that 2(4 + 3) = 2×4 + 2×3

[-] boatswain@infosec.pub 1 points 3 days ago

I really didn't think that's correct--though it's been a few years since I did SQL regularly.

SELECT *
FROM articles
WHERE last_updated >= %s
  AND created_at IS NOT NULL
UNION
SELECT *
FROM articles
WHERE id IN (1, 2, 3)
  AND created_at IS NOT NULL

That should give a list of all articles updated after whatever date (regardless of ID), appended to a list of all articles where the ID is 1, 2, or 3 (regardless of when they were last updated). I would expect to see extra articles that only fit one criteria or the other, and also duplicate articles.

I included the join quote because an inner join would be the way to do this, rather than a union--though it would likely be less efficient than just filtering on the required parameters.

If I'm wrong here, I'd love an explanation of why.

[-] Deebster@programming.dev 1 points 3 days ago

It's equivalent because UNION removes duplicates; the behaviour you're describing happens with UNION ALL. Since both queries are article.*, both halves will have the same columns and the dedupe will be successful.

UNION is less efficient because of this deduplication, but it's the default since that's what most people want. If that matters then you'd be correct that a JOIN version will be more efficient (possibly depending on indexes present and sql engine).

this post was submitted on 10 Aug 2025
9 points (100.0% liked)

SQL

1051 readers
1 users here now

Related Fediverse communities:

Icon base by Delapouite under CC BY 3.0 with modifications to add a gradient

founded 2 years ago
MODERATORS