also, I only know that this works because I tried it. Do you have a link that would explain why this actually works? Because I would interpret
SELECT
b.*
FROM
b
LEFT OUTER JOIN
a
ON b.id = a.id
WHERE
a.id IS NULL
to mean that the in the a table, the id field must be null.
Nothing can ever equal null, not even another null. Try this:
SELECT CASE WHEN null = null THEN 'null equals null' ELSE 'null doesnt equal null' END
Because null represents unknown, we cannot say whether one unknown equals another unknown, because they are unknown. In schrodingers cat experiment, we cannot say whether the cat is dead or alive. If there were two boxes with two cats, we couldnt say if they were in the same state. This concept is core to this discussion about nulls and their unknownness.
A left join takes the table on the left and matches up every row on the right, putting nulls in for those rows where no match is found:
color.id, color.val
--
1, red
2, green
3, blue
fruit.name, fruit.color_id
--
apple, 1
apple, 2
lime, 2
color LEFT JOIN fruit:
1, red, apple, 1
2, green, apple, 2
2, green, lime, 2
3, blue, null, null
No fruit are blue. The blue line exists with null fruit.
So it stands to reason that we can say "which colours have no associated fruit?" by saying
WHERE fruit.name IS NULL
Note that I said "null can never be equal another null"
This means that any fruit with a null colour ID cannot be a part of the join. If we had a row where we really didnt know what colour a lemon was:
lemon, null
This row will NOT show in color LEFT JOIN fruit, because the join condition is an equals (color LEFT JOIN fruit ON colorid = id) and we know that NULL = <anything> is false
If fruit.colorid is NULL it cannot ever be equal to any color.id, so it is dropped from the results.
Also, if we had a colour that we didnt have an id for:
null, magenta
No fruits would ever be classed as magenta, not even our lemon, for the same reason. The id of this color is null, it cannot equal anthing else, ever, it doesnt participate in the join and the result of the join condition is always false where one or both sides are null.
-
So basically, where your thinking was confused was in assuming that any row in A where the ID was null, would match up with any row in B where the id was also null; not so. The results of
a LEFT JOIN b
a1, b1
a2, null <-- hole
a3, b3
a4, b4
a5, null <-- hole
produce a "solid column of a rows, with no holes", a weak column of b rows, having holes where there was no match"
You will never see a null in the a column.
ergo, for every row in a join b where b.id is null, it means there is no row in b for that row in a. Or, "the <a.id> is NOT IN <list of all b.id> "
--
It is also worth noting here that if you use a LEFT OUTER JOIN and then make a WHERE clause that uses any operator other than IS or IS NOT, then you revert the query to an inner join:
SELECT
*
FROM
color
LEFT OUTER JOIN
fruit
ON
fruit.colorid = color.id
WHERE
fruit.price > 100
Remember our lines:
color.id, color.val, fruit.id, fruit.colorid, price
1, red, apple, 1, 200
2, green, apple, 2, 200
2, green, lime, 2, 300
3, blue, null, null, null
No fruit are blue, the price will be listed as null for blue fruit. Because
NULL > 100 is "unknown" it is made false, and the row is dropped.
Ive seen many queries where someone asked "this is supposed to return me all X that dont have a matching Y but it doesnt return anything" and its been because there has been a where clause trying to equate Y
In this case, either move the condition into the join, not the where, or provide for the cases where there is a null:
SELECT
*
FROM
color
LEFT OUTER JOIN
fruit
ON
fruit.colorid = color.id AND
fruit.price > 100
SELECT
*
FROM
color
LEFT OUTER JOIN
fruit
ON
fruit.colorid = color.id
WHERE
(fruit.price > 100 OR fruit.price IS NULL)
Remember to use brackets if youre ANDing other conditions into the where.