Welcome to the future. This timeline is amazing! The global hyperloop network is finally complete.
The bioengineered glow-in-the-dark Redwood Forest in the Sahara is a vacation paradise.
And best of all, we have colonies throughout the Solar System. One thing that might surprise
you about the future is that everyone still uses SQL. I didn’t see that coming.
So to prepare you for the future, we now explore an essential part of SQL: Joins.
The database we will be using to learn about joins has 5 Tables:
martian, base, visitor, inventory, and supply.
The Martian table is a list of all people
living on the Red Planet. After all, if you are living on Mars, you
are no longer an Earthling - you’re a Martian! The “base” table holds information on
all the different habitats on Planet #4. In addition to serving as a stopover for people
visiting the outer planets, Mars is now a tourist destination.
Every visitor to Mars is tracked in the “visitor” table.
The “inventory” table keeps track of the supplies at each base.
And the “supply” table shows what is available at the central Martian distribution center.
Let’s take a closer look at the “Martian” and “Base” tables.
Here are the columns in the “Martian” table…
The “super_id” is the “ID” of the Martian they report to.
And here are the columns in the “Base” table.
These two tables are connected, because each “Martian” has a “base_id” identifying
where they live and work. Suppose you need to generate a report listing
the full names of all Martians and the name of their home base.
How would you do it? The Martian name is in the “Martian” table,
while the “base” name is in the “Base” table.
The solution is to JOIN these two tables together by the “base_id.”
To begin, look at the DATA in the Martian table… and the Base table.
Let us visualize how to join these tables together by the “base_id.”
Take the first row from the “Martian” table...
Find the row in the Base table with the matching base_id…
And then join them together to make a new, larger row.
You continue doing this row… by row… by row…
This brings us to John Carter. He does not have a base_id, so what do we
do? Do we include him in the join? Or leave him
out? This is a very important question.
The answer depends on the TYPE of join you do.
For now, we will leave John Carter out. The result is 12 rows.
Each row is a combination of data from both tables.
Having joined these two tables, you can select data from these rows like an ordinary SELECT query.
By the way, notice that there is currently
no Martian at the Olympus Mons Spa & Casino. This is because the staff is en route from
the Moon and will not arrive for a few more weeks.
Here’s how to write a join using SQL. Begin with SELECT * so we can see all columns.
This will help in understanding the join. Next, specify the two tables.
FROM martian … INNER JOIN base. We will call “martian” the left table,
and “base” the right table. The “JOIN” keyword indicates we will be
joining these two tables. We’ll explain the INNER keyword in just a moment.
Next, specify HOW to connect two rows from
these tables with an ON clause. We only want to connect a martian row with
a base row if they have the same base_id. In the “ON” clause, you have to specify
both the table name... and column name. This makes it clear which table each column
is from. Now execute…
The result contains all the columns from the Martian table…
And all the columns from the Base table. The rows are matched by base_id.
John Carter and the Olympus Mons Spa & Casino do not appear in the result.
This is because neither one had a match in the other table.
We can now update this query to return only those columns we need for our report.
First name… last name… and base name… Execute.
When joining the tables, we encountered two problematic rows. John Carter is in the Martian
table, but his base is unknown. And the Olympus Mons Spa & Casino currently has no Martians.
So how do you tell SQL whether or not to include rows from a table that do not have a match
in the other? Hmm?? We have two tables, and we have two choices per table. Include rows
without a match or exclude them. This means there are 4 total options. Spoiler Alert:
SQL has a JOIN for each of the four options. Let’s visit them now.
We return to the “Martian” and “Base” tables.
We will call the “Martian” table the left table, and “Base” the right table.
Each table has a row with no match in the other table.
Here is the syntax for joining these two tables: Write SELECT followed by the columns you want
FROM martian... blank... JOIN base
ON martian.base_id = base.base_id The ON clause specifies HOW rows from the
two tables will be connected. Because this is a SELECT query, you can also
have WHERE and ORDER clauses. All that remains is to fill in the blank and
specify what kind of JOIN to perform.
We will learn about four types of joins: INNER.. LEFT .. RIGHT .. and FULL.
An INNER JOIN will only return connected rows
when there is a matching base_id in both tables.
A LEFT JOIN will return EVERY row from the left table, even if there is no matching row
in the right table. If a Martian row has no match in the Base
table, it will return nulls for ALL the columns in the right table.
A RIGHT JOIN will return EVERY row from the right table, even when there isn’t a match
in the left table. When a Base row does not have a matching Martian
row, null values will be used for ALL columns in the left table.
And finally there is the FULL JOIN. This is sometimes called a FULL OUTER JOIN.
The FULL JOIN is a combination of the LEFT JOIN and RIGHT JOIN.
A full join returns every row from the left table and every row from the right table.
When the base_ids match, the rows are connected. But when there isn’t a match, the row is
still included in the JOIN with nulls for the columns from the other table.
A common way to visualize the difference between these 4 kinds of joins is with Venn Diagrams.
The Inner Join… Left Join… Right Join… and Full Join…
The inner join diagram highlights how only rows with matching columns are connected.
The left join diagram shows that all rows in the left table are returned, even those
without a match on the right. The right join is the mirror image of the
left join. And the full join shows that all
rows from both tables are returned. When two rows match, they are connected.
Otherwise, nulls will fill in the gaps.
I just received a Dear John letter. It’s especially sad because my name isn’t even
John! It reads, “Dear John. Not every database supports the four joins you discussed. For
example, MySql and SQLite do not support FULL JOINs. Sincerely, R. E. Joinder”
I will not be deterred. We are going to see examples of these four kinds of joins using Postgres.
Maybe the other databases will… join us?
Before seeing examples for INNER, LEFT, RIGHT,
and FULL joins I would like to revisit our first query.
SELECT * FROM martian
INNER JOIN base ON martian.base_id = base.base_id
Notice how the output has TWO columns named base_id.
The first one is from the “martian” table,
and the second one is from the “base” table.
What if instead of selecting all columns from both tables, we only select the martian_id,
base_id, and base_name.
We get an error. This is because it is ambiguous which base_id column we want.
Is it the base_id from the martian table or base table?
To avoid ambiguity, you specify the table
AND column name in the SELECT clause. We will select the “base_id” from the
“base” table... Execute…
Repeatedly typing the table name can make queries a bit word-y, and hands a bit sore-y
We can fix this by creating an “alias” for each table name
To do this, use the “AS” keyword. If we write “FROM martian AS m”, then
‘m’ is now an alias for the table name “martian”
And let us give the “base” table an alias of “b”
Now everywhere in our query we can use “m” instead of the table name “martian”, and
use “b” instead of the table name “base”. The “ON” clause becomes nicer and shorter:
And the SELECT clause is more compact as well
Now that you have properly been introduced to Joins, it is time to see some examples.
Not just one or two examples, mind you, but a LOT of examples. But not too many! We don’t
want you to feel bored. Somewhere between not quite enough and way too many. That is
how many examples we are going to see…. In our next video. So JOIN me in the next
video as we boldly go where no one has gone before… Ok, that’s a lie. If you look
at the number of views on the next video, you will see exactly how many people have
gone before you.