7 SQL: Retrieve data from multiple tables

In this chapter, we will discuss how we can combine (join) multiple tables in order to retrieve data. We will again use the schema music_marketplace that describes a hypothetical marketplace for music. Recall that the schema includes the following tables:

  • songs: stores information about songs.
  • label: stores information about music labels.
  • artist: stores information about artists.
  • artist_songs: stores which songs are produced by which artists.
  • charts: stores the maximum rank that a songs achieves in the charts (the hypothetical marketplace has top-selling charts).

Recall that the Entity Relationship diagram of this schema is as follows:

The Entity Relationship diagram of schema `music_marketplace`.

Figure 7.1: The Entity Relationship diagram of schema music_marketplace.

 

First, we will connect to the database (Section 4.1):

library(DBI)
library(odbc)
library(RODBC)
library(tidyverse)
con = DBI::dbConnect(odbc::odbc(), 
                    Driver= "MySQL", 
                    Server='mysql-isys3350.bc.edu', 
                    UID='your_username', 
                    PWD= 'your_password',
                    Port= 3306)

 

Joins allow us to combine columns from two or more tables into a single result. There are various types of joins, including:

  • Inner join: retrieves only rows that satisfy the join condition
  • Left outer join: retrieves only rows that satisfy the join condition plus rows that do not satisfy the join condition from the left table.
  • Right outer join: retrieves only rows that satisfy the join condition plus rows that do not satisfy the join condition from the right table.

We discuss these joins next.

7.1 Inner join

In general, to join data from two tables, we need to have the same column in both tables. For instance, in Figure 5.1, we can join tables charts and songs on their common column trackid. So if we want to get the song information of all tracks that are included in the charts, we can write the following query:

SELECT *  
FROM music_marketplace.charts INNER JOIN music_marketplace.songs
ON music_marketplace.charts.trackid = music_marketplace.songs.trackid;
Table 7.1: Displaying records 1 - 10
trackId maxRank durationInCharts firstDateInCharts lastDateInCharts title subtitle trackid label genre released labelId price
887832 95 0 2016-01-17 16:38:32 2016-01-17 16:38:32 K&G Beat Original Mix 887832 Planet Mu Dubstep 2015-10-15 7397 1.49
907738 42 20 2015-08-16 12:21:35 2015-09-05 15:43:31 Sodium Original Mix 907738 Dangerous New Age Drum & Bass 2015-08-10 13562 1.49
3711813 93 3 2015-12-27 16:38:57 2015-12-30 16:39:06 Nightmares in Reality feat. Maksim Original Mix 3711813 Inspected Records Glitch Hop 2015-07-03 18995 1.49
4423896 56 301 2015-06-27 16:23:18 2016-04-23 16:39:36 Strandbar Disko 4423896 Olsen Records Indie Dance / Nu Disco 2015-09-11 30474 1.49
4927722 57 266 2015-03-29 16:22:17 2015-12-20 16:38:10 The Fifth Hand Original Mix 4927722 Delicieuse Records Electronica / Downtempo 2015-09-28 28282 1.49
4927724 20 276 2015-03-08 00:13:31 2015-12-09 16:38:08 True Thoughts feat. Anna Naklab Original Mix 4927724 Delicieuse Records Electronica / Downtempo 2015-09-28 28282 1.49
4969569 62 48 2016-02-12 16:40:14 2016-03-31 16:39:52 Business Original Mix 4969569 VP Records Reggae / Dancehall / Dub 2016-02-01 18027 1.49
4969576 60 6 2016-03-25 16:40:03 2016-03-31 16:39:52 Bruk It Original Mix 4969576 VP Records Reggae / Dancehall / Dub 2016-02-01 18027 1.49
6065153 60 7 2015-03-14 16:22:47 2015-03-21 16:22:52 Music Is My Life Original Mix 6065153 Mr. Carter Deep House 2015-03-09 26002 1.49
6079233 35 7 2015-04-11 16:26:24 2015-04-18 16:26:23 Basis Original Mix 6079233 Material Tech House 2015-04-06 5066 1.49


Note that the above join returns all columns from both tables!


A join condition identifies the joined column in each of the two tables, and indicates how the two columns should be compared. In the previous example, we use the equal = operator because we wanted to retrieve matching rows. Other logical operators can also be used.


Tables are usually joined based on foreign and primary keys; Above, the trackid column in table charts is a foreign key that references the trackid column in table songs. Joining on foreign keys is not required; we can join on any other columns that make sense even if their relationship is not defined in the database.


One cumbersome feature of the previous example is that we need to identify the table every time we access the joined column. To simplify, we can use table aliases as follows:

SELECT *  
FROM music_marketplace.charts t1 INNER JOIN music_marketplace.songs t2
ON t1.trackid = t2.trackid;
Table 7.2: Displaying records 1 - 10
trackId maxRank durationInCharts firstDateInCharts lastDateInCharts title subtitle trackid label genre released labelId price
887832 95 0 2016-01-17 16:38:32 2016-01-17 16:38:32 K&G Beat Original Mix 887832 Planet Mu Dubstep 2015-10-15 7397 1.49
907738 42 20 2015-08-16 12:21:35 2015-09-05 15:43:31 Sodium Original Mix 907738 Dangerous New Age Drum & Bass 2015-08-10 13562 1.49
3711813 93 3 2015-12-27 16:38:57 2015-12-30 16:39:06 Nightmares in Reality feat. Maksim Original Mix 3711813 Inspected Records Glitch Hop 2015-07-03 18995 1.49
4423896 56 301 2015-06-27 16:23:18 2016-04-23 16:39:36 Strandbar Disko 4423896 Olsen Records Indie Dance / Nu Disco 2015-09-11 30474 1.49
4927722 57 266 2015-03-29 16:22:17 2015-12-20 16:38:10 The Fifth Hand Original Mix 4927722 Delicieuse Records Electronica / Downtempo 2015-09-28 28282 1.49
4927724 20 276 2015-03-08 00:13:31 2015-12-09 16:38:08 True Thoughts feat. Anna Naklab Original Mix 4927724 Delicieuse Records Electronica / Downtempo 2015-09-28 28282 1.49
4969569 62 48 2016-02-12 16:40:14 2016-03-31 16:39:52 Business Original Mix 4969569 VP Records Reggae / Dancehall / Dub 2016-02-01 18027 1.49
4969576 60 6 2016-03-25 16:40:03 2016-03-31 16:39:52 Bruk It Original Mix 4969576 VP Records Reggae / Dancehall / Dub 2016-02-01 18027 1.49
6065153 60 7 2015-03-14 16:22:47 2015-03-21 16:22:52 Music Is My Life Original Mix 6065153 Mr. Carter Deep House 2015-03-09 26002 1.49
6079233 35 7 2015-04-11 16:26:24 2015-04-18 16:26:23 Basis Original Mix 6079233 Material Tech House 2015-04-06 5066 1.49


Of course, the join condition can be extended:

SELECT *  
FROM music_marketplace.charts t1 INNER JOIN music_marketplace.songs t2
ON t1.trackid = t2.trackid AND t1.maxRank > 95;
Table 7.3: Displaying records 1 - 10
trackId maxRank durationInCharts firstDateInCharts lastDateInCharts title subtitle trackid label genre released labelId price
6191715 96 0 2015-03-28 16:24:43 2015-03-28 16:24:43 A Woman Alone Original Mix 6191715 Robsoul Recordings House 2015-03-20 2204 1.49
6274556 96 1 2015-03-15 16:22:09 2015-03-16 16:22:13 Disgustingly Beautiful feat. Isa Kurz Original Mix 6274556 Tru Thoughts Electronica / Downtempo 2015-03-09 7505 1.49
6306434 99 1 2015-04-19 16:26:20 2015-04-20 16:25:52 So Me Born Original Mix 6306434 H2O Records Reggae / Dancehall / Dub 2015-03-10 6254 1.49
6318110 99 1 2015-03-15 16:26:28 2015-03-16 16:26:32 Ready For Sunshine Original Mix 6318110 Statement! Trance 2015-03-09 3175 1.49
6334749 97 0 2015-03-30 16:26:35 2015-03-30 16:26:35 Debug Original Mix 6334749 Tronic Techno 2015-03-23 12576 1.49
6348920 100 0 2015-03-14 16:24:06 2015-03-14 16:24:06 The Blood Ties Original Mix 6348920 Rebirth Society Hardcore / Hard Techno 2015-03-09 26914 1.49
6360895 97 0 2015-03-28 16:24:22 2015-03-28 16:24:22 Wasted Youth Original Mix 6360895 Hexagon Recordings Hardcore / Hard Techno 2015-03-21 40248 1.49
6361054 99 0 2015-03-29 16:24:29 2015-03-29 16:24:29 Boyah Bass Original Mix 6361054 Out Of This World Records Hardcore / Hard Techno 2015-03-23 46737 1.49
6362843 98 0 2015-03-16 16:23:16 2015-03-16 16:23:16 Precious Original Mix 6362843 Wolf + Lamb Records Electronica / Downtempo 2015-03-09 12379 1.49
6370452 96 0 2015-04-19 16:25:27 2015-04-19 16:25:27 Railway Original Mix 6370452 Inwave Minimal / Deep Tech 2015-04-13 42452 1.49

And finally, we can join more than two tables on different columns:

SELECT  t1.trackid,   maxRank, title, remixer, name
FROM music_marketplace.charts t1 INNER JOIN music_marketplace.songs t2
ON t1.trackid = t2.trackid AND t1.maxRank > 95 
INNER JOIN music_marketplace.artist_songs t3 ON
t2.trackid = t3.trackid 
INNER JOIN music_marketplace.artist t4 ON
t3.artistId = t4.artistId;
Table 7.4: Displaying records 1 - 10
trackid maxRank title remixer name
6191715 96 A Woman Alone False Philip Weeks
6274556 96 Disgustingly Beautiful feat. Isa Kurz False Manu Delago
6306434 99 So Me Born False Assassin Agent Sasco
6318110 99 Ready For Sunshine False Same K
6334749 97 Debug False Raffaele Rizzi
6348920 100 The Blood Ties False Mental Crush
6360895 97 Wasted Youth False Fresh Otis
6361054 99 Boyah Bass False DJ Punch
6362843 98 Precious False Cameo Culture
6370452 96 Railway False Vinyl Speed Adjust


Note how in the previous query we used t1 to specify from which table we want to retrive column trackid, as trackid appears in multiple tables. We do not need to use the table name for the other four columns as they are unuiquely defined.


7.2 Outer joins

Inner joins retrieve only rows that satisfy the join condition. But what if we want to keep all rows from a table and match information from another table whenever such information exists?

7.2.1 Left joins

For instance, table label is incomplete: it does not store information about all labels that appear in the songs table. We can use a left join to combine tables songs and label such that we keep all the information we want from songs and add the link from table label whenever that is available:

SELECT  title, t1.label, link 
FROM music_marketplace.songs t1 LEFT JOIN music_marketplace.label t2
ON t1.label = t2.label
order by title;
Table 7.5: Displaying records 1 - 10
title label link
#015 Faut Section NA
#DACKSQUAD Rottun Recordings /label/rottun-recordings/8587
#Ecstasy Indeependence Recordings NA
#Frei Auftakt Records NA
#Hey Beat Service Audio (RazNitzanMusic) /label/beat-service-audio-raznitzanmusic/23431
#Hey RNM Bundles NA
#MGTOW The Sektorz Electro NA
#MGTOW The Sektorz Electro NA
#SuperHero Mondo Records NA
#TAVLD Evolution Records NA


In cases where the label does not exist in table label the LEFT JOIN will return the information of the left table and add NULL (NA) values in the column link.


Note that in this example, the column label is not a foreign key. The two columns have an undefinded relationship according to the ERD of Figure 5.1; but conceptually, we know that they store the same values, and as a result, we can use them to join the two tables.


7.2.2 Right joins

The right outer joins are the same as left outer joins, with the only difference that we keep all rows from the right table:

SELECT  title, t1.label, link 
FROM music_marketplace.label t2 RIGHT JOIN music_marketplace.songs t1 
ON t1.label = t2.label
order by title;
Table 7.6: Displaying records 1 - 10
title label link
#015 Faut Section NA
#DACKSQUAD Rottun Recordings /label/rottun-recordings/8587
#Ecstasy Indeependence Recordings NA
#Frei Auftakt Records NA
#Hey Beat Service Audio (RazNitzanMusic) /label/beat-service-audio-raznitzanmusic/23431
#Hey RNM Bundles NA
#MGTOW The Sektorz Electro NA
#MGTOW The Sektorz Electro NA
#SuperHero Mondo Records NA
#TAVLD Evolution Records NA


In the above I used a RIGHT JOIN but changed the order of the two tables label and song in the query, and as a result, I got the same result as with the left join.


7.3 Unions

Similar to joins, unions combine data from multiple tables. But instead of combining columns, unions combine rows (similar to bind_rows in R—but note that joins are different than bind_columns).

Here is an example of a union between tables songs and label:

SELECT  label
FROM music_marketplace.songs
UNION 
SELECT link as label
FROM music_marketplace.label
ORDER BY label;
Table 7.7: Displaying records 1 - 10
label
Inc
Inc.
#138
#Goldrush Recordings (Armada)
&amp
+Mas Label
/label/100-percent-pure/1470
/label/1605/10775
/label/17-steps/42211
/label/2-owls-records/33454

For comments, suggestions, errors, and typos, please email me at: