9 SQL: Subqueries
In this chapter, we will show how we can create subqueries. We will again use the schema music_marketplace
that describes a hypothetical marketplace for music. Recall that the schema includes the following tables:
songs
: stores information about songs.label
: stores information about music labels.artist
: stores information about artists.artist_songs
: stores which songs are produced by which artists.charts
: stores the maximum rank that a songs achieves in the charts (the hypothetical marketplace has top-selling charts).
Recall that the Entity Relationship diagram of this schema is as follows:
First, we will connect to the database (Section 4.1):
library(DBI)
library(odbc)
library(RODBC)
library(tidyverse)
= DBI::dbConnect(odbc::odbc(),
con Driver= "MySQL",
Server='mysql-isys3350.bc.edu',
UID='your_username',
PWD= 'your_password',
Port= 3306)
Subqueries are SELECT statements that are nested within other SELECT statements. To define a subquery, we need to inlcude the SELECT statement within parentheses. Subqueries often appear:
- In a
WHERE
clause - In a
FROM
clause
Subqueries can return a single value, a list of values, or a table. We show examples of subqueries next.
9.1 Subqueries in the WHERE
clause
How could we retrieve the number songs for which we do not know the label link? (i.e., songs for which their label’s link is unknown—NULL)
SELECT count(*)
FROM music_marketplace.songs t1
WHERE t1.label NOT IN
SELECT DISTINCT label FROM music_marketplace.label WHERE link is not NULL) (
count(*) |
---|
67970 |
In the previous, the query inside the parenthesis is a subquery.
We could achieve the same result by using a left join as follows:
SELECT count(*)
FROM music_marketplace.songs t1 LEFT JOIN music_marketplace.label t2
ON t1.label = t2.label
where link is NULL;
count(*) |
---|
67970 |
IN
operator. In fact, when we use the IN
operator, the subquery needs to return a lit of values (i.e., it cannot return multiple columns).
9.2 Comparison operators
What if we wanted to get all the songs that have price
greater than the average price of the songs of genre House
?
SELECT title, price
FROM music_marketplace.songs
WHERE price >
SELECT AVG(price) FROM music_marketplace.songs WHERE genre = 'House') (
title | price |
---|---|
If I Want You | 1.49 |
Suki | 1.49 |
Goofy's Nightmare | 1.49 |
Suki | 1.49 |
Golden Nightfall | 1.49 |
Hope For The Right | 1.49 |
The Night's Already Started | 1.49 |
Yungle | 1.49 |
Sail | 1.49 |
Chromatik [Orbzone] | 1.49 |
>
above to filter the results based on a single value of a subquery!
9.3 The ALL
keyword
The previous discussion shows how we can use comparison operators to filter on a single value. But sometimes, we want to filter on multiple values. The keyword ALL
allows us to do so as shown in the following table:
Condition | Equivalent | Description |
---|---|---|
x > ALL (1, 2) | x > 2 | The comparison is TRUE when x is greater than the maximum returned value |
x < ALL (1, 2) | x < 1 | The comparison is TRUE when x is less than the minimum returned value |
x != ALL (1, 2) | x NOT IN (1, 2) | x is not in the returned values |
For instance, we might want to get all the songs that have durationInCharts
greater than the all the durationInCharts
of songs of genre Breaks
:
SELECT *
FROM music_marketplace.charts
WHERE durationInCharts > ALL
SELECT durationInCharts FROM music_marketplace.charts c
(inner join music_marketplace.songs s on s.trackid = c.trackid WHERE genre = 'Breaks')
trackId | maxRank | durationInCharts | firstDateInCharts | lastDateInCharts |
---|---|---|---|---|
6267474 | 38 | 472 | 2015-03-20 16:22:41 | 2016-07-04 16:38:23 |
6406052 | 9 | 458 | 2015-04-09 16:23:36 | 2016-07-10 16:39:05 |
6430041 | 7 | 449 | 2015-03-31 16:22:31 | 2016-06-22 16:38:26 |
6442684 | 2 | 466 | 2015-04-01 16:22:18 | 2016-07-10 16:38:10 |
6444735 | 1 | 475 | 2015-03-22 16:26:13 | 2016-07-09 16:40:36 |
6489111 | 2 | 449 | 2015-04-14 16:25:37 | 2016-07-06 16:39:57 |
6495896 | 4 | 466 | 2015-03-26 16:24:06 | 2016-07-04 16:39:09 |
6509119 | 15 | 457 | 2015-04-10 16:22:45 | 2016-07-10 16:38:20 |
6519495 | 1 | 452 | 2015-04-10 16:26:51 | 2016-07-05 16:39:19 |
The above is the same as the following:
SELECT *
FROM music_marketplace.charts
WHERE durationInCharts >
SELECT max(durationInCharts) FROM music_marketplace.charts c
(inner join music_marketplace.songs s on s.trackid = c.trackid WHERE genre = 'Breaks')
trackId | maxRank | durationInCharts | firstDateInCharts | lastDateInCharts |
---|---|---|---|---|
6267474 | 38 | 472 | 2015-03-20 16:22:41 | 2016-07-04 16:38:23 |
6406052 | 9 | 458 | 2015-04-09 16:23:36 | 2016-07-10 16:39:05 |
6430041 | 7 | 449 | 2015-03-31 16:22:31 | 2016-06-22 16:38:26 |
6442684 | 2 | 466 | 2015-04-01 16:22:18 | 2016-07-10 16:38:10 |
6444735 | 1 | 475 | 2015-03-22 16:26:13 | 2016-07-09 16:40:36 |
6489111 | 2 | 449 | 2015-04-14 16:25:37 | 2016-07-06 16:39:57 |
6495896 | 4 | 466 | 2015-03-26 16:24:06 | 2016-07-04 16:39:09 |
6509119 | 15 | 457 | 2015-04-10 16:22:45 | 2016-07-10 16:38:20 |
6519495 | 1 | 452 | 2015-04-10 16:26:51 | 2016-07-05 16:39:19 |
9.4 The ANY
keyword
Similar to the ALL
keyword, there is the ANY
keyword, which works as follows:
Condition | Equivalent | Description |
---|---|---|
x > ANY (1, 2) | x > 1 | The comparison is TRUE when x is greater than the minimum returned value |
x < ANY (1, 2) | x < 2 | The comparison is TRUE when x is less than the maximum returned value |
x != ANY (1, 2) | (x != 1) OR (x!=2) | x is not equal to at least one of the returned values |
For instance, we might want to get all the songs that have durationInCharts
greater than some (or any) of the durationInCharts
of songs of genre Breaks
:
SELECT *
FROM music_marketplace.charts
WHERE durationInCharts > ANY
SELECT durationInCharts FROM music_marketplace.charts c
(inner join music_marketplace.songs s on s.trackid = c.trackid WHERE genre = 'Breaks')
trackId | maxRank | durationInCharts | firstDateInCharts | lastDateInCharts |
---|---|---|---|---|
907738 | 42 | 20 | 2015-08-16 12:21:35 | 2015-09-05 15:43:31 |
3711813 | 93 | 3 | 2015-12-27 16:38:57 | 2015-12-30 16:39:06 |
4423896 | 56 | 301 | 2015-06-27 16:23:18 | 2016-04-23 16:39:36 |
4927722 | 57 | 266 | 2015-03-29 16:22:17 | 2015-12-20 16:38:10 |
4927724 | 20 | 276 | 2015-03-08 00:13:31 | 2015-12-09 16:38:08 |
4969569 | 62 | 48 | 2016-02-12 16:40:14 | 2016-03-31 16:39:52 |
4969576 | 60 | 6 | 2016-03-25 16:40:03 | 2016-03-31 16:39:52 |
6065153 | 60 | 7 | 2015-03-14 16:22:47 | 2015-03-21 16:22:52 |
6079233 | 35 | 7 | 2015-04-11 16:26:24 | 2015-04-18 16:26:23 |
6079234 | 41 | 7 | 2015-04-11 16:26:24 | 2015-04-18 16:26:23 |
9.6 The EXISTS
keyword
The EXISTS
keyword tests whether a subquery returns any results. For instance, we might be interested in selecting all the artists for which there are other artists with fewer twitter_followers
:
SELECT *
FROM music_marketplace.artist a1
WHERE EXISTS
SELECT name FROM music_marketplace.artist a2
(WHERE a1.twitter_followers > a2.twitter_followers)
LIMIT 10;
name | artistId | twitter_followers | accuracy_score |
---|---|---|---|
Black Sun Empire | 4 | 15336 | 1.000 |
Tony. | 12 | 1645 | 1.000 |
Steve Smooth | 18 | 49465 | 1.000 |
DJ Rhythm | 26 | 737 | 1.000 |
Richard Earnshaw | 38 | 9 | 1.000 |
Simon Grey | 40 | 110 | 1.000 |
Pete Bones | 47 | 315 | 1.000 |
Celeda | 64 | 307 | 1.000 |
First Choice | 82 | 43509 | 1.000 |
jay tripwire | 88 | 1170 | 0.833 |
or the opposite:
SELECT *
FROM music_marketplace.artist a1
WHERE NOT EXISTS
SELECT name FROM music_marketplace.artist a2
(WHERE a1.twitter_followers > a2.twitter_followers);
name | artistId | twitter_followers | accuracy_score |
---|---|---|---|
Hardy Heller | 215 | 0 | 1.000 |
Solarisheights | 1093 | 0 | 0.867 |
Mateo Murphy | 1922 | 0 | 1.000 |
Banzai Republic | 2762 | 0 | 1.000 |
Rennie Foster | 3571 | 0 | 1.000 |
francesco pico | 3715 | 0 | 0.857 |
Seth Vogt | 4905 | 0 | 1.000 |
Incolumis | 5204 | 0 | 1.000 |
Bob Zopp | 5285 | 0 | 1.000 |
Jose Amnesia | 5309 | 0 | 1.000 |
9.7 Subqueries in the FROM
clause
Subqueries in the FROM
clause create ad-hoc tables (views) that we can refer to in the same query as if they were actual tables. Assume that we want to rank the results of a union:
SELECT * FROM
SELECT label
(FROM music_marketplace.songs
UNION
SELECT link as label
FROM music_marketplace.label
) rORDER BY label;
label |
---|
Inc |
Inc. |
#138 |
#Goldrush Recordings (Armada) |
& |
+Mas Label |
/label/100-percent-pure/1470 |
/label/1605/10775 |
/label/17-steps/42211 |
/label/2-owls-records/33454 |
FROM
clause we must assign an alias to it!
Now I can take the previous union and join it with the table songs:
SELECT * FROM music_marketplace.songs s INNER JOIN
SELECT label
(FROM music_marketplace.songs
UNION
SELECT link as label
FROM music_marketplace.label
) rON r.label = s.label
LIMIT 10;
title | subtitle | trackid | label | genre | released | labelId | price | label |
---|---|---|---|---|---|---|---|---|
If I Want You | Retroid Remix | 91247 | Tilth Music | Breaks | 2016-01-18 | 1517 | 1.49 | Tilth Music |
Suki | Original Mix | 268746 | Phonocult Records | Minimal / Deep Tech | 2016-02-23 | 1114 | 1.49 | Phonocult Records |
Goofy's Nightmare | Original Mix | 268747 | Phonocult Records | Minimal / Deep Tech | 2016-02-23 | 1114 | 1.49 | Phonocult Records |
Suki | Seph Remix | 268748 | Phonocult Records | Minimal / Deep Tech | 2016-02-23 | 1114 | 1.49 | Phonocult Records |
Golden Nightfall | Muzzaik Remix | 308444 | Lip Recordings | House | 2016-01-25 | 5809 | 1.49 | Lip Recordings |
Hope For The Right | Original Mix | 308456 | Lip Recordings | House | 2015-10-12 | 5809 | 1.49 | Lip Recordings |
The Night's Already Started | Belocca Dark Vox | 308463 | Lip Recordings | House | 2015-09-25 | 5809 | 1.49 | Lip Recordings |
Yungle | Original Mix | 495959 | Malicious Damage | Drum & Bass | 2015-09-30 | 9188 | 1.49 | Malicious Damage |
Sail | Original Mix | 495960 | Malicious Damage | Electronica / Downtempo | 2015-09-30 | 9188 | 1.49 | Malicious Damage |
Chromatik [Orbzone] | Original Mix | 495961 | Malicious Damage | Electronica / Downtempo | 2015-09-30 | 9188 | 1.49 | Malicious Damage |
For comments, suggestions, errors, and typos, please email me at: kokkodis@bc.edu