9 SQL: Subqueries

In this chapter, we will show how we can create subqueries. We will again use the schema music_marketplace that describes a hypothetical marketplace for music. Recall that the schema includes the following tables:

  • songs: stores information about songs.
  • label: stores information about music labels.
  • artist: stores information about artists.
  • artist_songs: stores which songs are produced by which artists.
  • charts: stores the maximum rank that a songs achieves in the charts (the hypothetical marketplace has top-selling charts).

Recall that the Entity Relationship diagram of this schema is as follows:

The Entity Relationship diagram of schema `music_marketplace`.

Figure 9.1: The Entity Relationship diagram of schema music_marketplace.

 

First, we will connect to the database (Section 4.1):

library(DBI)
library(odbc)
library(RODBC)
library(tidyverse)
con = DBI::dbConnect(odbc::odbc(), 
                    Driver= "MySQL", 
                    Server='mysql-isys3350.bc.edu', 
                    UID='your_username', 
                    PWD= 'your_password',
                    Port= 3306)

 

Subqueries are SELECT statements that are nested within other SELECT statements. To define a subquery, we need to inlcude the SELECT statement within parentheses. Subqueries often appear:

  • In a WHERE clause
  • In a FROM clause

Subqueries can return a single value, a list of values, or a table. We show examples of subqueries next.

9.1 Subqueries in the WHERE clause

How could we retrieve the number songs for which we do not know the label link? (i.e., songs for which their label’s link is unknown—NULL)

SELECT  count(*)
FROM music_marketplace.songs t1 
WHERE t1.label NOT IN 
(SELECT DISTINCT label FROM music_marketplace.label WHERE link is not NULL)
Table 9.1: 1 records
count(*)
67970


In the previous, the query inside the parenthesis is a subquery.


We could achieve the same result by using a left join as follows:

SELECT  count(*)
FROM music_marketplace.songs t1 LEFT JOIN music_marketplace.label t2
ON t1.label = t2.label
where link is NULL;
Table 9.2: 1 records
count(*)
67970


The above example showcases how we can use a subquery to provide list of values with the help of the IN operator. In fact, when we use the IN operator, the subquery needs to return a lit of values (i.e., it cannot return multiple columns).


9.2 Comparison operators

What if we wanted to get all the songs that have price greater than the average price of the songs of genre House?

SELECT  title, price
FROM music_marketplace.songs  
WHERE price > 
(SELECT AVG(price) FROM music_marketplace.songs WHERE genre = 'House')
Table 9.3: Displaying records 1 - 10
title price
If I Want You 1.49
Suki 1.49
Goofy's Nightmare 1.49
Suki 1.49
Golden Nightfall 1.49
Hope For The Right 1.49
The Night's Already Started 1.49
Yungle 1.49
Sail 1.49
Chromatik [Orbzone] 1.49


We can use a comparison operator such as > above to filter the results based on a single value of a subquery!


9.3 The ALL keyword

The previous discussion shows how we can use comparison operators to filter on a single value. But sometimes, we want to filter on multiple values. The keyword ALL allows us to do so as shown in the following table:

Condition Equivalent Description
x > ALL (1, 2) x > 2 The comparison is TRUE when x is greater than the maximum returned value
x < ALL (1, 2) x < 1 The comparison is TRUE when x is less than the minimum returned value
x != ALL (1, 2) x NOT IN (1, 2) x is not in the returned values

For instance, we might want to get all the songs that have durationInCharts greater than the all the durationInCharts of songs of genre Breaks:

SELECT  *
FROM music_marketplace.charts  
WHERE durationInCharts > ALL
(SELECT durationInCharts FROM music_marketplace.charts c
inner join music_marketplace.songs s on s.trackid = c.trackid WHERE genre = 'Breaks')
Table 9.4: 9 records
trackId maxRank durationInCharts firstDateInCharts lastDateInCharts
6267474 38 472 2015-03-20 16:22:41 2016-07-04 16:38:23
6406052 9 458 2015-04-09 16:23:36 2016-07-10 16:39:05
6430041 7 449 2015-03-31 16:22:31 2016-06-22 16:38:26
6442684 2 466 2015-04-01 16:22:18 2016-07-10 16:38:10
6444735 1 475 2015-03-22 16:26:13 2016-07-09 16:40:36
6489111 2 449 2015-04-14 16:25:37 2016-07-06 16:39:57
6495896 4 466 2015-03-26 16:24:06 2016-07-04 16:39:09
6509119 15 457 2015-04-10 16:22:45 2016-07-10 16:38:20
6519495 1 452 2015-04-10 16:26:51 2016-07-05 16:39:19

The above is the same as the following:

SELECT  *
FROM music_marketplace.charts  
WHERE durationInCharts > 
(SELECT max(durationInCharts) FROM music_marketplace.charts c
inner join music_marketplace.songs s on s.trackid = c.trackid WHERE genre = 'Breaks')
Table 9.5: 9 records
trackId maxRank durationInCharts firstDateInCharts lastDateInCharts
6267474 38 472 2015-03-20 16:22:41 2016-07-04 16:38:23
6406052 9 458 2015-04-09 16:23:36 2016-07-10 16:39:05
6430041 7 449 2015-03-31 16:22:31 2016-06-22 16:38:26
6442684 2 466 2015-04-01 16:22:18 2016-07-10 16:38:10
6444735 1 475 2015-03-22 16:26:13 2016-07-09 16:40:36
6489111 2 449 2015-04-14 16:25:37 2016-07-06 16:39:57
6495896 4 466 2015-03-26 16:24:06 2016-07-04 16:39:09
6509119 15 457 2015-04-10 16:22:45 2016-07-10 16:38:20
6519495 1 452 2015-04-10 16:26:51 2016-07-05 16:39:19

9.4 The ANY keyword

Similar to the ALL keyword, there is the ANY keyword, which works as follows:

Condition Equivalent Description
x > ANY (1, 2) x > 1 The comparison is TRUE when x is greater than the minimum returned value
x < ANY (1, 2) x < 2 The comparison is TRUE when x is less than the maximum returned value
x != ANY (1, 2) (x != 1) OR (x!=2) x is not equal to at least one of the returned values

For instance, we might want to get all the songs that have durationInCharts greater than some (or any) of the durationInCharts of songs of genre Breaks:

SELECT  *
FROM music_marketplace.charts  
WHERE durationInCharts > ANY
(SELECT durationInCharts FROM music_marketplace.charts c
inner join music_marketplace.songs s on s.trackid = c.trackid WHERE genre = 'Breaks')
Table 9.6: Displaying records 1 - 10
trackId maxRank durationInCharts firstDateInCharts lastDateInCharts
907738 42 20 2015-08-16 12:21:35 2015-09-05 15:43:31
3711813 93 3 2015-12-27 16:38:57 2015-12-30 16:39:06
4423896 56 301 2015-06-27 16:23:18 2016-04-23 16:39:36
4927722 57 266 2015-03-29 16:22:17 2015-12-20 16:38:10
4927724 20 276 2015-03-08 00:13:31 2015-12-09 16:38:08
4969569 62 48 2016-02-12 16:40:14 2016-03-31 16:39:52
4969576 60 6 2016-03-25 16:40:03 2016-03-31 16:39:52
6065153 60 7 2015-03-14 16:22:47 2015-03-21 16:22:52
6079233 35 7 2015-04-11 16:26:24 2015-04-18 16:26:23
6079234 41 7 2015-04-11 16:26:24 2015-04-18 16:26:23

9.5 Correlated subqueries

A correlated subquery depends on each row of the main query — and as a result, it is being executed for ebery row of the main query. For instance, what if we wanted to get all the songs in the charts that have durationInCharts greater than the average durationInCharts of songs that have entered the charts before them?

SELECT  *
FROM music_marketplace.charts c_out  
WHERE durationInCharts > 
(SELECT avg(durationInCharts) FROM music_marketplace.charts c
WHERE c.firstDateInCharts < c_out.firstDateInCharts)
LIMIT 10;
Table 9.7: Displaying records 1 - 10
trackId maxRank durationInCharts firstDateInCharts lastDateInCharts
4423896 56 301 2015-06-27 16:23:18 2016-04-23 16:39:36
4927722 57 266 2015-03-29 16:22:17 2015-12-20 16:38:10
4969569 62 48 2016-02-12 16:40:14 2016-03-31 16:39:52
6267470 1 118 2015-03-13 16:25:16 2015-07-09 16:23:36
6267474 38 472 2015-03-20 16:22:41 2016-07-04 16:38:23
6322210 6 53 2015-03-19 16:26:10 2015-05-11 16:26:33
6322211 2 160 2015-03-19 16:26:10 2015-08-26 15:45:22
6325634 4 53 2015-03-17 16:27:18 2015-05-09 16:26:44
6325673 50 287 2015-04-05 16:23:38 2016-01-17 16:38:49
6332186 2 304 2015-03-17 16:26:22 2016-01-15 16:39:59
Such correlated subqueries are hugely expensive (time consuming). Please do not attempt to run the previous query without using the limit clause, because it will consume a significant amount of resources.


9.6 The EXISTS keyword

The EXISTS keyword tests whether a subquery returns any results. For instance, we might be interested in selecting all the artists for which there are other artists with fewer twitter_followers:

SELECT  *
FROM music_marketplace.artist   a1
WHERE EXISTS
(SELECT name FROM music_marketplace.artist a2
WHERE a1.twitter_followers > a2.twitter_followers)
LIMIT 10;
Table 9.8: Displaying records 1 - 10
name artistId twitter_followers accuracy_score
Black Sun Empire 4 15336 1.000
Tony. 12 1645 1.000
Steve Smooth 18 49465 1.000
DJ Rhythm 26 737 1.000
Richard Earnshaw 38 9 1.000
Simon Grey 40 110 1.000
Pete Bones 47 315 1.000
Celeda 64 307 1.000
First Choice 82 43509 1.000
jay tripwire 88 1170 0.833

or the opposite:

SELECT  *
FROM music_marketplace.artist   a1
WHERE NOT EXISTS
(SELECT name FROM music_marketplace.artist a2
WHERE a1.twitter_followers > a2.twitter_followers);
Table 9.9: Displaying records 1 - 10
name artistId twitter_followers accuracy_score
Hardy Heller 215 0 1.000
Solarisheights 1093 0 0.867
Mateo Murphy 1922 0 1.000
Banzai Republic 2762 0 1.000
Rennie Foster 3571 0 1.000
francesco pico 3715 0 0.857
Seth Vogt 4905 0 1.000
Incolumis 5204 0 1.000
Bob Zopp 5285 0 1.000
Jose Amnesia 5309 0 1.000

9.7 Subqueries in the FROM clause

Subqueries in the FROM clause create ad-hoc tables (views) that we can refer to in the same query as if they were actual tables. Assume that we want to rank the results of a union:

SELECT * FROM 
(SELECT  label
FROM music_marketplace.songs
UNION 
SELECT link as label
FROM music_marketplace.label
) r
ORDER BY label;
Table 9.10: Displaying records 1 - 10
label
Inc
Inc.
#138
#Goldrush Recordings (Armada)
&amp
+Mas Label
/label/100-percent-pure/1470
/label/1605/10775
/label/17-steps/42211
/label/2-owls-records/33454


When using a subquery in the FROM clause we must assign an alias to it!


Now I can take the previous union and join it with the table songs:

SELECT * FROM music_marketplace.songs s INNER JOIN 
(SELECT  label
FROM music_marketplace.songs
UNION 
SELECT link as label
FROM music_marketplace.label
) r
ON r.label = s.label
LIMIT 10;
Table 9.11: Displaying records 1 - 10
title subtitle trackid label genre released labelId price label
If I Want You Retroid Remix 91247 Tilth Music Breaks 2016-01-18 1517 1.49 Tilth Music
Suki Original Mix 268746 Phonocult Records Minimal / Deep Tech 2016-02-23 1114 1.49 Phonocult Records
Goofy's Nightmare Original Mix 268747 Phonocult Records Minimal / Deep Tech 2016-02-23 1114 1.49 Phonocult Records
Suki Seph Remix 268748 Phonocult Records Minimal / Deep Tech 2016-02-23 1114 1.49 Phonocult Records
Golden Nightfall Muzzaik Remix 308444 Lip Recordings House 2016-01-25 5809 1.49 Lip Recordings
Hope For The Right Original Mix 308456 Lip Recordings House 2015-10-12 5809 1.49 Lip Recordings
The Night's Already Started Belocca Dark Vox 308463 Lip Recordings House 2015-09-25 5809 1.49 Lip Recordings
Yungle Original Mix 495959 Malicious Damage Drum & Bass 2015-09-30 9188 1.49 Malicious Damage
Sail Original Mix 495960 Malicious Damage Electronica / Downtempo 2015-09-30 9188 1.49 Malicious Damage
Chromatik [Orbzone] Original Mix 495961 Malicious Damage Electronica / Downtempo 2015-09-30 9188 1.49 Malicious Damage

For comments, suggestions, errors, and typos, please email me at: