4 SQL: Create and fill a new database
In this chapter, we will introduce the Data Definition Language (DDL) and will create a new database from scratch. In particular, we will use the following Entity Relationship diagram that describes a fictional music label called NoFuture records:
As per the diagram:
- Each musician that records at NoFuture has an SSN, a name, an address, and a phone number.
- Each instrument used in songs recorded at NoFuture has a unique identification number, a description (e.g., guitar, synthesizer, flute) and a musical key (e.g., C, B-flat, E-flat).
- Each album recorded at NoFuture has a unique identification number, a title, a recording date, and a format (e.g., Vinyl, CD, Digital, Multiple).
- Each song recorded at NoFuture has a unique identification number, a title, lyrics, a recording date, and any other additional details.
- Each musician may play several instruments, and a given instrument may be played by several musicians.
- Each album includes multiple songs.
- Each song may appear on (at most) one album.
- Each song is performed by one or more musicians, and a musician may perform a number of songs.
4.1 Connect to a DBMS through RStudio
The first thing we need to do is to connect to our database from RStudio. For this process, we will need to install three new packages:
DBI
: provides functions that allow R to connect to database management systems.odbc
: provides a DBI-compliant interface to Open Database Connectivity (ODBC) drivers (see here: https://db.rstudio.com/r-packages/odbc/)RODBC
: implements ODBC database connectivity. (https://cran.r-project.org/web/packages/RODBC/RODBC.pdf)
install.packages("DBI")
install.packages("odbc")
install.packages("RODBC")
Then, we will need to load these packages along with the tidyverse
package in our environment:
library(DBI)
library(odbc)
library(RODBC)
library(tidyverse)
Now, we are ready to explore what drivers our system offers for connecting to a database:
unique(odbcListDrivers()[[1]])
## [1] "ODBC Drivers" "ODBC Connection Pooling"
## [3] "MySQL Driver"
The output of function
odbcListDrivers()[[1]]
will be different in your systems.
In general, to connect to a database management system we need 4 parameters:
- Host address (server address): The IP or the URL of the DBMS.
- Username: The username to connect.
- Password: The password associated with the above username.
- Port: The port number that the DBMS is accepting connections.
Once we have this information, we can call the dbConnect
function from the DBI package to create a connection object:
= DBI::dbConnect(odbc::odbc(),
con Driver= "MySQL",
Server='mysql-isys3350.bc.edu',
UID='your_username',
PWD= 'your_password',
Port= 3306)
You will need to edit the above code to work for your case. Specifically, you will need to (1) replace
MySQL Driver
with the MySQL driver as printed out by theodbcListDrivers()[[1]]
command; (2) replaceSys.getenv("dbisys3350host")
with the URL of the DBMS; (3) replaceSys.getenv("dbisys3350user")
with your username; (4) replaceSys.getenv("dbisys3350pass")
with your password; (5) replaceSys.getenv("dbisys3350port")
with the DBMS port.
Sys.getenv()
. This function uses global system variables that I have already created in my system. You can find more here: https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/Sys.getenv
4.2 CREATE
4.2.1 Create a new schema
To create a new schema we can use the CREATE
command:
kokkodis_student
with your last name in the following commands.
CREATE SCHEMA kokkodis_student_no_future;
Recall that the term
schema
is synonymous to the termdatabasde
.
;
.
To run SQL commands within an R Mardkown file you will need to create an SQL chunk by typing ```{sql connection=con}
and then close it with three left quotes ```
.
Insert a new code chunk
button and choose SQL
.
Now let’s see if the new database we just created shows up. We can view all the available databses in a DBMS as follows:
show databases;
Database |
---|
information_schema |
isys3350 |
kokkodis_student_db |
kokkodis_student_no_future |
music_marketplace |
4.2.2 Create new tables
Once we have the new schema, we can enrich it with new tables through the CREATE TABLE
statement:
CREATE TABLE kokkodis_student_no_future.musician (
INT,
SSN VARCHAR(45),
name VARCHAR(45),
address INT,
phone VARCHAR(45),
musician_id PRIMARY KEY (musician_id)
);
We can see whether we actually created this table in our schema with the show tables
statement:
tables in kokkodis_student_no_future; show
Tables_in_kokkodis_student_no_future |
---|
musician |
And we can further inspect a table with the show columns
statement:
columns in kokkodis_student_no_future.musician; show
Field | Type | Null | Key | Default | Extra |
---|---|---|---|---|---|
SSN | int(11) | YES | NA | ||
name | varchar(45) | YES | NA | ||
address | varchar(45) | YES | NA | ||
phone | int(11) | YES | NA | ||
musician_id | varchar(45) | NO | PRI | NA |
4.2.3 Datatypes in SQL
The above CREATE
statement has the general form:
CREATE TABLE database_name.table_name (
column_name_1 data_type/*optional*/
[, column_name_2 data_type] /*optional*/
[, table_constraints] )
In SQL a comment is surrounded by
/*
and*/
.
Similar to R, SQL has data types. Each column we add in a new table needs to have a data type. The most common data types that we will use are:
VARCHAR(M)
: Stores variable-length strings up toM
charactersINT
: Stores integersDOUBLE
: Stores floating-point numbersDATETIME
: Stores a date and a time. Its default format isyyyy-mm-dd hh:mm:ss
TEXT
: Stores text of up to 65KB
4.2.4 Constraints
In the generic CREATE
statement there is an optional clause for including table-specific constraints. This is where we can define our primary keys, foreign keys, and any other constraints we want to impose on the table.
For instance, in our example, we included the constraint PRIMARY KEY (musician_id)
that identifies column musician_id
as the primary key of the new table.
The syntax to define a foreign key is as follows:
CONSTRAINT constraint_name]
[FOREIGN KEY (column_name)
REFERENCES database_name.table_name (column_name)
We will see examples of foreign key definitions in Section 4.5.
4.3 ALTER
Once we have created a table, we can make changes by using the ALTER
statement. For instance, we can add a new column as follows:
ALTER TABLE kokkodis_student_no_future.musician
ADD genre VARCHAR(30);
columns in kokkodis_student_no_future.musician; show
Field | Type | Null | Key | Default | Extra |
---|---|---|---|---|---|
SSN | int(11) | YES | NA | ||
name | varchar(45) | YES | NA | ||
address | varchar(45) | YES | NA | ||
phone | int(11) | YES | NA | ||
musician_id | varchar(45) | NO | PRI | NA | |
genre | varchar(30) | YES | NA |
Similarly, we can drop a column from the table:
ALTER TABLE kokkodis_student_no_future.musician
DROP COLUMN genre;
columns in kokkodis_student_no_future.musician; show
Field | Type | Null | Key | Default | Extra |
---|---|---|---|---|---|
SSN | int(11) | YES | NA | ||
name | varchar(45) | YES | NA | ||
address | varchar(45) | YES | NA | ||
phone | int(11) | YES | NA | ||
musician_id | varchar(45) | NO | PRI | NA |
We can also add and drop constraints:
ALTER TABLE kokkodis_student_no_future.musician
DROP PRIMARY KEY;
columns in kokkodis_student_no_future.musician; show
Field | Type | Null | Key | Default | Extra |
---|---|---|---|---|---|
SSN | int(11) | YES | NA | ||
name | varchar(45) | YES | NA | ||
address | varchar(45) | YES | NA | ||
phone | int(11) | YES | NA | ||
musician_id | varchar(45) | NO | NA |
ALTER TABLE kokkodis_student_no_future.musician
ADD PRIMARY KEY (musician_id);
columns in kokkodis_student_no_future.musician; show
Field | Type | Null | Key | Default | Extra |
---|---|---|---|---|---|
SSN | int(11) | YES | NA | ||
name | varchar(45) | YES | NA | ||
address | varchar(45) | YES | NA | ||
phone | int(11) | YES | NA | ||
musician_id | varchar(45) | NO | PRI | NA |
Note the
PRI
identifier in columnKey
that identifies the primary key (if any) of a table.
4.4 DROP
We can drop a table with the DROP
command as follows:
DROP TABLE kokkodis_student_no_future.musician;
tables in kokkodis_student_no_future; show
Tables_in_kokkodis_student_no_future |
---|
Now sometimes, when designing a database, we might need to drop and re-create a table multiple times.
In those cases–and in cases where we are not certain that a table we want to drop exists–we can use the IF EXISTS
modifier as follows:
DROP TABLE IF EXISTS kokkodis_student_no_future.musician;
4.5 Create all tables
Next, we will create all tables listed in the ERD digram of Figure 4.1:
CREATE TABLE kokkodis_student_no_future.musician (
INT,
SSN VARCHAR(45),
name VARCHAR(45),
address INT,
phone VARCHAR(45),
musician_id PRIMARY KEY (musician_id)
);
CREATE TABLE kokkodis_student_no_future.instrument (
VARCHAR(100),
description INT,
instrument_id VARCHAR(2),
music_key PRIMARY KEY (instrument_id)
);
CREATE TABLE kokkodis_student_no_future.album (
VARCHAR(100),
title INT,
album_id VARCHAR(45),
album_format NULL,
recording_date DATETIME PRIMARY KEY (album_id)
);
CREATE TABLE kokkodis_student_no_future.song (
VARCHAR(100),
title INT,
song_id VARCHAR(100),
details
lyrics TEXT,
recording_date DATETIME,INT,
album_id PRIMARY KEY (song_id),
CONSTRAINT fk_Song_Album
FOREIGN KEY (album_id)
REFERENCES kokkodis_student_no_future.album (album_id)
);
CREATE TABLE kokkodis_student_no_future.musician_has_instrument (
VARCHAR(45),
musician_id INT,
instrument_id PRIMARY KEY (musician_id, instrument_id),
CONSTRAINT fk_musician_id
FOREIGN KEY (musician_id)
REFERENCES kokkodis_student_no_future.musician (musician_id),
CONSTRAINT instrument_id
FOREIGN KEY (instrument_id)
REFERENCES kokkodis_student_no_future.instrument (instrument_id)
);
CREATE TABLE kokkodis_student_no_future.musician_has_song (
VARCHAR(45),
musician_id INT,
song_id PRIMARY KEY (musician_id, song_id),
CONSTRAINT fk_musician_id_2
FOREIGN KEY (musician_id)
REFERENCES kokkodis_student_no_future.musician (musician_id),
CONSTRAINT fk_song_id
FOREIGN KEY (song_id)
REFERENCES kokkodis_student_no_future.song (song_id)
);
tables in kokkodis_student_no_future; show
Tables_in_kokkodis_student_no_future |
---|
album |
instrument |
musician |
musician_has_instrument |
musician_has_song |
song |
4.6 Load data into tables
Once we define a table, we can use the INSERT
, UPDATE
, and DELETE
statements to fill it with data.
4.6.1 INSERT
We can use the INSERT
statement to add a new row in our table:
INSERT INTO kokkodis_student_no_future.musician VALUES
123456789, 'Kokkodis', ' 140 Commonwealth Ave', 123456789, 1); (
As we will discuss next in Section 2.3.5.1, we can use the SELECT
clause to retrieve rows from a table as follows:
SELECT * FROM kokkodis_student_no_future.musician;
SSN | name | address | phone | musician_id |
---|---|---|---|---|
123456789 | Kokkodis | 140 Commonwealth Ave | 123456789 | 1 |
As you can see, in the previous example, we have successfully loaded a single row.
We can also insert multiple rows as follows:
INSERT INTO kokkodis_student_no_future.musician VALUES
123456789, 'John', ' ',NULL , 2),
(NULL, 'Doe', ' ', 123456789, 3)
( ;
NULL
. (Recall that in R we use NA
).
Let us see the results:
SELECT * FROM kokkodis_student_no_future.musician;
SSN | name | address | phone | musician_id |
---|---|---|---|---|
123456789 | Kokkodis | 140 Commonwealth Ave | 123456789 | 1 |
123456789 | John | NA | 2 | |
NA | Doe | 123456789 | 3 |
4.6.2 UPDATE
We can update an existing row in a table with the UPDATE
statement as follows:
UPDATE kokkodis_student_no_future.musician
SET SSN = 1
WHERE musician_id=3
;
SELECT * FROM kokkodis_student_no_future.musician
SSN | name | address | phone | musician_id |
---|---|---|---|---|
123456789 | Kokkodis | 140 Commonwealth Ave | 123456789 | 1 |
123456789 | John | NA | 2 | |
1 | Doe | 123456789 | 3 |
WHERE
clause in detail in Section 5.1.2.
4.6.3 DELETE
Finally, we can delete a row that we no longer need with the DELETE
statement:
DELETE from kokkodis_student_no_future.musician
WHERE musician_id=3
;
SELECT * FROM kokkodis_student_no_future.musician
SSN | name | address | phone | musician_id |
---|---|---|---|---|
123456789 | Kokkodis | 140 Commonwealth Ave | 123456789 | 1 |
123456789 | John | NA | 2 |
For comments, suggestions, errors, and typos, please email me at: kokkodis@bc.edu