10 Database and Other Languages

There are two main types of databases to familiarize yourself with: relational and non-relational. A non-relational database is most often used to store data that needs to be referenced (looked-up) quickly and efficiently and perform operations on this data. Non-relational databases require little structure and are easily scaled to meet demand. Examples of non-relational databases are: NoSQL, MongoDB, Redis, etc.

A relational database is designed to allow data to “relate”, or join, with other data stored within the same database (or other databases). Relational databases are highly structured and meet high standards for reliability. Examples of relational databases are: MySQL, PostgreSQL, SQLServer, MS SQL, etc. There is also a third type that mixes the qualities of these two database types. Examples of this third, hybrid type are ElasticSearch and RedShift.

In academia, especially in political science, relational databases are generally more useful, as political scientists often need to compare many different data sets or join data sets to form more useful pieces of data for analysis. Moreover, as “speed” is less of a concern when real-time changes are unlikely to be a concern, relational databases will almost always be a better option.

10.1 SQL Databases

SQL stands for “standard query language” and is the ubiquitous in both industry and academia. There are three “main” versions of SQL that each have advantages and disadvantages. Each of these three have slightly different syntax, but are interchangeable in terms of the logic of how you write your code. The three most popular SQL databases are:

  • SQLServer (Microsoft) — Is a powerful SQL database with additional tools, but is not free.
  • MySQL (GPL/Oracle) — Is a powerful SQL database that is both paid and free. (Recommended)
  • PostgreSQL (GPL) — Offers higher performance for certain types of operations than MySQL, but at the cost of ease-of-use, tool set, and some performance hits in other operations. Mainly used for when large analysis operations are necessary. (Recommended)

10.2 Creating a SQL Database

You can create your own SQL database on your personal computer for free. I personally prefer MySQL as it is free and a good middle ground between performance, scale, and ease-of-use.

The two links you will need to download and follow your machine’s prompts to install are as follows:

Additionally there are other useful tools to interact with the database beyond Workbench. These include Navicat ($) and HeidiSQL (F) among others. I suggest that if you do eventually use a SQL database, investing in Navicat would be smart.

Once you have followed the installation process on your machine, you will be able to directly interact with your MySQL database locally. You can upload data, download data, query data, and perform analysis on you data locally. While many of these same tasks can be performed in R, doing so in a data base improves reliability and replication performance. Moreover, there are some tasks that are simply easier to perform in a database in comparison to R, especially when the data you are dealing with is large.

10.3 Basic SQL Definitions

There are two basic ways you will interact with your database: queries and procedures. A query, is a statement which you write to perform some operation on the database. This can take the form of a looking up information, changing information, creating new information, and deleting old information. Procedures generally are queries that have been made repeatable.

  • SELECT — select information from a table or database
  • FROM — the table or database which is being acted on
  • WHERE — initial selection of which data in to include in action
  • AND — statement of continued exclusive selection for action
  • OR — statement of continued inclusive selection for action
  • LIKE — fuzzy selection of inclusive selection for action
  • COUNT — selection of number of terms defined by action statement
  • DISTINCT — selection of unique terms
  • GROUP — definition of return statement parsing
  • ORDER — definition of return statement ordering
  • HAVING — definition of return statement inclusion
  • LEFT JOIN — relation of one data opject to another
  • INNER JOIN — relation of one data object to another
  • AVG — arithmetic mean (numeric)
  • SUM — summation of returned terms (numeric)
  • ALTER — statement intended to change some aspect of table or data
  • CHANGE — specifies action to take on item identified in alter statement
  • DROP — removal of table from database or database
  • INSERT — insertion of new data
  • CREATE — deployment of new table or database
  • DELETE — deletion of data from table or database
  • UPDATE — specific action to change data within existing table or database
  • SET — specific change to be performed by update
  • INDEX — adds index to portion of table to improve relational performance
  • CALL — tells database to perform operations contained in proceduritzed format
  • UNION — joining of selection output
  • ALL — return or joining of all output
  • BY/ON

10.4 Query Syntax

The following is an example of a simple SELECT statement in MySQL syntax. Note, all SQL queries must end in a delimiter - the standard delimiter is “;”.

SELECT
  COUNT(A.Column_1) Count1,
  COUNT(DISTINCT A.Column_2) Count2,
  AVG(A.Column_5) AvgCol5,
  B.Column_A
FROM table_name_1 A
  LEFT JOIN table_name_2 B
  ON B.uniqueId = A.uniqueId
WHERE A.Column_3 > 5
  AND (B.Column_B LIKE '%look%'
       OR B.Column_C LIKE 'hh%')
GROUP BY B.Column_A
ORDER BY Count2 DESC;

Other syntax statements:

# Changes the column's name
ALTER TABLE tabe_name_1 CHANGE COLUMN Column_1 NewColumn_1 VARCHAR(255);
 
# Adds an index to the column
ALTER TABLE tabe_name_1 ADD INDEX(NewColumn_1);

# Inserts data into a table
INSERT INTO table1 A 
SELECT B.* FROM table2 B;

# Updates data in a table
UPDATE table1
SET Col1 = NULL
WHERE Col1 = 'NA';

# Creates a new table
CREATE TABLE ...

# Deletes a whole table (be careful)
DROP TABLE ...

# Deletes specific content from a table
DELETE *
  FROM table1
WHERE Col1 = NULL;

10.5 Other Languages

There are many other programming languages that may prove useful for you to familiarize yourself with during your career.

R is a dynamic scripting language. The other important language that is similar to R is Python. Python is widely used in industry and is becoming more widespread in academia.

Object Oriented/Compiled Languages: - C/C++ - Java - JavaScript/Node.js