2.2 Data type conversion

One common operation on data is to convert from one data type to another.

2.2.1 Numeric to Character

To convert number to character, we may use as.character():

## [1] "1200"

2.2.2 Character to Number

One frequently used conversion is between character and numeric. It is because numeric data may contain comma.

To convert string to number, we use as.numeric():

## [1] 12000
# When converting ``12,000'', need to replace comma
as.numeric(gsub(",", "", "12,000"))
## [1] 12000

Here, the function gsub is to find the comma and replace with null string where gsub stands for general substitution.

2.2.3 Character to Date

Date data type covers standard calendar date. It is usually converted from character data type.

One complication is that we need to specify the format since there are a lot of possible ways to express it:

  • For day, we use %d for day number (01-31).

  • For month, we use %m for month number (00-12), %b for abbreviated month (e.g.¬†Jan), and %B for unabbreviated month (e.g.¬†January)

  • For year, we use %y for two-digit year (e.g., 14) and %Y for four-digit year (e.g., 2014)

Here are three examples of how to convert character data into date data.

as.Date("21Jan2004", "%d%b%Y")
## [1] "2004-01-21"
as.Date("21/01/04", "%d/%m/%y")
## [1] "2004-01-21"
as.Date("21-01-04", "%d-%m-%y")
## [1] "2004-01-21"

2.2.4 Character to Date-time

Date-time data are converted using function as.POSIXct() or as.POSIXlt(). Both of them are almost the same for our purpose. We use them interchangely in this book.

Without time, the program will put in hours.

## [1] "2014-12-03 +08"
## [1] "2014-12-03 +08"

To put in time, we will need to input hour and minute. Sometimes, we have second.

  • For hour, we use %H for hour number (00-23).

  • For minute, we use %M for minute number (00-59).

  • For second, we use %OS for second number (00-60).

To avoid problem, the date specification should be following year, month, and then date. If this is not the case, it is better to convert it first to avoid possible error.

as.POSIXct("2014-12-03 04:34",format="%Y-%m-%d %H:%M")
## [1] "2014-12-03 04:34:00 +08"
as.POSIXlt("2014-12-03 04:34:59",format="%Y-%m-%d %H:%M:%OS")
## [1] "2014-12-03 04:34:59 +08"