This markdown file shows the problem experienced when trying to print (show) charater variables with UTF-8 / Russian encoding, discussed here:
https://stackoverflow.com/questions/48307007/printing-utf-8-russian-characters-in-r-rmd-knitr
Note a file .Rprofile, which contains one line Sys.setlocale(“LC_CTYPE”, “russian”) has been placed in working directory.
library(knitr)
print(sessionInfo())
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
##
## Matrix products: default
## BLAS: /opt/R/3.4.0/lib/R/lib/libRblas.so
## LAPACK: /opt/R/3.4.0/lib/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.20
##
## loaded via a namespace (and not attached):
## [1] compiler_3.4.0 backports_1.1.1 magrittr_1.5 rprojroot_1.2
## [5] tools_3.4.0 htmltools_0.3.6 yaml_2.1.18 Rcpp_0.12.13
## [9] stringi_1.1.7 rmarkdown_1.9 stringr_1.3.0 digest_0.6.12
## [13] evaluate_0.10.1
#Sys.setlocale("LC_CTYPE", "en_US.UTF-8")
Here is a simple variable and a data.frame with this variable:
nameInRussian <- c("Борис Немцов")
nameInEnglish <- c("Boris Nemtsov")
dt <- data.frame(
name=c("Борис Немцов","Martin Luter King"),
year=c("2015","1968")
)
Here’s what you get when you try to print (show) them in your markdown document:
nameInRussian
## [1] "Борис Немцов"
print(nameInRussian)
## [1] "Борис Немцов"
#cut(nameInRussian)
print(nameInRussian, encoding = "UTF-8")
## [1] "Борис Немцов"
print(enc2utf8(nameInRussian))
## [1] "Борис Немцов"
dt;
## name year
## 1 Борис Немцов 2015
## 2 Martin Luter King 1968
dt[1,1]
## [1] Борис Немцов
## Levels: Martin Luter King Борис Немцов
Note that using kable(dt)
allows to print the table correctly. However printing just the variable itself does not work.
kable(dt)
name | year |
---|---|
Борис Немцов | 2015 |
Martin Luter King | 1968 |
kable(dt$name[1])
x |
---|
Борис Немцов |
kable(dt[1,1])
x |
---|
Борис Немцов |
# how to UTF-8 to 1251 in R
# https://tomizonor.wordpress.com/2013/04/17/file-utf8-windows/
# https://www.smashingmagazine.com/2012/06/all-about-unicode-utf8-character-sets
# http://blog.rolffredheim.com/2013/01/r-and-foreign-characters.html