5.2 软件环境

R 内置的正则表达式实现是基于 PCRE ICU TRE iconv 等第三方库,搞清楚自己使用的版本信息是重要的,一些字符集的解释与区域环境有关,如 [:alnum:][:alpha:]等,所以获取当前的区域设置也很重要

# find a suitable coding for the current locale
localeToCharset(locale = Sys.getlocale("LC_CTYPE"))
## [1] "UTF-8"     "ISO8859-1"
# 软件版本信息
extSoftVersion()
##                                                      zlib 
##                                                  "1.2.11" 
##                                                     bzlib 
##                                      "1.0.8, 13-Jul-2019" 
##                                                        xz 
##                                                   "5.2.5" 
##                                                      PCRE 
##                                        "10.39 2021-10-29" 
##                                                       ICU 
##                                                    "70.1" 
##                                                       TRE 
##                                 "TRE 0.8.0 R_fixes (BSD)" 
##                                                     iconv 
##                                              "glibc 2.35" 
##                                                  readline 
##                                                     "8.1" 
##                                                      BLAS 
## "/usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3"
# 区域及其编码信息
l10n_info()
## $MBCS
## [1] TRUE
## 
## $`UTF-8`
## [1] TRUE
## 
## $`Latin-1`
## [1] FALSE
## 
## $codeset
## [1] "UTF-8"
# 表示数字、货币的细节
Sys.localeconv()
##     decimal_point     thousands_sep          grouping   int_curr_symbol 
##               "."                ""                ""            "USD " 
##   currency_symbol mon_decimal_point mon_thousands_sep      mon_grouping 
##               "$"               "."               ","        "\003\003" 
##     positive_sign     negative_sign   int_frac_digits       frac_digits 
##                ""               "-"               "2"               "2" 
##     p_cs_precedes    p_sep_by_space     n_cs_precedes    n_sep_by_space 
##               "1"               "0"               "1"               "0" 
##       p_sign_posn       n_sign_posn 
##               "1"               "1"
# PCRE 启用的配置选项
pcre_config()
##              UTF-8 Unicode properties                JIT              stack 
##               TRUE               TRUE               TRUE              FALSE
# 比较全的字符信息
stringi::stri_info()
## $Unicode.version
## [1] "14.0"
## 
## $ICU.version
## [1] "70.1"
## 
## $Locale
## $Locale$Language
## [1] "en"
## 
## $Locale$Country
## [1] "US"
## 
## $Locale$Variant
## [1] ""
## 
## $Locale$Name
## [1] "en_US"
## 
## 
## $Charset.internal
## [1] "UTF-8"  "UTF-16"
## 
## $Charset.native
## $Charset.native$Name.friendly
## [1] "UTF-8"
## 
## $Charset.native$Name.ICU
## [1] "UTF-8"
## 
## $Charset.native$Name.UTR22
## [1] NA
## 
## $Charset.native$Name.IBM
## [1] "ibm-1208"
## 
## $Charset.native$Name.WINDOWS
## [1] "windows-65001"
## 
## $Charset.native$Name.JAVA
## [1] "UTF-8"
## 
## $Charset.native$Name.IANA
## [1] "UTF-8"
## 
## $Charset.native$Name.MIME
## [1] "UTF-8"
## 
## $Charset.native$ASCII.subset
## [1] TRUE
## 
## $Charset.native$Unicode.1to1
## [1] NA
## 
## $Charset.native$CharSize.8bit
## [1] FALSE
## 
## $Charset.native$CharSize.min
## [1] 1
## 
## $Charset.native$CharSize.max
## [1] 3
## 
## 
## $ICU.system
## [1] TRUE
## 
## $ICU.UTF8
## [1] TRUE

需要临时改变区域环境设置,配合特殊的画图和文本输出要求。

# 获取当前默认的区域设置
Sys.getlocale()
foo <- Sys.getlocale()
# 恢复默认的区域设置
Sys.setlocale("LC_ALL", locale = foo)