8 中文文本分析
代码提供: 张柏珊 王楠
主要内容:
-1.安装拓展包和导入 -2.分词 -3.运用SQL -4.词云 -5.词频可视化
8.1 安装拓展包和导入
8.2 结巴分词处理
8.2.1 制作词表
8.2.1.1 标停止词
<- worker(user = "users.txt",stop_word = "stopwords.txt")
engine1 <- c("被","怎么","还是","多少","得", "吗","给",
stopwords_CN "年","月","还","个","能", "日","什么","做","没","啊",
"的", "了", "在", "是", "我", "有", "和", "就","不",
"人", "都", "一", "一个", "上", "也", "很", "到", "说",
"要", "去", "你","会", "着", "没有", "看", "好",
"自己", "这", "等","各位代表")
library('jiebaR')
8.3 运用SQL
8.3.1 安装并载入sqldf程序包 >group by“根据一定的规则进行分组”,通过一定的规则将一个数据集划分成若干个笑的区域,然后针对若干个小区域进行数据处理 >count(1)来计数 >select检索数据
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Warning in doTryCatch(return(expr), name, parentenv, handler): unable to load shared object '/Library/Frameworks/R.framework/Resources/modules//R_X11.so':
## dlopen(/Library/Frameworks/R.framework/Resources/modules//R_X11.so, 0x0006): Library not loaded: /opt/X11/lib/libSM.6.dylib
## Referenced from: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/modules/R_X11.so
## Reason: tried: '/opt/X11/lib/libSM.6.dylib' (no such file), '/Library/Frameworks/R.framework/Resources/lib/libSM.6.dylib' (no such file), '/Library/Java/JavaVirtualMachines/jdk-17.jdk/Contents/Home/lib/server/libSM.6.dylib' (no such file)
## Warning in system2("/usr/bin/otool", c("-L", shQuote(DSO)), stdout = TRUE):
## running command ''/usr/bin/otool' -L '/Library/Frameworks/R.framework/Resources/
## library/tcltk/libs//tcltk.so'' had status 1
## Could not load tcltk. Will use slower R code instead.
## Loading required package: RSQLite
<- sqldf('select seg,count(1)as freg from m1 group by seg')
m2 class(m2)
## [1] "data.frame"