moranajp
Release 0.9.6

Morphological Analysis for Japanese

Keywords: r, r-package
License: MIT

Documentation

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# moranajp はじめに

The goal of moranajp is a tool of morphological analysis for Japanese. 

moranajpは，日本語形態素解析をするためのものです．

## Installation インストール

You can install the released version of moranajp from [GitHub] ( https://github.com/matutosi/moranajp ). 
You need install MeCab ( https://taku910.github.io/mecab/ ). 

最新バージョンは，[GitHub] ( https://github.com/matutosi/moranajp ) でダウンロードできます．
MeCab ( https://taku910.github.io/mecab/ ) を別途インストールする必要があります．

``` r
  # CRAN
install.packages("moranajp")

  # development
  # install.packages("remotes")
remotes::install_github("matutosi/moranajp")
```

## Example 使用例

```{r}
library(moranajp)

data(neko)
neko <- 
  neko |>
  dplyr::mutate(text = stringi::stri_unescape_unicode(text)) |>
  tibble::rownames_to_column("cols")
neko  # First part of 'I Am a Cat' by Soseki Natsume

  # MeCab (Need install MeCab) 
  # MeCabをインストールする必要あり
bin_dir <- "d:/pf/mecab/bin" # set your environment MeCabをインストールしたフォルダを指定
  # bin_dir <- "/opt/local/mecab/bin/"  # Example for Mac or Linux
iconv <- "CP932_UTF-8"       # maybe need in Windows Windowsで必要な場合あり
  # 文字化けする場合は，引数 iconv を使ってください．
  # `iconv = "CP932_UTF-8"` or `iconv = "EUC_UTF-8"`
neko |>
  moranajp_all(text_col = "text", bin_dir = bin_dir, iconv = iconv) |>
  print(n=30)

  # chamame (Do not need install, but use web service)
  # 別途ツールのインストールのなし(Web茶まめを使用)
neko |>
  head(3) |>
  moranajp_all(method = "chamame", text_col = "text") |>
  print(n=30)
```

```{r, fig.width=7, fig.height=7}
library(moranajp)

data(synonym)
synonym <- unescape_utf(synonym)

data(neko_mecab)
neko_mecab <- 
  neko_mecab  |>
  unescape_utf() |>
  add_sentence_no() |>
  clean_up(use_common_data = TRUE, synonym_df = synonym)

bigram_neko <- 
  neko_mecab |>
  draw_bigram_network()

add_stop_words <- 
  c("\\u3042\\u308b", "\\u3059\\u308b", "\\u3066\\u308b", 
    "\\u3044\\u308b","\\u306e", "\\u306a\\u308b", "\\u304a\\u308b", 
    "\\u3093", "\\u308c\\u308b", "*") |> 
   unescape_utf()

data(review_chamame)
bigram_review <- 
  review_chamame |>
  dplyr::slice(1:2000) |>
  unescape_utf() |>
  add_sentence_no() |>
  clean_up(add_stop_words = add_stop_words) |>
  draw_bigram_network()

data(review_ginza)
bigram_review_ginza <- 
  review_ginza |>
  unescape_utf() |>
  add_sentence_no() |>
  clean_up(add_depend = TRUE) |>
  draw_bigram_network(depend = TRUE)
```


## Note 注意点

Line breaks in the text will be removed to avoid lag text id. 
If you want to remain line breaks, please change them into other character. 

文字列内の改行コード(\r\n, \n)は，削除されます(改行コードでずれるのを防ぐため)．
改行コードに意味がある場合は，事前に改行コードを別の文字列に変更するなどの対応をしてください．

## Citation 引用

Toshikazu Matsumura (2021) Morphological analysis for Japanese with R. https://github.com/matutosi/moranajp/.

松村 俊和 (2021) Rによる日本語形態素解析. https://github.com/matutosi/moranajp/.

## Installation of MeCab for (Linux / Mac) MeCabのインストール(Linux / Mac)

download file (mecab-0.996.tar.gz, mecab-ipadic-2.7.0-20070801.tar.gz)

ファイルのダウンロード(mecab-0.996.tar.gz, mecab-ipadic-2.7.0-20070801.tar.gz)

http://taku910.github.io/mecab/#download


```
  tar xvf mecab-0.996.tar.gz
  cd mecab-0.996
  ./configure --enable-utf8-only --prefix=/opt/local/mecab
  make
  sudo make install
  # install directory
  # 辞書のインストール
  tar xvf mecab-ipadic-2.7.0-20070801.tar.gz
  cd mecab-ipadic-2.7.0-20070801
  ./configure  --with-mecab-config=/opt/local/mecab/bin/mecab-config --with-charset=utf8 --prefix=/opt/local/mecab
  make
  sudo make install
  # add path
  # パスの追加
  echo 'export PATH=/opt/local/mecab/bin:$PATH' >> ~/.bash_profile
  source ~/.bash_profile
  # run mecab
  # mecabの実行
  mecab
```

ref (in Japanese) 参考
https://qiita.com/nkjm/items/913584c00af199794257

Dependencies: 0
Dependent packages: 0
Dependent repositories: 0
Total releases: 5
Latest release: Aug 1, 2024
First release: Mar 30, 2022
Stars: 0
Forks: 0
Watchers: 2
Contributors: 1
Repository size: 7.64 MB
SourceRank: 8

Source repo 2FA enabled: TEXT!
Package manager 2FA enabled: TEXT!
Is security responsive: TEXT!
Dependencies are managed: TEXT!
Issue-free release available: TEXT!
Succession plan available: TEXT!

Releases

0.9.7: Aug 1, 2024
0.9.6: Feb 28, 2023
0.9.5: Jul 12, 2022
0.9.4: May 6, 2022
0.9.3: Mar 30, 2022

Contributors

See all contributors

moranajp
Release 0.9.6

Release 0.9.6

0.9.7

0.9.6

0.9.5

0.9.4

0.9.3

Documentation

Stats

Development practices

Releases

Contributors

moranajp Release 0.9.6

Release 0.9.6 Toggle Dropdown 0.9.7 0.9.6 0.9.5 0.9.4 0.9.3

Documentation

Stats

Development practices

Releases

Contributors

moranajp
Release 0.9.6

Release 0.9.6

0.9.7

0.9.6

0.9.5

0.9.4

0.9.3