Text Processing for Small or Big Data Files


Keywords
bh, boost, cpp11, processing, r, rcpp, rcpparmadillo, text
License
GPL-3.0

Contributors

Lampros Mouselimis


See all contributors


Documentation

CRAN_Status_Badge Travis-CI Build Status codecov.io AppVeyor build status Downloads

textTinyR


The textTinyR package consists of text pre-processing functions for small or big data files. More details on the functionality of the textTinyR can be found in the blog-post and in the package Vignette. The R package can be installed, in the following OS's: Linux, Mac and Windows. However, there are some limitations :

  • there is no support for chinese, japanese, korean, thai or languages with ambiguous word boundaries.
  • there is no support functions for utf-locale on windows, meaning only english character strings or files can be input and pre-processed.

System Requirements ( for unix OS's )


Debian/Ubuntu

sudo apt-get install libboost-all-dev

sudo apt-get update

sudo apt-get install libboost-locale-dev


Fedora

yum install boost-devel


Macintosh OSX/brew


UPDATE 25-05-2017 : The current CRAN version of the package can only be installed on Linux and Windows. If the boost locale are installed properly on your OSystem use the devtools::install_github(repo = 'mlampros/textTinyR', clean = TRUE) function to download the textTinyR package.


The boost library will be installed on the Macintosh OSx using the Homebrew package manager,

If the boost library is already installed using brew install boost then it must be removed using the following command,


brew uninstall boost


Then the formula for the boost library should be modified using a text editor (TextEdit, TextMate, etc). The formula on a Macintosh OS Sierra is saved in:


/usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/Formula/boost.rb


The user should open the boost.rb formula and replace the following code chunk beginning from (approx.) line 71,


# layout should be synchronized with boost-python
args = ["--prefix=#{prefix}",
        "--libdir=#{lib}",
        "-d2",
        "-j#{ENV.make_jobs}",
        "--layout=tagged",
        "--user-config=user-config.jam",
        "install"]

if build.with? "single"
  args << "threading=multi,single"
else
  args << "threading=multi"
end

with the following code chunk,


# layout should be synchronized with boost-python
args = ["--prefix=#{prefix}",
        "--libdir=#{lib}",
        "-d2",
        "-j#{ENV.make_jobs}",
        "--layout=system", 
        "--user-config=user-config.jam",
        "threading=multi",
        "install"]

#if build.with? "single"
#  args << "threading=multi,single"
#else
#  args << "threading=multi"
#end

Then the user should save the changes, close the file and run,


brew update


to apply the changes.


Then he/she should open a new terminal (console) and type the following command, which installs the boost library using the modified formula from source, (warning: there are two dashes before : build-from-source)


brew install /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/Formula/boost.rb --build-from-source


That's it.


Installation of the textTinyR package (CRAN, Github)


To install the package from CRAN use,

install.packages('textTinyR', clean = TRUE)


and to download the latest version from Github use the install_github function of the devtools package,

devtools::install_github(repo = 'mlampros/textTinyR', clean = TRUE)


Use the following link to report bugs/issues,

https://github.com/mlampros/textTinyR/issues