CorpusTools
Tag

Julia library for corpus linguistics

License: BSD-2-Clause

Documentation

Library for corpus linguistics with Julia.


-----------------------------------------

Some functions:

THIS IS WORK IN PROGRESS. IT IS FAR FROM COMPLETE.

The type Files(path, filter=r"") creates a "file" type that will include either a single file if "path" referes to a single file, or a list of files if "path" is a directory. It will also look recursively. If specified, "filter" will filter all results to those containin the string.

The two main functions for exploring the corpora are grep_count(patter, texts, overlap=false) and grep_context(pattern, texts, context=5, sep="\t"), both functions can take regular expressions and strings as the patters and they can take single strings, arrays of strings, or File types as the texts to look in. In general, giving a string as the pattern will have better performance than a regular expression would. grep_count() returns the number of occurences of the particular pattern in the text(s), while grep_context() returns the contexts on which the patter occurs in the text(s).

By default, grep_context() and grep_count() if used with File type will read the files one by one. This can be slightly less efficient than having the files already loaded in memory, so you can instead use load_files(files::Files) to load them to a variable and have better performance. This is not a good idea if the corpus is too big though.

The function collocations(word::String, text; test = "deltap", span=1, lower=true) finds all the collocations of the node "word" in "text" (String or Array{String}, Files is not implemented yet).  "lower" converts to lower case all words in "text" by default. "span" specifies how many words away from the node it should look for collocations (the sapn). The function returns a matrix with the collocations, a Chi Squared value, direction of attraction and odds for each collocate of the word.

Dependent repositories: 0
Total tags: 0
Latest tag: about 15 hours ago
First tag: Feb 23, 2014
Stars: 3
Forks: 5
Watchers: 2
Contributors: 1
Repository size: 3.39 MB
SourceRank: 4

Source repo 2FA enabled: TEXT!
Package manager 2FA enabled: TEXT!
Is security responsive: TEXT!
Dependencies are managed: TEXT!
Issue-free release available: TEXT!
Succession plan available: TEXT!
Package manager 2FA enabled: TEXT!

Contributors

See all contributors

Something wrong with this page? Make a suggestion

Export .ABOUT file for this package

Last synced: 2024-04-25 01:04:04 UTC

CorpusTools
Tag

Tag

Documentation

Stats

Development practices

Contributors

CorpusTools Tag

Tag Toggle Dropdown

Documentation

Stats

Development practices

Contributors

CorpusTools
Tag

Tag