keywords

Parses keywords from strings.


Keywords
elixir-keyword-parser, keyword-extraction, keyword-parser, text-matching
License
MIT

Documentation

KeywordParser

A keyword parser for extracting words, phrases and simple patterns from strings of text.

This module allows you to load up multiple keyword patterns for different topics and run those against strings to extract pattern matches. When creating a pattern using new_pattern/2 or new_pattern/3 a list of keywords/phrases is given along with a name used for identification. When parsing strings of text, pattern names can be given as a list (or one individually) to specify which patterns you want to use when parsing the string. The atom :all can be passed in as an individual pattern_name argument to invoke the use of all available patterns. Once created all patterns get stored as processes and will exist until killed individually (kill_pattern/1) or the application is killed as a whole.

iex> Keywords.new_pattern("clothing_brands", ["converse", "nike", "adidas", "paige", "hanes"])
{:ok, "clothing_brands"}

iex> Keywords.new_pattern("retail_companies", ["amazon", "walmart", "home depot"])
{:ok, "retail_companies"}

iex> string = """
Been wearing converse low tops for the past 20 years. Purchased these maroon Chuck Taylor low tops recently, and I wasnโ€™t thrilled..

Beyond the fit, there are videos online showing how to tell if Chuck Taylor converse are counterfeit or real.. I purchased a shoe with the โ€œoโ€ in converse having a star in the center. That is how to tell if theyโ€™re legitimate sneakers made by converse. What I received by amazon are sneakers with a plain old โ€œoโ€ , no star, see photos..

Itโ€™s increasingly frustrating to pay for prime membership, but feel like your just another shopper. I donโ€™t feel like itโ€™s my job to dig through countless sellers on amazon to determine which are selling legitimate products, and which are selling knock off nike and converse shoes.
Amazon should be doing a better job at that.
"""

iex> Keywords.parse(string, ["clothing_brands", "retail_companies"])
{:ok, ["converse", "amazon", "nike"]}

iex> Keywords.parse(string, ["clothing_brands", "retail_companies"], counts: true)
{:ok, [{"converse", 4}, {"amazon", 2}, {"nike", 1}]}

iex> Keywords.parse(string, ["clothing_brands", "retail_companies"], aggreagte: false)
{:ok, ["clothing_brands": ["converse", "nike"], "retail_companies": ["amazon"]]}

Docs

HexDocs: https://hexdocs.pm/keywords

Installation

When available in Hex, the package can be installed by adding keywords to your list of dependencies in mix.exs:

def deps do
  [
    {:keywords, "~> 1.3.0"}
  ]
end

Functions

new_pattern

Creates a new keyword pattern from a list of keywords

new_pattern(name, keywords_list, opts)

options include:

  • substrings: | default = false | toggles whether keywords can be matched as substrings.
  • case_sensitive: | default = false | toggles whether keywords are case sensitive.

Usage:

iex> Keywords.new_pattern("stocks", ["TSLA", "XOM", "AMZN", "FB", "LMT", "NVDA"])
{:ok, "stocks"}

iex> Keywords.new_pattern("stocks", ["TSLA", "XOM", "AMZN", "FB", "LMT", "NVDA"], case_sensitive: true)
{:ok, "stocks"}

iex> Keywords.new_pattern("stocks", ["TSLA", "XOM", "AMZN", "FB", "LMT", "NVDA"], case_sensitive: true, substrings: true)
{:ok, "stocks"}

# When substrings are allowed
iex> Keywords.parse("OAMZNG ๐ŸŒ", "stocks")
{:ok, ["AMZN"]}

parse

Extracts keywords from a string

parse(string, pattern_names, opts)

options include:

  • :counts | default = false | toggles counts for individual keyword occurrences in results.
  • :aggregate | default = true | toggles grouping by pattern name.

Usage:

iex> Keywords.parse("My favorite picks right now are $NVDA and $AMZN ๐Ÿš€๐Ÿš€๐Ÿš€, but XOM and fb have my attention ๐ŸŒ", "stocks")
{:ok, ["NVDA", "AMZN", "XOM", "FB"]}

iex> Keywords.parse("How dare you @^%##! %&^?!?! *****!", "cartoon_profanity")
{:ok, ["@^%##!", "%&^?!?!", "*****"]}

iex> Keywords.parse("How dare you @^%##! %&^?!?! ***** *****!", "cartoon_profanity", counts: true)
{:ok, [{"@^%##!", 1}, {"%&^?!?!", 1}, {"*****", 2}]}

iex> Keywords.parse("How dare you put pineapple on a pizza you @^%##! %&^?!?! ***** *****!", ["cartoon_profanity", "illegal_pizza_toppings"], counts: true, aggregate: false)
{
  :ok, 
  [
    { "cartoon_profanity", [{"@^%##!", 1}, {"%&^?!?!", 1}, {"*****", 2}] }, 
    { "illegal_pizza_toppings", [{"pineapple", 1}] }
  ]
}

kill_pattern

Kills pattern agent and removes a pattern from registry

kill_pattern(name)

Usage:

iex> Keywords.kill_pattern("common_lyrics")
{:ok, "common_lyrics"}

pattern_exists?

Checks if a pattern exists

pattern_exists?(name)

Usage:

iex> Keywords.pattern_exists?(:stocks)
true
iex> Keywords.pattern_exists?(:stonks)
false

update_pattern

Updates a preexisting pattern with new keywords using original opts settings.

update_pattern(name)

Usage:

iex> Keywords.update_pattern(:stocks, ["TSLA", "XOM", "AMZN"])
{:ok, :stocks}
iex> Keywords.update_pattern(:stonks, ["TSLA", "XOM", "AMZN"])
{:error, :pattern_not_found}