grsearch

find keyword within string chars or files.


Keywords
grammatical, search, find, tf, idf
License
MIT
Install
pip install grsearch==0.6

Documentation

Gramsearch

Search keyword within string chars and files. More complete than str.find(keyword) function.

Installation

$ pip install grsearch

Compatibility

  • Python 2.7 - 3.4
  • Linux/Windows

Usage

With string characters

from grsearch import grsearch

text="""
  Python interpreters are available for installation on many operating systems,
  allowing python code execution on a wide variety of systems.
  """
keywords=['python','systems']

#return list [[keyword, number_of_occurences, [positions]]
result=grsearch.search(text, keywords)

print(result)
result: [['python', 2, [0, 87]], ['systems', 2, [69, 130]]]

Examples with more parameters :

result=grsearch.search(text, keywords, case_sensitive=True)
print(result)
result: [['python', 1, [87]], ['systems', 2, [69, 130]]]
result=grsearch.search(text, keywords, limit_iteration=1)
print(result)
result: [['python', 1, [0]], ['systems', 1, [69]]]

With files

from grsearch import grsearch

path_file="Documents/text.txt"
keywords=['python','systems']

#return list [[keyword, number_of_occurences, [positions]]
result=grsearch.search_infile(path_file, keywords)

print(result)
result: [['python', 2, [0, 87]], ['systems', 2, [69, 130]]]

All parameters (case_sensitive, limit_iteration, exactly) is available with search within file.

Parameter exactly

from grsearch import grsearch

text="I'm herrersystem and my operating system is not windows."
keywords=['system']

#return list [[keyword, number_of_occurences, [positions]]
result=grsearch.search(text, keywords, exactly=True)

print(result)
result: [['system', 1, [34]]]

Result with exactly = False

result: [['system', 2, [10, 34]]]

Term Frequency-Inverse Document Frequency

for more informations on TF-IDF go wikipedia.

Use TF-IDF with grsearch

from grsearch import tfidf

corpus=[
    'je suis ici et toi où es-tu ?',
    'tu es trop loin pour que je te tue. Tu comprends ?',
    'Il est pas croyable, je le hais'
]
keywords=['es', 'tu']

result=tfidf.calcul_tfidf(keywords, corpus)

for i,r in enumerate(result):
    print('text {}: {}'.format(i+1, r))

result:
text 1: [['es', 0.022011407381960155], ['tu', 0.022011407381960155]]
text 2: [['es', 0.014674271587973436], ['tu', 0.029348543175946873]]
text 3: [['es', 0.0], ['tu', 0.0]]

Same example with file:

corpus=['text1.txt','text2.txt','text3.txt']
keywords=['es', 'tu']

result=tfidf.calcul_tfidf(keywords, corpus, is_file=True)

for i,r in enumerate(result):
    print('text {}: {}'.format(i+1, r))

Don't forget is_file=True !