pylexique
- Free software: MIT license
- PyLexique Documentation: https://pylexique.readthedocs.io (en) -- https://sekoudiaonlp.github.io/pylexique/fr_FR/ (fr)
class LexEntryType:
"""
Type information about all the lexical attributes in a LexItem object.
"""
ortho: str
phon: str
lemme: str
cgram: str
genre: str
nombre: str
freqlemfilms2: float
freqlemlivres: float
freqfilms2: float
freqlivres: float
infover: str
nbhomogr: int
nbhomoph: int
islem: bool
nblettres: int
nbphons: int
cvcv: str
p_cvcv: str
voisorth: int
voisphon: int
puorth: int
puphon: int
syll: str
nbsyll: int
cv_cv: str
orthrenv: str
phonrenv: str
orthosyll: str
cgramortho: str
deflem: float
defobs: int
old20: float
pld20: float
morphoder: str
nbmorph: int
The meanings of the attributes of this object are as follow:
- ortho: the word
- phon: the phonological forms of the word
- lemme: the lemmas of this word
- cgram: the grammatical categories of this word
- genre: the gender
- nombre: the number
- freqlemfilms2: the frequency of the lemma according to the corpus of subtitles (per million occurrences)
- freqlemlivres: the frequency of the lemma according to the body of books (per million occurrences)
- freqfilms2: the frequency of the word according to the corpus of subtitles (per million occurrences)
- freqlivres: the frequency of the word according to the body of books (per million occurrences)
- infover: modes, tenses, and possible people for verbs
- nbhomogr: number of homographs
- nbhomoph: number of homophones
- islem: indicates if it is a lemma or not
- nblettres: the number of letters
- nbphons: number of phonemes
- cvcv: the orthographic structure
- p-cvcv: the phonological structure
- voisorth: number of orthographic neighbors
- voisphon: number of phonological neighbors
- puorth: point of spelling uniqueness
- puphon: point of phonological uniqueness
- syll: syllable phonological form
- nbsyll: number of syllables
- cv-cv: syllable phonological structure
- orthrenv: reverse orthographic form
- phonrenv: reversed phonological form
- orthosyll: syllable orthographic form
- cgramortho: the different grammatical category for a given orthographic representation
- deflem: the percentage of people who said they knew the lemma of the word
- defobs: the size of the sample from which 'deflem' is derived
- old20: orthographic Levenshtein Distance
- pld20: phonological Levenshtein Distance
- morphoder: inflectional morphology
- nbmorph: the number of morphemes directly computed from 'morphoder'
You can find all the relevant information in the official documentation of Lexique383 (French).
Features
-
- Extract all lexical information from a French word such as:
-
- orthographic and phonemics representations
- associated lemmas
- syllabation
- grammatical category
- gender and number
- frequencies in a corpus of books and in a body of film subtitles, etc...
- Extract all the lexical forms of a French word.
- Easy to use Api.
- Easily integrate pylexique in your own projects as an imported library.
- Can be used as a command line tool.
Credits
Main developer SekouDiaoNlp.
Lexical corpus: Lexique383
About Lexique383
Lexique3
Lexique 3.83 is a French lexical database that provides for ~ 140,000 words of French: orthographic and phonemics representations, associated lemmas, syllabation, grammatical category, gender and number, frequencies in a corpus of books and in a body of film subtitles, etc...
Table: Lexique383.zip
Web site: http://www.lexique.org
Online: http://www.lexique.org/shiny/lexique
Publications
- New, Boris, Christophe Pallier, Marc Brysbaert, and Ludovic Ferrand. 2004. "Lexique 2: A New French Lexical Database." Behavior Research Methods, Instruments, & Computers 36 (3): 516--524. DOI. pdf
- New, Boris, Christophe Pallier, Ludovic Ferrand, and Rafael Matos. 2001. "Une Base de Données Lexicales Du Français Contemporain Sur Internet: LEXIQUE" L'Année Psychologique 101 (3): 447--462. DOI. pdf
- Boris New, Marc Brysbaert, Jean Veronis, and Christophe Pallier. 2007. "The Use of Film Subtitles to Estimate Word Frequencies." Applied Psycholinguistics 28 (4): 661--77. DOI. (pdf)
Contributors
- Boris New & Christophe Pallier
- Ronald Peereman
- Sophie Dufour
- Christian Lachaud
- and many others... (contact us to be listed)
License
BibTex Entry to cite publications about Lexique383:
@article{npbf04,
author = {New, B. and Pallier, C. and Brysbaert, M. and Ferrand, L.},
journal = {ehavior Research Methods, Instruments, & Computers},
number = {3},
pages = {516-524},
title = {Lexique 2 : A New French Lexical Database},
volume = {36},
year = {2004},
eprint = {http://www.lexique.org/?page_id=294},
}
@article{npfm01,
author = {New, B. and Pallier, C. and Ferrand, L. and Matos, R.},
journal = {L'Ann{\'e}e Pschologique},
number = {447-462},
pages = {1396-2},
title = {Une base de donn{\'e}es lexicales du fran\c{c}ais contemporain sur internet: LEXIQUE},
volume = {101},
year = {2001},
}
@article{new_brysbaert_veronis_pallier_2007,
author={NEW, BORIS and BRYSBAERT, MARC and VERONIS, JEAN and PALLIER, CHRISTOPHE},
title={The use of film subtitles to estimate word frequencies},
volume={28}, DOI={10.1017/S014271640707035X},
number={4}, journal={Applied Psycholinguistics},
publisher={Cambridge University Press},
year={2007},
pages={661–677}}
BibTeX
If you want to cite pylexique in an academic publication use this citation format:
@article{pylexique,
title={pylexique},
author={Sekou Diao},
journal={GitHub. Note: https://github.com/SekouDiaoNlp/pylexique Cited by},
year={2021}
}