s2a-nightly

seq2annotation


Keywords
bilstm-crf, bilstm-crf-model, idcnn, idcnn-crf, named-entity-extraction, named-entity-recognition, part-of-speech, part-of-speech-tagger, sequence-annotation, tensorflow, tensorflow-models
License
Apache-2.0
Install
pip install s2a-nightly==0.8.0.dev20191024

Documentation

seq2annotation

基于 TensorFlow & PaddlePaddle 的通用序列标注算法库(目前包含 BiLSTM+CRFIDCNN+CRF,更多算法正在持续添加中)实现中文分词(Tokenizer / segmentation)、词性标注(Part Of Speech, POS)和命名实体识别(Named Entity Recognition, NER)等序列标注任务。

特色

  • 通用的序列标注:能够解决通用的序列标注问题:分词、词性标注和实体识别仅仅是特例。
  • Tag schema free: 你可以选择你想用的任何 Tagset。依赖于 tokenizer_tools 提供的编码、解码功能

TODO

  • current TF Metrics is not launch on pypi, but seq2annotation depends on it, so seq2annotation currently can't packaged as python package on pypi

More Algorithms To Do

Credits

增加 NER 评估方案

From http://www.davidsbatista.net/blog/2018/05/09/Named_Entity_Evaluation/