# Remove duplicates 重复内容筛选 tkitSimhash zh 根据经验,一般当两个文档特征字之间的汉明距离小于 3, 就可以判定两个文档相似。《数学之美》一书中,在讲述信息指纹时对这种算法有详细的介绍。 ```python from tkitSimhash import simHash sim=simHash() text1 = """' , in Valve's absence, the modern slew of co-op zombie games have not


Install
pip install tkitSimhash==0.0.1.9