landmark_ml

Machine learning library for the landmark set of tools


License
GPL-3.0
Install
pip install landmark_ml==0.1.1

Documentation

Landmark Machine Learning

Unsupervised Learning

from landmark_ml.learning import RuleLearnerAllSlots
page_dir = '~/tmp/html_pages/'
rules = RuleLearnerAllSlots.run(page_dir)
print json.dumps(json.loads(rules.toJson()), sort_keys=True, indent=2, separators=(',', ': '))

Clustering

On HTML directory alone
python -m landmark_ml.learning.PageClusterer [HTML_DIRECTORY]
On HTML with CDR directory and apply extractions
python -m landmark_ml.runclustering -d directory_above_html [OPTIONAL_SINGLE_SITE]
On HTML with CDR directory and apply extractions and copy to landmark-ui
python -m landmark_ml.runclustering -d directory_above_html -o optional_webapp_projects_dir [OPTIONAL_SINGLE_SITE]