metadata-detector

Library to detect metadata from html files.

Currently detects a title and a publish date of the html.

Usage

Clojure

Add a dependency to your project.clj:

[lt.tokenmill/metadata-detector "0.1.0" ]

In REPL type in:

(require '[metadata-detector.core :refer [detect]])
; => nil
(detect "moo" (slurp "test/data/en/abcnews.html"))
; => {:title "Galaxy May Be Full of 'Second Earths'", :date "2013-02-06"}

Java

As of now the JAR is stored in Clojars, therefore maven is not going to find the artifact. You need to add the repository information to your pom.xml:

<repositories>
    <repository>
        <id>clojars.org</id>
        <url>http://clojars.org/repo</url>
    </repository>
</repositories>

Add a dependency to your `pom.xml`.

```xml
<dependency>
    <groupId>lt.tokenmill</groupId>
    <artifactId>metadata-detector</artifactId>
    <version>0.1.0</version>
</dependency>

In your Java class:

package lt.tokenmill.metadatadetector;

import lt.tokenmill.metadatadetector.MetadataDetector;
import lt.tokenmill.metadatadetector.DocumentMetadata;

public static void main(String[] args) {
    MetadataDetector metadataDetector = new MetadataDetector();
    DocumentMetadata documentMetadata = metadataDetector.detect("url", new String(Files.readAllBytes(Paths.get("test/data/en/abcnews.html"))));
    String title = documentMetadata.getTitle();
    String publishDate = documentMetadata.getPublishDate();
}

Note that metadata-detector depends on org.clojure/clojure which must be provided.

To add clojure dependency add this snippet to your pom.xml:

<dependency>
    <groupId>org.clojure</groupId>
    <artifactId>clojure</artifactId>
    <version>1.8.0</version>
</dependency>

TODO

[] detect author of the document.

Stats

Dependent repositories

Total releases

Latest release

Feb 7, 2018

First release

Feb 7, 2018

Stars

Forks

Watchers

Contributors

Repository size

1.48 MB

SourceRank

Development practices

Source repo 2FA enabled

TEXT!

Package manager 2FA enabled

TEXT!

Is security responsive

TEXT!

Dependencies are managed

TEXT!

Issue-free release available

TEXT!

Succession plan available

TEXT!

Package manager 2FA enabled

TEXT!

The Tidelift Subscription provides access to a continuously curated stream of human-researched and maintainer-verified data on open source packages and their licenses, releases, vulnerabilities, and development practices.

Learn more →

lt.tokenmill/metadata-detector
Release 0.1.0

Release 0.1.0

0.1.0

0.1.1

0.1.2

0.1.3

0.1.4

0.1.5

0.1.6

Documentation

metadata-detector

Usage

Clojure

Java

TODO

Stats

Development practices

Releases

Contributors

lt.tokenmill/metadata-detector Release 0.1.0

Release 0.1.0 Toggle Dropdown 0.1.0 0.1.1 0.1.2 0.1.3 0.1.4 0.1.5 0.1.6

Documentation

metadata-detector

Usage

Clojure

Java

TODO

Stats

Development practices

Releases

Contributors

lt.tokenmill/metadata-detector
Release 0.1.0

Release 0.1.0

0.1.0

0.1.1

0.1.2

0.1.3

0.1.4

0.1.5

0.1.6