Hyphenation for node and Polyfill for client-side hyphenation.


Keywords
hyphenation, html, polyfill, hyphens, hyphen, soft, hyphenate, JavaScript, wasm, Webassembly, hyphenation-algorithm
License
MIT
Install
npm install hyphenopoly@5.3.0

Documentation

Hyphenopoly.js

CircleCI Coverage Status npms score Codacy Badge

Hyphenopoly.js is a JavaScript-polyfill for hyphenation in HTML: it hyphenates text if the user agent does not support CSS-hyphenation at all or not for the required languages and it is a Node.js-module.

The package consists of the following parts:

  • Hyphenopoly_Loader.js (~11KB unpacked, ~2KB minified and compressed): feature-checks the client and loads other resources if necessary.
  • Hyphenopoly.js (~36KB unpacked, ~5KB minified and compressed): does the whole DOM-foo and wraps wasm.
  • wasm-Modules (sizes differ! e.g. en-us.wasm: ~21KB uncompressed, ~15KB compressed): core hyphenation functions and hyphenation patterns in a space saving binary format (including pattern license).
  • hyphenopoly.module.js: the node module to hyphenate plain text strings.

Usage (Browser)

Place all the code for Hyphenopoly at the top of the header (immediately after the <title> tag) to ensure resources are loaded as early as possible.

You'll have to insert two script blocks. In the first block, load Hyphenopoly_Loader.js as an external script. In the second block, provide the initial configurations for Hyphenopoly_Loader as an inline script. This also triggers all further steps.

Also, don't forget to enable CSS hyphenation.

Example:

<!DOCTYPE html>
<html>
    <head>
        <meta http-equiv="content-type" content="text/html; charset=UTF-8">
        <title>Example 1</title>
        <script src="./Hyphenopoly_Loader.js"></script>
        <script>
        Hyphenopoly.config({
            require: {
                "la": "honorificabilitudinitas",
                "de": "Silbentrennungsalgorithmus",
                "en-us": "Supercalifragilisticexpialidocious"
            },
            setup: {
                selectors: {
                    ".container": {}
                }
            }
        });
        </script>
        <style type="text/css">
            body {
                width:60%;
                margin-left:20%;
            }
            p {
                text-align: justify;
                margin: 0 2em 0 0;
            }
            .container {
                display: flex;
                hyphens: auto;
                -ms-hyphens: auto;
                -moz-hyphens: auto;
                -webkit-hyphens: auto;
            }
        </style>
    </head>
    <body>
        <h1>Example 1</h1>
        <div class="container">
            <p lang="la">Qua de causa Helvetii quoque reliquos Gallos virtute praecedunt, quod fere cotidianis proeliis cum Germanis contendunt, cum aut suis finibus eos prohibent aut ipsi in eorum finibus bellum gerunt.</p>
            <p lang="en-us">For which reason the Helvetii also surpass the rest of the Gauls in valor, as they contend with the Germans in almost daily battles, when they either repel them from their own territories, or themselves wage war on their frontiers.</p>
            <p lang="de">Aus diesem Grund übertreffen auch die Helvetier die übrigen Gallier an Tapferkeit, weil sie sich in fast täglichen Gefechten mit den Germanen messen, wobei sie diese entweder von ihrem Gebiet fernhalten oder selbst in deren Gebiet kämpfen.</p>
        </div>
    </body>
</html>

Let's go through this example step by step:

UTF-8

Make sure your page is encoded as utf-8.

script blocks – load, configure and run Hyphenopoly_Loader.js

Hyphenopoly_Loader.js needs some information to run. This information is provided as a parameter object to the function Hyphenopoly.config(). This information is stored in a globally accessible Object called window.Hyphenopoly. Hyphenopoly_Loader.js and (if necessary) Hyphenopoly.js will add other methods and properties only to this object – there will be no other global variables or functions beyond this object.

require

The configuration object must have exactly one property called require which itself is an object containing at least one nameValuePair where the name is a language code string (Some languages are region-specific. See the patterns directory for supported languages. E.g. just using en won't work, use en-usor en-gb) and the value is a long word string in that language (preferably more than 12 characters long).

If you want to force the usage of Hyphenopoly.js for a language (e.g. for testing purposes), write "FORCEHYPHENOPOLY" instead of the long word.

Hyphenopoly_Loader.js tests if the client (aka browser, aka user agent) supports CSS hyphenation for the language(s) given in require. In the example above, it will test if the client supports CSS-hyphenation for Latin, German and US-English.

If one of the given languages isn't supported, it automatically hides the document's contents and loads Hyphenopoly.js and the necessary WebAssembly modules.

Hyphenopoly.js – once loaded – will hyphenate the elements according to the settings and unhide the document when it's done.

If something goes wrong and Hyphenopoly.js is unable to unhide the document, Hyphenopoly_Loader.js has a timeout that kicks in after some time (defaults to 1000ms) and unhides the document and writes a message to the console.

If the browser supports all required languages, the script deletes the Hyphenopoly-object and terminates without further ado.

enable CSS-hyphenation

Hyphenopoly by default hyphenates elements (and their children) with the classname .hyphenate. Don't forget to enable CSS-hyphenation for the classes eventually handled by Hyphenopoly.

Usage (node)

Try hyphenopoly on RunKit

Install:

npm i hyphenopoly
import hyphenopoly from "hyphenopoly";

const hyphenator = hyphenopoly.config({
    "require": ["de", "en-us"],
    "hyphen": "•",
    "loader": async (file) => {
        const {readFile} = await import("node:fs/promises");
        const {dirname} = await import("node:path");
        const {fileURLToPath} = await import("node:url");
        const cwd = dirname(fileURLToPath(import.meta.url));
        return readFile(`${cwd}/../patterns/${file}`);
    },
    "exceptions": {
        "en-us": "en-han-ces"
    }
});

async function hyphenate_en(text) {
    const hyphenateText = await hyphenator.get("en-us");
    console.log(hyphenateText(text));
}

async function hyphenate_de(text) {
    const hyphenateText = await hyphenator.get("de");
    console.log(hyphenateText(text));
}

hyphenate_en("hyphenation enhances justification.");
hyphenate_de("Silbentrennung verbessert den Blocksatz.");

Support this project

PayPal

Automatic hyphenation

The algorithm used for hyphenation was developed by Franklin M. Liang for TeX. It works more or less like this:

  1. Load a set of precomputed language specific patterns. The patterns are stored in a structure called a trie, which is very efficient for this task.
  2. Collect all patterns that are a substring of the word to be hyphenated.
  3. Combine the numerical values between characters: higher values overwrite lower values.
  4. Odd values are hyphenation points (except if the hyphenation point is left from leftmin and right from rightmin), replace them with a soft hyphen and drop the other values.
  5. Repeat steps 2. - 4. for all words longer than minWordLength

Example:

Hyphenation
h y p h e n a t i o n
h y3p h
      h e2n
      h e n a4
      h e n5a t
         1n a
          n2a t
             1t i o
               2i o
                  o2n
h0y3p0h0e2n5a4t2i0o2n
Hy-phen-ation

The patterns are precomputed and available for many languages on CTAN and tex-hyphen. For Hyphenopoly.js they are converted to a succinct trie data structure (including pattern license, metadata, and the patterns).

The original patterns are computed from a large list of hyphenated words by a program called patgen. They aim to find some hyphenation points – not all – because it's better to miss a hyphenation point than to have some false hyphenation points. Most patterns are really good, but none are error free.

These patterns vary in size. This is mostly due to the different linguistic characteristics of the languages.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

Stephan Hoyer
Stephan Hoyer

📖 💻
Thomas Broadley
Thomas Broadley

📖
Kai Lüke
Kai Lüke

💻
Sebastian Blank
Sebastian Blank

💡
ReLater
ReLater

🚧
julian-zatloukal
julian-zatloukal

📖
Maik Jablonski
Maik Jablonski

📖
yashha
yashha

💻
Dan Burzo
Dan Burzo

💻
Tobias Speicher
Tobias Speicher

💻

This project follows the all-contributors specification. Contributions of any kind welcome!