Asset Crawler for common Web pages


Keywords
asset, crawler, browser, web, hexo, migrate, plugin
License
LGPL-3.0
Install
npm install web-fetch@1.4.1

Documentation

Web fetch

Asset Crawler for common Web pages

NPM Dependency CI & CD

NPM

Usage

As a Node.js package

import { savePage } from 'web-fetch';

savePage({
    source: 'http://URL.to/one/of/your/old/posts/'
});

Docker example

https://github.com/kaiyuanshe/KYS-service

As a Command Line tool

web-fetch http://URL.to/one/of/your/old/posts/

In GitHub actions

name: Fetch Web pages
on:
    issues:
        types:
            - opened
jobs:
    Fetch-and-Save:
        runs-on: ubuntu-latest
        permissions:
            contents: write
            issues: write
        steps:
            - uses: actions/checkout@v4

            - uses: pnpm/action-setup@v4
              with:
                  version: 9
            - uses: actions/setup-node@v4
              with:
                  node-version: 20
                  cache: pnpm
            - uses: browser-actions/setup-chrome@v1

            - name: Setup Web-fetch
              run: pnpm i web-fetch -g

            - name: Fetch first URL in Issue Body
              run: web-fetch $(echo "${{ github.event.issue.Body }}" | grep -Eo "https?://\S+")

            - uses: stefanzweifel/git-auto-commit-action@v5
              with:
                  commit_message: '${{ github.event.issue.title }}'

Supported Structure

export const body_tag = [
'article',
'main',
'.article',
'.content',
'.post',
'.blog',
'.main',
'.container',
'body'
];
export const meta_tag = {
title: ['h1', 'h2', '.title', 'title'].map(likeOf),
authors: ['.author', '.publisher', '.creator', '.editor'].map(likeOf),
date: ['.date', '.time', '.publish', '.create'].map(likeOf),
updated: ['.update', '.edit', '.modif'].map(likeOf),
categories: ['.breadcrumb', '.categor'].map(likeOf),
tags: ['.tag', '.label'].map(likeOf)
};

Renderer

  1. Puppeteer (default)

  2. JSDOM

Wrapper

  1. Hexo plugin