Jekyll::URLMetadata
A jekyll plugin to extract meta information from urls and expose them to liquid variables.
This gem was originally authored to be used as a custom plugin for the static site of genicsblog.com
Installation
Add this line to your application's Gemfile inside the jekyll_plugins
group:
group :jekyll_plugins do
# other gems
gem "jekyll-url-metadata"
end
Then, enable the plugin by adding it to the plugins
section in the _config.yml
file:
plugins:
# - other plugins
- jekyll-url-metadata
And then execute:
bundle install
Usage
This plugin is essentially a filter and works on any valid URL string provided inside a liquid tag. Use it as below:
{% assign meta = "https://wikipedia.org" | metadata %}
The metadata
filter extracts the meta data from the given url string and returns the data as a Hash
.
These are the values that are extracted:
- The
<title>
tag. - The
<meta>
tags that have aname
,property
orcharset
fields. - The
<link>
tags with arel
attribute.
The expected output for {{ meta }}
from the above example would be:
{
"title" => "Wikipedia",
"charset" => "utf-8",
"description" => "Wikipedia is a free online encyclopedia, created and edited by volunteers around the world and hosted by the Wikimedia Foundation.",
"viewport" => "initial-scale=1,user-scalable=yes",
"apple-touch-icon" => "/static/apple-touch/wikipedia.png",
"shortcut icon" => "/static/favicon/wikipedia.ico",
"license" => "//creativecommons.org/licenses/by-sa/3.0/",
"preconnect" => "//upload.wikimedia.org"
}
Connection time-outs
By default, the connection request to the specified domain closes after about 1
second if there is an error.
To override the default behavior, add this block to the _config.yml
file:
url_metadata:
open_timeout: 5 # timeout after 5 second if connection doesn't open
read_timeout: 3 # timeout after 3 second if there's no data returned
Note: You may use any number for the timeout seconds, but generally a number less than
10
is ideal for better performance because the processing of the static pages is blocked until some data is returned from a website.
Caching
Once the initial data is returned from the website, the extracted parameters are stored in the .jekyll-cache
folder under Jekyll--URLMetadata
plugin cache folder.
The cache improves subsequent build times in the local development environment.
If you wish to purge the cache, simple delete the Jekyll--URLMetadata
folder and the plugin will re-generate the cache in the next run.
Use cases
- Creating beautiful social previews for links by fetching meta data for URLs at build time.
- Determining meta data from a website at jekyll build time to evaluate and perform certain action.
License
The gem is available as open source under the terms of the MIT License.