4.1 From Jekyll

A note from the authors: Some of the information and instructions in this book are now out of date because of changes to Hugo and the blogdown package. If you have suggestions for improving this book, please file an issue in our GitHub repository. Thanks for your patience while we work to update the book, and please stay tuned for the revised version!

In the meantime, you can find an introduction to the changes and new features in the v1.0 release blog post and this "Up & running with blogdown in 2021" blog post.

— Yihui, Amber, & Alison

When converting a Jekyll website to Hugo, the most challenging part is the theme. If you want to keep exactly the same theme, you will have to rewrite your Jekyll templates using Hugo’s syntax (see Section 2.5). However, if you can find an existing theme in Hugo (https://themes.gohugo.io), things will be much easier, and you only need to move the content of your website to Hugo, which is relatively easy. Basically, you copy the Markdown pages and posts to the content/ directory in Hugo and tweak these text files.

Usually, posts in Jekyll are under the _posts/ directory, and you can move them to content/post/ (you are free to use other directories). Then you need to define a custom rule for permanent URLs in config.toml like (see Section 2.2.2):

[permalinks]
    post = "/:year/:month/:day/:slug/"

This depends on the format of the URLs you used in Jekyll (see the permalink option in your _config.yml).

If there are static assets like images, they can be moved to the static/ directory in Hugo.

Then you need to use your favorite tool with some string manipulation techniques to process all Markdown files. If you use R, you can list all Markdown files and process them one by one in a loop. Below is a sketch of the code:

files = list.files(
  'content/', '[.](md|markdown)$', full.names = TRUE,
  recursive = TRUE
)
for (f in files) {
  xfun::process_file(f, function(x) {
    # process x here and return the modified x
    x
  })
}

The process_file() function from the xfun package takes a filename and a processor function to manipulate the content of the file, and writes the modified text back to the file.

To give you an idea of what a processor function may look like, I provided a few simple helper functions in blogdown, and below are two of them:

blogdown:::remove_extra_empty_lines

function (...) 
xfun::process_file(..., fun = function(x) {
    x = paste(gsub("\\s+$", "", x), collapse = "\n")
    trim_ws(gsub("\n{3,}", "\n\n", x))
})
<bytecode: 0x7fc894c87ef8>
<environment: namespace:blogdown>

blogdown:::process_bare_urls

function (...) 
xfun::process_file(..., fun = function(x) {
    gsub("\\[([^]]+)]\\(\\1/?\\)", "<\\1>", x)
})
<bytecode: 0x7fc8949eb7f0>
<environment: namespace:blogdown>

The first function substitutes two or more empty lines with a single empty line. The second function replaces links of the form [url](url) with <url>. There is nothing wrong with excessive empty lines or the syntax [url](url), though. These helper functions may make your Markdown text a little cleaner. You can find all such helper functions at https://github.com/rstudio/blogdown/blob/master/R/clean.R. Note they are not exported from blogdown, so you need triple colons to access them.

The YAML metadata of your posts may not be completely clean, especially when your Jekyll website was actually converted from an earlier WordPress website. The internal helper function blogdown:::modify_yaml() may help you clean up the metadata. For example, below is the YAML metadata of a blog post of Simply Statistics when it was built on Jekyll:

---
id: 4155
title: Announcing the JHU Data Science Hackathon 2015
date: 2015-07-28T13:31:04+00:00
author: Roger Peng
layout: post
guid: http://simplystatistics.org/?p=4155
permalink: /2015/07/28/announcing-the-jhu-data-science-hackathon-2015
pe_theme_meta:
  - 'O:8:"stdClass":2:{s:7:"gallery";O:8:"stdClass":...}'
al2fb_facebook_link_id:
  - 136171103105421_837886222933902
al2fb_facebook_link_time:
  - 2015-07-28T17:31:11+00:00
al2fb_facebook_link_picture:
  - post=http://simplystatistics.org/?al2fb_image=1
dsq_thread_id:
  - 3980278933
categories:
  - Uncategorized
---

You can discard the YAML fields that are not useful in Hugo. For example, you may only keep the fields title, author, date, categories, and tags, and discard other fields. Actually, you may also want to add a slug field that takes the base filename of the post (with the leading date removed). For example, when the post filename is 2015-07-28-announcing-the-jhu-data-science-hackathon-2015.md, you may want to add slug: announcing-the-jhu-data-science-hackathon-2015 to make sure the URL of the post on the new site remains the same.

Here is the code to process the YAML metadata of all posts:

for (f in files) {
  blogdown:::modify_yaml(f, slug = function(old, yaml) {
    # YYYY-mm-dd-name.md -> name
    gsub('^\\d{4}-\\d{2}-\\d{2}-|[.](md|markdown)', '', f)
  }, categories = function(old, yaml) {
    # remove the Uncategorized category
    setdiff(old, 'Uncategorized')
  }, .keep_fields = c(
    'title', 'author', 'date', 'categories', 'tags', 'slug'
  ), .keep_empty = FALSE)
}

You can pass a file path to modify_yaml(), define new YAML values (or functions to return new values based on the old values), and decide which fields to preserve (.keep_fields). You may discard empty fields via .keep_empty = FALSE. The processed YAML metadata is below, which looks much cleaner:

---
title: Announcing the JHU Data Science Hackathon 2015
author: Roger Peng
date: '2015-07-28T13:31:04+00:00'
slug: announcing-the-jhu-data-science-hackathon-2015
---