Typographic improvements for Jekyll/kramdown

My blog is build with Jekyll and I write my posts with kramdown. This works nicely, but I wanted to implement some typographic subtleties that need support in the converter that produces the HTML. This blog post describes how I implemented this using some small Jekyll plugins. All of these plugins can be found on the plugin page of the blog.

No hyphenation for URLs

The blog enables hyphenation via CSS (hyphens: auto;). This works well, but certain content should not be hyphenated. URLs may contain dashes, so any additional hyphenation leads to the display of invalid URLs. While the actual link still works, the user will see the mangled string. Of course, I cannot simply disable hyphenation for links (<a> tags), since the link text may consist of normal words, such as this one. Hyphenation should be disabled only if the link text is an actual URL, like https://en.wikipedia.org/. For that I use a simple heuristic: If the link text and the actual link are the same, disable hyphenation. This is implemented by adding a class no-hyphenation to these links and disabling hyphenation with CSS.

The Jekyll plugin (at the time of writing) subclasses the kramdown HTML converter to modify how kramdown produces links:

class Kramdown::Converter::TBHtml < Kramdown::Converter::Html
  def convert_a(el, indent)
    link_text = inner(el, indent)
    if el.attr["href"] == link_text
      if el.attr.key?("class")
        el.attr["class"] += " no-hyphenation"
      else
        el.attr["class"] = "no-hyphenation"
      end
    end
    super(el, indent)
  end
end

This simply adds a class no-hyphenation to the <a> element if the link equals the link text. This class can then be styled using CSS. In case the automatic detection fails, you can still add the class manually. The CSS is very simple:

.post-content {
    hyphens: auto;
}
.post-content .no-hyphenation {
    hyphens: manual;
}

The final missing piece is how to make Jekyll use this generator. First, we need to register a new Markdown dialect:

class Jekyll::Converters::Markdown
  class KramdownTB < KramdownParser
    def convert(content)
      Kramdown::Document.new(content, @config).to_t_b_html
    end
  end
end

This code subclasses the standard kramdown converter in order to keep all its features intact. Finally, add something like the following to _config.yml:

markdown: KramdownTB

I unimaginatively used my initials and named the converter KramdownTB, but the name does not matter. If adapting this code, you should take note of the magic in the above Ruby snippet: The name of the converter class (TBHtml) is mangled into the function to_t_b_html, i.e., every uppercase letter is converted to lowercase followed by an underscore.

Adding space to ALL CAPS

If you use all caps text, you usually want to add some letterspacing. Why? Because it is easier to read and looks better: lammps vs. LAMMPS. The effect is subtle, but I think it improves legibility. While one should generally avoid all caps text, the appearance of acronyms is also improved. Therefore, I wanted this on my blog.

There is no 100% reliable detection of all caps, so I use heuristics that are good enough in most cases. The rest can be fixed manually. The plugin implementing this searches for the regular expression:

[\p{Upper}0-9](?:[\p{Upper}0-9.'’]|&amp;)+(?!\w\w)

The first part, [\p{Upper}0-9], matches strings that start with with a number or an uppercase letter.

This must be followed by one or more uppercase letters, numbers, full stops, apostrophes, or ampersands: (?:[\p{Upper}0-9.'’]|&)+.

Finally, (?!\w\w), the matched string may not be followed by more than one letter. This ensures that plurals like “URLs” are matched, but not “XCharter”.

The code looks somewhat like this:

def span_allcaps(text)
  text.gsub(%r{[\p{Upper}0-9]
               (?:[\p{Upper}0-9.'’]|&amp;)+
               (?!\w\w)
               }x) do |m|
    if m.to_s.gsub(/\p{^Upper}/, "").length > 1
      "<span class=\"allcaps\">#{m}</span>"
    else
      m
    end
  end
end

There is an additional check to ensure that we have not matched a number, but have at least one letter in there. The matching CSS is simply:

.allcaps {
    letter-spacing: 0.05em;
}

I found that the best place to employ this function is in the convert_text method of the kramdown HTML converter. So let’s add it to the TBHtml class from above:

class Kramdown::Converter::TBHtml < Kramdown::Converter::Html
  def convert_text(el, indent)
    return span_allcaps(super)
  end
end

In my testing this seems to apply it in all the right cases.

Small caps

Sometimes, it is nice to have support for small caps in your text. In theory, we can do this in Markdown by adding a class sc

*small caps*{:.sc}

and styling it using CSS:

.sc {
    font-style: normal; /* disable italics */
    font-variant: small-caps;
    font-variant-numeric: oldstyle-nums; /* small caps look
                        best together with oldstyle numbers */
}

For my blog, there is one problem though: The font that I use, Bitstream Charter, does not support small caps. One would have to buy those and this is not only expensive but comes with weird licenses. I wanted to stay with free software. So what happens when applying the above CSS? Ugliness: what is this? These fake small caps should be completely avoided. Luckily, there is a little-known solution. Michael Sharpe created an extension to the Charter font for LaTeX and named it XCharter. This package also contains OpenType font files that can be converted for the web. Using this font for small caps yields much better results: in july 1969, man first landed on the moon. As you can see, the font also supports old style numbers.

Small improvements for great effect?

A large part of good typography is attention to detail. I am only an amateur, but I like how such simple steps can improve the quality of typesetting. Maybe I’m the only one who will ever notive these little details, but I believe the readability is improved either way.