md-toc 8.2.0 documentation

  • Installation
  • Developer Interface
  • Features
  • Markdown spec
  • Rules
    • Anchor link types and behaviours
    • Code fence
    • Headers
    • Link label
    • List items
    • TOC marker
  • Pre-commit hook
  • Contributing
  • Workflow
  • Source code
  • Copyright and License
Theme by the Executable Book Project
Contents
  • Generic
    • cmark , github
    • gitlab
    • redcarpet
  • Emphasis
    • cmark , github , gitlab

Anchor link types and behaviours¶

Generic¶

cmark, github¶

A translated version of the Ruby algorithm is used in md-toc. The original one is repored here:

  • https://github.com/jch/html-pipeline/blob/master/lib/html/pipeline/toc_filter.rb

I could not find the code directly responsable for the anchor link generation. See also:

  • https://github.github.com/gfm/

  • https://githubengineering.com/a-formal-spec-for-github-markdown/

  • https://github.com/github/cmark/issues/65#issuecomment-343433978

Apparently GitHub (and possibly others) filter HTML tags in the anchor links. This is an undocumented feature (?) so the remove_html_tags function was added to address this problem. Instead of designing an algorithm to detect HTML tags, regular expressions came in handy. All the rules present in https://spec.commonmark.org/0.28/#raw-html have been followed by the letter. Regular expressions are divided by type and are composed at the end by concatenating all the strings. For example:

1# Comment start.
2COS = '<!--'
3# Comment text.
4COT = '((?!>|->)(?:(?!--).))+(?!-).?'
5# Comment end.
6COE = '-->'
7# Comment.
8CO = COS + COT + COE

HTML tags are stripped using the re.sub replace function, for example:

line = re.sub(CO, str(), line, flags=re.DOTALL)

GitHub added an extension in GFM to ignore certain HTML tags, valid at least from versions 0.27.1.gfm.3 to 0.29.0.gfm.0:

  • https://github.github.com/gfm/#disallowed-raw-html-extension-

  • https://github.com/github/cmark-gfm/blob/fca380ca85c046233c39523717073153e2458c1e/extensions/tagfilter.c

gitlab¶

New rules have been written:

  • https://docs.gitlab.com/ee/user/markdown.html#header-ids-and-links

redcarpet¶

Treats consecutive dash characters by tranforming them into a single dash character. A translated version of the C algorithm is used in md-toc. The original version is here:

  • https://github.com/vmg/redcarpet/blob/6270d6b4ab6b46ee6bb57a6c0e4b2377c01780ae/ext/redcarpet/html.c#L274

See also:

  • https://github.com/vmg/redcarpet/issues/618#issuecomment-306476184

  • https://github.com/vmg/redcarpet/issues/307#issuecomment-261793668

Emphasis¶

To be able to have working anchor links, emphasis must also be removed from the link destination.

cmark, github, gitlab¶

At the moment the implementation of emnphasis removal is incomplete because of its complexity. See:

  • https://spec.commonmark.org/0.30/#emphasis-and-strong-emphasis

The core functions for this feature have been ported directly from the original cmark source with some differences:

  1. things such as string manipulation, mallocs, etc are different in Python

  2. the cmark_utf8proc_charlen uses length = 1 instead of length = utf8proc_utf8class[ord(line[0])] (causes list overflow).

    The cmark_utf8proc_charlen function is related to the cmark_utf8proc_encode_char function. Have a look at that function to know character lengths in cmark.

    In Python 3, since all characters are UTF-8 by default, they are all represented with length 1. See:

    • https://rosettacode.org/wiki/String_length#Python

    • https://docs.python.org/3/howto/unicode.html#comparing-strings

As of the release md-toc 8.1.2, cmark-gfm is still at version 0.29. Moreover, certain code sections used in the emphasis processing are not the same of cmark 0.29. See this one for example:

  • https://github.com/github/cmark-gfm/blob/0.29.0.gfm.3/src/inlines.c#L639-L654

  • https://github.com/commonmark/cmark/blob/0.29.0/src/inlines.c#L615-L621

For the moment md-toc uses the original cmark source only as reference for emphasis processing.

previous

Rules

next

Code fence

By Franco Masotti
© Copyright 2017-2024, Franco Masotti.
Last updated on 2024-03-27 00:33:23 +0000.