Markup - Markdown
About
markdown is a Lightweighted markup language.
Github by default uses its own Markdown syntax called gfm
Syntax
Reference
- http://commonmark.org/ is the reference (GitHub Flavored Markdown Spec based on CommonMark)
- rfc7763 - The text/markdown Media Type
Little Cheatsheet
- Comment: same as for HTML: HTML - Comment
- heading: Atx
# Your title
#(id=#custom-id tight=true bullet_char=-) Your title
#{id=#custom-id tight=true bullet_char=-} Your title
- Image
![Alt text](/path/to/img.jpg)
![Alt text](/path/to/img.jpg "Optional title" =100x20)
![Alt text](/path/to/img.jpg =100)
Tools
Editor
- Eclipse: Wiki Text with outline ! Just install Eclipse. F1 > Markdown Markup Cheat Sheet. Table implementation are just plain HTML
- Idea Intellij with plugin
Generator
Blog
This framework does not have any idea of a link to a md file.
- https://www.11ty.io/docs/languages/markdown/ - Eleventy - Jenkyll replacement -
Book
- Pandoc - Doc - http://pandoc.org/MANUAL.html
Wiki (doc)
mkdocs
geared towards building project documentation.
mkdocs new [dir-name] #- Create a new project.
mkdocs serve #- Start the live-reloading docs server.
- Create a new directory, named site
mkdocs build #- Build the documentation site.
mkdocs build --clean #- Delete the content of the site dir
mkdocs build --help
- Help message
mkdocs help #- Print this help message.
- Homepage: By convention index.md
Github
https://github.github.com/gfm/ - GitHub Flavored Markdown Spec
Gfm was build on top sundown but is now build on top of CommonMarc. 1)
Other Library
- Java: Pegdown, https://github.com/rjeschke/txtmark
- cmark - is the C reference implementation of CommonMark. It's a parsing and rendering library
- sundown - github - Standards compliant markdown processing library in C
- https://github.com/remarkjs/remark/tree/master/packages/remark-parse - Parser that generate the below tree (mdast)
- https://github.com/chjj/marked/ - Js parser
Parsing / Tokenizing Strategy
This paragraph tries to summarize the type of parsing strategy used by several different markdown parser
-
- Two phases:
- Block tree creation (heading, paragraph) where the block may defines when not to close it based on the beginning of the next line.
- Inline parsing of the content of the block
- AST
- Dokuwiki:
- Lexer rules: Tree of lexical mode.
- A mode is:
- a node
- unique
- with a regular expression for (open, closed or selfclosing)
- that can be in several branch (also known as chain)
- The branch concatenates all regular expression in a big one via a or and group expression. It's called a parallel regexp (ie Doku_LexerParallelRegex)
- Lexer output: Sequence of token (not a tree)
- but tokens have a entry/exit state
- and there is special mode to handle complex struct such as list, table
-
- Lexer rules: tree of lexical mode (they call it a chain, but this a branch of the tree)
- A blockquote token may contain paragraph, heading and list chains.
- Lexer output: Sequence of token (not a tree)
- but there is special token called inline token with nested tokens. sequences with inline markup (bold, italic, text, …).
-
- tokens in a nested tree structure 2)
- the mode / token should define if it's an inline or block (but not both)
- a block mode needs to add the inline text in a queue to be parsed later (ie this.lexer.inline(token.text, token.tokens); 3)
- recursive block strategy: Whenever it detects the start of a new container block it 4)
- attempts to find the end of the container block
- parses out all of the text that the container block contains
- removes line prefixes related to the container block
- recursively tokenizes the cleaned contents of the container block
AST
Example of AST: CommonMark (DTD)
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
<list type="ordered" start="1" tight="true" delimiter="period">
<item>
<paragraph>
<text>A paragraph</text>
<softbreak />
<text>with two lines.</text>
</paragraph>
</item>
</list>
</document>