1.6.2011 - Wiki .NET Parser II

After more then a year experience with the Wiki .NET Parser (C#), there is a brand new release.  It is running on the latest ultimate engine http://antlr.org. and bringing new features. 

+ There is finally the Table syntax support 

|| heading1 || heading2
| cell1 | cell2

+ Bold and italic codeplex style

new *bold text* and existing [B:bold text]
new _italic text_  and existing [I:italic text]

+ lists in block and inline style  (both with nesting)

* inline 1 (new)
* inline 2
[LI (existing)
item 1
item 2
LI]

+ extended URL functionality:
[URL: click here | http://catarsa.com]

(and of course... supporting any previously written Wiki .NET Syntax texts)  

Introduction  

Wiki .NET Parser (C#) is an open source project, which provides the powerful tool for CMS (Content management systems). Parser has simple rules, documented here and in action used on the Catharsis framework portal http://catarsa.com

Wiki .NET Parser is distributed as an Open source code (or build library) here: http://catarsa.com/Articles/Code/WikiNetParser

Example web-application

28.4.2010 - There is new download available: Wiki .NET Parser example: sample web-application providing you testing environment for Wiki .NET Syntax: http://catarsa.com/Articles/Code/Wiki-Example

There is only one 'Default.aspx' page with three columns

  1. The left is showing the result - HTML rendered in web-browser
  2. The central column contains the EDITOR (type your test code in there)
  3. The right column is showing the HTML code - to give you information what is the real result of the Wiki .NET Parser

Wiki .NET Parser engine: ANTLR  

The Wiki .NET Parser has an ultimate engine http://antlr.org. This is a tool which name means: "ANother Tool for Language Recognition", is written in JAVA but available also for us in .NET, C#. ANTLR is exceptional tool for any type of text parsing. If you will think in future about any type of text reading parsing and rewriting, do not forget ANTLR

Wiki .NET Parser evolution

The Wiki .NET Parser was result of the day-to-day web-application Catharsis framework need.

Whenever we've created Application, users were demanding "Documentation". Some parts was the basic UI description, which was goal of developers. But then came the request for publishing the "Documentation of the Business process".  

We suggested to publish Word or PDF documents. It was good, but it was a bit complicated to revise, to publish new versions. And hard to search.

Next step was conversion to HTML. We started 1) to convert the Document files into HTML elements, 2) to be mad and upset, whenever new revision came. The question was: how to allow editing for our users, - with brutally simple syntax - with absolutely safe result (no JS injection) - to do it for free.... (if possible)

Firstly, we asked 'Google' for help. We did found some open source parsers, but they where funny or extremely robust. We needed only one simple method to be called in Wiki .NET Syntax, returning the HTML.

var result = ProjectBase.Tools.Wiki.WikiProvider.ConvertToHtml(sourceText);

Why ANTLR? NHibernate!

Then, like a mana from heaven, the NHibernate 2.1 was released, with new 'core' external library needed: ANTLR. This tool replaced the previous HQL parsers. Why? because it was incredibly easy to use. You are just defining your lexer, grammar and result parser.

We've investigated how it works. And soon find out, that it is not only 'easy' but also supporting both words: Java, .NET. While our company has both technologies in use, this was a big plus. We were able to create ONE Wiki .NET Syntax, one lexer, grammar and two different parsers (JAVA, C#)

Happy user

In our web applications we are providing an entity: Article. User can Add, Update, Delete and also publish it. Syntax is trivial. Resulting code is the purist HTML standard. HTML cannot be any how JS injected. Nice. And Easy.

Wiki .NET Syntax overview

In next paragraphs you can find out brief description of the Wiki .NET Syntax. If you'll be interesting how it works: 1) download the source code 2) download the binaries with ONE Method ConvertToHTML() 3) observe the Wiki .NET Parser in action: 1) http://catarsa.com/ Open Source portal is powered by this tool 4) Try to use the Catharsis framework which out of the box supports and uses Wiki .NET Parser

Examples of Wiki .NET Syntax

LastUpdateDate : 3/27/2010 8:46 AM

Wiki .NET Syntax

Description of the Wiki .NET Syntax. How to write headers, text, hyper-links, append images and zipped files and many more. Every element is described and provided with sample code. 
The major intention of the Wiki .NET Syntax was to provide as easy way to describe text as possible. Once this target is fit, then Wiki .NET Parser can be used in a day to day practice.

latest downloads: http://catarsa.com/Articles/Download/WikiParser

WIKI.NET

  1. Simple Text
  2. Headers and Paragraphs
  3. Navigation (fixed for any url)
  4. Image
  5. Lists  (extended)
  6. Decorations and Colors    (extended)
  7. Block
  8. Escaped character syntax
  9. Tables    (new)

Notes 

This document was assembled with the Wiki .NET Syntax and converted with the Wiki .NET Parser  (and therefore could be taken as a proof of concept).

Wiki .NET Parser is based on the ANTLR3 for C#.NET. Very few rows of the lexer, grammar and AST Parser; future extensibility; easy maintainability; exceptional performance

Syntax of the Wiki .NET is (from the highlighting point of view) mostly C# and XML oriented.

In next paragraphs there is complete description of the syntax with examples. Syntax elements could be split into these categories:

  • Blocks (special formatting like XML)
  • Inline Blocks (text)
  • Decoration (styles, colors)
  • Hyperlinks and images

As in another languages there is a small set of the special leading characters used in Wiki .NET Syntax:

  •   [ (opening block)
  •   ] (closing block)
  •   | (inline block; attribute separator)
  •   : (attribute value starts)

These signs combined with keywords provides complete Wiki .NET playground for safe text publishing.

Simple Text

top

Keywords

  • (there are NO keywords)

Description

You can type text as it goes. It will be rendered as usual text without any formatting.

The newlines (ENTER) will be rendered as a html specific new lines (instead of "\r\n" - "<br />").

More Spaces (more then 1) will be all rendered (not skipped). (Standard HTML browser will convert multi-space into single one)

Empty lines are skipped. To append empty line to the output, place the White space on the line (white space and press ENTER)

Syntax

There is no special syntax, just type...

Example

wiki

1) Text with the
broken lines
2) Text with spaces

result

1) Text with the
broken lines
2) Text with    spaces

Headers and Paragraph

top

Headers and Paragraphs has the syntax based on 3 parts

  1. | Leading vertical line at the beginning
  2. H1 The name of the used header or paragraph
  3. ' ' One WHITE SPACE!

Keywords

  • |H1 .. |H6   (headers)
  • |P           (paragraph)
  • |P1 .. |P3   (paragraphs with different margins)
  • |PC          (center aligned paragraph)
  • |PR          (right aligned paragraph)
  • |CITE        (cite text (italic))
  • |BQ          (blockquote text with larger padding)

Description

Headers and Paragraphs allows formatting based on 'inline block' style.

It means, that the text between the opening symbol (e.g. '|P ' and the end of line (ENTER) is rendered as a paragraph <p>...</p>)

Syntax

|H1 Header 1
- > <h6>Header 1 </h6>
..
|H6 Header 6
- > <h6>Header 6</h6>

|P Usuall text...
- > <p>Usuall text...</p>

Example

wiki 

 
  |H3  Lorem ipsum<br /> 
  |P1  Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed dor eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. <br /> 
 <br /> 
  |P2  Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum<br /> 
 <br /> 
  |CITE  Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. <br /> 
 <br /> 
  |BQ  Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. <br /> 
 <br /> 
  |PR  Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem.<br /> 

result

Lorem ipsum

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed dor eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum

Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.
Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt.

Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem.

Navigation

top 

Keywords

  • URL 
  • URL | TEXT
  • NAME

Description

There are two keywords: URL and NAME for creating the hyperlinks.

The first (URL) will result in the hyperlink with the href attribute (where to navigate).

The second will create the anchor  (NAME) to which could be page scrolled (e.g. the 'top' hyperlinks navigating to the top of the page)

Syntax 

[URL:index.htmhttp://www.codeproject.com/KB/aspnet/#top] // only url
==> <a ref="index.htmhttp://www.codeproject.com/KB/aspnet/#top">index.htmhttp://www.codeproject.com/KB/aspnet/#top</a>
[URL: click here | index.htmhttp://www.codeproject.com/KB/aspnet/#top]          // short text first
==> <a ref="index.htmhttp://www.codeproject.com/KB/aspnet/#top" >click here</a> // then vline '|' and url

// OR
[URL:  index.htmhttp://www.codeproject.com/KB/aspnet/#top |TEXT: click here]    // first url
==> <a ref="index.htmhttp://www.codeproject.com/KB/aspnet/#top" >click here</a> // then attr '|TEXT:' and text

[URL: "index.htmhttp://www.codeproject.com/KB/aspnet/#top" |TEXT: click here]   // quoted url (if too long and complicated url)
==> <a ref="index.htmhttp://www.codeproject.com/KB/aspnet/#top" >click here</a> // |TEXT: next with an attr 'TEXT:'  

Example

Wiki 

[URL:http://www.codeproject.com/KB/aspnet/#top]   - only url
[URL:TOP | http://www.codeproject.com/KB/aspnet/#top]  - text first
[URL: http://www.codeproject.com/KB/aspnet/#top  |TEXT: TOP] - TEXT: attribute
[URL:"http://www.codeproject.com/KB/aspnet/#top" |TEXT: TOP] - quoted url

Result

http://www.codeproject.com/KB/aspnet/#top  - only url
TOP - short text first
TOP  - TEXT: attribute
TOP  - quoted url

Image

top 

Keywords

  • IMG
  • HEIGHT
  • WIDTH
  • FLOAT

Description

IMG element allows you to insert the image into to the page.

The URL for image should be relative, for instance 'i/cs.png'

There is an issue with the virtual directories for applications. E.g. instead of http://mydomain/ there is http://mydomain/myvirtualdir/. In these situations there is a background call which will change the IMG src attribute from  'i/cs.png' to  'myvirtualdir/i/cs.png' and the image will be correctly displayed.

Syntax

 
[IMG:i/cs.png]<br /> 
    - > <img src="i/cs.png" /><br /> 
 <br /> 
[IMG:i/cs.png |HEIGHT:100px|WIDTH:80px]<br /> 
    - > <img src="i/cs.png"<br /> 
             style="height:100px;width:80px;" /><br /> 
 <br /> 
[IMG:i/cs.png |FLOAT:right]<br /> 
    - > <img src="i/cs.png" <br /> 
             style="float:right;" /><br /> 
 <br /> 

Example

wiki

 
  <br /> 
  |P1  Lorem ipsum dolor [IMG:i/cs.png]  sit amet, consectetur adipisicing elit, sed dor eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. <br /> 
 <br /> 
  |P2  Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. [IMG:i/cs.png|HEIGHT:100px| WIDTH:80px]  Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum<br /> 
 <br /> 
  |P3  Sed [IMG:i/cs.png|FLOAT:right]  ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. <br /> 

result

Lorem ipsum dolor /i/cs.png  sit amet, consectetur adipisicing elit, sed dor eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. /i/cs.png  Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum

Sed /i/cs.png ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.

Lists

top

There are two types of lists: "With" or "Without" numbers.

Keywords

    Block syntax, uses starting and closing tags LI or/and NUM to wrap selected rows as list items.

    • '[LI'   'LI]'  
    • '[NUM'  'NUM]'  

    Inline syntax will render every row as a list item if starting with one of below statements (the space sign after asterix or hash is important)

    • '* '
    • '** '
    • '*** '
    • '# '
    • '## '
    • '### ' 

      Description

      The lists behave as the block elements. It means, that they have opening and closing tags. Every row between these marks is evaluated as the list item.

      Block Syntax 

      <code class="code"><span style="color: Purple;">[LI
      meals
      colors
      [NUM
      blue
      red
      NUM]
      LI]</span> 

      Inline Syntax 

      * meals
      * colors
      ## blue
      ## read
      

      Result for both

      <ul>
      <li>meals</li>
      <li>colors
      <ol>
      <li>blue</li>
      <li>red</li>
      </ol></li>
      <ul>
      
       
       <br /> 
      

      Example

      wiki

      // blocks
      [LI
      Meals:
      Supported colors are:
      [NUM
      blue
      red
      green
      NUM]
      LI]
      // inlines
      * Meals:
      * Supported colors are:
      ## blue
      ## red
      ## green
      

      result 

      // blocks
      • Meals:
      • Supported colors are:
        1. blue
        2. red
        3. green
      // inlines
      • Meals:
      • Supported colors are:
        1. blue
        2. red
        3. green 

      Decorations and Colors

      top

      Keywords 

      Special bold and italic: 

      • *bold text*   bold text 
      • _italic text_ italic text 

      Decorations:

      • B     bold
      • I     italic
      • U     underline
      • S     striked
      • V     variable
      • SAMP  sample
      • BIG   big
      • SMALL small
      • SUP   sup *sup
      • SUB   sub *sub

      Colors:

      • BLUE   blue
      • RED    red
      • GREEN  green
      • OLIVE  olive
      • LIME   lime
      • PURPLE purple
      • YELLOW yellow
      • MAROON maroon
      • SILVER silver
      • ORANGE orange
      • NAVY   navy
      • PINK   pink

      Description

      Decoration styles and colors are 'inline' elements. This means that these elements cannot contain new lines!

      You can use them almost anywhere and if needed they can be even nested (e.g. bold and blue)

      Syntax

      // bold and italic inside some row
      .. not bold *now bold until here* not bold ..
      ==> .. not bold <b>now bold until here</b> not bold ..
      .. not italic _now italic until here_ not italic ..
      ==> .. not italic <i>now italic until here</i> not italic ..
      // bold and italic
      [B:bold]
      ==> <b>bold</b>
      [I:italic]
      ==> <i>italic</i>
      // COLOR example: blue and purple
      [BLUE:blue]
      - > <span style="color:blue;"
      >blue</span>
      [PURPLE:purple]
      - > <span style="color:purple;"
      >purple</span> 
      

      Example

      wiki

      |CITE This text could be partially [B:bold end even [ORANGE:orange]]. And that all inside the big [BIG:CITE] element.

      result

      This text could be partially bold end even orange. And that all inside the CITE element.

      Block

      top

      There are 3 special block elements for specific text rendering: CODE, XML, HTML. There is also the BlockQuote element: BQ.

      Keywords

      • CODE   C# syntax support
      • XML    xml syntax support
      • HTML   html specific syntax support
      • BQ     support for multi-line blockquote
      • PRE    renders PRE element

      Description

      These block elementshas the special syntax which has OPENING and CLOSING tags. The text placed between the CODE, XML, HTML marks is rendered with special syntax support.

      Usage of these element is extremely simple: just copy your code, and place it between [CODE CODE] elements. That's all

      Syntax

      [CODE
      public interface IModel {}
      CODE]
      - > <code>
      <span class="base" >public</span>
      <span class="base" >interface</span>
      IModel<br />
      <span class="smbl" >{</span><br />
      <span class="cmmnt"
      >// TODO</span><br />
      <span class="smbl" >}</span><br />
      </code>
      

      Example

      wiki

      |H5 CODE
      [CODE
      public interface IModel
      {
      // TODO
      }
      CODE]
      |H5 XML
      [XML
      <project>
        <settings>
          <mode value="Leveled"
             type="Fluent" />
          <readability name="Inner" >
            <!-- TODO -->
            ... extended in the next phase ...
          </readability>
        </settings>
      </project>
      XML]
      |H5 HTML
      [HTML
      <html xmlns="http://www.w3.org/1999/xhtml" >
        <head>
          <title></title>
        </head>
        <body>
          <a href="Default.aspx" >
            Default</a>
        </body>
      </html>
      HTML] 
      

      result

      CODE
       
      public interface IModel<br /> 
      {<br /> 
        // TODO
      <br />}<br /> 
      
      XML

       
      <project><br /> 
        <settings><br /> 
           <mode value="Leveled"<br /> 
                 type="Fluent" /><br /> 
           <readability name="Inner" ><br /> 
             <!-- TODO --><br /> 
             extended in the next phase<br /> 
           </readability><br /> 
        </settings><br /> 
      </project><br /> 
      
      HTML
       
      <html xmlns="http://www.w3.org/1999/xhtml" ><br /> 
      <head><br /> 
          <title></title><br /> 
      </head><br /> 
      <body><br /> 
          <a href="Default.aspx" ><br /> 
              Default</a><br /> 
      </body><br /> 
      </html><br /> 
      

      Escaped character syntax

      top

      There are some special characters used for this wiki markup. They are leading symbols and therefore cannot be used directly in the text

      If they are used directly (without escaping) they could result in unwanted streams; or even break the whole page.

      Keywords

      • [
      • ]
      • |
      • :
      • <
      • >
      • *
      • #
      • _
      • "  

      Description

      To correctly render these signs you have to use the escape syntax

      The syntax is simple: encapsulate the sign in these braces '[' ']'

      Syntax

      [[]
      ==> [
      []]
      ==> ]
      [:]
      ==> :
      [|]
      ==> |
      [<]
      ==> <
      [>]
      ==> >
      [*]
      ==> *
      [#]
      ==> #
      [_]
      ==> _
      ["]
      ==> "
      

      Example

      wiki

       
        <br /> 
      Index[[]2[]] <br /> 
       <br /> 
      to not display less and greater as the element<br /> 
      [XML <br /> 
      <wrong><br /> 
      [<]right[>] <br /> 
      XML]<br /> 
      

      result

      Index[2]
       
       <wrong><br /> 
       <right><br /> 
      

      Tables

      Tables has been introduced following the usual syntax of the VLINE  |

      Use two VLINEs || for a table head, and single VLINE | for a table row

       

      Keywords

      • ||

      Syntax 

      <code class="code">|| heading1 || heading2 // two VLINEs will render THEAD
      || heading3 || heading4
      |  cell1    |  cell2    // single VLINE for a row
      |  cell3    |  cell4 

      Example 

      wiki

      || id  || date     || name
      |  123 |  1.2.2011 | myName
      

      result 

      id date  name 
      123 1.2.2011 myName 

Download 

All the code you can found on http://catarsa.com. Source code Wiki .NET Parser (http://catarsa.com/Articles/Code/WikiNetParser) web-application framework Catharsis Ultimate engine and tools http://antlr.org

 

Final Notes

Would you like to put your hand on it? To test it? Download this WikiNetParser_source.zip, which provides a tool, which will immediately show you results of your typing...


Enjoy the Wiki .NET Parser

Radim Köhler

推荐.NET配套的通用数据层ORM框架:CYQ.Data 通用数据层框架