Wiki .NET Parser, C#
1.6.2011 - Wiki .NET Parser II
After more then a year experience with the Wiki .NET Parser (C#), there is a brand new release. It is running on the latest ultimate engine http://antlr.org. and bringing new features.
+ There is finally the Table syntax support
|| heading1 || heading2
| cell1 | cell2
+ Bold and italic codeplex style
new *bold text* and existing [B:bold text]
new _italic text_ and existing [I:italic text]
+ lists in block and inline style (both with nesting)
* inline 1 (new)
* inline 2
[LI (existing)
item 1
item 2
LI]
+ extended URL functionality:
[URL: click here | http://catarsa.com]
(and of course... supporting any previously written Wiki .NET Syntax texts)
Introduction
Wiki .NET Parser (C#) is an open source project, which provides the powerful tool for CMS (Content management systems). Parser has simple rules, documented here and in action used on the Catharsis framework portal http://catarsa.com
Wiki .NET Parser is distributed as an Open source code (or build library) here: http://catarsa.com/Articles/Code/WikiNetParser
Example web-application
28.4.2010 - There is new download available: Wiki .NET Parser example: sample web-application providing you testing environment for Wiki .NET Syntax: http://catarsa.com/Articles/Code/Wiki-Example
There is only one 'Default.aspx' page with three columns
-
The left is showing the result - HTML rendered in web-browser
-
The central column contains the EDITOR (type your test code in there)
-
The right column is showing the HTML code - to give you information what is the real result of the Wiki .NET Parser
Wiki .NET Parser engine: ANTLR
The Wiki .NET Parser has an ultimate engine http://antlr.org. This is a tool which name means: "ANother Tool for Language Recognition", is written in JAVA but available also for us in .NET, C#. ANTLR is exceptional tool for any type of text parsing. If you will think in future about any type of text reading parsing and rewriting, do not forget ANTLR
Wiki .NET Parser evolution
The Wiki .NET Parser was result of the day-to-day web-application Catharsis framework need.
Whenever we've created Application, users were demanding "Documentation". Some parts was the basic UI description, which was goal of developers. But then came the request for publishing the "Documentation of the Business process".
We suggested to publish Word or PDF documents. It was good, but it was a bit complicated to revise, to publish new versions. And hard to search.
Next step was conversion to HTML. We started 1) to convert the Document files into HTML elements, 2) to be mad and upset, whenever new revision came. The question was: how to allow editing for our users, - with brutally simple syntax - with absolutely safe result (no JS injection) - to do it for free.... (if possible)
Firstly, we asked 'Google' for help. We did found some open source parsers, but they where funny or extremely robust. We needed only one simple method to be called in Wiki .NET Syntax, returning the HTML.
var result = ProjectBase.Tools.Wiki.WikiProvider.ConvertToHtml(sourceText);
Why ANTLR? NHibernate!
Then, like a mana from heaven, the NHibernate 2.1 was released, with new 'core' external library needed: ANTLR. This tool replaced the previous HQL parsers. Why? because it was incredibly easy to use. You are just defining your lexer, grammar and result parser.
We've investigated how it works. And soon find out, that it is not only 'easy' but also supporting both words: Java, .NET. While our company has both technologies in use, this was a big plus. We were able to create ONE Wiki .NET Syntax, one lexer, grammar and two different parsers (JAVA, C#)
Happy user
In our web applications we are providing an entity: Article. User can Add, Update, Delete and also publish it. Syntax is trivial. Resulting code is the purist HTML standard. HTML cannot be any how JS injected. Nice. And Easy.
Wiki .NET Syntax overview
In next paragraphs you can find out brief description of the Wiki .NET Syntax. If you'll be interesting how it works: 1) download the source code 2) download the binaries with ONE Method ConvertToHTML() 3) observe the Wiki .NET Parser in action: 1) http://catarsa.com/ Open Source portal is powered by this tool 4) Try to use the Catharsis framework which out of the box supports and uses Wiki .NET Parser
Examples of Wiki .NET Syntax
LastUpdateDate : 3/27/2010 8:46 AM
Wiki .NET Syntax
Description of the Wiki .NET Syntax. How to write headers, text, hyper-links, append images and zipped files and many more. Every element is described and provided with sample code. The major intention of the Wiki .NET Syntax was to provide as easy way to describe text as possible. Once this target is fit, then Wiki .NET Parser can be used in a day to day practice.
latest downloads: http://catarsa.com/Articles/Download/WikiParser
WIKI.NET
-
Simple Text
-
Headers and Paragraphs
-
Navigation (fixed for any url)
-
Image
-
Lists (extended)
- Decorations and Colors (extended)
-
Block
-
Escaped character syntax
- Tables (new)
Notes
This document was assembled with the Wiki .NET Syntax and converted with the Wiki .NET Parser (and therefore could be taken as a proof of concept).
Wiki .NET Parser is based on the ANTLR3 for C#.NET. Very few rows of the lexer, grammar and AST Parser; future extensibility; easy maintainability; exceptional performance
Syntax of the Wiki .NET is (from the highlighting point of view) mostly C# and XML oriented.
In next paragraphs there is complete description of the syntax with examples. Syntax elements could be split into these categories:
-
Blocks (special formatting like XML)
-
Inline Blocks (text)
-
Decoration (styles, colors)
-
Hyperlinks and images
As in another languages there is a small set of the special leading characters used in Wiki .NET Syntax:
-
[ (opening block)
-
] (closing block)
-
| (inline block; attribute separator)
-
: (attribute value starts)
These signs combined with keywords provides complete Wiki .NET playground for safe text publishing.
Simple Text
Keywords
-
(there are NO keywords)
Description
You can type text as it goes. It will be rendered as usual text without any formatting.
The newlines (ENTER) will be rendered as a html specific new lines (instead of "\r\n" - "<br />").
More Spaces (more then 1) will be all rendered (not skipped). (Standard HTML browser will convert multi-space into single one)
Empty lines are skipped. To append empty line to the output, place the White space on the line (white space and press ENTER)
Syntax
There is no special syntax, just type...
Example
wiki
1) Text with the broken lines 2) Text with spaces
result
1) Text with the
broken lines
2) Text with spaces
Headers and Paragraph
Headers and Paragraphs has the syntax based on 3 parts
-
| Leading vertical line at the beginning
-
H1 The name of the used header or paragraph
-
' ' One WHITE SPACE!
Keywords
-
|H1 .. |H6 (headers)
-
|P (paragraph)
-
|P1 .. |P3 (paragraphs with different margins)
-
|PC (center aligned paragraph)
-
|PR (right aligned paragraph)
-
|CITE (cite text (italic))
-
|BQ (blockquote text with larger padding)
Description
Headers and Paragraphs allows formatting based on 'inline block' style.
It means, that the text between the opening symbol (e.g. '|P ' and the end of line (ENTER) is rendered as a paragraph <p>...</p>)
Syntax
|H1 Header 1 - > <h6>Header 1 </h6> .. |H6 Header 6 - > <h6>Header 6</h6> |P Usuall text... - > <p>Usuall text...</p>
Example
wiki
|H3 Lorem ipsum<br /> |P1 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed dor eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. <br /> <br /> |P2 Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum<br /> <br /> |CITE Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. <br /> <br /> |BQ Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. <br /> <br /> |PR Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem.<br />
result
Lorem ipsum
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed dor eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt.Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem.
Navigation
Keywords
- URL
- URL | TEXT
-
NAME
Description
There are two keywords: URL and NAME for creating the hyperlinks.
The first (URL) will result in the hyperlink with the href attribute (where to navigate).
The second will create the anchor (NAME) to which could be page scrolled (e.g. the 'top' hyperlinks navigating to the top of the page)
Syntax
[URL:index.htmhttp://www.codeproject.com/KB/aspnet/#top] // only url ==> <a ref="index.htmhttp://www.codeproject.com/KB/aspnet/#top">index.htmhttp://www.codeproject.com/KB/aspnet/#top</a> [URL: click here | index.htmhttp://www.codeproject.com/KB/aspnet/#top] // short text first ==> <a ref="index.htmhttp://www.codeproject.com/KB/aspnet/#top" >click here</a> // then vline '|' and url // OR [URL: index.htmhttp://www.codeproject.com/KB/aspnet/#top |TEXT: click here] // first url ==> <a ref="index.htmhttp://www.codeproject.com/KB/aspnet/#top" >click here</a> // then attr '|TEXT:' and text [URL: "index.htmhttp://www.codeproject.com/KB/aspnet/#top" |TEXT: click here] // quoted url (if too long and complicated url) ==> <a ref="index.htmhttp://www.codeproject.com/KB/aspnet/#top" >click here</a> // |TEXT: next with an attr 'TEXT:'
Example
Wiki
[URL:http://www.codeproject.com/KB/aspnet/#top] - only url [URL:TOP | http://www.codeproject.com/KB/aspnet/#top] - text first [URL: http://www.codeproject.com/KB/aspnet/#top |TEXT: TOP] - TEXT: attribute [URL:"http://www.codeproject.com/KB/aspnet/#top" |TEXT: TOP] - quoted url
Result
http://www.codeproject.com/KB/aspnet/#top - only url
TOP - short text first
TOP - TEXT: attribute
TOP - quoted url
Image
Keywords
-
IMG
-
HEIGHT
-
WIDTH
-
FLOAT
Description
IMG element allows you to insert the image into to the page.
The URL for image should be relative, for instance 'i/cs.png'
There is an issue with the virtual directories for applications. E.g. instead of http://mydomain/ there is http://mydomain/myvirtualdir/. In these situations there is a background call which will change the IMG src attribute from 'i/cs.png' to 'myvirtualdir/i/cs.png' and the image will be correctly displayed.
Syntax
[IMG:i/cs.png]<br /> - > <img src="i/cs.png" /><br /> <br /> [IMG:i/cs.png |HEIGHT:100px|WIDTH:80px]<br /> - > <img src="i/cs.png"<br /> style="height:100px;width:80px;" /><br /> <br /> [IMG:i/cs.png |FLOAT:right]<br /> - > <img src="i/cs.png" <br /> style="float:right;" /><br /> <br />
Example
wiki
<br /> |P1 Lorem ipsum dolor [IMG:i/cs.png] sit amet, consectetur adipisicing elit, sed dor eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. <br /> <br /> |P2 Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. [IMG:i/cs.png|HEIGHT:100px| WIDTH:80px] Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum<br /> <br /> |P3 Sed [IMG:i/cs.png|FLOAT:right] ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. <br />
result
Lorem ipsum dolor
sit amet, consectetur adipisicing elit, sed dor eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum
Sed
ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.
Lists
There are two types of lists: "With" or "Without" numbers.
Keywords
Block syntax, uses starting and closing tags LI or/and NUM to wrap selected rows as list items.
- '[LI' 'LI]'
- '[NUM' 'NUM]'
Inline syntax will render every row as a list item if starting with one of below statements (the space sign after asterix or hash is important)
- '* '
- '** '
- '*** '
- '# '
- '## '
- '### '
Description
The lists behave as the block elements. It means, that they have opening and closing tags. Every row between these marks is evaluated as the list item.
Block Syntax
<code class="code"><span style="color: Purple;">[LI meals colors [NUM blue red NUM] LI]</span>
Inline Syntax
* meals * colors ## blue ## read
Result for both
<ul> <li>meals</li> <li>colors <ol> <li>blue</li> <li>red</li> </ol></li> <ul>
<br />
Example
wiki
// blocks [LI Meals: Supported colors are: [NUM blue red green NUM] LI] // inlines * Meals: * Supported colors are: ## blue ## red ## green
result
// blocks- Meals:
- Supported colors are:
- blue
- red
- green
- Meals:
- Supported colors are:
- blue
- red
- green
Decorations and Colors
Keywords
Special bold and italic:
- *bold text* bold text
- _italic text_ italic text
Decorations:
-
B bold
-
I italic
-
U underline
-
S
striked
-
V variable
-
SAMP sample
-
BIG big
-
SMALL small
-
SUP sup *sup
-
SUB sub *sub
Colors:
-
BLUE blue
-
RED red
-
GREEN green
-
OLIVE olive
-
LIME lime
-
PURPLE purple
-
YELLOW yellow
-
MAROON maroon
-
SILVER silver
-
ORANGE orange
-
NAVY navy
-
PINK pink
Description
Decoration styles and colors are 'inline' elements. This means that these elements cannot contain new lines!
You can use them almost anywhere and if needed they can be even nested (e.g. bold and blue)
Syntax
// bold and italic inside some row .. not bold *now bold until here* not bold .. ==> .. not bold <b>now bold until here</b> not bold .. .. not italic _now italic until here_ not italic .. ==> .. not italic <i>now italic until here</i> not italic .. // bold and italic [B:bold] ==> <b>bold</b> [I:italic] ==> <i>italic</i> // COLOR example: blue and purple [BLUE:blue] - > <span style="color:blue;" >blue</span> [PURPLE:purple] - > <span style="color:purple;" >purple</span>
Example
wiki
|CITE This text could be partially [B:bold end even [ORANGE:orange]]. And that all inside the big [BIG:CITE] element.
result
This text could be partially bold end even orange. And that all inside the CITE element.
Block
There are 3 special block elements for specific text rendering: CODE, XML, HTML. There is also the BlockQuote element: BQ.
Keywords
-
CODE C# syntax support
-
XML xml syntax support
-
HTML html specific syntax support
-
BQ support for multi-line blockquote
-
PRE renders PRE element
Description
These block elementshas the special syntax which has OPENING and CLOSING tags. The text placed between the CODE, XML, HTML marks is rendered with special syntax support.
Usage of these element is extremely simple: just copy your code, and place it between [CODE CODE] elements. That's all
Syntax
[CODE public interface IModel {} CODE] - > <code> <span class="base" >public</span> <span class="base" >interface</span> IModel<br /> <span class="smbl" >{</span><br /> <span class="cmmnt" >// TODO</span><br /> <span class="smbl" >}</span><br /> </code>
Example
wiki
|H5 CODE [CODE public interface IModel { // TODO } CODE] |H5 XML [XML <project> <settings> <mode value="Leveled" type="Fluent" /> <readability name="Inner" > <!-- TODO --> ... extended in the next phase ... </readability> </settings> </project> XML] |H5 HTML [HTML <html xmlns="http://www.w3.org/1999/xhtml" > <head> <title></title> </head> <body> <a href="Default.aspx" > Default</a> </body> </html> HTML]
result
CODE
public interface IModel<br /> {<br /> // TODO <br />}<br />XML
<project><br /> <settings><br /> <mode value="Leveled"<br /> type="Fluent" /><br /> <readability name="Inner" ><br /> <!-- TODO --><br /> extended in the next phase<br /> </readability><br /> </settings><br /> </project><br />HTML
<html xmlns="http://www.w3.org/1999/xhtml" ><br /> <head><br /> <title></title><br /> </head><br /> <body><br /> <a href="Default.aspx" ><br /> Default</a><br /> </body><br /> </html><br />
Escaped character syntax
There are some special characters used for this wiki markup. They are leading symbols and therefore cannot be used directly in the text
If they are used directly (without escaping) they could result in unwanted streams; or even break the whole page.
Keywords
- [
- ]
- |
- :
- <
- >
- *
- #
- _
- "
Description
To correctly render these signs you have to use the escape syntax
The syntax is simple: encapsulate the sign in these braces '[' ']'
Syntax
[[] ==> [ []] ==> ] [:] ==> : [|] ==> | [<] ==> < [>] ==> > [*] ==> * [#] ==> # [_] ==> _ ["] ==> "
Example
wiki
<br /> Index[[]2[]] <br /> <br /> to not display less and greater as the element<br /> [XML <br /> <wrong><br /> [<]right[>] <br /> XML]<br />
result
Index[2]
<wrong><br /> <right><br />
Tables
Tables has been introduced following the usual syntax of the VLINE |
Use two VLINEs || for a table head, and single VLINE | for a table row
Keywords
- ||
- |
Syntax
<code class="code">|| heading1 || heading2 // two VLINEs will render THEAD || heading3 || heading4 | cell1 | cell2 // single VLINE for a row | cell3 | cell4
Example
wiki
|| id || date || name | 123 | 1.2.2011 | myName
result
id | date | name |
123 | 1.2.2011 | myName |
Download
All the code you can found on http://catarsa.com. Source code Wiki .NET Parser (http://catarsa.com/Articles/Code/WikiNetParser) web-application framework Catharsis Ultimate engine and tools http://antlr.org
Final Notes
Would you like to put your hand on it? To test it? Download this WikiNetParser_source.zip, which provides a tool, which will immediately show you results of your typing...
Enjoy the Wiki .NET Parser