commit bdcd08b467e5774e92f34252d6caee8db370d9c5 Author: Michel Fortin Date: Tue Aug 14 16:11:03 2007 -0400 Base for Object-Oriented PHP Markdown diff --git a/License.text b/License.text new file mode 100644 index 0000000..e18fc09 --- /dev/null +++ b/License.text @@ -0,0 +1,34 @@ +Copyright (c) 2004-2005, John Gruber + +All rights reserved. + +Copyright (c) 2004-2006, Michel Fortin + +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + +* Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + +* Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + +* Neither the name "Markdown" nor the names of its contributors may + be used to endorse or promote products derived from this software + without specific prior written permission. + +This software is provided by the copyright holders and contributors "as +is" and any express or implied warranties, including, but not limited +to, the implied warranties of merchantability and fitness for a +particular purpose are disclaimed. In no event shall the copyright owner +or contributors be liable for any direct, indirect, incidental, special, +exemplary, or consequential damages (including, but not limited to, +procurement of substitute goods or services; loss of use, data, or +profits; or business interruption) however caused and on any theory of +liability, whether in contract, strict liability, or tort (including +negligence or otherwise) arising in any way out of the use of this +software, even if advised of the possibility of such damage. diff --git a/PHP Markdown Readme.text b/PHP Markdown Readme.text new file mode 100644 index 0000000..101dff1 --- /dev/null +++ b/PHP Markdown Readme.text @@ -0,0 +1,478 @@ +PHP Markdown +============ + +Version 1.0.1oo - Fri 19 May 2006 + +by Michel Fortin + + +based on work by John Gruber + + + +Introduction +------------ + +Markdown is a text-to-HTML conversion tool for web writers. Markdown +allows you to write using an easy-to-read, easy-to-write plain text +format, then convert it to structurally valid XHTML (or HTML). + +"Markdown" is two things: a plain text markup syntax, and a software +tool, written in Perl, that converts the plain text markup to HTML. +PHP Markdown is a port to PHP of the original Markdown program by +John Gruber. + +PHP Markdown can work as a plug-in for WordPress and bBlog, as a +modifier for the Smarty templating engine, or as a remplacement for +textile formatting in any software that support textile. + +Full documentation of Markdown's syntax is available on John's +Markdown page: + + +Installation and Requirement +---------------------------- + +PHP Markdown requires PHP version 4.0.5 or later. + + +### WordPress ### + +PHP Markdown works with [WordPress][wp], version 1.2 or later. +PHP Markdown is already bundled with WordPress. Still, you can find +here the latest version that may be newer than the latest WordPress +version. + + [wp]: http://wordpress.org/ + +1. To use PHP Markdown with WordPress, place the "makrdown.php" file + in the "plugins" folder. This folder is located inside + "wp-content" at the root of your site: + + (site home)/wp-content/plugins/ + +2. Activate the plugin with the administrative interface of + WordPress. In the "Plugins" section you will now find Markdown. + To activate the plugin, click on the "Activate" button on the + same line than Markdown. Your entries will now be formatted by + PHP Markdown. + +You can configure PHP Markdown to not apply to the comments on your +WordPress weblog. See the "Configuration" section below. + +Note: It is not possible at this time to apply a different set of +filters to different entries. All your entries will be formated by +PHP Markdown. This is currently a limitation of WordPress. If your old +entries are written in HTML (as opposed to another formatting syntax), +your site should not suffer much from installing PHP Markdown. + + +### bBlog ### + +PHP Markdown also works with the latest version of [bBlog][bb]. + + [bb]: http://www.bblog.com/ + +1. To use PHP Markdown with bBlog, rename "markdown.php" to + "modifier.markdown.php" and place the file in the "bBlog_plugins" + folder. This folder is located inside the "bblog" directory of + your site, like this: + + (site home)/bblog/bBlog_plugins/modifier.markdown.php + +2. Select "Markdown" as the "Entry Modifier" when you post a new + entry. This setting will only apply to the entry you are editing. + + +### Replacing Textile ### + +Many web programs written in PHP use [Textile][tx] to format your text. +To use PHP Markdown with these programs without having to change the +code, you can use PHP Markdown in "Textile Compatibility Mode." + + [tx]: http://www.textism.com/tools/textile/ + +1. Rename the "markdown.php" file to "classTextile.php". + +2. Locate the "classTextile.php" file hidden somewhere inside the + installation of your program (see table below). Replace it with + the PHP Markdown file you just renamed. + +As an helper, here you can learn where is the "classTextile.php" file +in some web programs: + + Program Location + ---------------------------------------------------------------- + TextPattern (site home)/textpattern/lib/classTextile.php + Pivot (site home)/pivot/includes/textile/classtextile.php + +Contrary to Textile, Markdown does not convert quotes to curly ones +and does not convert multiple hyphens (`--` and `---`) into en- and +em-dashes. If you use PHP Markdown in Textile Compatibility Mode, you +can solve this problem by installing the "smartypants.php" file from +[PHP SmartyPants][psp] beside the "classTextile.php" file. The Textile +Compatibility Mode function will use SmartyPants automatically without +further modification. + + [psp]: http://www.michelf.com/projects/php-smartypants/ + + +### In Your Own Programs ### + +You can use PHP Markdown easily in your current PHP program. Simply +include the file and then call the Markdown function on the text you +want to convert: + + include_once "markdown.php"; + $my_html = Markdown($my_text); + +If you wish to use PHP Markdown with another text filter function +built to parse HTML, you should filter the text *after* the Markdown +function call. This is an example with [PHP SmartyPants][psp]: + + $my_html = SmartyPants(Markdown($my_text)); + + +### With Smarty ### + +If your program use the [Smarty][sm] template engine, PHP Markdown +can now be used as a modifier for your templates. Rename "markdown.php" +to "modifier.markdown.php" and put it in your smarty plugins folder. + + [sm]: http://smarty.php.net/ + +If you are using MovableType 3.1 or later, the Smarty plugin folder is +located at `(MT CGI root)/php/extlib/smarty/plugins`. This will allow +Markdown to work on dynamic pages. + + +Configuration +------------- + +By default, PHP Markdown produces XHTML output for tags with empty +elements. E.g.: + +
+ +Markdown can be configured to produce HTML-style tags; e.g.: + +
+ +To do this, you must edit the "$md_empty_element_suffix" variable +below the "Global default settings" header at the start of the +"markdown.php" file. + + +### WordPress-Specific Settings ### + +By default, the Markdown plugin applies to both posts and comments on +your WordPress weblog. To deactivate one or the other, edit the +`$md_wp_posts` or `$md_wp_comments` variable under the "WordPress +settings" header at the start of the "markdown.php" file. + + +Bugs +---- + +To file bug reports please send email to: + + +Please include with your report: (1) the example input; (2) the output you +expected; (3) the output PHP Markdown actually produced. + + +Version History +--------------- + +1.0.1oo (19 May 2006) + +* Converted PHP Markdown to a object-oriented design. + + +1.0.1c (9 Dec 2005) + +* Fixed a problem occurring with PHP 5.1.1 due to a small + change to strings variable replacement behaviour in + this version. + + +1.0.1b (6 Jun 2005) + +* Fixed a bug where an inline image followed by a reference link would + give a completely wrong result. + +* Fix for escaped backticks still triggering code spans: + + There are two raw backticks here: \` and here: \`, not a code span + +* Fix for an ordered list following an unordered list, and the + reverse. There is now a loop in _DoList that does the two + separately. + +* Fix for nested sub-lists in list-paragraph mode. Previously we got + a spurious extra level of `

` tags for something like this: + + * this + + * sub + + that + +* Fixed some incorrect behaviour with emphasis. This will now work + as it should: + + *test **thing*** + **test *thing*** + ***thing* test** + ***thing** test* + + Name: __________ + Address: _______ + +* Correct a small bug in `_TokenizeHTML` where a Doctype declaration + was not seen as HTML. + +* Major rewrite of the WordPress integration code that should + correct many problems by preventing default WordPress filters from + tampering with Markdown-formatted text. More details here: + + +* Added a configuration variable for WordPress that can disable the + Markdown filter on comments. + + +1.0.1a (15 Apr 2005) + +* Fixed an issue where PHP warnings were trigged when converting + text with list items running on PHP 4.0.6. This was comming from + the `rtrim` function which did not support the second argument + prior version 4.1. Replaced by a regular expression. + +* Markdown now filter correctly post excerpts and comment + excerpts in WordPress. + +* Automatic links and some code sample were "corrected" by + the balenceTag filter in WordPress meant to ensure HTML + is well formed. This new version of PHP Markdown postpone this + filter so that it runs after Markdown. + +* Blockquote syntax and some code sample were stripped by + a new WordPress 1.5 filter meant to remove unwanted HTML + in comments. This new version of PHP Markdown postpone this + filter so that it runs after Markdown. + + +1.0.2b1 (5 Mar 2005) + +* Fix for backticks within HTML tag: + + like this + +* Fix for escaped backticks still triggering code spans: + + There are two raw backticks here: \` and here: \`, not a code span + +* Improved integration with WordPress. With WordPress 1.5, the + balenceTags filter now runs after Markdown, so it won't + interfere anymore. You can still disable balanceTags from the admin + interface (in Options > Writing) if you want to. + +* PHP Markdown now correctly filter text for excerpts in WordPress. + There is still one glitch: autolinks and tags in code samples are + stripped by WordPress when trimming it. A fix for this is possible + with WordPress 1.5, but would require duplicating WordPress entry + trimming code within Markdown, which I can't do because of a license + issue. (Nor do I think it is a good solution to fix this.) + +* Improved Textile compatibility mode. Markdown will now honor the + no-image and the lite parameters. In lite mode, no header, blockquote, + list, or code block will be made, and inline HTML is limited + to the following tags: + + + + This is acheived by backslash-escaping block markers before sending + text through the Markdown filter. + + The improved Textile comatibility means that the Markdown syntax will now + be processed for comments in TextPattern (only for span elements due to + TextPattern using the lite mode for comments). Sadly, due to TextPattern + tag stripping, sample code in code span and auto-links will be stripped + before the Markdown filter can see them. So I guess I should say it + half-work for comments TextPattern. + + +1.0.1 (16 Dec 2004): + +* Changed the syntax rules for code blocks and spans. Previously, + backslash escapes for special Markdown characters were processed + everywhere other than within inline HTML tags. Now, the contents of + code blocks and spans are no longer processed for backslash escapes. + This means that code blocks and spans are now treated literally, + with no special rules to worry about regarding backslashes. + + **IMPORTANT**: This breaks the syntax from all previous versions of + Markdown. Code blocks and spans involving backslash characters will + now generate different output than before. + + Implementation-wise, this change was made by moving the call to + `_EscapeSpecialChars()` from the top-level `Markdown()` function to + within `_RunSpanGamut()`. + +* Significants performance improvement in `_DoHeader`, `_Detab` + and `_TokenizeHTML`. + +* Added `>`, `+`, and `-` to the list of backslash-escapable + characters. These should have been done when these characters + were added as unordered list item markers. + +* Inline links using `<` and `>` URL delimiters weren't working: + + like [this]() + + Fixed by moving `_DoAutoLinks()` after `_DoAnchors()` in + `_RunSpanGamut()`. + +* Fixed bug where auto-links were being processed within code spans: + + like this: `` + + Fixed by moving `_DoAutoLinks()` from `_RunBlockGamut()` to + `_RunSpanGamut()`. + +* Sort-of fixed a bug where lines in the middle of hard-wrapped + paragraphs, which lines look like the start of a list item, + would accidentally trigger the creation of a list. E.g. a + paragraph that looked like this: + + I recommend upgrading to version + 8. Oops, now this line is treated + as a sub-list. + + This is fixed for top-level lists, but it can still happen for + sub-lists. E.g., the following list item will not be parsed + properly: + + * I recommend upgrading to version + 8. Oops, now this line is treated + as a sub-list. + + Given Markdown's list-creation rules, I'm not sure this can + be fixed. + +* Fix for horizontal rules preceded by 2 or 3 spaces or followed by + trailing spaces and tabs. + +* Standalone HTML comments are now handled; previously, they'd get + wrapped in a spurious `

` tag. + +* `_HashHTMLBlocks()` now tolerates trailing spaces and tabs following + HTML comments and `


` tags. + +* Changed special case pattern for hashing `
` tags in + `_HashHTMLBlocks()` so that they must occur within three spaces + of left margin. (With 4 spaces or a tab, they should be + code blocks, but weren't before this fix.) + +* Auto-linked email address can now optionally contain + a 'mailto:' protocol. I.e. these are equivalent: + + + + +* Fixed annoying bug where nested lists would wind up with + spurious (and invalid) `

` tags. + +* Changed `_StripLinkDefinitions()` so that link definitions must + occur within three spaces of the left margin. Thus if you indent + a link definition by four spaces or a tab, it will now be a code + block. + +* You can now write empty links: + + [like this]() + + and they'll be turned into anchor tags with empty href attributes. + This should have worked before, but didn't. + +* `***this***` and `___this___` are now turned into + + this + + Instead of + + this + + which isn't valid. + +* Fixed problem for links defined with urls that include parens, e.g.: + + [1]: http://sources.wikipedia.org/wiki/Middle_East_Policy_(Chomsky) + + "Chomsky" was being erroneously treated as the URL's title. + +* Double quotes in the title of an inline link used to give strange + results (incorrectly made entities). Fixed. + +* Tabs are now correctly changed into spaces. Previously, only + the first tab was converted. In code blocks, the second one was too, + but was not always correctly aligned. + +* Fixed a bug where a tab character inserted after a quote on the same + line could add a slash before the quotes. + + This is "before" [tab] and "after" a tab. + + Previously gave this result: + +

This is \"before\" [tab] and "after" a tab.

+ +* Removed a call to `htmlentities`. This fixes a bug where multibyte + characters present in the title of a link reference could lead to + invalid utf-8 characters. + +* Changed a regular expression in `_TokenizeHTML` that could lead to + a segmentation fault with PHP 4.3.8 on Linux. + +* Fixed some notices that could show up if PHP error reporting + E_NOTICE flag was set. + + +Copyright and License +--------------------- + +Copyright (c) 2004-2006 Michel Fortin + +All rights reserved. + +Based on Markdown +Copyright (c) 2003-2005 John Gruber + +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + +* Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + +* Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + +* Neither the name "Markdown" nor the names of its contributors may + be used to endorse or promote products derived from this software + without specific prior written permission. + +This software is provided by the copyright holders and contributors "as +is" and any express or implied warranties, including, but not limited +to, the implied warranties of merchantability and fitness for a +particular purpose are disclaimed. In no event shall the copyright owner +or contributors be liable for any direct, indirect, incidental, special, +exemplary, or consequential damages (including, but not limited to, +procurement of substitute goods or services; loss of use, data, or +profits; or business interruption) however caused and on any theory of +liability, whether in contract, strict liability, or tort (including +negligence or otherwise) arising in any way out of the use of this +software, even if advised of the possibility of such damage. diff --git a/markdown.php b/markdown.php new file mode 100644 index 0000000..426e7fc --- /dev/null +++ b/markdown.php @@ -0,0 +1,1412 @@ + +# +# Original Markdown +# Copyright (c) 2004-2005 John Gruber +# +# + + +define( 'MARKDOWN_VERSION', "1.0.1oo" ); # Fri 19 May 2006 + + +# +# Global default settings: +# + +# Change to ">" for HTML output +define( 'MARKDOWN_EMPTY_ELEMENT_SUFFIX', " />"); + +# Define the width of a tab for code blocks. +define( 'MARKDOWN_TAB_WIDTH', 4 ); + + +# +# WordPress settings: +# + +# Change to false to remove Markdown from posts and/or comments. +define( 'MARKDOWN_WP_POSTS', true ); +define( 'MARKDOWN_WP_COMMENTS', true ); + + + +### Standard Function Interface ### + +define( 'MARKDOWN_PARSER_CLASS', 'Markdown_Parser' ); + +function Markdown($text) { +# +# Initialize the parser and return the result of its transform method. +# + # Setup static parser variable. + static $parser; + if (!isset($parser)) { + $parser_class = MARKDOWN_PARSER_CLASS; + $parser = new $parser_class; + } + + # Transform text using parser. + return $parser->transform($text); +} + + +### WordPress Plugin Interface ### + +/* +Plugin Name: Markdown +Plugin URI: http://www.michelf.com/projects/php-markdown/ +Description:
Markdown syntax allows you to write using an easy-to-read, easy-to-write plain text format. Based on the original Perl version by John Gruber. More... +Version: 1.0.1oo +Author: Michel Fortin +Author URI: http://www.michelf.com/ +*/ + +if (isset($wp_version)) { + # More details about how it works here: + # + + # Post content and excerpts + # - Remove WordPress paragraph generator. + # - Run Markdown on excerpt, then remove all tags. + # - Add paragraph tag around the excerpt, but remove it for the excerpt rss. + if (MARKDOWN_WP_POSTS) { + remove_filter('the_content', 'wpautop'); + remove_filter('the_excerpt', 'wpautop'); + add_filter('the_content', 'Markdown', 6); + add_filter('get_the_excerpt', 'Markdown', 6); + add_filter('get_the_excerpt', 'trim', 7); + add_filter('the_excerpt', 'mdwp_add_p'); + add_filter('the_excerpt_rss', 'mdwp_strip_p'); + + remove_filter('content_save_pre', 'balanceTags', 50); + remove_filter('excerpt_save_pre', 'balanceTags', 50); + add_filter('the_content', 'balanceTags', 50); + add_filter('get_the_excerpt', 'balanceTags', 9); + } + + # Comments + # - Remove WordPress paragraph generator. + # - Remove WordPress auto-link generator. + # - Scramble important tags before passing them to the kses filter. + # - Run Markdown on excerpt then remove paragraph tags. + if (MARKDOWN_WP_COMMENTS) { + remove_filter('comment_text', 'wpautop'); + remove_filter('comment_text', 'make_clickable'); + add_filter('pre_comment_content', 'Markdown', 6); + add_filter('pre_comment_content', 'mdwp_hide_tags', 8); + add_filter('pre_comment_content', 'mdwp_show_tags', 12); + add_filter('get_comment_text', 'Markdown', 6); + add_filter('get_comment_excerpt', 'Markdown', 6); + add_filter('get_comment_excerpt', 'mdwp_strip_p', 7); + + global $markdown_hidden_tags; + $markdown_hidden_tags = array( + '

' => md5('

'), '

' => md5('

'), + '
'	=> md5('
'),	'
'=> md5('
'), + '
    ' => md5('
      '), '
    ' => md5('
'), + '
    ' => md5('
      '), '
    ' => md5('
'), + '
  • ' => md5('
  • '), '
  • ' => md5(''), + ); + } + + function mdwp_add_p($text) { + if (strlen($text) == 0) return; + if (strcasecmp(substr($text, -3), '

    ') == 0) return $text; + return '

    '.$text.'

    '; + } + + function mdwp_strip_p($t) { return preg_replace('{}', '', $t); } + + function mdwp_hide_tags($text) { + global $markdown_hidden_tags; + return str_replace(array_keys($markdown_hidden_tags), + array_values($markdown_hidden_tags), $text); + } + function mdwp_show_tags($text) { + global $markdown_hidden_tags; + return str_replace(array_values($markdown_hidden_tags), + array_keys($markdown_hidden_tags), $text); + } +} + + +### bBlog Plugin Info ### + +function identify_modifier_markdown() { + return array( + 'name' => 'markdown', + 'type' => 'modifier', + 'nicename' => 'Markdown', + 'description' => 'A text-to-HTML conversion tool for web writers', + 'authors' => 'Michel Fortin and John Gruber', + 'licence' => 'GPL', + 'version' => MARKDOWN_VERSION, + 'help' => 'Markdown syntax allows you to write using an easy-to-read, easy-to-write plain text format. Based on the original Perl version by John Gruber. More...' + ); +} + + +### Smarty Modifier Interface ### + +function smarty_modifier_markdown($text) { + return Markdown($text); +} + + +### Textile Compatibility Mode ### + +# Rename this file to "classTextile.php" and it can replace Textile everywhere. + +if (strcasecmp(substr(__FILE__, -16), "classTextile.php") == 0) { + # Try to include PHP SmartyPants. Should be in the same directory. + @include_once 'smartypants.php'; + # Fake Textile class. It calls Markdown instead. + class Textile { + function TextileThis($text, $lite='', $encode='') { + if ($lite == '' && $encode == '') $text = Markdown($text); + if (function_exists('SmartyPants')) $text = SmartyPants($text); + return $text; + } + # Workaround to ensure compatibility with TextPattern 4.0.3. + function blockLite($text) { return $text; } + } +} + + + +# +# Markdown Parser Class +# + +class Markdown_Parser { + + # Regex to match balanced [brackets]. + # Needed to insert a maximum bracked depth while converting to PHP. + var $nested_brackets_depth = 6; + var $nested_brackets; + + # Table of hash values for escaped characters: + var $escape_chars = '\`*_{}[]()>#+-.!'; + var $escape_table = array(); + var $backslash_escape_table = array(); + + # Change to ">" for HTML output. + var $empty_element_suffix = MARKDOWN_EMPTY_ELEMENT_SUFFIX; + var $tab_width = MARKDOWN_TAB_WIDTH; + + + function Markdown_Parser() { + # + # Constructor function. Initialize appropriate member variables. + # + $this->nested_brackets = + str_repeat('(?>[^\[\]]+|\[', $this->nested_brackets_depth). + str_repeat('\])*', $this->nested_brackets_depth); + + # Create an identical table but for escaped characters. + foreach (preg_split('/(?!^|$)/', $this->escape_chars) as $char) { + $hash = md5($char); + $this->escape_table[$char] = $hash; + $this->backslash_escape_table["\\$char"] = $hash; + } + } + + + # Internal hashes used during transformation. + var $urls = array(); + var $titles = array(); + var $html_blocks = array(); + + + function transform($text) { + # + # Main function. The order in which other subs are called here is + # essential. Link and image substitutions need to happen before + # _EscapeSpecialCharsWithinTagAttributes(), so that any *'s or _'s in the + # and tags get encoded. + # + # Clear the global hashes. If we don't clear these, you get conflicts + # from other articles when generating a page which contains more than + # one article (e.g. an index page that shows the N most recent + # articles): + $this->urls = array(); + $this->titles = array(); + $this->html_blocks = array(); + + # Standardize line endings: + # DOS to Unix and Mac to Unix + $text = str_replace(array("\r\n", "\r"), "\n", $text); + + # Make sure $text ends with a couple of newlines: + $text .= "\n\n"; + + # Convert all tabs to spaces. + $text = $this->detab($text); + + # Strip any lines consisting only of spaces and tabs. + # This makes subsequent regexen easier to write, because we can + # match consecutive blank lines with /\n+/ instead of something + # contorted like /[ \t]*\n+/ . + $text = preg_replace('/^[ \t]+$/m', '', $text); + + # Turn block-level HTML blocks into hash entries + $text = $this->hashHTMLBlocks($text); + + # Strip link definitions, store in hashes. + $text = $this->stripLinkDefinitions($text); + + $text = $this->runBlockGamut($text); + + $text = $this->unescapeSpecialChars($text); + + return $text . "\n"; + } + + + function stripLinkDefinitions($text) { + # + # Strips link definitions from text, stores the URLs and titles in + # hash references. + # + $less_than_tab = $this->tab_width - 1; + + # Link defs are in the form: ^[id]: url "optional title" + $text = preg_replace_callback('{ + ^[ ]{0,'.$less_than_tab.'}\[(.+)\]: # id = $1 + [ \t]* + \n? # maybe *one* newline + [ \t]* + ? # url = $2 + [ \t]* + \n? # maybe one newline + [ \t]* + (?: + (?<=\s) # lookbehind for whitespace + ["(] + (.+?) # title = $3 + [")] + [ \t]* + )? # title is optional + (?:\n+|\Z) + }xm', + array(&$this, '_stripLinkDefinitions_callback'), + $text); + return $text; + } + function _stripLinkDefinitions_callback($matches) { + $link_id = strtolower($matches[1]); + $this->urls[$link_id] = $this->encodeAmpsAndAngles($matches[2]); + if (isset($matches[3])) + $this->titles[$link_id] = str_replace('"', '"', $matches[3]); + return ''; # String that will replace the block + } + + + function hashHTMLBlocks($text) { + $less_than_tab = $this->tab_width - 1; + + # Hashify HTML blocks: + # We only want to do this for block-level HTML tags, such as headers, + # lists, and tables. That's because we still want to wrap

    s around + # "paragraphs" that are wrapped in non-block-level tags, such as anchors, + # phrase emphasis, and spans. The list of tags we're looking for is + # hard-coded: + $block_tags_a = 'p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|'. + 'script|noscript|form|fieldset|iframe|math|ins|del'; + $block_tags_b = 'p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|'. + 'script|noscript|form|fieldset|iframe|math'; + + # First, look for nested blocks, e.g.: + #

    + #
    + # tags for inner block must be indented. + #
    + #
    + # + # The outermost tags must start at the left margin for this to match, and + # the inner nested divs must be indented. + # We need to do this before the next, more liberal match, because the next + # match will start at the first `
    ` and stop at the first `
    `. + $text = preg_replace_callback("{ + ( # save in $1 + ^ # start of line (with /m) + <($block_tags_a) # start tag = $2 + \\b # word break + (.*\\n)*? # any number of lines, minimally matching + # the matching end tag + [ \\t]* # trailing spaces/tabs + (?=\\n+|\\Z) # followed by a newline or end of document + ) + }xm", + array(&$this, '_hashHTMLBlocks_callback'), + $text); + + # + # Now match more liberally, simply from `\n` to `\n` + # + $text = preg_replace_callback("{ + ( # save in $1 + ^ # start of line (with /m) + <($block_tags_b) # start tag = $2 + \\b # word break + (.*\\n)*? # any number of lines, minimally matching + .* # the matching end tag + [ \\t]* # trailing spaces/tabs + (?=\\n+|\\Z) # followed by a newline or end of document + ) + }xm", + array(&$this, '_hashHTMLBlocks_callback'), + $text); + + # Special case just for
    . It was easier to make a special case than + # to make the other regex more complicated. + $text = preg_replace_callback('{ + (?: + (?<=\n\n) # Starting after a blank line + | # or + \A\n? # the beginning of the doc + ) + ( # save in $1 + [ ]{0,'.$less_than_tab.'} + <(hr) # start tag = $2 + \b # word break + ([^<>])*? # + /?> # the matching end tag + [ \t]* + (?=\n{2,}|\Z) # followed by a blank line or end of document + ) + }x', + array(&$this, '_hashHTMLBlocks_callback'), + $text); + + # Special case for standalone HTML comments: + $text = preg_replace_callback('{ + (?: + (?<=\n\n) # Starting after a blank line + | # or + \A\n? # the beginning of the doc + ) + ( # save in $1 + [ ]{0,'.$less_than_tab.'} + (?s: + + ) + [ \t]* + (?=\n{2,}|\Z) # followed by a blank line or end of document + ) + }x', + array(&$this, '_hashHTMLBlocks_callback'), + $text); + + return $text; + } + function _hashHTMLBlocks_callback($matches) { + $text = $matches[1]; + $key = md5($text); + $this->html_blocks[$key] = $text; + return "\n\n$key\n\n"; # String that will replace the block + } + + + function runBlockGamut($text) { + # + # These are all the transformations that form block-level + # tags like paragraphs, headers, and list items. + # + $text = $this->doHeaders($text); + + # Do Horizontal Rules: + $text = preg_replace( + array('{^[ ]{0,2}([ ]?\*[ ]?){3,}[ \t]*$}mx', + '{^[ ]{0,2}([ ]? -[ ]?){3,}[ \t]*$}mx', + '{^[ ]{0,2}([ ]? _[ ]?){3,}[ \t]*$}mx'), + "\nempty_element_suffix\n", + $text); + + $text = $this->doLists($text); + $text = $this->doCodeBlocks($text); + $text = $this->doBlockQuotes($text); + + # We already ran _HashHTMLBlocks() before, in Markdown(), but that + # was to escape raw HTML in the original Markdown source. This time, + # we're escaping the markup we've just created, so that we don't wrap + #

    tags around block-level tags. + $text = $this->hashHTMLBlocks($text); + $text = $this->formParagraphs($text); + + return $text; + } + + + function runSpanGamut($text) { + # + # These are all the transformations that occur *within* block-level + # tags like paragraphs, headers, and list items. + # + $text = $this->doCodeSpans($text); + + $text = $this->escapeSpecialChars($text); + + # Process anchor and image tags. Images must come first, + # because ![foo][f] looks like an anchor. + $text = $this->doImages($text); + $text = $this->doAnchors($text); + + # Make links out of things like `` + # Must come after _DoAnchors(), because you can use < and > + # delimiters in inline links like [this](). + $text = $this->doAutoLinks($text); + $text = $this->encodeAmpsAndAngles($text); + $text = $this->doItalicsAndBold($text); + + # Do hard breaks: + $text = preg_replace('/ {2,}\n/', "empty_element_suffix\n", $text); + + return $text; + } + + + function escapeSpecialChars($text) { + $tokens = $this->tokenizeHTML($text); + + $text = ''; # rebuild $text from the tokens + # $in_pre = 0; # Keep track of when we're inside

     or  tags.
    +	#	$tags_to_skip = "!<(/?)(?:pre|code|kbd|script|math)[\s>]!";
    +
    +		foreach ($tokens as $cur_token) {
    +			if ($cur_token[0] == 'tag') {
    +				# Within tags, encode * and _ so they don't conflict
    +				# with their use in Markdown for italics and strong.
    +				# We're replacing each such character with its
    +				# corresponding MD5 checksum value; this is likely
    +				# overkill, but it should prevent us from colliding
    +				# with the escape values by accident.
    +				$cur_token[1] = str_replace(array('*', '_'),
    +					array($this->escape_table['*'], $this->escape_table['_']),
    +					$cur_token[1]);
    +				$text .= $cur_token[1];
    +			} else {
    +				$t = $cur_token[1];
    +				$t = $this->encodeBackslashEscapes($t);
    +				$text .= $t;
    +			}
    +		}
    +		return $text;
    +	}
    +
    +
    +	function doAnchors($text) {
    +	#
    +	# Turn Markdown link shortcuts into XHTML  tags.
    +	#
    +		#
    +		# First, handle reference-style links: [link text] [id]
    +		#
    +		$text = preg_replace_callback("{
    +			(					# wrap whole match in $1
    +			  \\[
    +				($this->nested_brackets)	# link text = $2
    +			  \\]
    +
    +			  [ ]?				# one optional space
    +			  (?:\\n[ ]*)?		# one optional newline followed by spaces
    +
    +			  \\[
    +				(.*?)		# id = $3
    +			  \\]
    +			)
    +			}xs",
    +			array(&$this, '_doAnchors_reference_callback'), $text);
    +
    +		#
    +		# Next, inline-style links: [link text](url "optional title")
    +		#
    +		$text = preg_replace_callback("{
    +			(				# wrap whole match in $1
    +			  \\[
    +				($this->nested_brackets)	# link text = $2
    +			  \\]
    +			  \\(			# literal paren
    +				[ \\t]*
    +				?	# href = $3
    +				[ \\t]*
    +				(			# $4
    +				  (['\"])	# quote char = $5
    +				  (.*?)		# Title = $6
    +				  \\5		# matching quote
    +				)?			# title is optional
    +			  \\)
    +			)
    +			}xs",
    +			array(&$this, '_DoAnchors_inline_callback'), $text);
    +
    +		return $text;
    +	}
    +	function _doAnchors_reference_callback($matches) {
    +		$whole_match = $matches[1];
    +		$link_text   = $matches[2];
    +		$link_id     = strtolower($matches[3]);
    +
    +		if ($link_id == "") {
    +			$link_id = strtolower($link_text); # for shortcut links like [this][].
    +		}
    +
    +		if (isset($this->urls[$link_id])) {
    +			$url = $this->urls[$link_id];
    +			# We've got to encode these to avoid conflicting with italics/bold.
    +			$url = str_replace(array('*', '_'),
    +				array($this->escape_table['*'], $this->escape_table['_']),
    +				$url);
    +			$result = "titles[$link_id] ) ) {
    +				$title = $this->titles[$link_id];
    +				$title = str_replace(array('*',     '_'),
    +									 array($this->escape_table['*'], 
    +										   $this->escape_table['_']), $title);
    +				$result .=  " title=\"$title\"";
    +			}
    +			$result .= ">$link_text";
    +		}
    +		else {
    +			$result = $whole_match;
    +		}
    +		return $result;
    +	}
    +	function _doAnchors_inline_callback($matches) {
    +		$whole_match	= $matches[1];
    +		$link_text		= $matches[2];
    +		$url			= $matches[3];
    +		$title			=& $matches[6];
    +
    +		# We've got to encode these to avoid conflicting with italics/bold.
    +		$url = str_replace(array('*', '_'),
    +						   array($this->escape_table['*'], $this->escape_table['_']), 
    +						   $url);
    +		$result = "escape_table['*'], $this->escape_table['_']),
    +								 $title);
    +			$result .=  " title=\"$title\"";
    +		}
    +		
    +		$result .= ">$link_text";
    +
    +		return $result;
    +	}
    +
    +
    +	function doImages($text) {
    +	#
    +	# Turn Markdown image shortcuts into  tags.
    +	#
    +		#
    +		# First, handle reference-style labeled images: ![alt text][id]
    +		#
    +		$text = preg_replace_callback('{
    +			(				# wrap whole match in $1
    +			  !\[
    +				('.$this->nested_brackets.')		# alt text = $2
    +			  \]
    +
    +			  [ ]?				# one optional space
    +			  (?:\n[ ]*)?		# one optional newline followed by spaces
    +
    +			  \[
    +				(.*?)		# id = $3
    +			  \]
    +
    +			)
    +			}xs', 
    +			array(&$this, '_doImages_reference_callback'), $text);
    +
    +		#
    +		# Next, handle inline images:  ![alt text](url "optional title")
    +		# Don't forget: encode * and _
    +
    +		$text = preg_replace_callback('{
    +			(				# wrap whole match in $1
    +			  !\[
    +				('.$this->nested_brackets.')		# alt text = $2
    +			  \]
    +			  \(			# literal paren
    +				[ \t]*
    +				?	# src url = $3
    +				[ \t]*
    +				(			# $4
    +				  ([\'"])	# quote char = $5
    +				  (.*?)		# title = $6
    +				  \5		# matching quote
    +				  [ \t]*
    +				)?			# title is optional
    +			  \)
    +			)
    +			}xs',
    +			array(&$this, '_doImages_inline_callback'), $text);
    +
    +		return $text;
    +	}
    +	function _doImages_reference_callback($matches) {
    +		$whole_match = $matches[1];
    +		$alt_text    = $matches[2];
    +		$link_id     = strtolower($matches[3]);
    +
    +		if ($link_id == "") {
    +			$link_id = strtolower($alt_text); # for shortcut links like ![this][].
    +		}
    +
    +		$alt_text = str_replace('"', '"', $alt_text);
    +		if (isset($this->urls[$link_id])) {
    +			$url = $this->urls[$link_id];
    +			# We've got to encode these to avoid conflicting with italics/bold.
    +			$url = str_replace(array('*', '_'),
    +							   array($this->escape_table['*'], $this->escape_table['_']),
    +							   $url);
    +			$result = "\"$alt_text\"";titles[$link_id])) {
    +				$title = $this->titles[$link_id];
    +				$title = str_replace(array('*', '_'),
    +									 array($this->escape_table['*'], 
    +										   $this->escape_table['_']), $title);
    +				$result .=  " title=\"$title\"";
    +			}
    +			$result .= $this->empty_element_suffix;
    +		}
    +		else {
    +			# If there's no such link ID, leave intact:
    +			$result = $whole_match;
    +		}
    +
    +		return $result;
    +	}
    +	function _doImages_inline_callback($matches) {
    +		$whole_match	= $matches[1];
    +		$alt_text		= $matches[2];
    +		$url			= $matches[3];
    +		$title			= '';
    +		if (isset($matches[6])) {
    +			$title		= $matches[6];
    +		}
    +
    +		$alt_text = str_replace('"', '"', $alt_text);
    +		$title    = str_replace('"', '"', $title);
    +		# We've got to encode these to avoid conflicting with italics/bold.
    +		$url = str_replace(array('*', '_'),
    +						   array($this->escape_table['*'], $this->escape_table['_']),
    +						   $url);
    +		$result = "\"$alt_text\"";escape_table['*'], $this->escape_table['_']),
    +								 $title);
    +			$result .=  " title=\"$title\""; # $title already quoted
    +		}
    +		$result .= $this->empty_element_suffix;
    +
    +		return $result;
    +	}
    +
    +
    +	function doHeaders($text) {
    +		# Setext-style headers:
    +		#	  Header 1
    +		#	  ========
    +		#  
    +		#	  Header 2
    +		#	  --------
    +		#
    +		$text = preg_replace_callback('{ ^(.+)[ \t]*\n=+[ \t]*\n+ }mx',
    +			array(&$this, '_doHeaders_callback_setext_h1'), $text);
    +		$text = preg_replace_callback('{ ^(.+)[ \t]*\n-+[ \t]*\n+ }mx',
    +			array(&$this, '_doHeaders_callback_setext_h2'), $text);
    +
    +		# atx-style headers:
    +		#	# Header 1
    +		#	## Header 2
    +		#	## Header 2 with closing hashes ##
    +		#	...
    +		#	###### Header 6
    +		#
    +		$text = preg_replace_callback('{
    +				^(\#{1,6})	# $1 = string of #\'s
    +				[ \t]*
    +				(.+?)		# $2 = Header text
    +				[ \t]*
    +				\#*			# optional closing #\'s (not counted)
    +				\n+
    +			}xm',
    +			array(&$this, '_doHeaders_callback_atx'), $text);
    +
    +		return $text;
    +	}
    +	function _doHeaders_callback_setext_h1($matches) {
    +		return "

    ".$this->runSpanGamut($matches[1])."

    \n\n"; + } + function _doHeaders_callback_setext_h2($matches) { + return "

    ".$this->runSpanGamut($matches[1])."

    \n\n"; + } + function _doHeaders_callback_atx($matches) { + $level = strlen($matches[1]); + return "".$this->runSpanGamut($matches[2])."\n\n"; + } + + + function doLists($text) { + # + # Form HTML ordered (numbered) and unordered (bulleted) lists. + # + $less_than_tab = $this->tab_width - 1; + + # Re-usable patterns to match list item bullets and number markers: + $marker_ul = '[*+-]'; + $marker_ol = '\d+[.]'; + $marker_any = "(?:$marker_ul|$marker_ol)"; + + $markers = array($marker_ul, $marker_ol); + + foreach ($markers as $marker) { + # Re-usable pattern to match any entirel ul or ol list: + $whole_list = ' + ( # $1 = whole list + ( # $2 + [ ]{0,'.$less_than_tab.'} + ('.$marker.') # $3 = first list item marker + [ \t]+ + ) + (?s:.+?) + ( # $4 + \z + | + \n{2,} + (?=\S) + (?! # Negative lookahead for another list item marker + [ \t]* + '.$marker.'[ \t]+ + ) + ) + ) + '; // mx + + # We use a different prefix before nested lists than top-level lists. + # See extended comment in _ProcessListItems(). + + if ($this->list_level) { + $text = preg_replace_callback('{ + ^ + '.$whole_list.' + }mx', + array(&$this, '_doLists_callback_top'), $text); + } + else { + $text = preg_replace_callback('{ + (?:(?<=\n\n)|\A\n?) + '.$whole_list.' + }mx', + array(&$this, '_doLists_callback_nested'), $text); + } + } + + return $text; + } + function _doLists_callback_top($matches) { + # Re-usable patterns to match list item bullets and number markers: + $marker_ul = '[*+-]'; + $marker_ol = '\d+[.]'; + $marker_any = "(?:$marker_ul|$marker_ol)"; + + $list = $matches[1]; + $list_type = preg_match("/$marker_ul/", $matches[3]) ? "ul" : "ol"; + + $marker_any = ( $list_type == "ul" ? $marker_ul : $marker_ol ); + + # Turn double returns into triple returns, so that we can make a + # paragraph for the last item in a list, if necessary: + $list = preg_replace("/\n{2,}/", "\n\n\n", $list); + $result = $this->processListItems($list, $marker_any); + + # Trim any trailing whitespace, to put the closing `` + # up on the preceding line, to get it past the current stupid + # HTML block parser. This is a hack to work around the terrible + # hack that is the HTML block parser. + $result = rtrim($result); + $result = "<$list_type>" . $result . "\n"; + return $result; + } + function _doLists_callback_nested($matches) { + # Re-usable patterns to match list item bullets and number markers: + $marker_ul = '[*+-]'; + $marker_ol = '\d+[.]'; + $marker_any = "(?:$marker_ul|$marker_ol)"; + + $list = $matches[1]; + $list_type = preg_match("/$marker_ul/", $matches[3]) ? "ul" : "ol"; + + $marker_any = ( $list_type == "ul" ? $marker_ul : $marker_ol ); + + # Turn double returns into triple returns, so that we can make a + # paragraph for the last item in a list, if necessary: + $list = preg_replace("/\n{2,}/", "\n\n\n", $list); + $result = $this->processListItems($list, $marker_any); + $result = "<$list_type>\n" . $result . "\n"; + return $result; + } + + var $list_level = 0; + + function processListItems($list_str, $marker_any) { + # + # Process the contents of a single ordered or unordered list, splitting it + # into individual list items. + # + # The $this->list_level global keeps track of when we're inside a list. + # Each time we enter a list, we increment it; when we leave a list, + # we decrement. If it's zero, we're not in a list anymore. + # + # We do this because when we're not inside a list, we want to treat + # something like this: + # + # I recommend upgrading to version + # 8. Oops, now this line is treated + # as a sub-list. + # + # As a single paragraph, despite the fact that the second line starts + # with a digit-period-space sequence. + # + # Whereas when we're inside a list (or sub-list), that line will be + # treated as the start of a sub-list. What a kludge, huh? This is + # an aspect of Markdown's syntax that's hard to parse perfectly + # without resorting to mind-reading. Perhaps the solution is to + # change the syntax rules such that sub-lists must start with a + # starting cardinal number; e.g. "1." or "a.". + + $this->list_level++; + + # trim trailing blank lines: + $list_str = preg_replace("/\n{2,}\\z/", "\n", $list_str); + + $list_str = preg_replace_callback('{ + (\n)? # leading line = $1 + (^[ \t]*) # leading whitespace = $2 + ('.$marker_any.') [ \t]+ # list marker = $3 + ((?s:.+?) # list item text = $4 + (\n{1,2})) + (?= \n* (\z | \2 ('.$marker_any.') [ \t]+)) + }xm', + array(&$this, '_processListItems_callback'), $list_str); + + $this->list_level--; + return $list_str; + } + function _processListItems_callback($matches) { + $item = $matches[4]; + $leading_line =& $matches[1]; + $leading_space =& $matches[2]; + + if ($leading_line || preg_match('/\n{2,}/', $item)) { + $item = $this->runBlockGamut($this->outdent($item)); + } + else { + # Recursion for sub-lists: + $item = $this->doLists($this->outdent($item)); + $item = preg_replace('/\n+$/', '', $item); + $item = $this->runSpanGamut($item); + } + + return "
  • " . $item . "
  • \n"; + } + + + function doCodeBlocks($text) { + # + # Process Markdown `
    ` blocks.
    +	#
    +		$text = preg_replace_callback('{
    +				(?:\n\n|\A)
    +				(	            # $1 = the code block -- one or more lines, starting with a space/tab
    +				  (?:
    +					(?:[ ]{'.$this->tab_width.'} | \t)  # Lines must start with a tab or a tab-width of spaces
    +					.*\n+
    +				  )+
    +				)
    +				((?=^[ ]{0,'.$this->tab_width.'}\S)|\Z)	# Lookahead for non-space at line-start, or end of doc
    +			}xm',
    +			array(&$this, '_doCodeBlocks_callback'), $text);
    +
    +		return $text;
    +	}
    +	function _doCodeBlocks_callback($matches) {
    +		$codeblock = $matches[1];
    +
    +		$codeblock = $this->encodeCode($this->outdent($codeblock));
    +	//	$codeblock = $this->detab($codeblock);
    +		# trim leading newlines and trailing whitespace
    +		$codeblock = preg_replace(array('/\A\n+/', '/\s+\z/'), '', $codeblock);
    +
    +		$result = "\n\n
    " . $codeblock . "\n
    \n\n"; + + return $result; + } + + + function doCodeSpans($text) { + # + # * Backtick quotes are used for spans. + # + # * You can use multiple backticks as the delimiters if you want to + # include literal backticks in the code span. So, this input: + # + # Just type ``foo `bar` baz`` at the prompt. + # + # Will translate to: + # + #

    Just type foo `bar` baz at the prompt.

    + # + # There's no arbitrary limit to the number of backticks you + # can use as delimters. If you need three consecutive backticks + # in your code, use four for delimiters, etc. + # + # * You can use spaces to get literal backticks at the edges: + # + # ... type `` `bar` `` ... + # + # Turns to: + # + # ... type `bar` ... + # + $text = preg_replace_callback('@ + (?encodeCode($c); + return "$c"; + } + + + function encodeCode($_) { + # + # Encode/escape certain characters inside Markdown code runs. + # The point is that in code, these characters are literals, + # and lose their special Markdown meanings. + # + # Encode all ampersands; HTML entities are not + # entities within a Markdown code span. + $_ = str_replace('&', '&', $_); + + # Do the angle bracket song and dance: + $_ = str_replace(array('<', '>'), + array('<', '>'), $_); + + # Now, escape characters that are magic in Markdown: + $_ = str_replace(array_keys($this->escape_table), + array_values($this->escape_table), $_); + + return $_; + } + + + function doItalicsAndBold($text) { + # must go first: + $text = preg_replace('{ + ( # $1: Marker + (?\2', $text); + # Then : + $text = preg_replace( + '{ ( (?\2', $text); + + return $text; + } + + + function doBlockQuotes($text) { + $text = preg_replace_callback('/ + ( # Wrap whole match in $1 + ( + ^[ \t]*>[ \t]? # ">" at the start of a line + .+\n # rest of the first line + (.+\n)* # subsequent consecutive lines + \n* # blanks + )+ + ) + /xm', + array(&$this, '_doBlockQuotes_callback'), $text); + + return $text; + } + function _doBlockQuotes_callback($matches) { + $bq = $matches[1]; + # trim one level of quoting - trim whitespace-only lines + $bq = preg_replace(array('/^[ \t]*>[ \t]?/m', '/^[ \t]+$/m'), '', $bq); + $bq = $this->runBlockGamut($bq); # recurse + + $bq = preg_replace('/^/m', " ", $bq); + # These leading spaces screw with
     content, so we need to fix that:
    +		$bq = preg_replace_callback('{(\s*
    .+?
    )}sx', + array(&$this, '_DoBlockQuotes_callback2'), $bq); + + return "
    \n$bq\n
    \n\n"; + } + function _doBlockQuotes_callback2($matches) { + $pre = $matches[1]; + $pre = preg_replace('/^ /m', '', $pre); + return $pre; + } + + + function formParagraphs($text) { + # + # Params: + # $text - string to process with html

    tags + # + # Strip leading and trailing lines: + $text = preg_replace(array('/\A\n+/', '/\n+\z/'), '', $text); + + $grafs = preg_split('/\n{2,}/', $text, -1, PREG_SPLIT_NO_EMPTY); + + # + # Wrap

    tags. + # + foreach ($grafs as $key => $value) { + if (!isset( $this->html_blocks[$value] )) { + $value = $this->runSpanGamut($value); + $value = preg_replace('/^([ \t]*)/', '

    ', $value); + $value .= "

    "; + $grafs[$key] = $value; + } + } + + # + # Unhashify HTML blocks + # + foreach ($grafs as $key => $value) { + if (isset( $this->html_blocks[$value] )) { + $grafs[$key] = $this->html_blocks[$value]; + } + } + + return implode("\n\n", $grafs); + } + + + function encodeAmpsAndAngles($text) { + # Smart processing for ampersands and angle brackets that need to be encoded. + + # Ampersand-encoding based entirely on Nat Irons's Amputator MT plugin: + # http://bumppo.net/projects/amputator/ + $text = preg_replace('/&(?!#?[xX]?(?:[0-9a-fA-F]+|\w+);)/', + '&', $text);; + + # Encode naked <'s + $text = preg_replace('{<(?![a-z/?\$!])}i', '<', $text); + + return $text; + } + + + function encodeBackslashEscapes($text) { + # + # Parameter: String. + # Returns: The string, with after processing the following backslash + # escape sequences. + # + # Must process escaped backslashes first. + return str_replace(array_keys($this->backslash_escape_table), + array_values($this->backslash_escape_table), $text); + } + + + function doAutoLinks($text) { + $text = preg_replace("!<((https?|ftp):[^'\">\\s]+)>!", + '\1', $text); + + # Email addresses: + $text = preg_replace_callback('{ + < + (?:mailto:)? + ( + [-.\w]+ + \@ + [-a-z0-9]+(\.[-a-z0-9]+)*\.[a-z]+ + ) + > + }xi', + array(&$this, '_doAutoLinks_callback'), $text); + + return $text; + } + function _doAutoLinks_callback($matches) { + $address = $matches[1]; + $address = $this->unescapeSpecialChars($address); + $address = $this->encodeEmailAddress($address); + return $address; + } + + + function encodeEmailAddress($addr) { + # + # Input: an email address, e.g. "foo@example.com" + # + # Output: the email address as a mailto link, with each character + # of the address encoded as either a decimal or hex entity, in + # the hopes of foiling most address harvesting spam bots. E.g.: + # + # foo + # @example.com + # + # Based by a filter by Matthew Wickline, posted to the BBEdit-Talk + # mailing list: + # + $addr = "mailto:" . $addr; + $length = strlen($addr); + + # leave ':' alone (to spot mailto: later) + $addr = preg_replace_callback('/([^\:])/', + array(&$this, '_encodeEmailAddress_callback'), $addr); + + $addr = "$addr"; + # strip the mailto: from the visible part + $addr = preg_replace('/">.+?:/', '">', $addr); + + return $addr; + } + function _encodeEmailAddress_callback($matches) { + $char = $matches[1]; + $r = rand(0, 100); + # roughly 10% raw, 45% hex, 45% dec + # '@' *must* be encoded. I insist. + if ($r > 90 && $char != '@') return $char; + if ($r < 45) return '&#x'.dechex(ord($char)).';'; + return '&#'.ord($char).';'; + } + + + function unescapeSpecialChars($text) { + # + # Swap back in all the special characters we've hidden. + # + return str_replace(array_values($this->escape_table), + array_keys($this->escape_table), $text); + } + + + function tokenizeHTML($str) { + # + # Parameter: String containing HTML markup. + # Returns: An array of the tokens comprising the input + # string. Each token is either a tag (possibly with nested, + # tags contained therein, such as , or a + # run of text between tags. Each element of the array is a + # two-element array; the first is either 'tag' or 'text'; + # the second is the actual value. + # + # + # Regular expression derived from the _tokenize() subroutine in + # Brad Choate's MTRegex plugin. + # + # + $index = 0; + $tokens = array(); + + $match = '(?s:)|'. # comment + '(?s:<\?.*?\?>)|'. # processing instruction + # regular tags + '(?:<[/!$]?[-a-zA-Z0-9:]+\b(?>[^"\'>]+|"[^"]*"|\'[^\']*\')*>)'; + + $parts = preg_split("{($match)}", $str, -1, PREG_SPLIT_DELIM_CAPTURE); + + foreach ($parts as $part) { + if (++$index % 2 && $part != '') + $tokens[] = array('text', $part); + else + $tokens[] = array('tag', $part); + } + + return $tokens; + } + + + function outdent($text) { + # + # Remove one level of line-leading tabs or spaces + # + return preg_replace("/^(\\t|[ ]{1,$this->tab_width})/m", "", $text); + } + + + function detab($text) { + # + # Replace tabs with the appropriate amount of space. + # + # For each line we separate the line in blocks delemited by + # tab characters. Then we reconstruct every line by adding the + # appropriate number of space between each blocks. + + $lines = explode("\n", $text); + $text = ""; + + foreach ($lines as $line) { + # Split in blocks. + $blocks = explode("\t", $line); + # Add each blocks to the line. + $line = $blocks[0]; + unset($blocks[0]); # Do not add first block twice. + foreach ($blocks as $block) { + # Calculate amount of space, insert spaces, insert block. + $amount = $this->tab_width - strlen($line) % $this->tab_width; + $line .= str_repeat(" ", $amount) . $block; + } + $text .= "$line\n"; + } + return $text; + } + +} + + +/* + +PHP Markdown +============ + +Description +----------- + +This is a PHP translation of the original Markdown formatter written in +Perl by John Gruber. + +Markdown is a text-to-HTML filter; it translates an easy-to-read / +easy-to-write structured text format into HTML. Markdown's text format +is most similar to that of plain text email, and supports features such +as headers, *emphasis*, code blocks, blockquotes, and links. + +Markdown's syntax is designed not as a generic markup language, but +specifically to serve as a front-end to (X)HTML. You can use span-level +HTML tags anywhere in a Markdown document, and you can use block level +HTML tags (like
    and as well). + +For more information about Markdown's syntax, see: + + + + +Bugs +---- + +To file bug reports please send email to: + + + +Please include with your report: (1) the example input; (2) the output you +expected; (3) the output Markdown actually produced. + + +Version History +--------------- + +See the readme file for detailed release notes for this version. + +1.0.1oo (19 May 2006) + +* Converted PHP Markdown to a object-oriented design. + + +1.0.1c (9 Dec 2005) + +1.0.1b (6 Jun 2005) + +1.0.1a (15 Apr 2005) + +1.0.1 (16 Dec 2004) + +1.0 (21 Aug 2004) + + +Author & Contributors +--------------------- + +Original Markdown by John Gruber + + +PHP port and extras by Michel Fortin + + + +Copyright and License +--------------------- + +Copyright (c) 2004-2006 Michel Fortin + +All rights reserved. + +Copyright (c) 2003-2004 John Gruber + +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + +* Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + +* Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + +* Neither the name "Markdown" nor the names of its contributors may + be used to endorse or promote products derived from this software + without specific prior written permission. + +This software is provided by the copyright holders and contributors "as +is" and any express or implied warranties, including, but not limited +to, the implied warranties of merchantability and fitness for a +particular purpose are disclaimed. In no event shall the copyright owner +or contributors be liable for any direct, indirect, incidental, special, +exemplary, or consequential damages (including, but not limited to, +procurement of substitute goods or services; loss of use, data, or +profits; or business interruption) however caused and on any theory of +liability, whether in contract, strict liability, or tort (including +negligence or otherwise) arising in any way out of the use of this +software, even if advised of the possibility of such damage. + +*/ +?>