From 8699f81114645905cc80359d9179f3fb08104a37 Mon Sep 17 00:00:00 2001
From: Michel Fortin <michel.fortin@michelf.com>
Date: Tue, 14 Aug 2007 16:29:47 -0400
Subject: [PATCH] PHP Markdown 1.0.1d

---
 PHP Markdown Readme.text | 196 +++++++++++++++++++++-------------
 markdown.php             | 222 ++++++++++++++-------------------------
 2 files changed, 202 insertions(+), 216 deletions(-)

diff --git a/PHP Markdown Readme.text b/PHP Markdown Readme.text
index 9bc20ca..9c2e49c 100644
--- a/PHP Markdown Readme.text	
+++ b/PHP Markdown Readme.text	
@@ -1,7 +1,7 @@
 PHP Markdown
 ============
 
-Version 1.0.2b7 - Sat 16 Sep 2006
+Version 1.0.1d - Fri 1 Dec 2006
 
 by Michel Fortin
 <http://www.michelf.com/>
@@ -57,54 +57,53 @@ version.
     same line than Markdown. Your entries will now be formatted by 
     PHP Markdown.
 
+3.  To post Markdown content, you'll first have to disable the 
+	"visual" editor in the User section of WordPress.
+
 You can configure PHP Markdown to not apply to the comments on your 
 WordPress weblog. See the "Configuration" section below.
 
-Note: It is not possible at this time to apply a different set of 
+It is not possible at this time to apply a different set of 
 filters to different entries. All your entries will be formated by 
-PHP Markdown. This is currently a limitation of WordPress. If your old 
-entries are written in HTML (as opposed to another formatting syntax), 
-your site should not suffer much from installing PHP Markdown.
+PHP Markdown. This is a limitation of WordPress. If your old entries 
+are written in HTML (as opposed to another formatting syntax, like 
+Textile), they'll probably stay fine after installing Markdown.
 
 
 ### bBlog ###
 
-PHP Markdown also works with the latest version of [bBlog][bb].
+PHP Markdown also works with [bBlog][bb].
 
  [bb]: http://www.bblog.com/
 
-1.  To use PHP Markdown with bBlog, rename "markdown.php" to 
-    "modifier.markdown.php" and place the file in the "bBlog_plugins" 
-    folder. This folder is located inside the "bblog" directory of 
-    your site, like this:
+To use PHP Markdown with bBlog, rename "markdown.php" to 
+"modifier.markdown.php" and place the file in the "bBlog_plugins" 
+folder. This folder is located inside the "bblog" directory of 
+your site, like this:
 
         (site home)/bblog/bBlog_plugins/modifier.markdown.php
 
-2.  Select "Markdown" as the "Entry Modifier" when you post a new 
-    entry. This setting will only apply to the entry you are editing.
+Select "Markdown" as the "Entry Modifier" when you post a new 
+entry. This setting will only apply to the entry you are editing.
 
 
-### Replacing Textile ###
+### Replacing Textile in TextPattern ###
 
-Many web programs written in PHP use [Textile][tx] to format your text. 
-To use PHP Markdown with these programs without having to change the 
-code, you can use PHP Markdown in "Textile Compatibility Mode."
+[TextPattern][tp] use [Textile][tx] to format your text. You can 
+replace Textile by Markdown in TextPattern without having to change
+any code by using the *Texitle Compatibility Mode*. This may work 
+with other software that expect Textile too.
 
  [tx]: http://www.textism.com/tools/textile/
+ [tp]: http://www.textpattern.com/
 
-1.  Rename the "markdown.php" file to "classTextile.php".
+1.  Rename the "markdown.php" file to "classTextile.php". This will
+	make PHP Markdown behave as if it was the actual Textile parser.
 
-2.  Locate the "classTextile.php" file hidden somewhere inside the 
-    installation of your program (see table below). Replace it with 
-    the PHP Markdown file you just renamed.
+2.  Replace the "classTextile.php" file TextPattern installed in your
+	web directory. It can be found in the "lib" directory:
 
-As an helper, here you can learn where is the "classTextile.php" file 
-in some web programs:
-
-    Program       Location
-    ----------------------------------------------------------------
-    TextPattern   (site home)/textpattern/lib/classTextile.php
-    Pivot         (site home)/pivot/includes/textile/classtextile.php
+		(site home)/textpattern/lib/
 
 Contrary to Textile, Markdown does not convert quotes to curly ones 
 and does not convert multiple hyphens (`--` and `---`) into en- and 
@@ -158,17 +157,17 @@ Markdown can be configured to produce HTML-style tags; e.g.:
 
     <br>
 
-To do this, you  must edit the "$md_empty_element_suffix" variable
-below the "Global default settings" header at the start of the
-"markdown.php" file.
+To do this, you  must edit the "MARKDOWN_EMPTY_ELEMENT_SUFFIX" 
+definition below the "Global default settings" header at the start of 
+the "markdown.php" file.
 
 
 ### WordPress-Specific Settings ###
 
 By default, the Markdown plugin applies to both posts and comments on 
 your WordPress weblog. To deactivate one or the other, edit the 
-`$md_wp_posts` or `$md_wp_comments` variable under the "WordPress 
-settings" header at the start of the "markdown.php" file.
+`MARKDOWN_WP_POSTS` or `MARKDOWN_WP_COMMENTS` definitions under the 
+"WordPress settings" header at the start of the "markdown.php" file.
 
 
 Bugs
@@ -184,6 +183,99 @@ expected; (3) the output PHP Markdown actually produced.
 Version History
 ---------------
 
+1.0.1d (1 Dec 2006)
+
+*   Fixed a bug where inline images always had an empty title attribute. The 
+	title attribute is now present only when explicitly defined.
+
+*	Link references definitions can now have an empty title, previously if the 
+	title was defined but left empty the link definition was ignored. This can 
+	be useful if you want an empty title attribute in images to hide the 
+	tooltip in Internet Explorer.
+
+*	Made `detab` aware of UTF-8 characters. UTF-8 multi-byte sequences are now 
+	correctly mapped to one character instead of the number of bytes.
+
+*	Fixed a small bug with WordPress where WordPress' default filter `wpautop`
+	was not properly deactivated on comment text, resulting in hard line breaks
+	where Markdown do not prescribes them.
+
+*	Added a `TextileRestrited` method to the textile compatibility mode. There
+	is no restriction however, as Markdown does not have a restricted mode at 
+	this point. This should make PHP Markdown work again in the latest 
+	versions of TextPattern.
+
+*   Converted PHP Markdown to a object-oriented design.
+
+*	Changed span and block gamut methods so that they loop over a 
+	customizable list of methods. This makes subclassing the parser a more 
+	interesting option for creating syntax extensions.
+
+*	Also added a "document" gamut loop which can be used to hook document-level 
+	methods (like for striping link definitions).
+
+*	Changed all methods which were inserting HTML code so that they now return 
+	a hashed representation of the code. New methods `hashSpan` and `hashBlock`
+	are used to hash respectivly span- and block-level generated content. This 
+	has a couple of significant effects:
+	
+	1.	It prevents invalid nesting of Markdown-generated elements which 
+	    could occur occuring with constructs like `*something [link*][1]`.
+	2.	It prevents problems occuring with deeply nested lists on which 
+	    paragraphs were ill-formed.
+	3.	It removes the need to call `hashHTMLBlocks` twice during the the 
+		block gamut.
+	
+	Hashes are turned back to HTML prior output.
+
+*	Made the block-level HTML parser smarter using a specially-crafted regular 
+	expression capable of handling nested tags.
+
+*	Solved backtick issues in tag attributes by rewriting the HTML tokenizer to 
+	be aware of code spans. All these lines should work correctly now:
+	
+		<span attr='`ticks`'>bar</span>
+		<span attr='``double ticks``'>bar</span>
+		`<test a="` content of attribute `">`
+
+*	Changed the parsing of HTML comments to match simply from `<!--` to `-->` 
+	instead using of the more complicated SGML-style rule with paired `--`.
+	This is how most browsers parse comments and how XML defines them too.
+
+*	`<address>` has been added to the list of block-level elements and is now
+	treated as an HTML block instead of being wrapped within paragraph tags.
+
+*	Now only trim trailing newlines from code blocks, instead of trimming
+	all trailing whitespace characters.
+
+*	Fixed bug where this:
+
+		[text](http://m.com "title" )
+		
+	wasn't working as expected, because the parser wasn't allowing for spaces
+	before the closing paren.
+
+*	Filthy hack to support markdown='1' in div tags.
+
+*	_DoAutoLinks() now supports the 'dict://' URL scheme.
+
+*	PHP- and ASP-style processor instructions are now protected as
+	raw HTML blocks.
+
+		<? ... ?>
+		<% ... %>
+
+*	Fix for escaped backticks still triggering code spans:
+
+		There are two raw backticks here: \` and here: \`, not a code span
+
+
+1.0.1c (9 Dec 2005)
+
+*   Fixed a problem occurring with PHP 5.1.1 due to a small
+    change to strings variable replacement behaviour in
+    this version.
+
 
 1.0.1b (6 Jun 2005)
 
@@ -248,46 +340,6 @@ Version History
 	filter so that it runs after Markdown.
 
 
-1.0.2b1 - 5 Mar 2005
-
-*	Fix for backticks within HTML tag:
-
-		<span attr='`ticks`'>like this</span>
-
-*	Fix for escaped backticks still triggering code spans:
-	
-		There are two raw backticks here: \` and here: \`, not a code span
-
-*	Improved integration with WordPress. With WordPress 1.5, the 
-	balenceTags filter now runs after Markdown, so it won't 
-	interfere anymore. You can still disable balanceTags from the admin 
-	interface (in Options > Writing) if you want to.
-	
-*	PHP Markdown now correctly filter text for excerpts in WordPress. 
-	There is still one glitch: autolinks and tags in code samples are 
-	stripped by WordPress when trimming it. A fix for this is possible 
-	with WordPress 1.5, but would require duplicating WordPress entry 
-	trimming code within Markdown, which I can't do because of a license 
-	issue. (Nor do I think it is a good solution to fix this.)
-
-*	Improved Textile compatibility mode. Markdown will now honor the
-	no-image and the lite parameters. In lite mode, no header, blockquote, 
-	list, or code block will be made, and inline HTML is limited
-	to the following tags:
-		
-		<a><i><b><em><strong><sup><sub><code><img><cite><ins><del>
-
-	This is acheived by backslash-escaping block markers before sending
-	text through the Markdown filter.
-	
-	The improved Textile comatibility means that the Markdown syntax will now 
-	be processed for comments in TextPattern (only for span elements due to 
-	TextPattern using the lite mode for comments). Sadly, due to TextPattern 
-	tag stripping, sample code in code span and auto-links will be stripped 
-	before the Markdown filter can see them. So I guess I should say it 
-	half-work for comments TextPattern.
-
-
 1.0.1 (16 Dec 2004):
 
 *	Changed the syntax rules for code blocks and spans. Previously,
diff --git a/markdown.php b/markdown.php
index b5ae177..a53d0e1 100644
--- a/markdown.php
+++ b/markdown.php
@@ -7,12 +7,12 @@
 # <http://www.michelf.com/projects/php-markdown/>
 #
 # Original Markdown
-# Copyright (c) 2004-2005 John Gruber  
+# Copyright (c) 2004-2006 John Gruber  
 # <http://daringfireball.net/projects/markdown/>
 #
 
 
-define( 'MARKDOWN_VERSION',  "1.0.2b7" ); # Sat 16 Sep 2006
+define( 'MARKDOWN_VERSION',  "1.0.1d" ); # Fri 1 Dec 2006
 
 
 #
@@ -62,7 +62,7 @@ function Markdown($text) {
 Plugin Name: Markdown
 Plugin URI: http://www.michelf.com/projects/php-markdown/
 Description: <a href="http://daringfireball.net/projects/markdown/syntax">Markdown syntax</a> allows you to write using an easy-to-read, easy-to-write plain text format. Based on the original Perl version by <a href="http://daringfireball.net/">John Gruber</a>. <a href="http://www.michelf.com/projects/php-markdown/">More...</a>
-Version: 1.0.2b7
+Version: 1.0.1d
 Author: Michel Fortin
 Author URI: http://www.michelf.com/
 */
@@ -96,7 +96,7 @@ if (isset($wp_version)) {
 	# - Scramble important tags before passing them to the kses filter.
 	# - Run Markdown on excerpt then remove paragraph tags.
 	if (MARKDOWN_WP_COMMENTS) {
-		remove_filter('comment_text', 'wpautop');
+		remove_filter('comment_text', 'wpautop', 30);
 		remove_filter('comment_text', 'make_clickable');
 		add_filter('pre_comment_content', 'Markdown', 6);
 		add_filter('pre_comment_content', 'mdwp_hide_tags', 8);
@@ -145,7 +145,7 @@ function identify_modifier_markdown() {
 		'nicename'		=> 'Markdown',
 		'description'	=> 'A text-to-HTML conversion tool for web writers',
 		'authors'		=> 'Michel Fortin and John Gruber',
-		'licence'		=> 'GPL',
+		'licence'		=> 'BSD-like',
 		'version'		=> MARKDOWN_VERSION,
 		'help'			=> '<a href="http://daringfireball.net/projects/markdown/syntax">Markdown syntax</a> allows you to write using an easy-to-read, easy-to-write plain text format. Based on the original Perl version by <a href="http://daringfireball.net/">John Gruber</a>. <a href="http://www.michelf.com/projects/php-markdown/">More...</a>'
 	);
@@ -173,6 +173,10 @@ if (strcasecmp(substr(__FILE__, -16), "classTextile.php") == 0) {
 			if (function_exists('SmartyPants'))  $text = SmartyPants($text);
 			return $text;
 		}
+		# Fake restricted version: restrictions are not supported for now.
+		function TextileRestricted($text, $lite='', $noimage='') {
+			return $this->TextileThis($text, $lite);
+		}
 		# Workaround to ensure compatibility with TextPattern 4.0.3.
 		function blockLite($text) { return $text; }
 	}
@@ -302,7 +306,7 @@ class Markdown_Parser {
 							(?:
 								(?<=\s)			# lookbehind for whitespace
 								["(]
-								(.+?)			# title = $3
+								(.*?)			# title = $3
 								[")]
 								[ \t]*
 							)?	# title is optional
@@ -692,14 +696,14 @@ class Markdown_Parser {
 		# These must come last in case you've also got [link test][1]
 		# or [link test](/foo)
 		#
-		$text = preg_replace_callback('{
-			(					# wrap whole match in $1
-			  \[
-				([^\[\]]+)		# link text = $2; can\'t contain [ or ]
-			  \]
-			)
-			}xs',
-			array(&$this, '_doAnchors_reference_callback'), $text);
+//		$text = preg_replace_callback('{
+//			(					# wrap whole match in $1
+//			  \[
+//				([^\[\]]+)		# link text = $2; can\'t contain [ or ]
+//			  \]
+//			)
+//			}xs',
+//			array(&$this, '_doAnchors_reference_callback'), $text);
 
 		return $text;
 	}
@@ -841,15 +845,12 @@ class Markdown_Parser {
 		$whole_match	= $matches[1];
 		$alt_text		= $matches[2];
 		$url			= $matches[3];
-		$title			= '';
-		if (isset($matches[6])) {
-			$title		= $matches[6];
-		}
+		$title			=& $matches[6];
 
 		$alt_text = str_replace('"', '&quot;', $alt_text);
-		$title    = str_replace('"', '&quot;', $title);
 		$result = "<img src=\"$url\" alt=\"$alt_text\"";
 		if (isset($title)) {
+			$title = str_replace('"', '&quot;', $title);
 			$result .=  " title=\"$title\""; # $title already quoted
 		}
 		$result .= $this->empty_element_suffix;
@@ -1148,22 +1149,23 @@ class Markdown_Parser {
 		# <strong> must go first:
 		$text = preg_replace_callback('{
 				(						# $1: Marker
-					(?<!\*\*) \*\* |	#     (not preceded by two chars of
-					(?<!__)   __		#      the same marker)
-				)						
+					(?<!\*\*) \* |		#     (not preceded by two chars of
+					(?<!__)   _			#      the same marker)
+				)
+				\1
 				(?=\S) 					# Not followed by whitespace 
-				(?!\1)					#   or two others marker chars.
+				(?!\1\1)				#   or two others marker chars.
 				(						# $2: Content
 					(?:
 						[^*_]+?			# Anthing not em markers.
 					|
 										# Balence any regular emphasis inside.
-						([*_]) (?=\S) .+? (?<=\S) \3	# $3: em char (* or _)
+						\1 (?=\S) .+? (?<=\S) \1
 					|
 						(?! \1 ) .		# Allow unbalenced * and _.
 					)+?
 				)
-				(?<=\S) \1				# End mark not preceded by whitespace.
+				(?<=\S) \1\1			# End mark not preceded by whitespace.
 			}sx',
 			array(&$this, '_doItalicAndBold_strong_callback'), $text);
 		# Then <em>:
@@ -1207,9 +1209,10 @@ class Markdown_Parser {
 		$bq = $this->runBlockGamut($bq);		# recurse
 
 		$bq = preg_replace('/^/m', "  ", $bq);
-		# These leading spaces screw with <pre> content, so we need to fix that:
+		# These leading spaces cause problem with <pre> content, 
+		# so we need to fix that:
 		$bq = preg_replace_callback('{(\s*<pre>.+?</pre>)}sx', 
-									array(&$this, '_DoBlockQuotes_callback2'), $bq);
+			array(&$this, '_DoBlockQuotes_callback2'), $bq);
 
 		return $this->hashBlock("<blockquote>\n$bq\n</blockquote>")."\n\n";
 	}
@@ -1245,52 +1248,46 @@ class Markdown_Parser {
 		#
 		# Unhashify HTML blocks
 		#
-//		foreach ($grafs as $key => $value) {
-//			if (isset( $this->html_blocks[$value] )) {
-//				$grafs[$key] = $this->html_blocks[$value];
-//			}
-//		}
-
 		foreach ($grafs as $key => $graf) {
 			# Modify elements of @grafs in-place...
 			if (isset($this->html_blocks[$graf])) {
 				$block = $this->html_blocks[$graf];
 				$graf = $block;
-				if (preg_match('{
-					\A
-					(							# $1 = <div> tag
-					  <div  \s+
-					  [^>]*
-					  \b
-					  markdown\s*=\s*  ([\'"])	#	$2 = attr quote char
-					  1
-					  \2
-					  [^>]*
-					  >
-					)
-					(							# $3 = contents
-					.*
-					)
-					(</div>)					# $4 = closing tag
-					\z
-					}xs', $block, $matches))
-				{
-					list(, $div_open, , $div_content, $div_close) = $matches;
-
-					# We can't call Markdown(), because that resets the hash;
-					# that initialization code should be pulled into its own sub, though.
-					$div_content = $this->hashHTMLBlocks($div_content);
-					
-					# Run document gamut methods on the content.
-					foreach ($this->document_gamut as $method => $priority) {
-						$div_content = $this->$method($div_content);
-					}
-
-					$div_open = preg_replace(
-						'{\smarkdown\s*=\s*([\'"]).+?\1}', '', $div_open);
-
-					$graf = $div_open . "\n" . $div_content . "\n" . $div_close;
-				}
+//				if (preg_match('{
+//					\A
+//					(							# $1 = <div> tag
+//					  <div  \s+
+//					  [^>]*
+//					  \b
+//					  markdown\s*=\s*  ([\'"])	#	$2 = attr quote char
+//					  1
+//					  \2
+//					  [^>]*
+//					  >
+//					)
+//					(							# $3 = contents
+//					.*
+//					)
+//					(</div>)					# $4 = closing tag
+//					\z
+//					}xs', $block, $matches))
+//				{
+//					list(, $div_open, , $div_content, $div_close) = $matches;
+//
+//					# We can't call Markdown(), because that resets the hash;
+//					# that initialization code should be pulled into its own sub, though.
+//					$div_content = $this->hashHTMLBlocks($div_content);
+//					
+//					# Run document gamut methods on the content.
+//					foreach ($this->document_gamut as $method => $priority) {
+//						$div_content = $this->$method($div_content);
+//					}
+//
+//					$div_open = preg_replace(
+//						'{\smarkdown\s*=\s*([\'"]).+?\1}', '', $div_open);
+//
+//					$graf = $div_open . "\n" . $div_content . "\n" . $div_close;
+//				}
 				$grafs[$key] = $graf;
 			}
 		}
@@ -1403,21 +1400,23 @@ class Markdown_Parser {
 
 	function tokenizeHTML($str) {
 	#
-	#   Parameter:  String containing HTML markup.
+	#   Parameter:  String containing HTML + Markdown markup.
 	#   Returns:    An array of the tokens comprising the input
-	#               string. Each token is either a tag (possibly with nested,
-	#               tags contained therein, such as <a href="<MTFoo>">, or a
-	#               run of text between tags. Each element of the array is a
+	#               string. Each token is either a tag or a run of text 
+	#               between tags. Each element of the array is a
 	#               two-element array; the first is either 'tag' or 'text';
 	#               the second is the actual value.
-	#   Note:       Takes code spans into account and does not generate tag 
-	#               tokens inside code spans.
+	#   Note:       Markdown code spans are taken into account: no tag token is 
+	#               generated within a code span.
 	#
 		$tokens = array();
 
 		while ($str != "") {
 			#
-			# 
+			# Each loop iteration seach for either the next tag or the next 
+			# openning code span marker. If a code span marker is found, the 
+			# code span is extracted in entierty and will result in an extra
+			# text token.
 			#
 			$parts = preg_split('{
 				(
@@ -1496,7 +1495,8 @@ class Markdown_Parser {
 			unset($blocks[0]); # Do not add first block twice.
 			foreach ($blocks as $block) {
 				# Calculate amount of space, insert spaces, insert block.
-				$amount = $this->tab_width - strlen($line) % $this->tab_width;
+				$amount = $this->tab_width - 
+					mb_strlen($line, 'UTF-8') % $this->tab_width;
 				$line .= str_repeat(" ", $amount) . $block;
 			}
 			$text .= "$line\n";
@@ -1558,73 +1558,7 @@ Version History
 
 See the readme file for detailed release notes for this version.
 
-1.0.2b7 (16 Sep 2006)
-
-*	Changed span and block gamut methods so that they loop over a 
-	customizable list of methods. This makes subclassing the parser a more 
-	interesting option for creating syntax extensions.
-
-*	Also added a "document" gamut loop which can be used to hook document-level 
-	methods (like for striping link definitions).
-
-*	Changed all methods which were inserting HTML code so that they now return 
-	a hashed representation of the code. New methods `hashSpan` and `hashBlock`
-	are used to hash respectivly span- and block-level generated content. This 
-	has a couple of significant effects:
-	
-	1.	It prevents invalid nesting of Markdown-generated elements which 
-	    could occur occuring with constructs like `*something [link*][1]`.
-	2.	It prevents problems occuring with deeply nested lists on which 
-	    paragraphs were ill-formed.
-	3.	It removes the need to call `hashHTMLBlocks` twice during the the 
-		block gamut.
-	
-	Hashes are turned back to HTML prior output.
-
-*	Made the block-level HTML parser smarter using a specially-crafted regular 
-	expression capable of handling nested tags.
-
-*	Solved backtick issues in tag attributes by rewriting the HTML tokenizer to 
-	be aware of code spans. All these lines should work correctly now:
-	
-		<span attr='`ticks`'>bar</span>
-		<span attr='``double ticks``'>bar</span>
-		`<test a="` content of attribute `">`
-
-*	`<address>` has been added to the list of block-level elements and is now
-	treated as an HTML block instead of being wrapped within paragraph tags.
-
-*	Now only trim trailing newlines from code blocks, instead of trimming
-	all trailing whitespace characters.
-
-*	Fixed bug where this:
-
-		[text](http://m.com "title" )
-		
-	wasn't working as expected, because the parser wasn't allowing for spaces
-	before the closing paren.
-
-*	Filthy hack to support markdown='1' in div tags.
-
-*	_DoAutoLinks() now supports the 'dict://' URL scheme.
-
-*	PHP- and ASP-style processor instructions are now protected as
-	raw HTML blocks.
-
-		<? ... ?>
-		<% ... %>
-
-*	Experimental support for [this] as a synonym for [this][].
-
-*	Fix for escaped backticks still triggering code spans:
-
-		There are two raw backticks here: \` and here: \`, not a code span
-
-
-1.0.1oo (19 May 2006)
-
-*   Converted PHP Markdown to a object-oriented design.
-
+1.0.1d (1 Dec 2006)
 
 1.0.1c (9 Dec 2005)
 
@@ -1654,7 +1588,7 @@ Copyright (c) 2004-2006 Michel Fortin
 <http://www.michelf.com/>  
 All rights reserved.
 
-Copyright (c) 2003-2004 John Gruber   
+Copyright (c) 2003-2006 John Gruber   
 <http://daringfireball.net/>   
 All rights reserved.