PHP Markdown 1.0.1d

This commit is contained in:
Michel Fortin 2007-08-14 16:29:47 -04:00
parent 60be7c207e
commit 8699f81114
2 changed files with 202 additions and 216 deletions

View file

@ -1,7 +1,7 @@
PHP Markdown
============
Version 1.0.2b7 - Sat 16 Sep 2006
Version 1.0.1d - Fri 1 Dec 2006
by Michel Fortin
<http://www.michelf.com/>
@ -57,54 +57,53 @@ version.
same line than Markdown. Your entries will now be formatted by
PHP Markdown.
3. To post Markdown content, you'll first have to disable the
"visual" editor in the User section of WordPress.
You can configure PHP Markdown to not apply to the comments on your
WordPress weblog. See the "Configuration" section below.
Note: It is not possible at this time to apply a different set of
It is not possible at this time to apply a different set of
filters to different entries. All your entries will be formated by
PHP Markdown. This is currently a limitation of WordPress. If your old
entries are written in HTML (as opposed to another formatting syntax),
your site should not suffer much from installing PHP Markdown.
PHP Markdown. This is a limitation of WordPress. If your old entries
are written in HTML (as opposed to another formatting syntax, like
Textile), they'll probably stay fine after installing Markdown.
### bBlog ###
PHP Markdown also works with the latest version of [bBlog][bb].
PHP Markdown also works with [bBlog][bb].
[bb]: http://www.bblog.com/
1. To use PHP Markdown with bBlog, rename "markdown.php" to
"modifier.markdown.php" and place the file in the "bBlog_plugins"
folder. This folder is located inside the "bblog" directory of
your site, like this:
To use PHP Markdown with bBlog, rename "markdown.php" to
"modifier.markdown.php" and place the file in the "bBlog_plugins"
folder. This folder is located inside the "bblog" directory of
your site, like this:
(site home)/bblog/bBlog_plugins/modifier.markdown.php
2. Select "Markdown" as the "Entry Modifier" when you post a new
entry. This setting will only apply to the entry you are editing.
Select "Markdown" as the "Entry Modifier" when you post a new
entry. This setting will only apply to the entry you are editing.
### Replacing Textile ###
### Replacing Textile in TextPattern ###
Many web programs written in PHP use [Textile][tx] to format your text.
To use PHP Markdown with these programs without having to change the
code, you can use PHP Markdown in "Textile Compatibility Mode."
[TextPattern][tp] use [Textile][tx] to format your text. You can
replace Textile by Markdown in TextPattern without having to change
any code by using the *Texitle Compatibility Mode*. This may work
with other software that expect Textile too.
[tx]: http://www.textism.com/tools/textile/
[tp]: http://www.textpattern.com/
1. Rename the "markdown.php" file to "classTextile.php".
1. Rename the "markdown.php" file to "classTextile.php". This will
make PHP Markdown behave as if it was the actual Textile parser.
2. Locate the "classTextile.php" file hidden somewhere inside the
installation of your program (see table below). Replace it with
the PHP Markdown file you just renamed.
2. Replace the "classTextile.php" file TextPattern installed in your
web directory. It can be found in the "lib" directory:
As an helper, here you can learn where is the "classTextile.php" file
in some web programs:
Program Location
----------------------------------------------------------------
TextPattern (site home)/textpattern/lib/classTextile.php
Pivot (site home)/pivot/includes/textile/classtextile.php
(site home)/textpattern/lib/
Contrary to Textile, Markdown does not convert quotes to curly ones
and does not convert multiple hyphens (`--` and `---`) into en- and
@ -158,17 +157,17 @@ Markdown can be configured to produce HTML-style tags; e.g.:
<br>
To do this, you must edit the "$md_empty_element_suffix" variable
below the "Global default settings" header at the start of the
"markdown.php" file.
To do this, you must edit the "MARKDOWN_EMPTY_ELEMENT_SUFFIX"
definition below the "Global default settings" header at the start of
the "markdown.php" file.
### WordPress-Specific Settings ###
By default, the Markdown plugin applies to both posts and comments on
your WordPress weblog. To deactivate one or the other, edit the
`$md_wp_posts` or `$md_wp_comments` variable under the "WordPress
settings" header at the start of the "markdown.php" file.
`MARKDOWN_WP_POSTS` or `MARKDOWN_WP_COMMENTS` definitions under the
"WordPress settings" header at the start of the "markdown.php" file.
Bugs
@ -184,6 +183,99 @@ expected; (3) the output PHP Markdown actually produced.
Version History
---------------
1.0.1d (1 Dec 2006)
* Fixed a bug where inline images always had an empty title attribute. The
title attribute is now present only when explicitly defined.
* Link references definitions can now have an empty title, previously if the
title was defined but left empty the link definition was ignored. This can
be useful if you want an empty title attribute in images to hide the
tooltip in Internet Explorer.
* Made `detab` aware of UTF-8 characters. UTF-8 multi-byte sequences are now
correctly mapped to one character instead of the number of bytes.
* Fixed a small bug with WordPress where WordPress' default filter `wpautop`
was not properly deactivated on comment text, resulting in hard line breaks
where Markdown do not prescribes them.
* Added a `TextileRestrited` method to the textile compatibility mode. There
is no restriction however, as Markdown does not have a restricted mode at
this point. This should make PHP Markdown work again in the latest
versions of TextPattern.
* Converted PHP Markdown to a object-oriented design.
* Changed span and block gamut methods so that they loop over a
customizable list of methods. This makes subclassing the parser a more
interesting option for creating syntax extensions.
* Also added a "document" gamut loop which can be used to hook document-level
methods (like for striping link definitions).
* Changed all methods which were inserting HTML code so that they now return
a hashed representation of the code. New methods `hashSpan` and `hashBlock`
are used to hash respectivly span- and block-level generated content. This
has a couple of significant effects:
1. It prevents invalid nesting of Markdown-generated elements which
could occur occuring with constructs like `*something [link*][1]`.
2. It prevents problems occuring with deeply nested lists on which
paragraphs were ill-formed.
3. It removes the need to call `hashHTMLBlocks` twice during the the
block gamut.
Hashes are turned back to HTML prior output.
* Made the block-level HTML parser smarter using a specially-crafted regular
expression capable of handling nested tags.
* Solved backtick issues in tag attributes by rewriting the HTML tokenizer to
be aware of code spans. All these lines should work correctly now:
<span attr='`ticks`'>bar</span>
<span attr='``double ticks``'>bar</span>
`<test a="` content of attribute `">`
* Changed the parsing of HTML comments to match simply from `<!--` to `-->`
instead using of the more complicated SGML-style rule with paired `--`.
This is how most browsers parse comments and how XML defines them too.
* `<address>` has been added to the list of block-level elements and is now
treated as an HTML block instead of being wrapped within paragraph tags.
* Now only trim trailing newlines from code blocks, instead of trimming
all trailing whitespace characters.
* Fixed bug where this:
[text](http://m.com "title" )
wasn't working as expected, because the parser wasn't allowing for spaces
before the closing paren.
* Filthy hack to support markdown='1' in div tags.
* _DoAutoLinks() now supports the 'dict://' URL scheme.
* PHP- and ASP-style processor instructions are now protected as
raw HTML blocks.
<? ... ?>
<% ... %>
* Fix for escaped backticks still triggering code spans:
There are two raw backticks here: \` and here: \`, not a code span
1.0.1c (9 Dec 2005)
* Fixed a problem occurring with PHP 5.1.1 due to a small
change to strings variable replacement behaviour in
this version.
1.0.1b (6 Jun 2005)
@ -248,46 +340,6 @@ Version History
filter so that it runs after Markdown.
1.0.2b1 - 5 Mar 2005
* Fix for backticks within HTML tag:
<span attr='`ticks`'>like this</span>
* Fix for escaped backticks still triggering code spans:
There are two raw backticks here: \` and here: \`, not a code span
* Improved integration with WordPress. With WordPress 1.5, the
balenceTags filter now runs after Markdown, so it won't
interfere anymore. You can still disable balanceTags from the admin
interface (in Options > Writing) if you want to.
* PHP Markdown now correctly filter text for excerpts in WordPress.
There is still one glitch: autolinks and tags in code samples are
stripped by WordPress when trimming it. A fix for this is possible
with WordPress 1.5, but would require duplicating WordPress entry
trimming code within Markdown, which I can't do because of a license
issue. (Nor do I think it is a good solution to fix this.)
* Improved Textile compatibility mode. Markdown will now honor the
no-image and the lite parameters. In lite mode, no header, blockquote,
list, or code block will be made, and inline HTML is limited
to the following tags:
<a><i><b><em><strong><sup><sub><code><img><cite><ins><del>
This is acheived by backslash-escaping block markers before sending
text through the Markdown filter.
The improved Textile comatibility means that the Markdown syntax will now
be processed for comments in TextPattern (only for span elements due to
TextPattern using the lite mode for comments). Sadly, due to TextPattern
tag stripping, sample code in code span and auto-links will be stripped
before the Markdown filter can see them. So I guess I should say it
half-work for comments TextPattern.
1.0.1 (16 Dec 2004):
* Changed the syntax rules for code blocks and spans. Previously,

View file

@ -7,12 +7,12 @@
# <http://www.michelf.com/projects/php-markdown/>
#
# Original Markdown
# Copyright (c) 2004-2005 John Gruber
# Copyright (c) 2004-2006 John Gruber
# <http://daringfireball.net/projects/markdown/>
#
define( 'MARKDOWN_VERSION', "1.0.2b7" ); # Sat 16 Sep 2006
define( 'MARKDOWN_VERSION', "1.0.1d" ); # Fri 1 Dec 2006
#
@ -62,7 +62,7 @@ function Markdown($text) {
Plugin Name: Markdown
Plugin URI: http://www.michelf.com/projects/php-markdown/
Description: <a href="http://daringfireball.net/projects/markdown/syntax">Markdown syntax</a> allows you to write using an easy-to-read, easy-to-write plain text format. Based on the original Perl version by <a href="http://daringfireball.net/">John Gruber</a>. <a href="http://www.michelf.com/projects/php-markdown/">More...</a>
Version: 1.0.2b7
Version: 1.0.1d
Author: Michel Fortin
Author URI: http://www.michelf.com/
*/
@ -96,7 +96,7 @@ if (isset($wp_version)) {
# - Scramble important tags before passing them to the kses filter.
# - Run Markdown on excerpt then remove paragraph tags.
if (MARKDOWN_WP_COMMENTS) {
remove_filter('comment_text', 'wpautop');
remove_filter('comment_text', 'wpautop', 30);
remove_filter('comment_text', 'make_clickable');
add_filter('pre_comment_content', 'Markdown', 6);
add_filter('pre_comment_content', 'mdwp_hide_tags', 8);
@ -145,7 +145,7 @@ function identify_modifier_markdown() {
'nicename' => 'Markdown',
'description' => 'A text-to-HTML conversion tool for web writers',
'authors' => 'Michel Fortin and John Gruber',
'licence' => 'GPL',
'licence' => 'BSD-like',
'version' => MARKDOWN_VERSION,
'help' => '<a href="http://daringfireball.net/projects/markdown/syntax">Markdown syntax</a> allows you to write using an easy-to-read, easy-to-write plain text format. Based on the original Perl version by <a href="http://daringfireball.net/">John Gruber</a>. <a href="http://www.michelf.com/projects/php-markdown/">More...</a>'
);
@ -173,6 +173,10 @@ if (strcasecmp(substr(__FILE__, -16), "classTextile.php") == 0) {
if (function_exists('SmartyPants')) $text = SmartyPants($text);
return $text;
}
# Fake restricted version: restrictions are not supported for now.
function TextileRestricted($text, $lite='', $noimage='') {
return $this->TextileThis($text, $lite);
}
# Workaround to ensure compatibility with TextPattern 4.0.3.
function blockLite($text) { return $text; }
}
@ -302,7 +306,7 @@ class Markdown_Parser {
(?:
(?<=\s) # lookbehind for whitespace
["(]
(.+?) # title = $3
(.*?) # title = $3
[")]
[ \t]*
)? # title is optional
@ -692,14 +696,14 @@ class Markdown_Parser {
# These must come last in case you've also got [link test][1]
# or [link test](/foo)
#
$text = preg_replace_callback('{
( # wrap whole match in $1
\[
([^\[\]]+) # link text = $2; can\'t contain [ or ]
\]
)
}xs',
array(&$this, '_doAnchors_reference_callback'), $text);
// $text = preg_replace_callback('{
// ( # wrap whole match in $1
// \[
// ([^\[\]]+) # link text = $2; can\'t contain [ or ]
// \]
// )
// }xs',
// array(&$this, '_doAnchors_reference_callback'), $text);
return $text;
}
@ -841,15 +845,12 @@ class Markdown_Parser {
$whole_match = $matches[1];
$alt_text = $matches[2];
$url = $matches[3];
$title = '';
if (isset($matches[6])) {
$title = $matches[6];
}
$title =& $matches[6];
$alt_text = str_replace('"', '&quot;', $alt_text);
$title = str_replace('"', '&quot;', $title);
$result = "<img src=\"$url\" alt=\"$alt_text\"";
if (isset($title)) {
$title = str_replace('"', '&quot;', $title);
$result .= " title=\"$title\""; # $title already quoted
}
$result .= $this->empty_element_suffix;
@ -1148,22 +1149,23 @@ class Markdown_Parser {
# <strong> must go first:
$text = preg_replace_callback('{
( # $1: Marker
(?<!\*\*) \*\* | # (not preceded by two chars of
(?<!__) __ # the same marker)
)
(?<!\*\*) \* | # (not preceded by two chars of
(?<!__) _ # the same marker)
)
\1
(?=\S) # Not followed by whitespace
(?!\1) # or two others marker chars.
(?!\1\1) # or two others marker chars.
( # $2: Content
(?:
[^*_]+? # Anthing not em markers.
|
# Balence any regular emphasis inside.
([*_]) (?=\S) .+? (?<=\S) \3 # $3: em char (* or _)
\1 (?=\S) .+? (?<=\S) \1
|
(?! \1 ) . # Allow unbalenced * and _.
)+?
)
(?<=\S) \1 # End mark not preceded by whitespace.
(?<=\S) \1\1 # End mark not preceded by whitespace.
}sx',
array(&$this, '_doItalicAndBold_strong_callback'), $text);
# Then <em>:
@ -1207,9 +1209,10 @@ class Markdown_Parser {
$bq = $this->runBlockGamut($bq); # recurse
$bq = preg_replace('/^/m', " ", $bq);
# These leading spaces screw with <pre> content, so we need to fix that:
# These leading spaces cause problem with <pre> content,
# so we need to fix that:
$bq = preg_replace_callback('{(\s*<pre>.+?</pre>)}sx',
array(&$this, '_DoBlockQuotes_callback2'), $bq);
array(&$this, '_DoBlockQuotes_callback2'), $bq);
return $this->hashBlock("<blockquote>\n$bq\n</blockquote>")."\n\n";
}
@ -1245,52 +1248,46 @@ class Markdown_Parser {
#
# Unhashify HTML blocks
#
// foreach ($grafs as $key => $value) {
// if (isset( $this->html_blocks[$value] )) {
// $grafs[$key] = $this->html_blocks[$value];
// }
// }
foreach ($grafs as $key => $graf) {
# Modify elements of @grafs in-place...
if (isset($this->html_blocks[$graf])) {
$block = $this->html_blocks[$graf];
$graf = $block;
if (preg_match('{
\A
( # $1 = <div> tag
<div \s+
[^>]*
\b
markdown\s*=\s* ([\'"]) # $2 = attr quote char
1
\2
[^>]*
>
)
( # $3 = contents
.*
)
(</div>) # $4 = closing tag
\z
}xs', $block, $matches))
{
list(, $div_open, , $div_content, $div_close) = $matches;
# We can't call Markdown(), because that resets the hash;
# that initialization code should be pulled into its own sub, though.
$div_content = $this->hashHTMLBlocks($div_content);
# Run document gamut methods on the content.
foreach ($this->document_gamut as $method => $priority) {
$div_content = $this->$method($div_content);
}
$div_open = preg_replace(
'{\smarkdown\s*=\s*([\'"]).+?\1}', '', $div_open);
$graf = $div_open . "\n" . $div_content . "\n" . $div_close;
}
// if (preg_match('{
// \A
// ( # $1 = <div> tag
// <div \s+
// [^>]*
// \b
// markdown\s*=\s* ([\'"]) # $2 = attr quote char
// 1
// \2
// [^>]*
// >
// )
// ( # $3 = contents
// .*
// )
// (</div>) # $4 = closing tag
// \z
// }xs', $block, $matches))
// {
// list(, $div_open, , $div_content, $div_close) = $matches;
//
// # We can't call Markdown(), because that resets the hash;
// # that initialization code should be pulled into its own sub, though.
// $div_content = $this->hashHTMLBlocks($div_content);
//
// # Run document gamut methods on the content.
// foreach ($this->document_gamut as $method => $priority) {
// $div_content = $this->$method($div_content);
// }
//
// $div_open = preg_replace(
// '{\smarkdown\s*=\s*([\'"]).+?\1}', '', $div_open);
//
// $graf = $div_open . "\n" . $div_content . "\n" . $div_close;
// }
$grafs[$key] = $graf;
}
}
@ -1403,21 +1400,23 @@ class Markdown_Parser {
function tokenizeHTML($str) {
#
# Parameter: String containing HTML markup.
# Parameter: String containing HTML + Markdown markup.
# Returns: An array of the tokens comprising the input
# string. Each token is either a tag (possibly with nested,
# tags contained therein, such as <a href="<MTFoo>">, or a
# run of text between tags. Each element of the array is a
# string. Each token is either a tag or a run of text
# between tags. Each element of the array is a
# two-element array; the first is either 'tag' or 'text';
# the second is the actual value.
# Note: Takes code spans into account and does not generate tag
# tokens inside code spans.
# Note: Markdown code spans are taken into account: no tag token is
# generated within a code span.
#
$tokens = array();
while ($str != "") {
#
#
# Each loop iteration seach for either the next tag or the next
# openning code span marker. If a code span marker is found, the
# code span is extracted in entierty and will result in an extra
# text token.
#
$parts = preg_split('{
(
@ -1496,7 +1495,8 @@ class Markdown_Parser {
unset($blocks[0]); # Do not add first block twice.
foreach ($blocks as $block) {
# Calculate amount of space, insert spaces, insert block.
$amount = $this->tab_width - strlen($line) % $this->tab_width;
$amount = $this->tab_width -
mb_strlen($line, 'UTF-8') % $this->tab_width;
$line .= str_repeat(" ", $amount) . $block;
}
$text .= "$line\n";
@ -1558,73 +1558,7 @@ Version History
See the readme file for detailed release notes for this version.
1.0.2b7 (16 Sep 2006)
* Changed span and block gamut methods so that they loop over a
customizable list of methods. This makes subclassing the parser a more
interesting option for creating syntax extensions.
* Also added a "document" gamut loop which can be used to hook document-level
methods (like for striping link definitions).
* Changed all methods which were inserting HTML code so that they now return
a hashed representation of the code. New methods `hashSpan` and `hashBlock`
are used to hash respectivly span- and block-level generated content. This
has a couple of significant effects:
1. It prevents invalid nesting of Markdown-generated elements which
could occur occuring with constructs like `*something [link*][1]`.
2. It prevents problems occuring with deeply nested lists on which
paragraphs were ill-formed.
3. It removes the need to call `hashHTMLBlocks` twice during the the
block gamut.
Hashes are turned back to HTML prior output.
* Made the block-level HTML parser smarter using a specially-crafted regular
expression capable of handling nested tags.
* Solved backtick issues in tag attributes by rewriting the HTML tokenizer to
be aware of code spans. All these lines should work correctly now:
<span attr='`ticks`'>bar</span>
<span attr='``double ticks``'>bar</span>
`<test a="` content of attribute `">`
* `<address>` has been added to the list of block-level elements and is now
treated as an HTML block instead of being wrapped within paragraph tags.
* Now only trim trailing newlines from code blocks, instead of trimming
all trailing whitespace characters.
* Fixed bug where this:
[text](http://m.com "title" )
wasn't working as expected, because the parser wasn't allowing for spaces
before the closing paren.
* Filthy hack to support markdown='1' in div tags.
* _DoAutoLinks() now supports the 'dict://' URL scheme.
* PHP- and ASP-style processor instructions are now protected as
raw HTML blocks.
<? ... ?>
<% ... %>
* Experimental support for [this] as a synonym for [this][].
* Fix for escaped backticks still triggering code spans:
There are two raw backticks here: \` and here: \`, not a code span
1.0.1oo (19 May 2006)
* Converted PHP Markdown to a object-oriented design.
1.0.1d (1 Dec 2006)
1.0.1c (9 Dec 2005)
@ -1654,7 +1588,7 @@ Copyright (c) 2004-2006 Michel Fortin
<http://www.michelf.com/>
All rights reserved.
Copyright (c) 2003-2004 John Gruber
Copyright (c) 2003-2006 John Gruber
<http://daringfireball.net/>
All rights reserved.