PHP HTML Entities

Soldato
Joined
16 Dec 2005
Posts
14,443
Location
Manchester
I have created my own function for converting BBCode into XHTML, as can be seen below. I have successfully managed to convert all BBCode tags and also add paragraph <p> tags where necessary.

The next step is to convert quotes, brackets <> and other characters to their corresponding HTML Entity. However, I am unsure as to how to do this. I know about the htmlspecialchars() and htmlspecialentities() functions but I don't think they would work well with existing HTML.

For instance, it would end up converting quotes inside or <a> tags would convert angle br...e { return false; } } [/CODE]
 
This is the function i compiled.

Might give you some ideas:

Code:
[noparse]
<?php

function Format($str)
{
  $str = htmlentities($str);
  $str = bbcode_format($str);
  $str = bbcode_quote($str);
  $str = bbcode_img($str);
  $str = find_url($str);
  $str = nl2p($str);
  return $str;
}

function bbcode_format($str)
{
  $simple_search = array('/\[b\](.*?)\[\/b\]/is','/\[i\](.*?)\[\/i\]/is','/\[u\](.*?)\[\/u\]/is','/\[url\=(.*?)\](.*?)\[\/url\]/is');
  $simple_replace = array('<b>$1</b>','<i>$1</i>','<u>$1</u>','<a href="$1" target="_blank">$2</a>');
  $str = preg_replace ($simple_search, $simple_replace, $str);
  return $str;
}

function bbcode_quote($str)
{
  $open = '</p><p class="quote">';
  $close = '</p>';
  preg_match_all ('/\[quote\]/i', $str, $matches);
  $opentags = count($matches['0']);
  preg_match_all ('/\[\/quote\]/i', $str, $matches);
  $closetags = count($matches['0']);
  $unclosed = $opentags - $closetags;
  for($i = 0; $i < $unclosed; $i++)
  {
    $str .= '</p>';
  }
  $str = str_replace ('[quote]', $open, $str);
  $str = str_replace ('[/quote]', $close, $str);
  return $str;
}

function bbcode_img($str)
{
  $open = '<img src="';
  $close = '" alt="" />';
  preg_match_all ('/\[img\]/i', $str, $matches);
  $opentags = count($matches['0']);
  preg_match_all ('/\[\/img\]/i', $str, $matches);
  $closetags = count($matches['0']);
  $unclosed = $opentags - $closetags;
  for($i = 0; $i < $unclosed; $i++)
  {
    $str .= '" alt="" />';
  }
  $str = str_replace ('[img]', $open, $str);
  $str = str_replace ('[/img]', $close, $str);
  return $str;
}

function nl2p($str)
{
	$str = "<p>" . str_replace("\r\n","<br />", $str);
	$str = str_replace("<br /><br />","</p><p>",$str) . "</p>";
	$str = str_replace("<p><br /></p>","",$str);
	$str = str_replace("<p></p>","",$str);
	$str = str_replace("<p><p","<p",$str);
	$str = str_replace("</p></p>","</p>",$str);
	$str = str_replace("<p class=\"quote\">","<p class=\"quote\">\r\n\t",$str);
	$str = str_replace("<p","\t\t\t\t\t" . "<p",$str);
	$str = str_replace("</p>","</p>\r\n",$str);
	
	return $str;
}

function find_url($string)
{
//"www."
   $pattern_preg1 = '#(^|\s)(www|WWW)\.([^\s<>\.]+)\.([^\s\n<>]+)#sm';
   $replace_preg1 = '\\1<a href="http://\\2.\\3.\\4" target="_blank">\\2.\\3.\\4</a>';

//"http://"
   $pattern_preg2 = '#(^|[^\"=\]]{1})(http|HTTP|ftp|Http)(s|S)?://([^\s<>\.]+)\.([^\s<>]+)#sm';
   $replace_preg2 = '\\1<a href="\\2\\3://\\4.\\5" target="_blank">\\2\\3://\\4.\\5</a>';
  
   $string = preg_replace($pattern_preg1, $replace_preg1, $string);
   $string = preg_replace($pattern_preg2, $replace_preg2, $string);

   return $string;
}

?>
[/noparse]
 
Last edited:
Looks good, and a lot more comprehensive. However, could you tell me how it deals with quotes in the BBCode, such as [noparse]blahblah[/noparse] and so on?

If I were to do htmlspecialchars on the above I would get:

Code:
[noparse][url=& quot ;blahblah& quot ;][/url][/noparse]
Similarly if I did htmlspecialchars on XHTML it would wreck all the tags.
 

I think you mean to do a new thread?


Looks good, and a lot more comprehensive. However, could you tell me how it deals with quotes in the BBCode, such as [noparse]blahblah[/noparse] and so on?

If I were to do htmlspecialchars on the above I would get:

Code:
[noparse][url=& quot ;blahblah& quot ;][/url][/noparse]
Similarly if I did htmlspecialchars on XHTML it would wreck all the tags.

Well I never knew you put quotes in the BB Tag and have never done that myself, so I don't have that to deal with i suppose :p
 
Just been looking and I notice you don't need to put quotes in! :p

So all I have to do is whack in a htmlspecialchars at the beginning of my function and im sorted. Amazing what you overlook.
 
The only reliable way to do this type of parsing is to formulate a stack based (or if you fancy a challenge, a stackless) tokeniser.
 
Back
Top Bottom