
|
View Full Version : Issue with new line to <p> function
Aeron 01-06-2010, 02:12 PM Hi all,
I've been working on deploying a new-line-to-paragraph function, as I want better formatting of my paragraphs than a simple nl2br() function would provide.
The code I'm using is as follows:
function nl2p($string, $line_breaks = true, $xml = true)
{
// Remove existing HTML formatting to avoid double-wrapping things
$string = str_replace(array('<p>', '</p>', '<br>', '<br />'), '', $string);
// It is conceivable that people might still want single line-breaks
// without breaking into a new paragraph.
if ($line_breaks == true)
return '<p>'.preg_replace(array("/([\n]{2,})/i", "/([^>])\n([^<])/i"), array("</p>\n<p>", '<br'.($xml == true ? ' /' : '').'>'), trim($string)).'</p>';
else
return '<p>'.preg_replace("/([\n]{1,})/i", "</p>\n<p>", trim($string)).'</p>';
}
While that works, it also seems to add an extra set of <p></p> tags in between the paragraphs it creates. So it ends up looking something like this:
<p>
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam faucibus enim in mauris fermentum feugiat id ut purus. Proin ut erat velit, quis pulvinar justo.
</p>
<p>
</p>
<p>
Pellentesque hendrerit dapibus malesuada. Fusce scelerisque porta nisi, cursus ultricies neque laoreet a. In eleifend consectetur dui nec mattis.
</p>
As far as the source text goes, it only has two /n breaks in it between paragraphs. I guess my question, is if there is a way to edit the function so that it only provides the paragraph cutting on double /n/n, rather than on a single /n.
Thanks in advance for your help.
mattle 01-06-2010, 02:30 PM This may not be matching the way you think it is...can you copy in an example haystack, and the results of this:
preg_match("/([\n]{2,})/", $haystack, $matches); // i modifier is irrelevant here
print_r($matches);
On an aside, depending on your application, you may want to convert \r\n to \n and then \r to \n to accommodate Windows and Mac users. If that was the problem here, however, I would expect that you would not be getting any matches.
I checked your regex here: http://www.regextester.com/, against a very simple string and it seems to work as expected...
Aeron 01-06-2010, 06:02 PM Hi Mattle,
I'm not familiar with regex at all, and I wasn't able to see what you were trying to do with your response, as your code varied so greatly from the original example I posted.
You were using the preg_match() function, while the script I'm using uses the preg_replace() function. Is there a big difference between the two?
Would it be possible to show me how your changes would work within the original function I posted?
The website this is being used on is at goldenshine.com/blog/. If you view the source, you can see that extraneous paragraph problem. Once again, the original text only has as many line breaks between paragraphs as each of the paragraphs in this comment, so it should be able to parse it without creating that extra <p></p> tag. So far the only way I've been able to get it to work is to resort to using single line breaks in the raw text I enter into the database, but that's really not ideal.
mattle 01-06-2010, 06:19 PM You should definitely check out the docs here.
Basically, preg_replace will find a pattern and make a replacement, preg_match just test the string for matching occurrences of the pattern and dumps them into the $matches array.
I'm not giving you production code here, but rather some debugging code that might help you discover why the regular expression isn't working as desired. The first step to debugging a replace is to find out what it's matching. If it's preforming the replacement twice, stands to reason it's matching something twice...I'm just trying to get a little more perspective on what's going on. Here's a basic example you can use to get an idea of what I'm talking about:
# php -r 'preg_match_all("/a/", "abcda", $matches); print_r($matches);'
Array
(
[0] => Array
(
[0] => a
[1] => a
)
)
This is telling me two things...I have one matching condition (the whole regular expression /a/) and that it matched twice in the string "abcda".
By contrast,
# php -r 'preg_match_all("/^a(b)/", "abcda", $matches); print_r($matches);'
Array
(
[0] => Array
(
[0] => ab
)
[1] => Array
(
[0] => b
)
)
Tells me that I have two matching expressions (/^ab/ and (b)), and that I matched the first with "ab" and the second with "b".
Also, I realized that I erred in my previous post, you should use preg_match_all(). preg_match() will stop after the first successful match, thus negating our ability to debug a regex that is matching more times than we would like!
Aeron 01-06-2010, 06:20 PM So you need me to run that code on my server and tell you the outcome?
Aeron 01-07-2010, 11:51 AM Sorry, I just don't know enough about this to be able to run with what you posted.
If it helps at all, the function parses the text fine if the paragraphs are separated by only one line break. Like so:
This is paragraph 1, consectetur adipiscing elit. Duis in leo metus. Maecenas imperdiet purus sed enim hendrerit sit amet auctor libero dignissim. Donec mollis pellentesque auctor. Vestibulum laoreet ante viverra ipsum tempus gravida interdum erat dictum. Duis gravida mollis tortor.
This is paragraph 2, vitae vestibulum lectus facilisis ac. Nam porttitor sagittis ante, eget tempus nulla faucibus a. Suspendisse non elit elit.
My problem, is that if you use the standard double line break to separate paragraphs, it inserts and extra set of <p></p> tags in there. So I'm guessing the logic needs to execute the paragraph wrap on double line breaks and not only single.
If someone could look at the original function and give me a working solution I'd really appreciate it.
foobic 01-07-2010, 06:40 PM The original code does include provision for converting multiple line breaks to a single paragraph separator, so someone (preferably you ;)) is going to need to debug exactly why it doesn't work with the information you're feeding it. I'd suspect some kind of whitespace character between the line breaks, including Windows \r\n line breaks as mattie mentioned earlier.
Put the code from post #2 into your function temporarily and see what it spits out.
(Edit: $haystack is $string in your function, of course)
Aeron 01-08-2010, 07:05 PM Sorry, I just don't think I understand this language well enough to know what to do with that code.
I posted it as-is, and the who top line is commented out because of the hash tag.
Can you put the debug code into the code I originally posted, and then I'll run it and give you guys the output?
Thanks for your help, and sorry for the lack of comprehension, I know nothing about regex.
foobic 01-08-2010, 09:40 PM function nl2p($string, $line_breaks = true, $xml = true)
{
// Remove existing HTML formatting to avoid double-wrapping things
$string = str_replace(array('<p>', '</p>', '<br>', '<br />'), '', $string);
preg_match_all("/([\n]{2,})/", $string, $matches); // i modifier is irrelevant here
print_r($matches);
// It is conceivable that people might still want single line-breaks
// without breaking into a new paragraph.
if ($line_breaks == true)
return '<p>'.preg_replace(array("/([\n]{2,})/i", "/([^>])\n([^<])/i"), array("</p>\n<p>", '<br'.($xml == true ? ' /' : '').'>'), trim($string)).'</p>';
else
return '<p>'.preg_replace("/([\n]{1,})/i", "</p>\n<p>", trim($string)).'</p>';
}
The print_r output may appear somewhere unexpected, probably at or near the top of the page, but you should see it in the page source.
Try ([\n]{1,}) also, and ([\r\n]{3,}).
Aeron 01-08-2010, 10:26 PM Ok, I've run the script with the different regex combinations, and I've included the results below.
----
([\r\n]{3,}) gives me the following output:
Array( [0] => Array ( [0] => ) [1] => Array ( [0] => ) )
----
([\n]{1,}) gives me the following output:
Array ( [0] => Array ( [0] => [1] => ) [1] => Array ( [0] => [1] => ) )
----
([\n]{2,}) gives me the following output:
Array ( [0] => Array ( ) [1] => Array ( ) )
----
Does that help tell you anything?
foobic 01-09-2010, 09:08 PM So:
([\n]{2,}) doesn't match at all (no match to \n\n)
([\n]{1,}) matches twice (two matches to \n, but not together)
([\r\n]{3,}) matches once (one match of a mixture of \r and \n, all together)
It looks like \r is your problem. You could probably just replace ([\n]{2,}) with ([\r\n]{3,}) in the original routine, but that would fail for non-Windows systems, so I guess what you really need is both, ie. something like:
return '<p>'.preg_replace(
array("/([\n]{2,})/i", "/([\r\n]{3,})/i","/([^>])\n([^<])/i"),
array("</p>\n<p>", "</p>\n<p>", '<br'.($xml == true ? ' /' : '').'>'),
trim($string)).'</p>';
tim2718281 01-10-2010, 01:17 AM I 'm not sure I understand the specification.
If the OP were to state the requirement, it would not take long to produce code to do it ... far easier than wrestling with someone else's code that doesn't work.
I am guessing the following:
1) Any HTML sequences for paragraph, paragraph end, break, and break end in the input are to removed
2) If the parameter line_breaks is true, then single newline sequences in the input are to be replaced by <br />
3) The output string is to be divided into HTML paragraphs:
<p>part1</p><p>part2</p> ... etc
where the divisions between parts are represented by one or more newline sequences in the input. The newline sequences are to be discarded.
So now we need to know all the possible newline sequences.
Also. is the program required to correctly parse paragraph tags, etc, or specifically the sequence <p>
That is, what should the program do with <p > and so on
Aeron 01-10-2010, 03:28 PM @foobic
That last snippet of code worked perfectly. At least as far as I can tell.
Thanks for your help, it's much appreciated.
mattle 01-11-2010, 12:54 PM @foobic
That last snippet of code worked perfectly. At least as far as I can tell.
Thanks for your help, it's much appreciated.
You may want to test foobic's code on a Mac. IIRC, Safari (and all Aqua apps?) are still using just '\r' for line breaks (even though '\n' is used in the BSD-subsystem files). If that's the case, text entered on a Mac may not match the regex given.
|