removing line breaks from inside sentences

Associate
Joined
18 Oct 2002
Posts
1,752
Location
Southern England
hi all...

ok so i'm pulling some data out into a string for adding to a word document. the data is in text format but instead of the strings coming out as expected i get strings with sentences with line breaks that are not at the end of the line.


for example

Expecting this:

"the cat sat on the mat. The cat likes milk."

but i get this:
"the cat
sat on
the mat. the cat likes milk."



i'd like to remove the line breaks from inside the sentences but retain the line breaks at the end of the line where the full stops are.

I've been playing with regular expressions but just can't get it working correctly!

Any pointers?

VB.Net by the way.
 
Well a line break character in regex is \n, so what you are saying is you want to remove all the lines breaks except the ones after full stops (and/or double quotes)?

Not sure how to do it in VB.net but in php you could do something like (VB.net will have some sort of similar function I expect):

Code:
$formatedDocument = preg_replace(/[^\.\"]\n/, '', $inputFile)

http://php.net/manual/en/function.preg-replace.php

That might be wrong as I've not tested it but what that should do if it works is remove all lines breaks except the ones preceded with full stops or double quotes.

The square brackets denotes a character class which has the . and the doubles quotes in it (escaped with backslashes to void any special meaning). The ^ negates that class. The next bit is the character for a line break (\n).
 
swap that \n for (\r|\n)+ so it captures line breaks (\r) and line feeds (\n), and you also want to not capture the preceding character, so use a negating match with non-capturing group:

Code:
Dim result as String = Regex.Replace(input, "(?!(?:\.|"))(\r|\n)+", String.Empty)
 
Last edited:
Back
Top Bottom