Regex to remove specific tags

  • Thread starter Thread starter Kua
  • Start date Start date

Kua

Kua

Associate
Joined
21 Jul 2008
Posts
512
Location
Lancaster
Basically my housemate is a linguistics PHD student looking at some kind of NLP. She has a huge XML document which marks up a UN committee meeting or something. She wants to remove the following tags (and their content):

<tuv xml:lang="AR">....</tuv>
<tuv xml:lang="FR">....</tuv>
<tuv xml:lang="ES">....</tuv>

Can you think of a regex to do that?

I'll be using Java, String.replaceAll() if it makes any difference.
 
You don't use regex to manipulate XML documents. Use JAXP or something similar, select the nodes with XPath and remove them, then save out again.
 
Back
Top Bottom