Regex to remove specific tags

  • Thread starter Thread starter Kua
  • Start date Start date

Kua

Kua

Associate
Joined
21 Jul 2008
Posts
512
Location
Lancaster
Basically my housemate is a linguistics PHD student looking at some kind of NLP. She has a huge XML document which marks up a UN committee meeting or something. She wants to remove the following tags (and their content):

<tuv xml:lang="AR">....</tuv>
<tuv xml:lang="FR">....</tuv>
<tuv xml:lang="ES">....</tuv>

Can you think of a regex to do that?

I'll be using Java, String.replaceAll() if it makes any difference.
 
Back
Top Bottom