Basically my housemate is a linguistics PHD student looking at some kind of NLP. She has a huge XML document which marks up a UN committee meeting or something. She wants to remove the following tags (and their content):
<tuv xml:lang="AR">....</tuv>
<tuv xml:lang="FR">....</tuv>
<tuv xml:lang="ES">....</tuv>
Can you think of a regex to do that?
I'll be using Java, String.replaceAll() if it makes any difference.
<tuv xml:lang="AR">....</tuv>
<tuv xml:lang="FR">....</tuv>
<tuv xml:lang="ES">....</tuv>
Can you think of a regex to do that?
I'll be using Java, String.replaceAll() if it makes any difference.