How would I do this - xml search related

Soldato
Joined
18 Oct 2002
Posts
7,515
Location
Maidenhead
Hi all,

Ive got about 3500 entries which look like

<WAGN>
<mutation>c</mutation>
<nummer>150686228</nummer>
<code>YE03VC</code>
<omschr>EURO</omschr>
<kentek>YE03VDC</kentek>
<ORGAcode>UKNC1</ORGAcode>
</WAGN>

I want to search the <kentek> tag for spaces.

How would I go about doing that?
 
Is the file formatted with a tag per line?

When you say search for spaces do you just want to find the particular tags containing spaces or the full record?

Do you have access to a grep (or equivalent) command? Should be reasonably straightforward to pick out the individual tags although it might be more complex if you need the full record.
 
basically I need to delete any spaces in that tag. Yes its one tag per line. Dont know what grep is sorry?
 
Grep is a text search utility, traditionally it's a UNIX command but there's a GNU port for Windows. If you need to remove spaces from the tag then it becomes a bit trickier.

Gimme a few minutes to have a think how to do this
 
You could use VBScript and the FileSystemObject to do this as well I think, but it could get a little complicated, and depends on how the content exists. Can you confirm if this is all in one big file or is it in multiple files each containing content like you listed above?
 
I think this should be possible in a text editor which supports search and replace using regular expressions however REs aren't my strong point so it's taking me a while to work out.
 
Open it in Excel, if it falls in column C, select C, then do a find/reaplce for "space" with "" (ie nothing)
 
Cant do that mate as some of the other fields have spaces.

<?xml version="1.0" encoding="US-ASCII"?>
<CLEAR>
<WAGN>
<mutation>c</mutation>
<nummer>31</nummer>
<code>1247</code>
<omschr>Mercedes Artic</omschr>
<kentek>L801DOV</kentek>
<ORGAcode>UKMC0</ORGAcode>
</WAGN>
<WAGN>
<mutation>c</mutation>
<nummer>32</nummer>
<code>1248</code>
<omschr>Mercedes Artic</omschr>
<kentek>L802DOV</kentek>
<ORGAcode>UKMC0</ORGAcode>
</WAGN>
<WAGN>
<mutation>c</mutation>
<nummer>36</nummer>
<code>4039</code>
<omschr>Mercedes Rollonoff</omschr>
<kentek>M171LOK</kentek>
<ORGAcode>UKMC0</ORGAcode>
</WAGN>
<WAGN>
<mutation>c</mutation>
<nummer>59</nummer>
<code>5048</code>
<omschr>Volvo REL</omschr>
<kentek>M351POF</kentek>
<ORGAcode>UKMC0</ORGAcode>
</WAGN>
<WAGN>
<mutation>c</mutation>
<nummer>63</nummer>
<code>4048</code>
<omschr>Mercedes Rollonoff</omschr>
<kentek>N204AOF</kentek>
<ORGAcode>UKMC0</ORGAcode>
</WAGN>
</CLEAR>
 
Right, after much wailing and gnashing of teeth I think I've cracked it. You'll need a copy of Notepad++ but it's freeware so no problems there.

Open the XML file then
1) Search -> Replace
2) Make sure the Search Mode is set to Regular Expression
3) Find: (<kentek>.*)\s
4) Replace with: \1

Then start searching and replacing as normal
 
Last edited:
Back
Top Bottom