How would I do this - xml search related

A[L]C · 19 Oct 2010 at 18:26

Hi all,

Ive got about 3500 entries which look like

<WAGN>
<mutation>c</mutation>
<nummer>150686228</nummer>
<code>YE03VC</code>
<omschr>EURO</omschr>
<kentek>YE03VDC</kentek>
<ORGAcode>UKNC1</ORGAcode>
</WAGN>

I want to search the <kentek> tag for spaces.

How would I go about doing that?

rpstewart · 19 Oct 2010 at 18:52

Is the file formatted with a tag per line?

When you say search for spaces do you just want to find the particular tags containing spaces or the full record?

Do you have access to a grep (or equivalent) command? Should be reasonably straightforward to pick out the individual tags although it might be more complex if you need the full record.

A[L]C · 19 Oct 2010 at 18:57

basically I need to delete any spaces in that tag. Yes its one tag per line. Dont know what grep is sorry?

rpstewart · 19 Oct 2010 at 19:02

Grep is a text search utility, traditionally it's a UNIX command but there's a GNU port for Windows. If you need to remove spaces from the tag then it becomes a bit trickier.

Gimme a few minutes to have a think how to do this

rpstewart · 19 Oct 2010 at 19:04

Do any of the other tags contain spaces which you need to retain?

A[L]C · 19 Oct 2010 at 19:05

thanks dude.

A[L]C · 19 Oct 2010 at 19:05

yes they do unfortunately or I could do a search replace for spaces across it all

HungryHippos · 19 Oct 2010 at 19:25

You could use VBScript and the FileSystemObject to do this as well I think, but it could get a little complicated, and depends on how the content exists. Can you confirm if this is all in one big file or is it in multiple files each containing content like you listed above?

A[L]C · 19 Oct 2010 at 19:29

one big file.

rpstewart · 19 Oct 2010 at 19:36

I think this should be possible in a text editor which supports search and replace using regular expressions however REs aren't my strong point so it's taking me a while to work out.

bledd · 19 Oct 2010 at 19:39

Open it in Excel, if it falls in column C, select C, then do a find/reaplce for "space" with "" (ie nothing)

A[L]C · 19 Oct 2010 at 19:41

Cant do that mate as some of the other fields have spaces.

<?xml version="1.0" encoding="US-ASCII"?>
<CLEAR>
<WAGN>
<mutation>c</mutation>
<nummer>31</nummer>
<code>1247</code>
<omschr>Mercedes Artic</omschr>
<kentek>L801DOV</kentek>
<ORGAcode>UKMC0</ORGAcode>
</WAGN>
<WAGN>
<mutation>c</mutation>
<nummer>32</nummer>
<code>1248</code>
<omschr>Mercedes Artic</omschr>
<kentek>L802DOV</kentek>
<ORGAcode>UKMC0</ORGAcode>
</WAGN>
<WAGN>
<mutation>c</mutation>
<nummer>36</nummer>
<code>4039</code>
<omschr>Mercedes Rollonoff</omschr>
<kentek>M171LOK</kentek>
<ORGAcode>UKMC0</ORGAcode>
</WAGN>
<WAGN>
<mutation>c</mutation>
<nummer>59</nummer>
<code>5048</code>
<omschr>Volvo REL</omschr>
<kentek>M351POF</kentek>
<ORGAcode>UKMC0</ORGAcode>
</WAGN>
<WAGN>
<mutation>c</mutation>
<nummer>63</nummer>
<code>4048</code>
<omschr>Mercedes Rollonoff</omschr>
<kentek>N204AOF</kentek>
<ORGAcode>UKMC0</ORGAcode>
</WAGN>
</CLEAR>

rpstewart · 19 Oct 2010 at 20:35

Right, after much wailing and gnashing of teeth I think I've cracked it. You'll need a copy of Notepad++ but it's freeware so no problems there.

Open the XML file then
1) Search -> Replace
2) Make sure the Search Mode is set to Regular Expression
3) Find: (<kentek>.*)\s
4) Replace with: \1

Then start searching and replacing as normal

A[L]C · 19 Oct 2010 at 20:47

check you out! nice one thank you very much!!!