Using regular expressions to remove text

AndrewP · 28 Nov 2007 at 19:42

Code:

sed 's/\([0-9]\{2\}\/\)\{2\}[0-9]\{2\}[ \t]\+\(.*\)[ \t]\+\$.*/\2/g'

Turns

Code:

28/11/07          SPENT ON STUFF       $5 
28/11/07          834637       $1,035

into

Code:

SPENT ON STUFF
834637

It will match a dd/mm/yy format followed by tabs or spaces then the account details followed by more tabs/whitespace ending with a dollar sign and then the remainder of the line and replace it with the account details (matched in brackets as region 2). Pipe your grep into the sed command and that should be it

.

Edit: I'm not clear what you need any more after re-reading the OP :confused:

DarkShadow · 28 Nov 2007 at 20:04

You got the requirements right

However, it doesn't seem to be working here! It's a lot longer and complicated than I initially imagined!

AndrewP · 28 Nov 2007 at 20:15

Sed commands can get that way, you could try removing parts of the match and seeing where it fails? I checked it on my system (sed 4.1.2), and it works fine - though if there's a different kind of whitespace in the original that would break it, for example.

My results:

Code:

[andrew@server tmp]$ cat sed.txt
28/11/07          SPENT ON STUFF       $5
28/11/07          834637       $1,035
[andrew@server tmp]$ cat sed.txt | sed 's/\([0-9]\{2\}\/\)\{2\}[0-9]\{2\}[ \t]\+\(.*\)[ \t]\+\$.*/\2/g'
SPENT ON STUFF
834637

DarkShadow · 28 Nov 2007 at 20:32

AndrewP · 28 Nov 2007 at 20:45

You could actually just pull the details out with grep then, try

Code:

grep -vP "^[0-9]" /path/to/file | grep -v '\$' | grep -oP '[a-zA-Z0-9 ]{1,}'

i.e. a line not starting with a number (the date ones) and not containing a $ (the balance/cost ones) and then match the alphanumeric details.

DarkShadow · 28 Nov 2007 at 22:59

Completely nailed it, that sensation of accomplishment is just aahhhhhhhhhhhhhhhh