Using regular expressions to remove text

Code:
sed 's/\([0-9]\{2\}\/\)\{2\}[0-9]\{2\}[ \t]\+\(.*\)[ \t]\+\$.*/\2/g'

Turns
Code:
28/11/07          SPENT ON STUFF       $5 
28/11/07          834637       $1,035

into

Code:
SPENT ON STUFF
834637

It will match a dd/mm/yy format followed by tabs or spaces then the account details followed by more tabs/whitespace ending with a dollar sign and then the remainder of the line and replace it with the account details (matched in brackets as region 2). Pipe your grep into the sed command and that should be it :).

Edit: I'm not clear what you need any more after re-reading the OP :confused:
 
Last edited:
Sed commands can get that way, you could try removing parts of the match and seeing where it fails? I checked it on my system (sed 4.1.2), and it works fine - though if there's a different kind of whitespace in the original that would break it, for example.

My results:
Code:
[andrew@server tmp]$ cat sed.txt
28/11/07          SPENT ON STUFF       $5
28/11/07          834637       $1,035
[andrew@server tmp]$ cat sed.txt | sed 's/\([0-9]\{2\}\/\)\{2\}[0-9]\{2\}[ \t]\+\(.*\)[ \t]\+\$.*/\2/g'
SPENT ON STUFF
834637
 
You could actually just pull the details out with grep then, try
Code:
grep -vP "^[0-9]" /path/to/file | grep -v '\$' | grep -oP '[a-zA-Z0-9 ]{1,}'
i.e. a line not starting with a number (the date ones) and not containing a $ (the balance/cost ones) and then match the alphanumeric details.
 
Last edited:
Back
Top Bottom