Regex / sed experts

Soldato
Joined
22 Aug 2005
Posts
8,968
Location
Clydebank
Hi all

I have this in a text file :

Code:
abcdefghijkl abcdefghijk,BATCH040605_1,AB/2009/00123,username,05 Jun 2009
abcdefghijkl abcdefghijk,BATCH040605_2,AB/2009/00123,username,05 Jun 2009
abcdefghijkl abcdefghijk,BATCH040605_11,AB/2009/00123,username,05 Jun 2009
abcdefghijkl abcdefghijk,BATCH040605_12,AB/2009/00123,username,05 Jun 2009
abcdefghijkl abcdefghijk,BATCH040605_,AB/2009/00123,username,05 Jun 2009

And would like to remove the " underscore part" i.e. BATCH040506_12 should be BATCH040506 etc.

I can do it but not with the last one that doesn't have a charater after.

Assume that there can be none or more than one charater after the underscore, and could be a letter or number.

This is what I have been trying so far:

Code:
 cat OUT.TXT | grep _ | sed -e "s/_[0-9A-Z]*//g"


Hmmm in trying to find the code I was using, that seems to actually work. Can anyone see any issue with that?
 
I don't understand what you mean, it works for me?

Code:
www:/downloads# cat out.txt
abcdefghijkl abcdefghijk,BATCH040605_1,AB/2009/00123,username,05 Jun 2009
abcdefghijkl abcdefghijk,BATCH040605_2,AB/2009/00123,username,05 Jun 2009
abcdefghijkl abcdefghijk,BATCH040605_11,AB/2009/00123,username,05 Jun 2009
abcdefghijkl abcdefghijk,BATCH040605_12,AB/2009/00123,username,05 Jun 2009
abcdefghijkl abcdefghijk,BATCH040605_,AB/2009/00123,username,05 Jun 2009
www:/downloads# cat out.txt | grep _ | sed -e "s/_[0-9A-Z]*//g"
abcdefghijkl abcdefghijk,BATCH040605,AB/2009/00123,username,05 Jun 2009
abcdefghijkl abcdefghijk,BATCH040605,AB/2009/00123,username,05 Jun 2009
abcdefghijkl abcdefghijk,BATCH040605,AB/2009/00123,username,05 Jun 2009
abcdefghijkl abcdefghijk,BATCH040605,AB/2009/00123,username,05 Jun 2009
abcdefghijkl abcdefghijk,BATCH040605,AB/2009/00123,username,05 Jun 2009
 
Yeah it was kind of a fail on my part.. I wrote the post and then figured it out just before I posted it.

It's actually part of a bigger script -:

Code:
cat Input.csv | sed -e "s/[^a-zA-Z0-9_\/, ]//g" -e "s/  \+//g" > OUT.TXT

I think it's some weird kind of unicode CSV database dump. Basically it removes everything that's not a letter, number, _, slash (/) or comma and then removes 2 or more spaces to leave only spaces between words....

seems to be loads of NUL characters between the commas and stuff, anyway the above works.

Shoud be farily straight forward to convert that to PHP as part of the file import step.

Danger stat, have you verified that regex? I tried that and it doesn't work in my version of sed here v 3.02 GNU on win32. But Perl may be different, but I would have expected it to work as that was along the lines of my orginal tries...
 
OK I got the script working in batch file which works nicely.

Trying to convert to PHP, but having difficulty:

original script:
Code:
cat APP_Results.csv | sed -e "s/[^a-zA-Z0-9&_\/, ]//g" -e "s/  \+//g" > out.txt
cat out.txt | grep _ | sed -e "s/_[0-9A-Z]*//g" > out1.txt
cat out.txt | grep -v _ >> out1.txt
cat out1.txt | sort | uniq > out.txt

my php attempt: (starts at line 17)
PHP:
$filename = $_FILES['uploadedfile']['tmp_name'];
$input = file_get_contents($filename);
$out = preg_replace('/[^a-zA-Z0-9&_\/, ]/m','',$out);
$out = preg_replace('/  \+/m','',$out);
$out1 = preg_grep('/_/',$out);
$out1 = preg_replace('/_[0-9A-Z]*/m','',$out1);
$out2 = preg_grep('/_/',$out,PREG_GREP_INVERT);
$new_array = array_unique(array_merge($out1, $out2));
print_r($new_array);
echo "<br />DONE";
exit();

It's some string/array issue...

errors:
Code:
Warning: preg_grep() expects parameter 2 to be array, string given in /var/www/idocsapp-dev/csvread.php on line 20

Warning: preg_grep() expects parameter 2 to be array, string given in /var/www/idocsapp-dev/csvread.php on line 22

Warning: array_merge() [function.array-merge]: Argument #1 is not an array in /var/www/idocsapp-dev/csvread.php on line 23

Warning: array_merge() [function.array-merge]: Argument #2 is not an array in /var/www/idocsapp-dev/csvread.php on line 23

Warning: array_unique() [function.array-unique]: The argument should be an array in /var/www/idocsapp-dev/csvread.php on line 23

DONE

Any pointers??
 
Last edited:
Back
Top Bottom