noob trying to write a simple python script, help please =]

Disco Boy · 13 Dec 2011 at 10:32

Hello all,

I am trying to write a python script which takes an input file and multiplies some numbers before outputting.

Here's the start of the file:

Code:

# This file was created Wed Nov  2 12:37:30 2011
# by the following command:
# g_dist -f md1_10.xtc -s run1.tpr -n index.ndx -o triad_thr136_asn86_dist_ox1M_1.xvg 
#
# g_dist is part of G R O M A C S:
#
# GRoups of Organic Molecules in ACtion for Science
#
@    title "Distance"
@    xaxis  label "Time (ps)"
@    yaxis  label "Distance (nm)"
@TYPE xy
@ view 0.15, 0.15, 0.75, 0.85
@ legend on
@ legend box on
@ legend loctype view
@ legend 0.78, 0.8
@ legend length 2
@ s0 legend "|d|"
@ s1 legend "d\sx\N"
@ s2 legend "d\sy\N"
@ s3 legend "d\sz\N"
   0.0000000    0.7551542   -0.6089997    0.2760000    0.3510003
  20.0000000    0.7571034   -0.5939999    0.2770000    0.3789997
  40.0000000    0.6894182   -0.5540004    0.2700000    0.3090000
  60.0000000    0.6623963   -0.5409999    0.2579999    0.2820001
  80.0000000    0.6986261   -0.5410004    0.3189998    0.3060002
 100.0000000    0.6851938   -0.5509996    0.3109999    0.2630000

n.b. the first column goes up to 50,000.

Specifically, I want it to print the first few header lines (# and @ at the start). Then, for every row, to divide the value in the first column by 1000, and multiply the value in the second column by 10 (i.e. it's converting the units). The 3rd, 4th and 5th columns I am not interested in.

Here's what I have got so far:

Code:

import sys
import re

inp1 = sys.argv[1]
file = open(inp1, "r")
n = 1000.0000000
x = 10.0000000

lines = open( sys.argv[1], "r" ).readlines()

for line in file:
    if re.search( r"#", line ):   ## print header ##
        print line,
    if re.search( r"@", line ):   # print header ##
        print line,
    else:
        for i in range( 1, len(lines) ):
            line = lines[i].rstrip()
            words = line.split("    ")
            a = words[0] / n
            b = words[1] * x
            print "%f %f" % (a, b)

You'll see when you run it that it has a problem with strings and floats...

Can anyone with a clue give me a hand at all?

Thank you

jack.mitchell · 13 Dec 2011 at 11:30

Code:

import sys
import re

inp1 = sys.argv[1]
file = open(inp1, "r")
n = 1000.0000000
x = 10.0000000

lines = open( sys.argv[1], "r" ).readlines()

for line in file:
    if re.search( r"#", line ):   ## print header ##
        print line,
    if re.search( r"@", line ):   # print header ##
        print line,
    else:
        for i in range( 1, len(lines) ):
            line = lines[i].rstrip()
            words = line.split("    ")
            a = float(words[0]) / n
            b = float(words[1]) * x
            print "%f %f" % (a, b)

Try setting the 'words' as floats, as I woudl imagine they are strings when split like that. I'm no python guru so it's just a guess.

There is also an oddity with the way you are doing your loop, i'll have a quick re-arrange and see if my instincts are correct!

jack.mitchell · 13 Dec 2011 at 11:36

I think this would be a further improvement, the else statement and then the re-iteration through all the lines again, when you're already iterating through the lines seems wrong...

Code:

import sys
import re

inp1 = sys.argv[1]
file = open(inp1, "r")
n = 1000.0000000
x = 10.0000000

for line in file:

    if (re.search( r"#", line ) || re.search( r"@", line )):
        print line,

    else:    
        line = line.rstrip()
        words = line.split("    ")
        a = float(words[0]) / n
        b = float(words[1]) * x
        print "%f %f" % (a, b)

Disco Boy · 13 Dec 2011 at 11:47

Thanks for your help Jack.

I actually started to have a go at doing it in perl and have managed to achieve it

here it is in all of its glory

Code:

$filename = $ARGV[0];
$x = 1000;
$y = 10;


open FILE,"<$filename" or die "Cannot read from $filename: $!\n";


$filename = $ARGV[0];
    
open FILE,"<$filename" or die "Cannot read the file $filename: $!\n";

while ($line = <FILE>)
{
    if ($line =~ m/@/)
    {
        print $line;
    }
    elsif ($line =~ m/#/)
    {
        print $line;
    }

    else 
    {
        chomp $line;

        @words = split("   ",$line);
        $col1 = $words[0] / $x;
        $col2 = $words[1] * $y;

    print "$col1  $col2\n";
    }


}

jack.mitchell · 13 Dec 2011 at 11:52

Yeah, the beauty of Perl (am I joking?

) Perl is a lot more scripty than python and as such is fantastic for knocking up quick programs where you need not worry about being typesafe or having to mess about casting variables etc...

I've been doing quite a bit of this recently as the lass is doing a phd and I'm helping her write python programs to manage mass amounts of info about genes and genotypes etc.. which are all obviously in horribly formatted text files, my regex is improving daily!

A.N.Other · 13 Dec 2011 at 11:53

Don't forget to close your files! No need for regex in this situation, it's just complicating matters.

Code:

import sys

fobj = open(sys.argv[1], "r")
lines = fobj.readlines()
fobj.close()

for line in lines
    if line[0] == "#" or line[0] == "@":
        print line
    else:
        tmp = [i for i in line.split(" ") if i != ""]
        print "%f %f" % (float(tmp[0]) / 1000.0, float(tmp[1]) * 10.0)