noob looking for perl/python help

Soldato
Joined
4 Aug 2009
Posts
5,310
Location
London
Hi all,

I'm a complete noob when it comes to scripting. I have had a play with perl and python to do a few simple things to make my life easier. But this task I am really struggling with in either.

I have three files which look like this:

# few lines of header
@more lines of header
0.000 0.040524
0.002 0.495572
0.004 0.486072
0.006 0.586495
0.008 0.720278
... etc....
50

The left hand column is the time (so is the same in each file) and the right hand value is the interesting part which varies.

I want to take these three files and average the value in the right hand value. Ending up with a file that looks like this:
#header (doesn't matter which file it's from)
@header (doesn't matter which file it's from)
0.000 *mean 1-3*
0.002 *mean 1-3*
0.004 *mean 1-3*
0.006 *mean 1-3*
0.006 *mean 1-3*
...
50.000 *mean 1-3*


My problem is that I am struggling to address the three files at the same time. Previously I have done stuff with one of these files using things like this in perl:

Code:
while ($line = <FILE>)
{
    if ($line =~ m/@/)
    {
        print $line;
    }
    elsif ($line =~ m/#/)
    {
        print $line;
    }

    else 
    {
        for ($line) {
        s/^\s+//;
        s/\s+$//;
        }

        chomp $line;

        @words = split("   ",$line);
        $col1 = $words[0] / $x;
        $col2 = $words[1] * $y;

    print "$col1   $col2\n";
    }


}

or something similar in python:

Code:
for line in lines:

    if re.search( r"#", line ):
        print line,
    elif re.search( r"@", line ):
        print line,
    elif line== '\n':
        print line,
    else:
        words = line.split()
        time = words[0]
        rmsd1 = words[1]

        print "%s %s" % (time, rmsd1)

but I am struggling to do this task with a simple "line in lines" and a split because I am taking things from two files.

I think that the solution should be to take the two files, pick the interesting sections and put them in arrays, then add the three arrays and divide by three. I just can't work out how to do this!
 
Last edited:
The data is some analysis of a molecular dynamics simulation of a g-protein coupled receptor.

Specifically, the left hand column is the time (in ns) and the right hand column is the RMSD of the alpha carbons (in nm) at that point in time, with the first frame as a reference. The RMSD is a good measure of how far the structure at time x is from the initial structure.

The three different files are the data from three different simulations.

Thanks for your help, and just ask if you want to know more :)
 
Last edited:
Hey,

That seems not to work. I tried it on two versions of the file (with different units) the type I posted earlier:
0.000 *mean 1-3*
0.002 *mean 1-3*
0.004 *mean 1-3*
0.006 *mean 1-3*
0.006 *mean 1-3*
...
50.000 *mean 1-3*

returned:
Code:
Traceback (most recent call last):
  File "oc.py", line 104, in <module>
    Number_3 = Dictionary_3[Keys]
KeyError: '0.000'

I then tried the different version of the input file which looks like this:
0.0000000 0.0040524
2.0000000 0.0495572
4.0000000 0.0486072
6.0000000 0.0586495
...
50000.0000000 0.2347846

which gave an output, but it's junk:
Code:
0.00000000.0040524 /n 10.00000000.0890422333333 /n 100.00000000.132484733333 /n 1000.00000000.150533633333 /n 10000.00000000.182452666667 /n 10002.00000000.173097 /n 10004.00000000.174290566667 /n 10006.00000000.1680934 /n 10008.00000000.1684483 /n 10010.00000000.175850433333 /n 10012.00000000.1720737 /n 10014.00000000.1775554 /n 10016.00000000.1666258 /n 10018.00000000.172433 /n 1002.00000000.154299233333 /n 10020.00000000.1710826 /n 10022.00000000.1697698 /n 10024.

etc.

Notably the output is all one line. But the values also seem to be junk.

The simulation isn't of a reaction. It's just a GPCR embedded in a bilayer as it would be in the body. We have built a model of the GPCR and are using the MD to optimise the structure to use it for drug design.
 
Back
Top Bottom