Any budding scripters who could set me on the way to this script would be much appreciated.
I have lines of sequence, aligned by the identity of letters within them. I'd like to sum the % identity of each in the sequence, generating a matrix for the entire alignment. i.e.
fred ATGTTGTAT
fred1 ATCT-ATAT
fred2 ATCTTATAT
Output:
A 3 0 0 0 0 2 0 3 0
T 0 3 0 3 2 0 3 0 3
G 0 0 1 0 0 1 0 0 0
C 0 0 2 0 0 0 0 0 0
- 0 0 0 0 1 0 0 0 0
This will be for 100,000 sequences, hence the script requirement. I figure that the best way is to count the incidence of each letter at each position within each line and sum them in the matrix, but I'm yet to work out how to do this. Thoughts would be much obliged... Thanks.
I have lines of sequence, aligned by the identity of letters within them. I'd like to sum the % identity of each in the sequence, generating a matrix for the entire alignment. i.e.
fred ATGTTGTAT
fred1 ATCT-ATAT
fred2 ATCTTATAT
Output:
A 3 0 0 0 0 2 0 3 0
T 0 3 0 3 2 0 3 0 3
G 0 0 1 0 0 1 0 0 0
C 0 0 2 0 0 0 0 0 0
- 0 0 0 0 1 0 0 0 0
This will be for 100,000 sequences, hence the script requirement. I figure that the best way is to count the incidence of each letter at each position within each line and sum them in the matrix, but I'm yet to work out how to do this. Thoughts would be much obliged... Thanks.