I'm hoping there are some Python gurus on here somewhere, or at least someone who's attempted something similarish in the past ...
The problem:
This is a biosciences application, but that doesn't impact the thrust of it. I have a list of ~500,000 nucleotide sequences (strings mostly of length 25 but variable up to ~200) that I'd like to compare to every other sequence in the list.
For that I'm using a local install of NCBI BLAST+ (link). This is a command line app that I call from python and get the results through subprocess communicate(). The app takes an input sequence and compares it to a pre-compiled library of other sequences (my ~500,000 list).
All well and good, but: 1) each operation is going to take some time 2) this is obviously a very parallel task.
So:
I'm newish to Python, but I've got a fair amount of programming experience. I haven't played with multiprocessing to date and was wondering if it's going to be worth my while to attempt it for this application? I've heard very mixed things about multiprocessing in python, but I also know it's had a fairly major overhaul for the current v3 release. I'll happily dive in, but it looks complicated, so I'm just after some opinions / tips / ideas before I start.
The problem:
This is a biosciences application, but that doesn't impact the thrust of it. I have a list of ~500,000 nucleotide sequences (strings mostly of length 25 but variable up to ~200) that I'd like to compare to every other sequence in the list.
For that I'm using a local install of NCBI BLAST+ (link). This is a command line app that I call from python and get the results through subprocess communicate(). The app takes an input sequence and compares it to a pre-compiled library of other sequences (my ~500,000 list).
All well and good, but: 1) each operation is going to take some time 2) this is obviously a very parallel task.
So:
I'm newish to Python, but I've got a fair amount of programming experience. I haven't played with multiprocessing to date and was wondering if it's going to be worth my while to attempt it for this application? I've heard very mixed things about multiprocessing in python, but I also know it's had a fairly major overhaul for the current v3 release. I'll happily dive in, but it looks complicated, so I'm just after some opinions / tips / ideas before I start.
Last edited: