Sticking a Q9550 core on a GPU will not make things faster. In fact it will make your games ridiculously slower. For the short answer as to why, compare software renderers with 3D hardware acceleration. that software renderer is running on a general-purpose CPU while the hardware accelerator has dedicated hardware for compute-intensive problems.
That's the short answer, here is the (grossly simplified) long answer:
It's down to architecture. CPU design engineers have quite different design goals from GPU designers. CPUs are designed for general purpose processing. The majority of their processing deals with I/O, string manipulation, integer arithmetic, logical operations etc. So much so that the original x86 processors (8086 to 80386) could not even do floating-point arithmetic (on single precision and double precision variables, as defined by IEEE Standard 854 and 754). They were only capable of integer arithmetic, and even now the standard 8086 instructions only support integer arithmetic. Back then you needed to buy an expensive co-processor called the math-co processor or Floating Point Unit (FPU) if you wanted to carry out arithmetic on numbers iwth decimal points (floating point variables). These corresponded to processor numbers like 8086, 286 and 386 CPUs in the form of: 8087, 80287 and 80387 FPUs etc. (I've omitted 80186 because its primary use was in embedded systems).
On hardware that did not have it, software floating point emulators were used. These were software programs, often written in highly-optimized assembly language (both for speed and because it required low-level access to the hardware traps). And they would use the integer instruction set of the computer to perform FP arithmetic.
Then when the 80486 DX arrived it came with a built in FPU. The cheaper 80486 SX was essentially a DX with its FPU disabled.
(As a side note: The FPU is a completely separate architecture from the general-purpose register organised machine the x86 is. In fact it was a stack-organised architecture consisting of 80-bit doubles which allows for very interesting ways to perform arithmetic easily -- it can directly compute arithmetic expressions in Reverse Polish Notation (RPN).)
This should give you some indication as to how far removed floating-point arithmetic is from what personal computers conventionally do. The lack of need for fast FP arithmetic in normal computer use gave rise to its relegation, at least until the 486 onwards, as a separate optional co-processor (much in the manner 3D accelerators are nowadays).
The FPU expanded in functionality from the days of yore when the 8087 reigned supreme. Fast forward to the time of the Pentium III: a new way to carry out floating point arithmetic on the CPU Was introduced. And this was originaly because multimedia applications needed access to fast hardware-based FP arithmetic. Pentium IIIs and their AMD counterparts therefore introduced a new set of instructions called SSE which also incorporated a form of Data-level parallelism called SIMD (infact SSE stands for Streaming SIMD Extensions). This is important for understanding where GPUs have the advantage over CPUs. SIMD allowed single instructions to operate on large sets of data -- why is this important? Because much numerical work is expressed and carried out in linear algebra which uses matrices. SIMD allowed fast calculations on matrices (the reasons for this become obvious if you delve into linear algebra to a moderate depth). SSE continued to expand in the form of SSE2, SSE5, etc. Though primarily created for multimedia that is not its only application. In fact the reason such arithmetic is necessary in multimedia has to do with a field of engineering called Digital Signal Processing (or DSP). DSP is the core scientific basis for audio and video processing, and it is almost entirely about matrices and Z-transforms. So these advances make it very useful for scientific/engineering computing which is numerically intensive.
Fast forward again to the time of Sandybridge: A new set of instructions were introduced for the burgeoning requirement of ever faster floating point arithmetic called AVX or Advanced Vector Extensions (Vector being a different name for a type of matrix). This also functions similarly to the FPU and SSE by providing ways of performing FP arithmetic on large data sets in a rapid manner. Now keep in mind that the FPU, SSEx and AVX are all great for both single and double precision. In fact the FPU only ever carries out arithmetic on 80-bit long doubles which it then truncates to 64-bit doubles or 32-bit singles. SSE and AVX do it a bit differently. Example an AVX instruction on a 128-bit register can be either 2 double ops or 4 single ops.
Now here's where GPUs come in. GPUs are single-minded number crunchers. They are designed from the ground up to have data-level parallelism, and even instruction level parallelism. Their strength lies in their ability to perform floating-point arithmetic. Along the way the companies (prominently, NVIDIA) realised that what makes GPUs great for graphics also makes it a great general-purpose FPU. As such they began to move towards a more general architecture that is compute heavy. This change mainly began between the Geforce 7000 and 8000 series.
However, graphics only requires single-precision floats. Sadly for science and engineering this not enough as computational problems are often performed exclusively on doubles. This is the cause of the recent trend you see with NVIDIA and ATI trying to make their graphics cards stronger in double precision floating point arithmetic. Because NVIDIA in particular is pushing its GPU as a general purpose processor.
Generally speaking the GPU is a processor designed for general-purpose linear algebra, while the CPU is a general purpose processor in the truest sense and therefore not nearly as optimised for compute-heavy problems like graphics, DSP and optimization theoretic problems.