I make console games for a living so hopefully can help

Typically, the CPU does no graphical, per-pixel 'rendering', per-se. The most graphical thing that is done is the construction of "display lists" aka "command lists" which are basically long, high-level sequences of instructions for the GPU to execute. They are not commands like "draw a red pixel at co-ordinate (123, 456)".
So in your game render loop, you iterate through each object that you'd like to draw and call its render function. That object then adds to the current display list a command to set the current position, rotation + scale of the model you'd like to draw. It then adds a command to set the visual state such as "I'd like these textures, this lighting and these shaders bound to the next model" and then adds the command to draw a model that's in memory.
Once this has all been done the command list is then sent to the GPU to execute.
There are non pixel-related rendering things though, such as calculating visibility of objects. For instance, there's no point in rendering objects behind the camera only to have the GPU automatically cull them out. This costs both CPU time (adding those invisible objects to the display list) and GPU time (since the GPU has to do run the vertex shaders, just to have the clipping phase classify them as behind the camera). So the CPU normally does a ray-trace like step to see if objects are visible. One such way of doing this for FPS games is to use BSP trees.
Rarely however you may want to do actual rendering on the CPU since the latency needed to do processing on the GPU and then get it back on the CPU is too high. We did this in a shipping game last year by using an old-skool software renderer to do it instead of the GPU.
Does this all make sense?
