First point is that developing games is expensive and risky. Smaller companies have pretty much their entire company invested in each title, and go bust if the game fails. Hours tend to be long and brutal. I'm not sure "lazy" is a fair description - if you've got four hours to implement and test something, single threading may be your only hope.
... You can also easily get to the point where creation and management of threads is costing more than the gain from the multi-threading itself. As a result you have to write for a target number of threads, perhaps with the ability to take advantage to some extent of further threads if they're available. You'll realistically never (never as in right now!) optimise for more than about 4 threads
Edit: My experience is mostly with high level languages so game engine writers may not agree!
I thought I'd add a couple of thoughts from threading in a relatively low level language (C). Thread/creation and destruction is expensive but it's also avoidable. The design pattern is a "threadpool", but the idea is simply to reuse threads to call other functions instead of create/destroy. It's harder to code than parallel-for loops but efficiency is better.
I've found some odd results from varying the number of threads. The standard advice is threads=cores, probably optimal if the processing is well behaved. I've had some preliminary success with thread count > cores, tested up to 24 threads on 4 cores without dramatic slowdown. I think it's possible to improve on one thread : one core with clever scheduling. As a side effect you end up with code which expands to meet a relatively large number of cores.
Sadly it is very possible to spend more time sending data between threads than doing useful work. That's where the real difficulty seems to be.
Theres a BIG BIG difference between coding multi thread support into something like banking and security software, where the specific order a lot of stuff is processed doesn't matter and something like a game engine where there are large amounts where you either have to process them in a specific order and each depends on the one before it or where threading the code produces so little in the way of gains its not worth the effort and potential issues threading it.
Yeah - this would be true. Different threads doing different things is awkward, when the order of execution is critical it's very hard.
I'm still unconvinced that many weak cores is a better way to go than a few strong ones for future mainstream CPU's and software. Crisis 3's coding does allow the FX8 to beat the i7 but its still taking AMD twice as many cores as Intel to get the job done, is this due to the poor per core performance of the FX or is it actually because optimizing real time software (I.E not encoding apps) for 8 cores is very lossy in comparison to a few main threads?
The old C2Q systems scaled really badly past two cores because the FSB was limiting. In the case of AMD, it may not be weak cores that are the problem, but weak connections between cores. Plus AMD count cores differently - an Intel quad core has four floating point units, an AMD eight core has four floating point units. The ones in an Intel chip will be idle some of the time, but two cores will have to fight each other for access in an AMD chip.
It might be educational to find a benchmark between an Intel eight core and an AMD eight core - the Intel oct would be expected to absolutely trounce the AMD when maths is involved.