Soldato
- Joined
- 15 Nov 2008
- Posts
- 5,060
- Location
- In the ether
That's what i ment by cores, as in cores in a cluster. If you can spread across core you can spread across clusters, it just the matter of getting the state over to all the boxes, which many programs will do for you.
It's just getting it parallel in the first place, which is hard.
The two are different though, things that can work well over clusters do not necessarily work well over cores. It really depends on where your hitting your hardware limits. Typically with a cluster environment you'll hit network bandwidth issues where you simply can't transfer data between the nodes as quickly as you'd like, in a one machine multiple cores setup that's not really an issue.
But there are implicit problems with map reduce in practice. It'll fall over where in the map stage if the records are highly asymmetric , or equally if the files sizes themselves differ massively. But the ultimate point is that IMHO it's inefficient for most applications