Ruby speed

It would be about 30-40 lines of C# (my current native tongue!) for the basic multi-threading stuff.

I'm sure Ruby can do it in less.
 
Personally I would use a thread pool. I'm sure Ruby has one. It saves a lot of boilerplate-code/hassle WRT to spawning and lifecycle of the threads.
 
Personally I would use a thread pool. I'm sure Ruby has one. It saves a lot of boilerplate-code/hassle WRT to spawning and lifecycle of the threads.
Just spent a few hours trying to implement a threadpool, and for whatever reason it's ended up slower. Stupid Ruby.

May do it in Java now...

EDIT - seems to be slower on OSX, so I'm blaming 1.8.6 and upgrading now.
 
Last edited:
For the Ruby 1.8 series at least with the normal reference implementation, IIRC threads don't run in parallel, it basically just has a scheduler in the vm which timeslices; "green threads" in ruby parlance.
Maybe it's similar to the whole GIL Python thing.

This is obviously crap from the point of view of parallel programming - I think I read proper threads are in 1.9?
 
For the Ruby 1.8 series at least with the normal reference implementation, IIRC threads don't run in parallel, it basically just has a scheduler in the vm which timeslices; "green threads" in ruby parlance.
Maybe it's similar to the whole GIL Python thing.

This is obviously crap from the point of view of parallel programming - I think I read proper threads are in 1.9?
Indeed sir. Just installed 1.9 and it works better in parallel. Well, it's using multi-threads but it's up to the OS to distribute it over the CPU cores - which it isn't doing well, being Apple and all (Leopard on the box - can't upgrade).

It runs quicker on my much-lesser-spec Windows box. Ah well.

I have a sample 1Gb/35 million line file churning over now... 3hrs approx that will take... so 180hrs for the big file. It's still crazily slow. I'm just hoping loading the serialisation dumps is quick, otherwise I may have to re-evaluate.
 
Seriously? I had no idea that OSX was also useless at multi-threading? :confused:
Well on my laptop (Snow Leopard) it performs *much* better, but it only has two slower cores. The mac desktop (Leopard) has 8 faster cores, but it can't hold it's own when multi-threaded, the Windows desktop (2 slowest cores) fairs quite well. Maybe it's a 64bit thing, me, or Ruby, or whatever, but it's odd.

Either way, I'll just let it tick over...
 
Just in case you wondered, I got it working.

Had to split it into three different steps, across 12 cores, 3 machines, 3 operating systems, 24gb memory, but it ran in <15mins. One of the steps is the bottleneck - it creates a massive hash and crashes out. Had to run that bit on Ubuntu with no GUI to free up memory :D.
 
Leopard isn't that bad really, I dont have any issues running multithreaded code efficiently at work on a MP running it (don't have SL, although yes it's supposed to be better).

Maybe it's a Ruby interpreter issue, dunno though.
 
I maybe a little late to this thread, but I'll add it's usually worth batching the disk writes, writing say a 1000 checksums at time. Should give you better throughput as your not waiting for disk round-trip each time after a thread finishes calculating a checksum.

akakjs
 
I maybe a little late to this thread, but I'll add it's usually worth batching the disk writes, writing say a 1000 checksums at time. Should give you better throughput as your not waiting for disk round-trip each time after a thread finishes calculating a checksum.

akakjs

The OS does this. It's usually called lazy writing or delayed writes or write caching.
 
I did some profiling and it is definitely the hash causing the problems. I'm going to have a think of a work around, maybe combining Marshall dumps or something, to ease the load. But then I'll have the problem of loading it into memory. Argh.
 
Back
Top Bottom