Intel's Larrabee Architecture Disclosure: A Calculated First Move

Indy500 · 4 Aug 2008 at 10:25

Ulfhedjinn said:
Indeed, but maybe we'll see it in driver-level without any need for messing around.

Microsoft might have something to say about that

but if it does happen, I'll finally be able to fully convert to Linux..

uvarvu · 4 Aug 2008 at 10:46

Engadget are in the loop as well now - they posted this link:-

http://www.tgdaily.com/content/view/38700/135/

OrphanBoy · 4 Aug 2008 at 12:36

I can't see how this thing will ever compete with a current-day bespoke GPU. Anyone who's ever coded a software renderer knows how much code is required to do filtered texture lookups - anything higher than bilinear texture filtering (on one texture) is difficult. Anisotropic filtering is just seriously painful and a lack of discreet hardware to do texture sampling will completely clobber fill rate.
There's got to be more hardware/specialist instructions that have not been disclosed! Else this is gonna compete with the low-end...

Darkwave · 4 Aug 2008 at 12:39

OrphanBoy said:
I can't see how this thing will ever compete with a current-day bespoke GPU. Anyone who's ever coded a software renderer knows how much code is required to do filtered texture lookups - anything higher than bilinear texture filtering (on one texture) is difficult. Anisotropic filtering is just seriously painful and a lack of discreet hardware to do texture sampling will completely clobber fill rate.
There's got to be more hardware/specialist instructions that have not been disclosed! Else this is gonna compete with the low-end...

From what I've read it'll actually be easier to code for as it's using x86 instruction sets.

I don't claim to know anything whatsoever about coding, it's just what I read.

Big.Wayne · 5 Aug 2008 at 01:23

I made it about half way through that article before my brain started dripping out my ears (Building an Optimized Rasterizer).

I got what I could from it though, seems INTEL have pulled together a splendid team to work on Larrabee's development!

I am assuming that it will still be a seperate piece of hardware you plug into your machine similar to a modern day GPU right? although with the talk of AMD Fusion™ makes you wonder how things will pan out.

I've also heard a lot of talk about Vector graphics but I don't know what that means? I have a mate who does 3D Programming so will have to brain-drain him on the subject.

Best case scenario for Larrabee is that is really good, but not good enough to wipe out AMD and nVidia. Three players in the performance graphics market will be great!

I hope a few peeps here at OcUK are able to fully understand that Anandtech article and break it down into pea-brain sized chunks for us mortals!

Now I bet if that article had a screenshot of Crysis playing at 200fps in full HD this thread would be about 30 pages long lol!

Oh well roll on 2009 and 2010 . . . .

Strife212 · 5 Aug 2008 at 01:39

Intel's graphics drivers are a literal sham at the moment, nothing works properly, it will take some major overhauling

Lightnix · 5 Aug 2008 at 06:59

Further reading suggests a 12-core larrabee device would have 192 ALUs, as each core has a 16 ALU wide vector unit. Okay, that number seems 'meh' compared to the monster number 800 claimed by ATi for RV770, or even 240 by nvidia for GT200. I think though that you have to A: factor in that they're completely different architectures, and that B: Intel is likely to push these things out at huge clock speeds compared to the 'dizzying' highs of 750MHz of today. Think about it, Core 2 is nothing but a derivation of Pentium itself, and that hits (or at least can hit) over 4GHz on a 45nm process, I wouldn't be surprised if Intel made the part competitive just through sheer force of clockspeed.

Edit: Though the wikipedia article, yeah I know, but the bits I'm talking about had reasonable looking citations, suggest clockspeed in the 2GHz range, and likely 24-32 cores on higher end models (512 ALUs on the 32 core model with the 16 ALU vector units), anandtech states that Larrabee has an R520 through RV670 alike ring bus. Oh and, uh, 4 way hyperthreading per core apparently. Not sure how that'd affect performance though.

Biffa · 5 Aug 2008 at 09:34

Heh, yeah but a Core2 at 4Ghz needs what sort of cooling? Even at stock you can't really compare video card cooling to CPU cooling just in sheer size! Imagine what you'd need to cool a 3Ghz shader monster 12 core Larrabee :eek:

Hitman_Leon · 5 Aug 2008 at 09:38

Strife212 said:
Intel's graphics drivers are a literal sham at the moment, nothing works properly, it will take some major overhauling

Maybe more to the point intel's gfx hardware is a sham atm so they just dont devote any decent talent to putting those drivers right. Have you considered the trojan horse idea. Maybe big I want Amd/Nvidia to assume they are going to let out a sheep dressed as a lion. They are for sure investing a whole lot in this project so they are going to want returns for their investment. I for one will be amazed if after their gfx endevours of the past they come out with killer hardware. But I will be dam well pleased it wont be an Ati/Nvidia race anymore and competition breeds innovation if Intel succeed we all benefit from the insuing scramble to catchup and out-do eachother.

Turambar · 5 Aug 2008 at 09:52

The article in the op states that Intels driver team aren't working on Larrabee.

Lightnix · 5 Aug 2008 at 10:05

Biffa said:
Heh, yeah but a Core2 at 4Ghz needs what sort of cooling? Even at stock you can't really compare video card cooling to CPU cooling just in sheer size! Imagine what you'd need to cool a 3Ghz shader monster 12 core Larrabee

Well yeah but I wouldn't expect a 4GHz version (certainly not stock!), probably more 2GHz, I'm just saying Intel have a knack for making products that clock really high. Still, it's looking like anything up to 32 core on 45nm which'd be 512 ALUs in total, at 2GHz that'd be a LOT of horsepower even if it does require some 23 heatpipe mammoth cooling unit.

Hitman_Leon · 5 Aug 2008 at 10:15

Turambar said:
The article in the op states that Intels driver team aren't working on Larrabee.

I posted before I read it

.

bru · 5 Aug 2008 at 14:54

Big.Wayne said:
I hope a few peeps here at OcUK are able to fully understand that Anandtech article and break it down into pea-brain sized chunks for us mortals!

ok ill see if i can write up some of the important points in a way that is fairly easy to understand.

with regard to the size of the currant gpu chips and number of cores, ati currantly have the 4870x2(yes i know its not actually here yet ...so what) with 2 cores.
NV also has a 2 core chip the gx2, and a larger die gtx200 with no dual core varriant as of yet.

ATI 260mm^2 for a single core built on 55nm
NV 576mm^2 for a single core built on 65nm
intel could fit 64 cores for the same 576mm^2 built on 45nm
intel estimates between 16 and 32 cores for the initial larrabee gpu

now for each core currantly availible, there are different makeups of the way they work and the number of threads they can work on at once (theroretical peak throughput)

ATI has 160 sp each cabable of 5 operations each whereas NV has 240 sp +64 sfu (special function units) each sp can do 2 operations and each sfu can do 4, intel has 32 cores each with 16 sp capable of 4 operations each.

ATI 160*5=800
NV 240*2+64*4=736
intel 32*16*4=2048

now the speed these sp run at also comes into play

ATI core & sp 750Mhz
Nv core 602Mhz sp 1296Mhz
intel est 2000Mhz

ATI run on a 256Mbit bus with GDDR5
Nv run on a 512Mbit bus with GDDR3
intel maybe 256Mbit bus with GDDR5 although the article does mention a 128Mbit bus with GDDR5 letting the sheer core speed carry it through.
intel also uses an internal 512Mbit ringbus.

the best bit of news as far as im concerend is.

ATI direct rendering
NV direct rendering
intel tile based rendering ala STmicro/kyro

each tile will fit into half of the larrabee indevidual cores cache (possible eliminateting the need for a large amount of memmory.)

the whole programing side of things is fairly straight forward as larrabee uses the standard DirectX/OpenGL code and writing directly to the hardware using c+/c++ which is of course just like writing to any other x86 processor (most cpu's out there).

now intel need to make sure that the drivers and compilers that keep the whole thing runing smoothly are up to scratch, you will be pleased to know that the guys who currantly write the intel graphics drivers are having nothing to do with these new larrabee drivers.

ill finish this round up of the siliant points as i see them with a quote from the article which i feel sums the whole thing up.

obviously any comments wellcome and i hope this helps some of you who didnt read/understand the article. (any mistakes or errors on my part....tough...

)

The flexibility of Larrabee allows it to best fit any game running on it. But keep in mind that just because software has a greater potential to better utilize the hardware, we won't necessarily see better performance than what is currently out there. The burden is still on Intel to build a part that offers real-world performance that matches or exceeds what is currently out there. Efficiency and adaptability are irrelevant if real performance isn't there to back it up.

Lightnix · 5 Aug 2008 at 15:59

bru said:
ATI has 160 sp each cabable of 5 operations each whereas

Not right, ATi has 160 SIMDs, each containing 5 ALUs, four of which do MADx2, one of which does more complicated things such as SIN, COS, TAN, etc. Each ALU is worth 2 FLOP each. If anything it'd be 160 SIMD units that can do 10 micro-ops each, actually. At least that's my understanding of it, if anyone can point me to an article that shows me to be a douche, please do so!

juicytuna · 5 Aug 2008 at 20:06

bru said:
ATI has 160 sp each cabable of 5 operations each whereas NV has 240 sp +64 sfu (special function units) each sp can do 2 operations and each sfu can do 4, intel has 32 cores each with 16 sp capable of 4 operations each.

ATI 160*5=800
NV 240*2+64*4=736
intel 32*16*4=2048

That's not quite right. You are vastly overstating what each Larrabee core is capable of. Each core in Larrabee has a 16 element wide vector processing unit with each element being capable of a MADD (2flops). ATI has 160 SIMDs each with 5 MADD capable elements. Nvidia has 240 sp each capable of a MADD + MUL (2 + 1 flop). I would discount the 64bit SPs altogether as they cannot be used in parallel with the 240 main ALUs and are there purely for gpgpu use.

ATI 160 * 5 * 2flops = 1600 flops/clk
NV 240 * 3flops = 720 flops/clk
Intel 32 * 16 * 2flops = 1024 flops/clk (assuming a 32 core Larrabee)

OrphanBoy · 5 Aug 2008 at 20:28

There's also no mention of special AA hardware - we all saw what happened to the 3800 series cards where AA was done with shaders (rather than dedicated hardware).

juicytuna · 5 Aug 2008 at 20:47

OrphanBoy said:
There's also no mention of special AA hardware - we all saw what happened to the 3800 series cards where AA was done with shaders (rather than dedicated hardware).

The only fixed function hardware AFAICT are the texture sampling and filtering units. Every other part of the traditional directx pipeline is carried out through the x86 cores, including triangle setup, rasterization, AA etc.. I wouldn't necessarily discount it based on what ATI acheived doing shader AA in the past though, as this is a completely different way of tackling the problem of real time rendering.

This Siggraph paper is a very interesting read. It goes into some detail about the architecture and special software renderer that they've created for it. Some perfomance graphs for games run through a Larrabee simulator are provided also.

Big.Wayne · 5 Aug 2008 at 21:10

Thanks bru for the breakdown and also to Lightnix & juicytuna for the fine point corrections!

I remember the days where 3D graphics were rendered by your CPU, it looked like this:

And then they invented 3D-Cards, I remember the first ones like Voodoo and Power-VR were stand alone 3D-Cards, you still need another video card to display your 2D so most enthusiasts had both cards! (some even had three cards, one 2D/Video card and another two Voodoo cards in SLI!).

With a 3D card installed the games were deffo quicker and looked a bit better, like this:

So the gist of what I am understanding is that Larrabee will be approaching 3D-Rendering in a similar vein to the first CPU-Rendered example above, except instead of having a Pentium 133MHz chugging away doing all the 3D work INTEL are making a Super-Fast-Multi-Core-CPU to do the same thing?

And instead of having hardware specific features welded onto the card (like shader model 3.0 etc) all that stuff will be *emulated* by the Larrabee?

If thats correct then that totally rocks as the Larrabee could do anything if it was programed. The only reason you would need to upgrade would be to achieve faster cores and more of them (like how you would probably have to upgrade the Pentium 133MHz when running VISTA, the old chip could do anything a modern quad core could do in Vista but just very slowly!).

Sounds good!

bru · 5 Aug 2008 at 21:21

juicytuna said:
That's not quite right. You are vastly overstating what each Larrabee core is capable of.

i came up with the number 2048 instead of 1024 because the hyperthreading they are going to be using will be 4 way not not 2.

Each core is a dual-issue, in-order architecture loosely derived from the original Pentium microprocessor. The Pentium core was modified to include support for 64-bit operations, the updates to the x86 instruction set, larger caches, 4-way SMT/Hyper Threading and a 16-wide vector ALU.

from the anandtech article.

now if you are using the *2 because they are dual issue units then surely it would be 4096..... but of course its very unlikely that any processor would be able to achieve filling all its pipelines to its theoreticle peak throughput, just the same as ATI's gpu dosnt and niether does NV's.

edit: i think what is clear........ however the numbers are worked out, is that larrabee has the potential to be a very powerfull gpu indeed

edit2: that Siggraph paper is a very interesting read. i can only assume that is the white paper that intel released to all the technical sites like anandtech from which tehy have written up these articles.

juicytuna · 5 Aug 2008 at 21:38

bru said:
i came up with the number 2048 instead of 1024 because the hyperthreading they are going to be using will be 4 way not not 2.

Having 4 way hyperthreading doesn't mean you have 4 times the execution resources. Hyperthreading is simply a way to help your maxmize resource utilisation by swapping in different threads when the currently active thread has stalled. GPU's have been using a form of this for years and it is fundamental to the way they work.