• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

Intel i7 Ivybridge on its knees.....

Associate
Joined
29 May 2011
Posts
19
Hi all,

I'm working on commissioning a security system for a client which comprises a number of workstations custom built by myself but despite following the guidelines of the software vendor utilised for the management and viewing of up to 40 x H264 video streams per workstation, the two workstations running an 8 screen video wall setup are coming up short in compute muscle decoding the required H264 streams.

The video wall workstation set up is as follows:

CPU: Intel Core i7 3770 3.4GHz (Ivy Bridge)
MB: Asus P8B WS C206
RAM: 8GB Corsair Vengeance DDR3 PC3-15000C9 1866MHz Dual Channel (running XMP1 profile).
GPU: AMD FirePro V7900

Both systems are stable and since they are required to work 24/7 for the next 3-5 years, I made a conscious decision to stay away from K chips and play with voltages to gain marginal increase in performance.

The situation is that when each of the 4 x 55" 1080P screens are loaded with a 3x3 tile, I have 36 x H264 @ 3-5Mbps per stream being exclusively decoded by the CPU. Unfortunately the GPU is only used for scaling and rendering in this instance, no decoding workload is being handed off by the video management software and I'm seeing 70%+ CPU utilisation with peaks up to 90%+ during busy scenes.

I'm currently at a crossroads trying to decide what's the next step up in hardware as I really need to keep CPU utilisation below 70% to allow the workstations to be responsive to end client inputs when they need to modify the video wall layouts, which at 90%+ CPU utilisation are becoming stuttery in response and leave no margin for peaks during busy times.

I am considering the value of moving to Dual Xeon platform on LGA2011, but these chips are SBe based and are mostly clocked below 3GHz, unless of course one is prepared to pay £2K plus for a CPU :eek:

I'm not looking for a solution here, instead an open discussion and feedback on possible steps above in CPU number crunching performance above my current position.

Any constructive feedback/opinions are welcome.

best regards

Humour
 
Last edited:
Forgive me if this is a silly question, but where are the streams coming from? and what software is processing them?
 
Hi Ryder, no reasonable question is a silly one imo ;)

The streams are coming from IP cameras, and in some cases Encoders attached to analogue cameras. The software processing them is a dedicated video management platform from a company in Canada, its unlikely to have heard of it if you are not involved in the security Integration industry. The company is called Genetec.

In any case, the software/server services are hosted in a VM (distributed across 3 VM's) which are hosted inside the spare server resources available in the recording storage pool which is a SAN. The path of the video is as follows:

Camera/Codec--->UDP Unicast--->External Network--->Internal Network--->SAN NIC4 Inbound (UDP Unicast)--->SAN NIC5 Outbound (UDP Multicast)---->Internal Network---->Clients/Workstations.

The reason I mention External and Internal networks is because the external network is wireless, comprising a combination of 50Mbps + 100Mbps + 200Mbps pipes, and the internal network is cabled cat6e 1Gig throughout.

@ Anydbird123, no I'm not sure a bump of that magnitude will provide the headroom required, hence why I'm here and a few other places brainstorming ;) Taking your suggestion forward, I immediately see some issues, first the processors in use are i7 3770's, not 3770K's, so there is no movement without a CPU change. If I have to consider changing CPU's, is an OC'd 3770K Ivy bridge the best performing option out there, notwithstanding Dual Xeons? Second, the system is required to operate on a 24/7 basis, so stability up to 90% load @ 24/7 operation is critical. How many tweakers out there pushing their CPU's have run Prime post OC for over a month or up to 3 months to check the stability of the CPU OC at constant heavy load for a prolonged period of time?

I'm not saying your suggestion doesn't have merit, however I cannot afford to make a rash choice and the one I make has to be the right one, so dissecting the pro's and cons is vital in my case.

best regards

Humour
 
Your better of switching to a x79 and a hexcore unlocked CPU imo,give it a mild overclock of 4.2-4.4ghz and decent cooling should be plenty power,if you did switch to a k version 3770k then you'd struggle to keep it cool once oveclocked unlike a sb chip
 
Your better of switching to a x79 and a hexcore unlocked CPU imo,give it a mild overclock of 4.2-4.4ghz and decent cooling should be plenty power,if you did switch to a k version 3770k then you'd struggle to keep it cool once oveclocked unlike a sb chip

Really? The Hexcores are Sandy bridgeE are they not? By definition of the differing manufacturing processes between the two the Ivy's are less power hungry and produce less heat. If all things are equal in terms of voltage gain to reach a certain frequency OC from both chips, what makes you say the 3770K will generate more heat?

I can only assume from your statement that the X79 has more room for OC from stock without any voltage increase......correct me if I am wrong.

Also what do you consider "decent cooling" dazza?

best regards

Humour
 
Your better of switching to a x79 and a hexcore unlocked CPU imo,give it a mild overclock of 4.2-4.4ghz and decent cooling should be plenty power,if you did switch to a k version 3770k then you'd struggle to keep it cool once oveclocked unlike a sb chip


As above really, the hex core offers greater processing power, by its nature its a higher TDP 130w against 77w of ivy, but with 6 cores and quad channel X79 offers the more powerful option. no integrated GFX on i7 2011 and with a built in turbo boost to 3.8ghz on 6 cores it will out perform the ivy. infact with a very mild OC to say 4ghz on standard voltage it offers 24ghz of processing power. which will offer you the best option provided the software can make use of it.

Really? The Hexcores are Sandy bridgeE are they not? By definition of the differing manufacturing processes between the two the Ivy's are less power hungry and produce less heat. If all things are equal in terms of voltage gain to reach a certain frequency OC from both chips, what makes you say the 3770K will generate more heat?

With ivy and the new haswell its a lottery on heat, as intel have changed the way the I\ch heat spreader is attached to ivy and haswell, they now use a cheap tim paste as opposed to soft solder on the hex core X79 i7 cpu's some if your lucky will run cool, some even at stock are hot runners because of the cheap tim used. search for people deliding there ivy and haswell CPU's trying to get them to run cooler.
 
Last edited:
that's fair enough vapour matt. I'll check whether the extra cores will be a factor as far as the VMS is concerned and if so then it makes sense. Thus far its the only option of upgrade before we hit server grade components anyway.

I didn't know about the paste being used is different, that's news to me and thanks for bringing it up. Very useful info.

best regards

Humour
 
Last edited:
Are there any bios options to enable max turbo (3.9Ghz) on all cores at load? May serve as a stop gap.
 
just throwing the thought out there, but perhaps a hexcore xeon with a dual socket mobo might be a cheaper route than going straight to the monster 8 core, it's a downgrade in clocks yes but seems its more parallel you need.

I believe [although I also stand to be corrected] that you can run a dual socket mobo like a c602 with only the 1 cpu. means you'll have plenty of scope for upgrading in future if needs be, even going for a couple of hexcores over the 8 cores would work out considerably cheaper.
 
just throwing the thought out there, but perhaps a hexcore xeon with a dual socket mobo might be a cheaper route than going straight to the monster 8 core, it's a downgrade in clocks yes but seems its more parallel you need.

I believe [although I also stand to be corrected] that you can run a dual socket mobo like a c602 with only the 1 cpu. means you'll have plenty of scope for upgrading in future if needs be, even going for a couple of hexcores over the 8 cores would work out considerably cheaper.

A quick bit of research suggests you can run these boards with 1 CPU so this could be a viable option if you can split the load over one or two 6 core chips.
 
Is there not an option in the software to limit how many cores it uses? that way you could leave one free?

I would have thought a 3930K setup would be your best bet, as you say 8 core Xeons are clocked much lower and thus have a lower IPC so you'll more than likely just get the same situation happening but with 8 cores running at 70-90% instead.

A dual Xeon system with two fast quad cores might also be an option but I imagine it will be costly, but has the benefit of ECC memory seeing as they'll be in 24/7 operation.

edit: actually there are some 3.1ghz 8 core Xeons which might be well suited but the price for them is £1.5k...

I would give a 3930K system a try and perhaps overclock it a little to 4ghz if needed, you just want a decent quality motherboard.
 
Last edited:
Firstly, I'd like to thank all of you for posting and taking the time to provide an opinion. :cool:

@ eddyr, I will have to check, although I have spent some time in the bios of each machine to set it up and lock it down, I haven't really looked too deep into the CPU options.

@ Adolf, (excellent nick btw) I have been thinking along those lines, but I also wasn't certain whether a single chip can be used to run the system and/or how the addition of a second chip will be implemented and used by the OS once the OS is up and running with a single chip. Totally agree that Hexcore in a dual socket mobo is the practical solution from the commercial perspective. Still though, there wont be many beer tokens left after such a purchase :D but its definitely the more scalable solution moving forward.

@ Redmint, thanks for confirming that its plausible to run a single core, I will definitely be looking into it in more detail.

@ Nuclear, everything is plausible, but in this case its not practical. The reason it isn't is because I chose to go with a 4xDP output GPU (v7900Pro) for a number of reasons which was specced into the job going back over a year! These cards are in the 500+ range and whilst they are hardly breaking a sweat running 4 screens, to break the system down equally, then 2 more machines will be required to distribute the resources evenly across the 8 screen tile. This adds cost, power consumption, uses more valuable rack space, as well as adding more complexity in the system and from the end user functionality standpoint. This is an option, but a last resort option, it's not that bad yet ;)

I've tinkered with the system today and managed to settle both machines down a tad, but still too much utilisation for my liking looking forward a year or two.

I might be able to upload a quick you tube clip that I have taken with my phone, but no promises because my head could end up lubricating a guillotine if the clip is seen online by the wrong people ;)

Thanks again for your thoughts guys and gals, much appreciated.

Hum

P.S. Off to Qatar for a week tomorrow, so wont be thinking or doing much re; this system whilst away. (no not holiday unfortunately).
 
@ mmj, no, the only option is to define the number of "components per process". What this means is the number of video streams per process. If 1, then the software generates a process for each stream being decoded, if that stream has an issue in decoding at any point, then only that stream and process are affected.

The range is from 1 to 16, the latter is 16 components per process. Whilst Im playing, I have configured one WS to work on one extreme and the second with the other extreme. No immediately obvious change in CPU and/or RAM utilisation in either configuration. I will leave it like that for a week and let the client play without telling them, they will let me know if they have noticed any performance difference between the two, if applicable.

The software vendor said that VMS allows the OS (Win7 64bit) to control/utilise and manage multithreading, which tells me the VMS isn't optimised for the hardware which is disappointing.

regards, Hum

[EDIT] After logging off and logging back into the VMS, the 16 components per process WS was quickly brought to its knees at 100% CPU utilisation, so much so I struggled to open the options and reduce the number to a more manageable 6 out of 16. It appears that the higher the number the more load it puts on the CPU. I'd hazard a guess that a dual Hex core setup will be pushed to its limits at 16 components per process setting.
 
Last edited:
I would give a 3930K a try first, it has about 40% more processing power compared to a 3770, so if the 3770 is only borderline struggling I would expect 3930K to manage it at stock without any overclocking.

Plus if you overclocked the 3930K to 3.8ghz on all cores which is it's certified turbo clock rate it would probably manage comfortably and leave plenty of headroom for in the future?
 
The situation is that when each of the 4 x 55" 1080P screens are loaded with a 3x3 tile, I have 36 x H264 @ 3-5Mbps per stream being exclusively decoded by the CPU. Unfortunately the GPU is only used for scaling and rendering in this instance, no decoding workload is being handed off by the video management software and I'm seeing 70%+ CPU utilisation with peaks up to 90%+ during busy scenes.

That's a heavy load for the CPU but a trivial one for those GPUs. The only solution is to offload decoding to the GPU.
 
Back
Top Bottom