New workstation Setup

Associate
Joined
10 Sep 2008
Posts
11
Hi all,

Long time lurker!!

I'm looking to build a workstation PC for my home office, I've been using a 10yr old laptop which is so slow that I'm far from being productive. I've now switched to using my gaming PC in the interim whilst I try and decide what components I want in this new build.

My current PC is pretty good (8700k @ 4.9, GTX 1080ti, 16gb of DDR4 3000) but I built it as a base for racing simulators, and would rather leave it in the room where my rig is setup rather than have to keep lugging it about! But it's awesome for my day to day work, so I've decided to build another solely for work (maybe with a little bit of light gaming!).

As this will be a work computer, it's main roles will include OCR'ing of PDFs (I sometimes have to OCR 100 of gigs of data, which on my laptop would take weeks!)and searching those PDFs for character strings. It will also be used for CAD design and some light software development.

I've come up with the below spec, but would be grateful of any advice that anyone has to offer!

My basket at Overclockers UK:
Total: £2,235.51 (includes shipping: £12.60)​

A couple of questions:
Is the i9 9900K worth the current £100 premium over the i7 9700K?
Ideally I would like 32gb of RAM, but not sure if by todays RAM standards it is worth paying for the extra 16gb?
Is the RTX 2080 overkill (I'll probably dabble in some flight sim, but not much)

Thanks in advance!
 
As this will be a work computer, it's main roles will include OCR'ing of PDFs (I sometimes have to OCR 100 of gigs of data, which on my laptop would take weeks!)and searching those PDFs for character strings. It will also be used for CAD design and some light software development.
how cpu/gpu intensive are these? does ocr hammer your storage drives?

what's the budget?
 
It works the CPU quite hard, the GPU is normally at idle! For storage drives, I normally work a couple of projects at once and then transfer to my clients cloud drive, I find that 500gb is more than enough for a couple of projects.

Budget wise, around £2500
 
Save some money on the psu. A 750W is still more than enough for that spec.

My basket at Overclockers UK:
Total: £105.49 (includes shipping: £10.50)


I would buy the board, cpu, ssd and ram separately. The Gigabyte RGB ram is pretty expensive seeing as you can get a 16GB kit for £89.99. Plus you have selected a non windowed case so it won't even be on show.

Saves about another £75.

So those savings would easily pay for another 2 x 8GB ram if you wanted it.



My basket at Overclockers UK:
Total: £915.06 (includes shipping: £11.10)
 
Personally I'd go with a threadripper. Out of pure interest as well why in God's name are you not using some kind of indexer for the pdf's? Personally I'd set up a drop folder on a server with some sort of work flow and an indexer.

In fact this is exactly what I do with a mix of tools from autonomy alongside Hp's idle indexer. I even scrape the pdf's and auto file them to our dms. What I'm saying is automate that job, there are a million products out there that will do it for you.
 
My basket at Overclockers UK:
Total: £2,102.98 (includes shipping: £14.10)

compared to your spec:
1) same i9 9900k build, but with a cheaper board, still has the same VRMs so not losing much
2) 32gb ram
3) cheaper PSU, less watts (but more than enough for a 2080)
4) better storage option. 1tb ssd for your scratch disk, then 2x 2tb hdd for storage (to run in raid 1). i suspect you'd want best reliability before it gets uploaded to the cloud
5) changed the cooling to a normal HSF. aio cooling looks nice, but you have to deal with the possibility that the pump may fail at some point, or that the hose/fittings/pump may leak...whereas a lump of metal with fans won't have that issue.
6) changed case to a better ventilated case. added an extra 2 fans for the front. (case only comes with 2 fans at rear)
7) cheaper in total!
 
Thanks for the suggestions, I'll make some changes I think.

I'm not too clued up on threadripper systems so didn't even look to be honest!

Most of the data I receive is raw scanned data, that hasn't been looked at. It's messy and long winded, but I'm looking to introduce some of my own workflow.
 
Catch 22,
Ryzen 3000. Confirmed 12 core. Not sure if 16 core will hit as mainstream equals lowest core count of high end , but could be wrong and 16 core ryzen 3000 pops up.

Straight away it'll be cheaper , the core - less PCIe lanes , the motherboard - dual channel and not quad along with PCIe pathways etc .

Only problem is waiting... 5 months .

Builds listed above are solid as a rock, heck you'll find yourself gaming on that one !
CAD is mainly CPU rendering so can just slump in RTax 2060 and save the cash , unless you have GPU rendering software you use afterwards ?
 
Thanks for the suggestions, I'll make some changes I think.

I'm not too clued up on threadripper systems so didn't even look to be honest!

Most of the data I receive is raw scanned data, that hasn't been looked at. It's messy and long winded, but I'm looking to introduce some of my own workflow.

Exactly what im talking about, users scan in raw pdf's and no OCR at this point is done the mfd then drops the pdf into a users drop folder on an OCR server (VM) as well as emails it to them etc or whatever else. As it hits this folder the ocr software of choice kicks in and ocr's the document. Once done it's indexed for all content, the content checked for certain complex phrases, case numbers, invoice numbers etc etc etc and stored in DMS folders relevant to content. Stuff that meets no rules gets dropped into another folder, analysed and then you create new rules to cater for such a document. This is the sort of workflow you should be thinking about imo, what it sounds like your signing up for is a huge amount of time wasted.

of course this is just an example :)
 
Think the idea (Tamzzy's) to have physical back-up is smart as well. Never know when network issues can strike (on your end or the cloud's end), and delay any further work on that small drive until you can upload. Plus exporting to HDD will be faster than the uploads, which you can then do from the HDDs so your SSD remains free to provide the normal speed if you need to continue working with it (not sure if the latter would make a tangible difference and also don't know how long the uploads tend to take but hey).

It does seem though, as if the searching PDF task would benefit from the fastest SSD speed available (like new Samsung Plus Polaris), at least if you one day automate that task as Vince suggested? Obviously manual-ish searching won't be of noticeable benefit.

Windows will cache (Standby files in virtual memory etc) as much as it can into RAM even if you don't see physical RAM usage maxed. And if you have a multiple work-type + gaming scenario like yours, with multiple apps' Standby files getting cached, 32GB can mean not having to restart as often as with 16GB if the system starts feeling sluggish, plus a litte boost when opening stuff that's still in the memory cache. You may even find some use/excuse for a RAMDisk (AMD's version of Dataram RAMDisk allows up to 4GB free, works on Intel systems too). If feasible to use for an aspect of your work, that'd speed things up even more.

What might be overkill is the 2080 for "light gaming" (cough cough). Which monitor will you be using?
 
Thanks guys, appreciate the input and recommendations!

I agree that the 2080 is overkill, but I don't plan on upgrading this machine for a few years, so should last!! I'm using a Benq 4K 32" Monitor.

I'll have a ponder over the next couple of days.

Exactly what im talking about, users scan in raw pdf's and no OCR at this point is done the mfd then drops the pdf into a users drop folder on an OCR server (VM) as well as emails it to them etc or whatever else. As it hits this folder the ocr software of choice kicks in and ocr's the document. Once done it's indexed for all content, the content checked for certain complex phrases, case numbers, invoice numbers etc etc etc and stored in DMS folders relevant to content. Stuff that meets no rules gets dropped into another folder, analysed and then you create new rules to cater for such a document. This is the sort of workflow you should be thinking about imo, what it sounds like your signing up for is a huge amount of time wasted.

of course this is just an example :)

It is so painful, but all of the consultants I work with have always done it this way, and is considered part of the job (most of them are happy to sit around and let their computers chug away for hours/days on end processing data!!). I'm hoping with a bit of thought I can come up with a much more efficient work flow, even if it is just for me!
 
Thanks guys, appreciate the input and recommendations!

I agree that the 2080 is overkill, but I don't plan on upgrading this machine for a few years, so should last!! I'm using a Benq 4K 32" Monitor.

I'll have a ponder over the next couple of days.



It is so painful, but all of the consultants I work with have always done it this way, and is considered part of the job (most of them are happy to sit around and let their computers chug away for hours/days on end processing data!!). I'm hoping with a bit of thought I can come up with a much more efficient work flow, even if it is just for me!

You know this is exactly why you should make that change! :) Look up a piece of software called PDF converter professional by Nuance, it's cheap and has the ability to batch out OCR jobs and even schedule jobs unless you are already using something a lot better like iManage ocr module or million of other corporate variants. If I was going solo on something like this id be looking at cutting down crazy time as much as possible, right here you have a prime example of where there is a huge amount of time to be saved in the workflow and not a huge amount in the hardware beyond a point. Basically anything recent you have is going to do the job quicker than you can do all of the "human" part.

If it were me id write a bit of vba for outlook to dump en mass pdf's from outlook folders into system folders and then ocr them en mass pulling relevant content into some sort of database or indexer where I can query the data. If I was having to manually ocr hundreds of gigs of pdf data at a time I think i'd want to kill myself in the face, id even favour writing my own vb.net app to do the job for me over something manual.
 
Back
Top Bottom