• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

RX 5700 XT Undervolting - Confusing results

Associate
Joined
9 Oct 2018
Posts
47
Location
Folkestone
Got myself a RX 5700 XT and thought I'd spend the weekend tweaking it to find a nice sweet spot.
The stock setting reported by WattMan was 2054MHz@1173mV. Stock TimeSpy score was 8702.

My first mistake was using the Mankind Divided benchmark to test stability. I had the card 'stable' (3 subsequent benchmark runs) at 1990MHz@1020mV with almost 2FPS more than stock, -3°C off Max Edge Temp and -4°C off hotspot only for it to crash 3DMark Timespy :/
I spent the rest of yesterday carefully lowering the speed and/or increasing the voltage and got numbers that would pass TimeSpy GPU tests but would heartbreakingly fail a TimeSpy stress test (1984MHz@1020mV scoring 8951 & 1970MHz@1023mV scoring 8885).

This morning I started fresh by reloading the 1990MHz@1020mV profile and lowing the clock to 1900MHz letting WattMan reduce the voltage maintaining ratio which resulted in 986mV.
This passed TimeSpy with a score of 8745 and passed the stress test too (99.7 frame rate stability) but had worse-than-stock Mankind Divided results. 1900MHz@1V is a common underclock/undervolt so I was happy to have beaten that but not willing to rest on my laurels.
I upped the clockspeed to the point that WattMan scaled voltage up to 1V reaching 1934MHz@1V. This passed TimeSpy GPU tests with a disappointing 8750 (I think my notes are messy) but failed stress test.
1930MHz@1V however passed Time Spy stress test with 99.9% stability.

I thought next I would target 1950MHz and this is where things got properly confusing
1950MHz@1010mV - TimeSpy GPU: 8870, Stress Test: FAIL
1950MHz@1012mV - TimeSpy GPU: 8855, Stress Test: FAIL [Here +2mV is detrimental to score]
1950MHz@1013mV - TimeSpy GPU: 8905, Stress Test: FAIL [Here +1mV is super beneficial to score]
1950MHz@1015mV - TimeSpy GPU: 8887, Stress Test: 99.7% stability PASS, Temp: 58°C, HSpot: 75°C

Naturally at this point I was like "that'll do" and started going through the 8 built-in game benchmarks and 6 synthetics I used to test the card at stock. Results were very close to stock being either a little worse or a little better but what stuck me was how inconsistent they were between runs and that they tended to trend downwards. For example my stock Sleeping Dogs (High preset) results were 113.60, 113.40 & 113.10 but 1950MHz@1015mV gave results of 116, 113.7, 114.7 and 115.

I reasoned that going up one more mV might improve the stability and maybe even performance.
1950MHz@1016mV - TimeSpy GPU: 8876, Stress Test: FAIL (not just crash - no output had to restart PC)

I get that additional voltage increases temperature but how can 1mV extra (still remaining way below stock settings) completely destabilize a card?
I'm really at wits end. This is SOO much more complicated, stressful difficult and time consuming than I expected. I'm willing to persevere as it is still kinda fun and there's no risk of damaging my card but I feel I'm at the point where I need to reach out and get some advice. I have of course tried Googling but what works for someone else's silicon isn't going to work for mine and I keep getting overclocking rather than underclocking/undervolting results.

So please - what am I missing and where am I going wrong?
 
You only really know if something is properly stable by running it over an extended period of days/weeks under a variety of workloads and conditions.
That's really disappointing to hear. My intention had been to do a comparison video between my old card, my new card stock and new card tweaked as a fun project to help others as there is very little information about the PowerColor dual-fan model specifically. To be clear I'm not a YouTube partner so I don't make money from ads.
It was a lot of work but I have the results for old card and stock across 8 built-ins (a few at different quality levels), 6 synthetics and 3 custom game benchmarks averaged across 3 runs. The idea then of getting most of the way through that and then finding out my tweaks are not stable and I have to start again is pretty horrible.

you're definitely thinking in too fine margins on the voltage.
That would make sense. Generally I see people working in 25mV increments i.e. "go down until it crashes and then go up 25mV" but to someone on the autistic spectrum that seems barbaric :p . I just kind of thought that if I put the work in i'd be able to find the exact point where it went from stable to unstable and that would be the optimum.

If the default for that gpu is 1.17 Volts, you're already a long way below that at ~1.02 Volts and should probably be thinking of adding another 10 - 20 millivolts to give yourself a reasonable margin of stability, eg 1.03+ Volts.

I started with the guidance from this video that suggests most cards can do 1970@1050 and then kind of went up/down from there as that was stable in Mankind Divided. The stock voltage was 1.17V but the clock speed was set much higher on the WattMan graph than what I am trying at 2054MHz. Voltage required to raise frequency increases exponentially as I understand it so I figured if I'm taking frequency down a load then I should be able to decrease voltage to a greater extent.

Last night, after I posted, I tried some higher voltages e.g. 1.05 and 1.06 but still found the TimeSpy scores to be varying quite wildly between runs. My stock results were 8700, 8702 and 8703 but I'm getting results that vary from eachother by 10s of points. That probably explains the inverse relationship between score and voltage I showed in my initial post - it's noise and instability. I only did single runs to get those results - averaged over a number of runs they'd probably be much closer.

I feel now that doing 3-4 runs of the TimeSpy GPU tests and eyeballing how close the scores are is a better and more practical test of stability than the actual stress test. The stress test seems to do 20 runs of GPU Test 1 and then evaluates based on how close the frame rate is between runs but in doing so takes AGES and isn't discerning enough. I also found that a lot of the time clearly unstable speed/voltages would pass GPU Test 1 but fail late into GPU Test 2 so the stress test only using GPU Test 1 seems a big failing.


Afaik you can't actually change voltage in such increments. When you do the card reverts to a different voltage altogether in order to function properly. Check voltage in real time w/ afterburner etc to validate.
I had noticed that measured max voltage was exceeding the voltage I had set but it seems to still be relative/offset from the voltage set. For example at 1988MHz@1020mV measured voltage hit 1025mV max and at 1950MHz@1015mV measured voltage hit a max of 1018mV while using 3 different programs.

Afterburner seems unable to report voltage unfortunately :( . Voltage monitoring is ticked but is constantly at 0 in the GUI and no option to add to the overlay. I have been leaving GPU-Z in the system tray during a run and then looking at max recorded values on it's sensor page.

I will be paying closer attention to voltage though as it is likely that raising voltage by 1mV is not actually raising measured voltage by the same amount i.e the offset is stepped.
I'll also maybe see if the measured voltage is 'accurate' at stock as that might be another metric to assess stability if it. If as you say the card is ignoring me then finding the point where it accepts the voltage goal it is given and doesn't need to exceed it should be a good way to find stable settings?
 
Update:
Did more testing at stock and also the 'auto-undervolt' option
Stock is 2054MHz@1173mV as previously stated. Measured voltage is actually 1175mV. Adding two more runs to the 4 I recorded in my spreadsheet I get TimeSpyGPU scores of 8700, 8702, 8703, 8696 & 8701 so a range of 7 from lowest to highest
Auto-undervolt keeps the same clockspeed but knocks the voltage down by 50mV to 1123mV. Measured voltage is actually 1125mV. 4 runs of TimeSpyGPU gave 8819, 8820, 8811 & 8820 so a range of 9 from lowest to highest

Stock is known stable as I have ran it through a load of benchmarks and other workloads over a number of days. I would assume the 'auto-undervolt' is conservative and therefore also stable.
They both have in common a measured voltage only 2mV over set voltage and a range in scores of less than 10 over 4 runs.
I think that's going to be my goal now :)
 
Last edited:
That said it might help with a reference card or one of the lesser non reference cards like an Msi Mech.
I've got a PowerColor Dual Fan which is certainly a 'lesser' card price and market segmentation-wise but since it has not been reviewed by any of the major reviewers it's hard to tell how it stacks up against the premium cards cooling-wise. I'm getting good temperatures but it's winter so ambient is 16°C-18°C and I have good airflow from 3 140mm fans in the front. I've got no way to noise-normalise anyway to compare to Gamers Nexus' results.

I wanted a non-reference card with horizontal fins so it would vent out the back of my case and not onto my SSD unfortunately positioned on the motherboard directly below it. All the 'premium' cards have vertical fins - presumably as they result in more effective cooling of the card itself but at the expense of throwing out heated air into your case.
That restricted me to the Dual Fan or the Pulse and I favoured the simple aesthetics and lower price of the Dual Fan (£368.99 for xt at Overclockers ATM!).


Does it really need it? I know lowering the voltage along with other tweaks gave good results with Vega but after an initial play with Navi I didn't see any worthwhile change to do it
I mean the 'auto-undervolt' in two clicks gives improved performance (figures were round the wrong way in my above post - have fixed), thermals and power consumption in my testing. It's not a dramatic difference but it seems like a no-brainer for most RX 5700 XT owners even if they have premium cooling.
I just want to put the time in and go a little further than that voltage-wise and also drop the frequency as it has no chance of getting close to the stock WattMan frequency of 2054MHz.

In this video the guy saw average clockspeed increase from 1855MHz to 1908Mhz with drops to temperature and power usage.
I haven't been examining average clockpeed as it would require me to carefully hit hotkeys at the exact start and finish of benchmarks and I'm not prepared to sit in front of my PC for every benchmark run :P

Of course results will vary and some unlucky people will have silicon that require stock voltages to be stable but that's the silicon lottery.
 
This includes the Powercolor Dual Card & it looks to be pretty good.
Nah I've seen that video - it features the Red Dragon not the Dual Fan.
They look similar but the Red Dragon has a vertically finned heatsink and is slightly larger including slightly larger fans.
There are some very amateur reviews and some non-english language reviews of the dual fan but no good temperature information so I hope to improve that situation a little.

HjWbZca.png

I happened to notice the 19.11.3 driver I am using has the known issue:
"Radeon RX 5700 series graphics products may intermittently experience loss of display or video signal during gameplay"
I am worried that this might be what I am seeing sometimes and thinking it's a crash/instability. Notably the 'crash' I got at 1950MHz@1016mV was this kind of 'failure'.
I've reverted back to 19.11.1 which doesn't have it listed as a known issue but I have still had it occur. Possibly the issue was discovered in 19.11.1 but only became a 'known issue' with 19.11.2?
Maybe I should wait for a driver where it is fixed but I don't know how long that will take :/
I think I might just run the full array of tests on the auto-undervolt settings as another point of comparison.
 
Back
Top Bottom