I was the one who told you about the latency being an issue holding back ryzen because of the Fabric running at Ram frequency. It was very funny, I was trying to spread that message at
Overclock.net and was being told I was an Idiot 3 months ago. I am laughing down my sleeve now as everyone is talking about latency and timings.
The extra bandwidth in GB/s above 3200 doesn't really help that much for extra performance. Kits that are rated at higher frequencies and can be used as you have to keep latencies low by dropping the kit to run slower but tighter. 4000mhz kits are not worth spending the money on unless you can turn the binning to your advantage, running the ram at lower frequencies but with much tighter timings, given that 4000mhz kits are rated at CL18 and CL19 they are not actually that fast to start with compared to the 3200c14, 3600cl15 or CL16 kits making the slower kits a much better buy. When 4500 becomes an available product then the ultra high kits will give Ryzen better performance as their native latency will be around 8ns but they will initially be cost prohibitive.
One thing that you didnt mention is that all ram has a native latency that you can calculate from the timings and it can be helpful in deciding what direction to take in setting up you system memory. The formula is (CAS setting/Kit MTs rating) x 2000 or you can use (CAS/actual frequency Mhz) x 1000 ad the DDR4 kits are double data rate kits. The actual frequency that the Infinity Fabric is using is half of what the kit states. the 2000 takes this into account so you don need to remember to divide the kit speed in two.
18/4000 x 2000 = 9 ns
16/3600 x 2000 = 8.89ns
14/3466 x 2000 = 8.08ns
14/3200 x 2000 = 8.75ns
12/3200 x 2000 = 7.5ns
You can see that your kit at 3466 is better than the kit at 3600c16 or 4000c18. If you can get your system to boot and run with your Ram kit set at 3200 CL12 settings, you should find even more performance in gaming.
If you divide the result by the cas setting, it will tell you how many nanoseconds each cycle tales. The infinity fabric transfers data between CCX modules, memory controllers and PCIe controllers at the rate of 32 bytes per cycle. Remember that if you want to convert everything to nanoseconds to investigate the component parts of system latency, Cache memory on Ryzen is running at CPU frequency and not ram frequency so the number of cycles cache memory takes to get a 4ns L2 cache latency is different from the number of cycles it takes for the system RAM to do something in 4ns. When you measure system latency the path is L1+L2_L3 cache then the time it tales the system ram for do its stuff based on these timings.
You may find though that as latency decreases with really tight timings, there is a cross over point where multithreaded performance multiplier efficiency over single core performance starts to drop off. The key is to find the best compromise between latency and multi thread performance. The idle period of time where the CPU primary thread on a core is queued up waiting for memory access because of latency gives the secondary SMT thread access to CPU time to do its process work. That is why the Ryzen chips mutlithreaded performance tends to scale better over their single core scores than Intel chips do. Intel hyper thread efficiency is about 25% vs Ryzen getting almost 50%. You can check the multiplier using the single and multi core scores in cinebench.
BTW for Witcher 3 and Watchdogs 2, you might like to try setting the CPU affinity for the game processes to just use CPU8-CPU16 and see how that performs compared to the 7700K. you should see a boost in performance