• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

** The Official Nvidia GeForce 'Pascal' Thread - for general gossip and discussions **

Caporegime
Joined
24 Sep 2008
Posts
38,280
Location
Essex innit!
Yeah, something is wrong with those drivers towards that test. You don't even reach Graphics part, or graphics + compute part. They would come after kernel number 512, and you crash before that.

Don't bother testing futher as it is.

Try this lighter version instead.

https://forum.beyond3d.com/posts/1868993/


It runs only 128 kernels. Also gives output in easier form, without extra timestamps.

Same for Humbug.

okies, here is my results:

Compute only:
1. 11.54ms
2. 11.59ms
3. 10.94ms
4. 10.46ms
5. 10.59ms
6. 10.74ms
7. 9.88ms
8. 9.89ms
9. 9.82ms
10. 9.83ms
11. 9.85ms
12. 9.84ms
13. 10.19ms
14. 9.83ms
15. 9.85ms
16. 9.82ms
17. 9.85ms
18. 9.86ms
19. 9.86ms
20. 9.86ms
21. 9.88ms
22. 9.90ms
23. 9.93ms
24. 9.88ms
25. 9.95ms
26. 9.92ms
27. 9.90ms
28. 9.91ms
29. 9.88ms
30. 9.85ms
31. 9.87ms
32. 9.86ms
33. 19.32ms
34. 19.37ms
35. 19.28ms
36. 19.27ms
37. 19.31ms
38. 19.63ms
39. 20.06ms
40. 19.89ms
41. 19.93ms
42. 19.78ms
43. 19.31ms
44. 19.35ms
45. 19.45ms
46. 19.51ms
47. 19.41ms
48. 19.45ms
49. 19.41ms
50. 19.45ms
51. 19.47ms
52. 19.52ms
53. 19.44ms
54. 19.62ms
55. 19.62ms
56. 19.60ms
57. 19.67ms
58. 19.80ms
59. 19.61ms
60. 19.62ms
61. 19.62ms
62. 19.62ms
63. 19.62ms
64. 29.52ms
65. 29.33ms
66. 29.33ms
67. 29.44ms
68. 29.58ms
69. 29.29ms
70. 29.34ms
71. 29.36ms
72. 29.45ms
73. 29.33ms
74. 29.35ms
75. 29.38ms
76. 29.38ms
77. 29.33ms
78. 29.36ms
79. 29.35ms
80. 29.42ms
81. 29.36ms
82. 29.36ms
83. 29.37ms
84. 29.56ms
85. 29.33ms
86. 29.35ms
87. 29.34ms
88. 29.39ms
89. 29.38ms
90. 29.33ms
91. 29.33ms
92. 29.37ms
93. 29.34ms
94. 29.37ms
95. 29.34ms
96. 38.83ms
97. 38.73ms
98. 39.30ms
99. 40.20ms
100. 39.30ms
101. 38.77ms
102. 38.92ms
103. 38.84ms
104. 38.84ms
105. 39.00ms
106. 38.79ms
107. 38.87ms
108. 38.93ms
109. 38.90ms
110. 38.86ms
111. 38.92ms
112. 38.91ms
113. 38.91ms
114. 38.89ms
115. 38.98ms
116. 38.90ms
117. 39.10ms
118. 39.12ms
119. 39.04ms
120. 39.08ms
121. 39.09ms
122. 39.06ms
123. 39.15ms
124. 39.08ms
125. 39.06ms
126. 39.03ms
127. 39.12ms
128. 48.77ms
Graphics only: 16.25ms (103.23G pixels/s)
Graphics + compute:
1. 25.89ms (64.79G pixels/s)
2. 25.88ms (64.82G pixels/s)
3. 25.85ms (64.90G pixels/s)
4. 25.76ms (65.14G pixels/s)
5. 25.82ms (64.97G pixels/s)
6. 25.91ms (64.76G pixels/s)
7. 25.84ms (64.94G pixels/s)
8. 25.86ms (64.88G pixels/s)
9. 25.82ms (64.98G pixels/s)
10. 25.90ms (64.77G pixels/s)
11. 25.89ms (64.80G pixels/s)
12. 25.87ms (64.84G pixels/s)
13. 25.87ms (64.86G pixels/s)
14. 25.86ms (64.88G pixels/s)
15. 25.88ms (64.82G pixels/s)
16. 25.86ms (64.87G pixels/s)
17. 25.82ms (64.97G pixels/s)
18. 25.83ms (64.95G pixels/s)
19. 25.88ms (64.82G pixels/s)
20. 25.84ms (64.94G pixels/s)
21. 25.82ms (64.97G pixels/s)
22. 25.84ms (64.92G pixels/s)
23. 25.85ms (64.91G pixels/s)
24. 25.91ms (64.76G pixels/s)
25. 25.80ms (65.02G pixels/s)
26. 25.78ms (65.08G pixels/s)
27. 25.81ms (65.00G pixels/s)
28. 25.98ms (64.57G pixels/s)
29. 25.80ms (65.03G pixels/s)
30. 25.76ms (65.14G pixels/s)
31. 25.85ms (64.90G pixels/s)
32. 25.82ms (64.97G pixels/s)
33. 35.45ms (47.33G pixels/s)
34. 35.36ms (47.45G pixels/s)
35. 35.32ms (47.50G pixels/s)
36. 35.38ms (47.42G pixels/s)
37. 35.26ms (47.58G pixels/s)
38. 35.22ms (47.63G pixels/s)
39. 35.33ms (47.49G pixels/s)
40. 35.40ms (47.39G pixels/s)
41. 35.42ms (47.37G pixels/s)
42. 35.48ms (47.29G pixels/s)
43. 35.42ms (47.37G pixels/s)
44. 35.41ms (47.38G pixels/s)
45. 35.45ms (47.33G pixels/s)
46. 35.58ms (47.15G pixels/s)
47. 35.39ms (47.40G pixels/s)
48. 35.49ms (47.28G pixels/s)
49. 35.60ms (47.13G pixels/s)
50. 35.41ms (47.38G pixels/s)
51. 35.46ms (47.31G pixels/s)
52. 35.47ms (47.31G pixels/s)
53. 35.59ms (47.15G pixels/s)
54. 35.64ms (47.07G pixels/s)
55. 35.62ms (47.10G pixels/s)
56. 35.88ms (46.76G pixels/s)
57. 35.52ms (47.23G pixels/s)
58. 35.60ms (47.13G pixels/s)
59. 35.63ms (47.09G pixels/s)
60. 35.68ms (47.02G pixels/s)
61. 35.60ms (47.13G pixels/s)
62. 35.68ms (47.02G pixels/s)
63. 35.66ms (47.04G pixels/s)
64. 45.25ms (37.08G pixels/s)
65. 45.30ms (37.04G pixels/s)
66. 45.38ms (36.97G pixels/s)
67. 45.24ms (37.08G pixels/s)
68. 45.36ms (36.98G pixels/s)
69. 45.27ms (37.06G pixels/s)
70. 45.32ms (37.02G pixels/s)
71. 45.35ms (36.99G pixels/s)
72. 45.30ms (37.04G pixels/s)
73. 45.47ms (36.90G pixels/s)
74. 45.35ms (36.99G pixels/s)
75. 45.25ms (37.08G pixels/s)
76. 45.44ms (36.92G pixels/s)
77. 45.27ms (37.06G pixels/s)
78. 45.31ms (37.03G pixels/s)
79. 45.35ms (37.00G pixels/s)
80. 45.31ms (37.03G pixels/s)
81. 45.28ms (37.05G pixels/s)
82. 45.39ms (36.96G pixels/s)
83. 45.24ms (37.09G pixels/s)
84. 45.42ms (36.94G pixels/s)
85. 45.20ms (37.12G pixels/s)
86. 45.33ms (37.01G pixels/s)
87. 45.35ms (36.99G pixels/s)
88. 45.26ms (37.07G pixels/s)
89. 45.41ms (36.94G pixels/s)
90. 45.46ms (36.90G pixels/s)
91. 45.22ms (37.10G pixels/s)
92. 45.28ms (37.05G pixels/s)
93. 45.17ms (37.14G pixels/s)
94. 45.26ms (37.07G pixels/s)
95. 45.30ms (37.04G pixels/s)
96. 54.71ms (30.67G pixels/s)
97. 54.71ms (30.66G pixels/s)
98. 54.64ms (30.70G pixels/s)
99. 54.79ms (30.62G pixels/s)
100. 54.73ms (30.65G pixels/s)
101. 54.68ms (30.68G pixels/s)
102. 54.80ms (30.61G pixels/s)
103. 54.73ms (30.65G pixels/s)
104. 54.98ms (30.51G pixels/s)
105. 54.78ms (30.63G pixels/s)
106. 54.82ms (30.60G pixels/s)
107. 54.74ms (30.65G pixels/s)
108. 54.90ms (30.56G pixels/s)
109. 54.79ms (30.62G pixels/s)
110. 54.80ms (30.61G pixels/s)
111. 54.87ms (30.58G pixels/s)
112. 54.83ms (30.60G pixels/s)
113. 55.17ms (30.41G pixels/s)
114. 54.81ms (30.61G pixels/s)
115. 54.95ms (30.53G pixels/s)
116. 54.84ms (30.60G pixels/s)
117. 55.04ms (30.48G pixels/s)
118. 54.96ms (30.53G pixels/s)
119. 54.93ms (30.54G pixels/s)
120. 55.01ms (30.50G pixels/s)
121. 54.96ms (30.53G pixels/s)
122. 55.10ms (30.45G pixels/s)
123. 54.93ms (30.54G pixels/s)
124. 55.04ms (30.48G pixels/s)
125. 54.94ms (30.54G pixels/s)
126. 55.18ms (30.40G pixels/s)
127. 54.93ms (30.54G pixels/s)
128. 64.70ms (25.93G pixels/s)

Not sure what all that means but there it is :)
 
Associate
Joined
27 Oct 2014
Posts
550
Location
Finland.
I'd analyze it that there is some benefit from asyncronous queye used now in driver. But very minimal. They are still executed serially.

Asyncronous would look like this :

Compute only:
1. 6.72ms
2. 6.72ms
3. 6.72ms
4. 6.72ms
5. 6.71ms
6. 6.72ms
7. 6.71ms
8. 6.71ms
9. 6.71ms
10. 6.71ms
11. 6.71ms
12. 6.72ms
13. 6.71ms
14. 6.71ms
15. 6.72ms
16. 6.72ms
17. 6.72ms
18. 6.72ms
19. 6.71ms
20. 6.72ms
21. 6.72ms
22. 6.72ms
23. 6.72ms
24. 6.72ms
25. 6.72ms
26. 6.72ms
27. 6.72ms
28. 6.72ms
29. 6.72ms
30. 6.72ms
31. 6.72ms
32. 6.72ms
33. 6.72ms
34. 6.72ms
35. 6.72ms
36. 6.72ms
37. 6.72ms
38. 6.73ms
39. 6.72ms
40. 6.72ms
41. 6.72ms
42. 6.72ms
43. 6.72ms
44. 6.72ms
45. 6.72ms
46. 6.72ms
47. 6.72ms
48. 6.72ms
49. 6.72ms
50. 6.72ms
51. 6.72ms
52. 6.72ms
53. 6.72ms
54. 6.72ms
55. 6.72ms
56. 6.72ms
57. 6.73ms
58. 6.72ms
59. 6.72ms
60. 6.72ms
61. 6.72ms
62. 6.72ms
63. 6.72ms
64. 6.72ms
65. 6.72ms
66. 6.72ms
67. 6.72ms
68. 6.72ms
69. 6.72ms
70. 6.72ms
71. 6.72ms
72. 6.72ms
73. 6.72ms
74. 6.73ms
75. 6.75ms
76. 6.73ms
77. 6.73ms
78. 6.72ms
79. 6.72ms
80. 6.72ms
81. 6.72ms
82. 6.72ms
83. 6.72ms
84. 6.73ms
85. 6.73ms
86. 6.72ms
87. 6.73ms
88. 6.72ms
89. 6.72ms
90. 6.72ms
91. 6.73ms
92. 6.73ms
93. 6.73ms
94. 6.73ms
95. 6.73ms
96. 6.73ms
97. 6.72ms
98. 6.72ms
99. 6.73ms
100. 6.73ms
101. 6.72ms
102. 6.73ms
103. 6.73ms
104. 6.73ms
105. 6.73ms
106. 6.72ms
107. 6.72ms
108. 6.73ms
109. 6.73ms
110. 6.72ms
111. 6.73ms
112. 6.73ms
113. 6.73ms
114. 6.73ms
115. 6.73ms
116. 6.73ms
117. 6.73ms
118. 6.73ms
119. 6.73ms
120. 6.73ms
121. 6.73ms
122. 6.73ms
123. 6.73ms
124. 6.73ms
125. 6.73ms
126. 6.73ms
127. 6.73ms
128. 6.73ms
Graphics only: 25.55ms (65.67G pixels/s)
Graphics + compute:
1. 25.69ms (65.31G pixels/s)
2. 25.49ms (65.83G pixels/s)
3. 25.32ms (66.25G pixels/s)
4. 25.42ms (66.01G pixels/s)
5. 25.63ms (65.46G pixels/s)
6. 25.47ms (65.88G pixels/s)
7. 25.16ms (66.69G pixels/s)
8. 25.60ms (65.55G pixels/s)
9. 25.52ms (65.75G pixels/s)
10. 25.44ms (65.96G pixels/s)
11. 25.42ms (66.00G pixels/s)
12. 25.55ms (65.67G pixels/s)
13. 25.56ms (65.65G pixels/s)
14. 25.58ms (65.58G pixels/s)
15. 25.48ms (65.85G pixels/s)
16. 25.39ms (66.07G pixels/s)
17. 25.42ms (65.99G pixels/s)
18. 25.56ms (65.65G pixels/s)
19. 25.49ms (65.83G pixels/s)
20. 25.32ms (66.27G pixels/s)
21. 25.51ms (65.76G pixels/s)
22. 25.49ms (65.81G pixels/s)
23. 25.67ms (65.35G pixels/s)
24. 25.58ms (65.58G pixels/s)
25. 25.48ms (65.85G pixels/s)
26. 25.32ms (66.27G pixels/s)
27. 25.55ms (65.66G pixels/s)
28. 25.60ms (65.53G pixels/s)
29. 25.36ms (66.17G pixels/s)
30. 25.42ms (66.00G pixels/s)
31. 25.38ms (66.10G pixels/s)
32. 25.46ms (65.90G pixels/s)
33. 25.44ms (65.94G pixels/s)
34. 25.48ms (65.83G pixels/s)
35. 25.29ms (66.34G pixels/s)
36. 25.38ms (66.09G pixels/s)
37. 25.55ms (65.66G pixels/s)
38. 27.01ms (62.12G pixels/s)
39. 25.59ms (65.57G pixels/s)
40. 25.35ms (66.19G pixels/s)
41. 25.47ms (65.87G pixels/s)
42. 25.36ms (66.17G pixels/s)
43. 25.41ms (66.02G pixels/s)
44. 25.37ms (66.14G pixels/s)
45. 25.50ms (65.79G pixels/s)
46. 25.42ms (66.00G pixels/s)
47. 25.50ms (65.80G pixels/s)
48. 25.45ms (65.93G pixels/s)
49. 25.40ms (66.05G pixels/s)
50. 25.47ms (65.88G pixels/s)
51. 25.48ms (65.85G pixels/s)
52. 25.46ms (65.90G pixels/s)
53. 25.39ms (66.09G pixels/s)
54. 25.60ms (65.52G pixels/s)
55. 25.49ms (65.83G pixels/s)
56. 25.47ms (65.88G pixels/s)
57. 25.53ms (65.72G pixels/s)
58. 25.49ms (65.82G pixels/s)
59. 25.53ms (65.72G pixels/s)
60. 25.42ms (65.99G pixels/s)
61. 25.40ms (66.06G pixels/s)
62. 25.42ms (66.00G pixels/s)
63. 25.50ms (65.79G pixels/s)
64. 25.22ms (66.52G pixels/s)
65. 25.46ms (65.89G pixels/s)
66. 25.45ms (65.93G pixels/s)
67. 25.49ms (65.81G pixels/s)
68. 25.28ms (66.37G pixels/s)
69. 25.44ms (65.96G pixels/s)
70. 25.42ms (66.00G pixels/s)
71. 25.47ms (65.87G pixels/s)
72. 25.40ms (66.05G pixels/s)
73. 25.61ms (65.50G pixels/s)
74. 25.50ms (65.79G pixels/s)
75. 25.41ms (66.01G pixels/s)
76. 25.34ms (66.20G pixels/s)
77. 25.53ms (65.73G pixels/s)
78. 25.53ms (65.72G pixels/s)
79. 25.45ms (65.93G pixels/s)
80. 27.12ms (61.87G pixels/s)
81. 25.39ms (66.09G pixels/s)
82. 26.99ms (62.17G pixels/s)
83. 27.05ms (62.02G pixels/s)
84. 27.17ms (61.75G pixels/s)
85. 28.78ms (58.30G pixels/s)
86. 28.77ms (58.31G pixels/s)
87. 27.08ms (61.94G pixels/s)
88. 26.87ms (62.43G pixels/s)
89. 27.10ms (61.91G pixels/s)
90. 27.16ms (61.78G pixels/s)
91. 27.02ms (62.08G pixels/s)
92. 26.93ms (62.30G pixels/s)
93. 26.99ms (62.17G pixels/s)
94. 28.75ms (58.36G pixels/s)
95. 30.33ms (55.32G pixels/s)
96. 27.05ms (62.01G pixels/s)
97. 27.04ms (62.04G pixels/s)
98. 28.64ms (58.58G pixels/s)
99. 28.83ms (58.20G pixels/s)
100. 25.58ms (65.58G pixels/s)
101. 27.08ms (61.96G pixels/s)
102. 27.18ms (61.74G pixels/s)
103. 28.76ms (58.34G pixels/s)
104. 30.53ms (54.95G pixels/s)
105. 27.17ms (61.74G pixels/s)
106. 30.35ms (55.28G pixels/s)
107. 32.00ms (52.42G pixels/s)
108. 30.43ms (55.13G pixels/s)
109. 30.34ms (55.30G pixels/s)
110. 28.59ms (58.69G pixels/s)
111. 28.60ms (58.66G pixels/s)
112. 28.46ms (58.95G pixels/s)
113. 28.55ms (58.77G pixels/s)
114. 26.99ms (62.16G pixels/s)
115. 27.11ms (61.90G pixels/s)
116. 32.05ms (52.35G pixels/s)
117. 28.91ms (58.03G pixels/s)
118. 27.42ms (61.18G pixels/s)
119. 30.59ms (54.84G pixels/s)
120. 32.04ms (52.37G pixels/s)
121. 32.00ms (52.43G pixels/s)
122. 28.78ms (58.30G pixels/s)
123. 30.25ms (55.46G pixels/s)
124. 28.63ms (58.60G pixels/s)
125. 32.12ms (52.24G pixels/s)
126. 28.72ms (58.41G pixels/s)
127. 32.01ms (52.41G pixels/s)
128. 32.06ms (52.33G pixels/s)

Notice how graphics + compute takes same time to finish as single graphics que would, while in your example they are added, with very slight boost.

Don't pay attention to times itself, I ran a bit faster shadesr to finish test sooner as I don't like to wait, this is no benchmark.

I do hope Pascal will get true asyncronous shaders, but so far it atleast looks like nvidia should be able to run async code without penalty atleast. Ofc this is just my observation of this test. I'm sure real gurus over b3d will find the real truth.
 
Last edited:
Caporegime
Joined
24 Sep 2008
Posts
38,280
Location
Essex innit!
I'd analyze it that there is some benefit from asyncronous queye used now in driver. But very minimal. They are still executed serially.

Asyncronous would look like this :

Compute only:
1. 6.72ms
2. 6.72ms
3. 6.72ms
4. 6.72ms
5. 6.71ms
6. 6.72ms
7. 6.71ms
8. 6.71ms
9. 6.71ms
10. 6.71ms
11. 6.71ms
12. 6.72ms
13. 6.71ms
14. 6.71ms
15. 6.72ms
16. 6.72ms
17. 6.72ms
18. 6.72ms
19. 6.71ms
20. 6.72ms
21. 6.72ms
22. 6.72ms
23. 6.72ms
24. 6.72ms
25. 6.72ms
26. 6.72ms
27. 6.72ms
28. 6.72ms
29. 6.72ms
30. 6.72ms
31. 6.72ms
32. 6.72ms
33. 6.72ms
34. 6.72ms
35. 6.72ms
36. 6.72ms
37. 6.72ms
38. 6.73ms
39. 6.72ms
40. 6.72ms
41. 6.72ms
42. 6.72ms
43. 6.72ms
44. 6.72ms
45. 6.72ms
46. 6.72ms
47. 6.72ms
48. 6.72ms
49. 6.72ms
50. 6.72ms
51. 6.72ms
52. 6.72ms
53. 6.72ms
54. 6.72ms
55. 6.72ms
56. 6.72ms
57. 6.73ms
58. 6.72ms
59. 6.72ms
60. 6.72ms
61. 6.72ms
62. 6.72ms
63. 6.72ms
64. 6.72ms
65. 6.72ms
66. 6.72ms
67. 6.72ms
68. 6.72ms
69. 6.72ms
70. 6.72ms
71. 6.72ms
72. 6.72ms
73. 6.72ms
74. 6.73ms
75. 6.75ms
76. 6.73ms
77. 6.73ms
78. 6.72ms
79. 6.72ms
80. 6.72ms
81. 6.72ms
82. 6.72ms
83. 6.72ms
84. 6.73ms
85. 6.73ms
86. 6.72ms
87. 6.73ms
88. 6.72ms
89. 6.72ms
90. 6.72ms
91. 6.73ms
92. 6.73ms
93. 6.73ms
94. 6.73ms
95. 6.73ms
96. 6.73ms
97. 6.72ms
98. 6.72ms
99. 6.73ms
100. 6.73ms
101. 6.72ms
102. 6.73ms
103. 6.73ms
104. 6.73ms
105. 6.73ms
106. 6.72ms
107. 6.72ms
108. 6.73ms
109. 6.73ms
110. 6.72ms
111. 6.73ms
112. 6.73ms
113. 6.73ms
114. 6.73ms
115. 6.73ms
116. 6.73ms
117. 6.73ms
118. 6.73ms
119. 6.73ms
120. 6.73ms
121. 6.73ms
122. 6.73ms
123. 6.73ms
124. 6.73ms
125. 6.73ms
126. 6.73ms
127. 6.73ms
128. 6.73ms
Graphics only: 25.55ms (65.67G pixels/s)
Graphics + compute:
1. 25.69ms (65.31G pixels/s)
2. 25.49ms (65.83G pixels/s)
3. 25.32ms (66.25G pixels/s)
4. 25.42ms (66.01G pixels/s)
5. 25.63ms (65.46G pixels/s)
6. 25.47ms (65.88G pixels/s)
7. 25.16ms (66.69G pixels/s)
8. 25.60ms (65.55G pixels/s)
9. 25.52ms (65.75G pixels/s)
10. 25.44ms (65.96G pixels/s)
11. 25.42ms (66.00G pixels/s)
12. 25.55ms (65.67G pixels/s)
13. 25.56ms (65.65G pixels/s)
14. 25.58ms (65.58G pixels/s)
15. 25.48ms (65.85G pixels/s)
16. 25.39ms (66.07G pixels/s)
17. 25.42ms (65.99G pixels/s)
18. 25.56ms (65.65G pixels/s)
19. 25.49ms (65.83G pixels/s)
20. 25.32ms (66.27G pixels/s)
21. 25.51ms (65.76G pixels/s)
22. 25.49ms (65.81G pixels/s)
23. 25.67ms (65.35G pixels/s)
24. 25.58ms (65.58G pixels/s)
25. 25.48ms (65.85G pixels/s)
26. 25.32ms (66.27G pixels/s)
27. 25.55ms (65.66G pixels/s)
28. 25.60ms (65.53G pixels/s)
29. 25.36ms (66.17G pixels/s)
30. 25.42ms (66.00G pixels/s)
31. 25.38ms (66.10G pixels/s)
32. 25.46ms (65.90G pixels/s)
33. 25.44ms (65.94G pixels/s)
34. 25.48ms (65.83G pixels/s)
35. 25.29ms (66.34G pixels/s)
36. 25.38ms (66.09G pixels/s)
37. 25.55ms (65.66G pixels/s)
38. 27.01ms (62.12G pixels/s)
39. 25.59ms (65.57G pixels/s)
40. 25.35ms (66.19G pixels/s)
41. 25.47ms (65.87G pixels/s)
42. 25.36ms (66.17G pixels/s)
43. 25.41ms (66.02G pixels/s)
44. 25.37ms (66.14G pixels/s)
45. 25.50ms (65.79G pixels/s)
46. 25.42ms (66.00G pixels/s)
47. 25.50ms (65.80G pixels/s)
48. 25.45ms (65.93G pixels/s)
49. 25.40ms (66.05G pixels/s)
50. 25.47ms (65.88G pixels/s)
51. 25.48ms (65.85G pixels/s)
52. 25.46ms (65.90G pixels/s)
53. 25.39ms (66.09G pixels/s)
54. 25.60ms (65.52G pixels/s)
55. 25.49ms (65.83G pixels/s)
56. 25.47ms (65.88G pixels/s)
57. 25.53ms (65.72G pixels/s)
58. 25.49ms (65.82G pixels/s)
59. 25.53ms (65.72G pixels/s)
60. 25.42ms (65.99G pixels/s)
61. 25.40ms (66.06G pixels/s)
62. 25.42ms (66.00G pixels/s)
63. 25.50ms (65.79G pixels/s)
64. 25.22ms (66.52G pixels/s)
65. 25.46ms (65.89G pixels/s)
66. 25.45ms (65.93G pixels/s)
67. 25.49ms (65.81G pixels/s)
68. 25.28ms (66.37G pixels/s)
69. 25.44ms (65.96G pixels/s)
70. 25.42ms (66.00G pixels/s)
71. 25.47ms (65.87G pixels/s)
72. 25.40ms (66.05G pixels/s)
73. 25.61ms (65.50G pixels/s)
74. 25.50ms (65.79G pixels/s)
75. 25.41ms (66.01G pixels/s)
76. 25.34ms (66.20G pixels/s)
77. 25.53ms (65.73G pixels/s)
78. 25.53ms (65.72G pixels/s)
79. 25.45ms (65.93G pixels/s)
80. 27.12ms (61.87G pixels/s)
81. 25.39ms (66.09G pixels/s)
82. 26.99ms (62.17G pixels/s)
83. 27.05ms (62.02G pixels/s)
84. 27.17ms (61.75G pixels/s)
85. 28.78ms (58.30G pixels/s)
86. 28.77ms (58.31G pixels/s)
87. 27.08ms (61.94G pixels/s)
88. 26.87ms (62.43G pixels/s)
89. 27.10ms (61.91G pixels/s)
90. 27.16ms (61.78G pixels/s)
91. 27.02ms (62.08G pixels/s)
92. 26.93ms (62.30G pixels/s)
93. 26.99ms (62.17G pixels/s)
94. 28.75ms (58.36G pixels/s)
95. 30.33ms (55.32G pixels/s)
96. 27.05ms (62.01G pixels/s)
97. 27.04ms (62.04G pixels/s)
98. 28.64ms (58.58G pixels/s)
99. 28.83ms (58.20G pixels/s)
100. 25.58ms (65.58G pixels/s)
101. 27.08ms (61.96G pixels/s)
102. 27.18ms (61.74G pixels/s)
103. 28.76ms (58.34G pixels/s)
104. 30.53ms (54.95G pixels/s)
105. 27.17ms (61.74G pixels/s)
106. 30.35ms (55.28G pixels/s)
107. 32.00ms (52.42G pixels/s)
108. 30.43ms (55.13G pixels/s)
109. 30.34ms (55.30G pixels/s)
110. 28.59ms (58.69G pixels/s)
111. 28.60ms (58.66G pixels/s)
112. 28.46ms (58.95G pixels/s)
113. 28.55ms (58.77G pixels/s)
114. 26.99ms (62.16G pixels/s)
115. 27.11ms (61.90G pixels/s)
116. 32.05ms (52.35G pixels/s)
117. 28.91ms (58.03G pixels/s)
118. 27.42ms (61.18G pixels/s)
119. 30.59ms (54.84G pixels/s)
120. 32.04ms (52.37G pixels/s)
121. 32.00ms (52.43G pixels/s)
122. 28.78ms (58.30G pixels/s)
123. 30.25ms (55.46G pixels/s)
124. 28.63ms (58.60G pixels/s)
125. 32.12ms (52.24G pixels/s)
126. 28.72ms (58.41G pixels/s)
127. 32.01ms (52.41G pixels/s)
128. 32.06ms (52.33G pixels/s)

Notice how graphics + compute takes same time to finish as single graphics que would, while in your example they are added, with very slight boost.

Don't pay attention to times itself, I ran a bit faster shadesr to finish test sooner as I don't like to wait, this is no benchmark.

I do hope Pascal will get true asyncronous shaders, but so far it atleast looks like nvidia should be able to run async code without penalty atleast. Ofc this is just my observation of this test. I'm sure real gurus over b3d will find the real truth.

Cheers and I think I need to go and read that thread to better understand what I am looking at. It doesn't really mean anything to me if I am honest :(
 
Soldato
Joined
7 Feb 2015
Posts
2,864
Location
South West
So in other words the latest version with apparent working Async is doing what i stated. the driver just receives the async commands then runs the compute consecutively with the graphics as though the application had no async. It is certainly one way around supporting applications that enable async without the application needing another rendering path or disabling the effects that use async.

I reckon Nvidia did it this way in the end due to the context switching adding too much latency. That and with a few big games coming out that will be async heavy they want everything to just work.

Did anyone try the ones i posted? it gives you fps results.
 
Caporegime
Joined
17 Mar 2012
Posts
48,768
Location
ARC-L1, Stanton System
I'd analyze it that there is some benefit from asyncronous queye used now in driver. But very minimal. They are still executed serially.

Asyncronous would look like this :

Compute only:
1. 6.72ms
2. 6.72ms
3. 6.72ms
4. 6.72ms
5. 6.71ms
6. 6.72ms
7. 6.71ms
8. 6.71ms
9. 6.71ms
10. 6.71ms
11. 6.71ms
12. 6.72ms
13. 6.71ms
14. 6.71ms
15. 6.72ms
16. 6.72ms
17. 6.72ms
18. 6.72ms
19. 6.71ms
20. 6.72ms
21. 6.72ms
22. 6.72ms
23. 6.72ms
24. 6.72ms
25. 6.72ms
26. 6.72ms
27. 6.72ms
28. 6.72ms
29. 6.72ms
30. 6.72ms
31. 6.72ms
32. 6.72ms
33. 6.72ms
34. 6.72ms
35. 6.72ms
36. 6.72ms
37. 6.72ms
38. 6.73ms
39. 6.72ms
40. 6.72ms
41. 6.72ms
42. 6.72ms
43. 6.72ms
44. 6.72ms
45. 6.72ms
46. 6.72ms
47. 6.72ms
48. 6.72ms
49. 6.72ms
50. 6.72ms
51. 6.72ms
52. 6.72ms
53. 6.72ms
54. 6.72ms
55. 6.72ms
56. 6.72ms
57. 6.73ms
58. 6.72ms
59. 6.72ms
60. 6.72ms
61. 6.72ms
62. 6.72ms
63. 6.72ms
64. 6.72ms
65. 6.72ms
66. 6.72ms
67. 6.72ms
68. 6.72ms
69. 6.72ms
70. 6.72ms
71. 6.72ms
72. 6.72ms
73. 6.72ms
74. 6.73ms
75. 6.75ms
76. 6.73ms
77. 6.73ms
78. 6.72ms
79. 6.72ms
80. 6.72ms
81. 6.72ms
82. 6.72ms
83. 6.72ms
84. 6.73ms
85. 6.73ms
86. 6.72ms
87. 6.73ms
88. 6.72ms
89. 6.72ms
90. 6.72ms
91. 6.73ms
92. 6.73ms
93. 6.73ms
94. 6.73ms
95. 6.73ms
96. 6.73ms
97. 6.72ms
98. 6.72ms
99. 6.73ms
100. 6.73ms
101. 6.72ms
102. 6.73ms
103. 6.73ms
104. 6.73ms
105. 6.73ms
106. 6.72ms
107. 6.72ms
108. 6.73ms
109. 6.73ms
110. 6.72ms
111. 6.73ms
112. 6.73ms
113. 6.73ms
114. 6.73ms
115. 6.73ms
116. 6.73ms
117. 6.73ms
118. 6.73ms
119. 6.73ms
120. 6.73ms
121. 6.73ms
122. 6.73ms
123. 6.73ms
124. 6.73ms
125. 6.73ms
126. 6.73ms
127. 6.73ms
128. 6.73ms
Graphics only: 25.55ms (65.67G pixels/s)
Graphics + compute:
1. 25.69ms (65.31G pixels/s)
2. 25.49ms (65.83G pixels/s)
3. 25.32ms (66.25G pixels/s)
4. 25.42ms (66.01G pixels/s)
5. 25.63ms (65.46G pixels/s)
6. 25.47ms (65.88G pixels/s)
7. 25.16ms (66.69G pixels/s)
8. 25.60ms (65.55G pixels/s)
9. 25.52ms (65.75G pixels/s)
10. 25.44ms (65.96G pixels/s)
11. 25.42ms (66.00G pixels/s)
12. 25.55ms (65.67G pixels/s)
13. 25.56ms (65.65G pixels/s)
14. 25.58ms (65.58G pixels/s)
15. 25.48ms (65.85G pixels/s)
16. 25.39ms (66.07G pixels/s)
17. 25.42ms (65.99G pixels/s)
18. 25.56ms (65.65G pixels/s)
19. 25.49ms (65.83G pixels/s)
20. 25.32ms (66.27G pixels/s)
21. 25.51ms (65.76G pixels/s)
22. 25.49ms (65.81G pixels/s)
23. 25.67ms (65.35G pixels/s)
24. 25.58ms (65.58G pixels/s)
25. 25.48ms (65.85G pixels/s)
26. 25.32ms (66.27G pixels/s)
27. 25.55ms (65.66G pixels/s)
28. 25.60ms (65.53G pixels/s)
29. 25.36ms (66.17G pixels/s)
30. 25.42ms (66.00G pixels/s)
31. 25.38ms (66.10G pixels/s)
32. 25.46ms (65.90G pixels/s)
33. 25.44ms (65.94G pixels/s)
34. 25.48ms (65.83G pixels/s)
35. 25.29ms (66.34G pixels/s)
36. 25.38ms (66.09G pixels/s)
37. 25.55ms (65.66G pixels/s)
38. 27.01ms (62.12G pixels/s)
39. 25.59ms (65.57G pixels/s)
40. 25.35ms (66.19G pixels/s)
41. 25.47ms (65.87G pixels/s)
42. 25.36ms (66.17G pixels/s)
43. 25.41ms (66.02G pixels/s)
44. 25.37ms (66.14G pixels/s)
45. 25.50ms (65.79G pixels/s)
46. 25.42ms (66.00G pixels/s)
47. 25.50ms (65.80G pixels/s)
48. 25.45ms (65.93G pixels/s)
49. 25.40ms (66.05G pixels/s)
50. 25.47ms (65.88G pixels/s)
51. 25.48ms (65.85G pixels/s)
52. 25.46ms (65.90G pixels/s)
53. 25.39ms (66.09G pixels/s)
54. 25.60ms (65.52G pixels/s)
55. 25.49ms (65.83G pixels/s)
56. 25.47ms (65.88G pixels/s)
57. 25.53ms (65.72G pixels/s)
58. 25.49ms (65.82G pixels/s)
59. 25.53ms (65.72G pixels/s)
60. 25.42ms (65.99G pixels/s)
61. 25.40ms (66.06G pixels/s)
62. 25.42ms (66.00G pixels/s)
63. 25.50ms (65.79G pixels/s)
64. 25.22ms (66.52G pixels/s)
65. 25.46ms (65.89G pixels/s)
66. 25.45ms (65.93G pixels/s)
67. 25.49ms (65.81G pixels/s)
68. 25.28ms (66.37G pixels/s)
69. 25.44ms (65.96G pixels/s)
70. 25.42ms (66.00G pixels/s)
71. 25.47ms (65.87G pixels/s)
72. 25.40ms (66.05G pixels/s)
73. 25.61ms (65.50G pixels/s)
74. 25.50ms (65.79G pixels/s)
75. 25.41ms (66.01G pixels/s)
76. 25.34ms (66.20G pixels/s)
77. 25.53ms (65.73G pixels/s)
78. 25.53ms (65.72G pixels/s)
79. 25.45ms (65.93G pixels/s)
80. 27.12ms (61.87G pixels/s)
81. 25.39ms (66.09G pixels/s)
82. 26.99ms (62.17G pixels/s)
83. 27.05ms (62.02G pixels/s)
84. 27.17ms (61.75G pixels/s)
85. 28.78ms (58.30G pixels/s)
86. 28.77ms (58.31G pixels/s)
87. 27.08ms (61.94G pixels/s)
88. 26.87ms (62.43G pixels/s)
89. 27.10ms (61.91G pixels/s)
90. 27.16ms (61.78G pixels/s)
91. 27.02ms (62.08G pixels/s)
92. 26.93ms (62.30G pixels/s)
93. 26.99ms (62.17G pixels/s)
94. 28.75ms (58.36G pixels/s)
95. 30.33ms (55.32G pixels/s)
96. 27.05ms (62.01G pixels/s)
97. 27.04ms (62.04G pixels/s)
98. 28.64ms (58.58G pixels/s)
99. 28.83ms (58.20G pixels/s)
100. 25.58ms (65.58G pixels/s)
101. 27.08ms (61.96G pixels/s)
102. 27.18ms (61.74G pixels/s)
103. 28.76ms (58.34G pixels/s)
104. 30.53ms (54.95G pixels/s)
105. 27.17ms (61.74G pixels/s)
106. 30.35ms (55.28G pixels/s)
107. 32.00ms (52.42G pixels/s)
108. 30.43ms (55.13G pixels/s)
109. 30.34ms (55.30G pixels/s)
110. 28.59ms (58.69G pixels/s)
111. 28.60ms (58.66G pixels/s)
112. 28.46ms (58.95G pixels/s)
113. 28.55ms (58.77G pixels/s)
114. 26.99ms (62.16G pixels/s)
115. 27.11ms (61.90G pixels/s)
116. 32.05ms (52.35G pixels/s)
117. 28.91ms (58.03G pixels/s)
118. 27.42ms (61.18G pixels/s)
119. 30.59ms (54.84G pixels/s)
120. 32.04ms (52.37G pixels/s)
121. 32.00ms (52.43G pixels/s)
122. 28.78ms (58.30G pixels/s)
123. 30.25ms (55.46G pixels/s)
124. 28.63ms (58.60G pixels/s)
125. 32.12ms (52.24G pixels/s)
126. 28.72ms (58.41G pixels/s)
127. 32.01ms (52.41G pixels/s)
128. 32.06ms (52.33G pixels/s)

Notice how graphics + compute takes same time to finish as single graphics que would, while in your example they are added, with very slight boost.

Don't pay attention to times itself, I ran a bit faster shadesr to finish test sooner as I don't like to wait, this is no benchmark.

I do hope Pascal will get true asyncronous shaders, but so far it atleast looks like nvidia should be able to run async code without penalty atleast. Ofc this is just my observation of this test. I'm sure real gurus over b3d will find the real truth.


Its software emulated isn't it? Nvidia can run it but IMO the que time does matter, it all adds up to what is in the end your performance.
 
Caporegime
Joined
24 Sep 2008
Posts
38,280
Location
Essex innit!
So in other words the latest version with apparent working Async is doing what i stated. the driver just receives the async commands then runs the compute consecutively with the graphics as though the application had no async. It is certainly one way around supporting applications that enable async without the application needing another rendering path or disabling the effects that use async.

I reckon Nvidia did it this way in the end due to the context switching adding too much latency. That and with a few big games coming out that will be async heavy they want everything to just work.

Did anyone try the ones i posted? it gives you fps results.

I did try yours and even after installing the SDK, I got files missing, so gave up :(
 

bru

bru

Soldato
Joined
21 Oct 2002
Posts
7,359
Location
kent
Isn't the increasing number on NVidia due to the queue size, seeing as Maxwell can do a queue depth of 32 and Fiji can do 128 (I think), if you asked the program to run 256 or 384 the AMD would step up as well.

Don't worry though NVidia will be paying developers to remove async from games, well according to DM anyway. :rolleyes:

It's really pretty simple, if Nvidia could do proper async compute it wouldn't need lots of broken promises about how it will be working in the next driver, their story wouldn't change month to month, game to game and they wouldn't be paying to remove async compute from games that had it as standard.
 
Associate
Joined
27 Oct 2014
Posts
550
Location
Finland.
Isn't the increasing number on NVidia due to the queue size, seeing as Maxwell can do a queue depth of 32 and Fiji can do 128 (I think), if you asked the program to run 256 or 384 the AMD would step up as well.

Exactly that, but I was only interested of the very first lines in this case.

Nvidia should be able to do Graphics + compute at the same time, if they would use cuda for compute. Unfortunately HyperQ doesn't appear to be compatible with DX12.

I expect to see changes for this in Pascal.
 
Caporegime
Joined
17 Mar 2012
Posts
48,768
Location
ARC-L1, Stanton System
Isn't the increasing number on NVidia due to the queue size, seeing as Maxwell can do a queue depth of 32 and Fiji can do 128 (I think), if you asked the program to run 256 or 384 the AMD would step up as well.

Don't worry though NVidia will be paying developers to remove async from games, well according to DM anyway. :rolleyes:

Yes. this is why Maxwell has a longer Que length, its not that Nvidia can't A-Sync, they can, (so can GNC 1.0 which has a similar setup to Maxwell) its just that its far less parallel so more queuing on less threads = lower performance.
 
Caporegime
Joined
24 Sep 2008
Posts
38,280
Location
Essex innit!
So one thing is clear in these tests.... NVidia can do Async. At least we can now see and prove it. It must be in a similar fashion to AMD doing ROVs and Conservative Rasterization. It is being dealt with by the driver and not the hardware.
 
Soldato
Joined
7 Feb 2015
Posts
2,864
Location
South West
So one thing is clear in these tests.... NVidia can do Async. At least we can now see and prove it. It must be in a similar fashion to AMD doing ROVs and Conservative Rasterization. It is being dealt with by the driver and not the hardware.

They are not doing async though, just able to accept then run the async input. Your results show that the compute and graphics tasks are being performed consecutively. Shown by the increase in time with compute + graphics.

But if you look at the AMD results, it shows that async is working and that compute and graphics are being performed concurrently. as graphics + compute shows the slowest time out of graphics or compute for that run. rather than any addition of the times.
 
Caporegime
Joined
17 Mar 2012
Posts
48,768
Location
ARC-L1, Stanton System
Maxwell does have some paralleled threading ^^^^

So one thing is clear in these tests.... NVidia can do Async. At least we can now see and prove it. It must be in a similar fashion to AMD doing ROVs and Conservative Rasterization. It is being dealt with by the driver and not the hardware.

Right on with AMD but Nvidia do have Hardware capable A-Sync, more by accident than by design, its not much good so software emulation is doing some of the lifting.

My hope is Pascal can match GCN 1.1/1.2 for A-Sync and GCN 1.3 or 4 or whatever designation it ends up with can match Nvidia on Rasterization.

Then we will have the two vendors with equally capable DX12 GPU's and developers can just get on with it and not worry about vendor features, or lack there of.
 
Caporegime
Joined
24 Sep 2008
Posts
38,280
Location
Essex innit!
They are not doing async though, just able to accept then run the async input. Your results show that the compute and graphics tasks are being performed consecutively. Shown by the increase in time with compute + graphics.

But if you look at the AMD results, it shows that async is working and that compute and graphics are being performed concurrently. as graphics + compute shows the slowest time out of graphics or compute for that run. rather than any addition of the times.

You are contradicting yourself there. You are saying they can't do Async and then say they are doing Async. Whilst I agree it isn't doing it as well as AMD, it is doing it and that was the whole argument. An Oxide developer confirmed that the new drivers meant NVidia could do Async but how it was done needed NVidia to confirm.
 
Soldato
Joined
7 Feb 2015
Posts
2,864
Location
South West
You are contradicting yourself there. You are saying they can't do Async and then say they are doing Async. Whilst I agree it isn't doing it as well as AMD, it is doing it and that was the whole argument. An Oxide developer confirmed that the new drivers meant NVidia could do Async but how it was done needed NVidia to confirm.

What i mean is that the driver is accepting the async input and sending something to the graphics card to run it. but it is not running the graphics and compute concurrently. it is just running it consecutively which is shown by the increases in the time.

in otherwords like what i originally said. The driver is accepting the async input but then rearranging it and scheduling it to run concurrently. so in other words the hardware is not performing async compute, the driver is making the async input non-async.

Maxwell does have some paralleled threading ^^^^

They do, but id does not work very well when it comes to swtiching between the graphics queue and the compute queues. Compute queue to compute queue is fine otherwise we would have heard about problems with cuda. But there is some context switching overhead and issues when jumping between graphics and compute queues.
 
Last edited:
Caporegime
Joined
24 Sep 2008
Posts
38,280
Location
Essex innit!
What i mean is that the driver is accepting the async input and sending something to the graphics card to run it. but it is not running the graphics and compute concurrently. it is just running it consecutively which is shown by the increases in the time.

in otherwords like what i originally said. The driver is accepting the async input but then rearranging it and scheduling it to run concurrently. so in other words the hardware is not performing async compute, just accepting the commands while the driver makes the input non async.

I am in over my depth really and don't know to what extent Async would be a problem on NVidia if at all but Hitman is supposedly using Async according to AMD and that will show what is what I expect. Personally I hope NVidia don't have a problem with it for obvious reasons but the same goes for AMD and other DX12 implementations and it would be great if both can "wing it" to make the life of current GPUs longer.
 
Caporegime
Joined
18 Oct 2002
Posts
33,188
You are contradicting yourself there. You are saying they can't do Async and then say they are doing Async. Whilst I agree it isn't doing it as well as AMD, it is doing it and that was the whole argument. An Oxide developer confirmed that the new drivers meant NVidia could do Async but how it was done needed NVidia to confirm.

The developer said it could support async, he didn't say it could do async. Support as in, we don't have to write a completely different path for Nvidia that disables async because it would completely break as Nvidia offer no support for even accepting the call from the api. If as everyone suspects, Nvidia merely converts a async call into a sequential list to execute then it's not doing async.
 
Caporegime
Joined
24 Sep 2008
Posts
38,280
Location
Essex innit!
The developer said it could support async, he didn't say it could do async. Support as in, we don't have to write a completely different path for Nvidia that disables async because it would completely break as Nvidia offer no support for even accepting the call from the api. If as everyone suspects, Nvidia merely converts a async call into a sequential list to execute then it's not doing async.

Well if it can support Async, I am fine with that. I am still waiting for you to show me where I said that the Titan was a compute card.
 
Back
Top Bottom