日本电子维修技术显卡VEGA is better than you think

日期：2021-09-29 栏目：维修经验

http://www.moepc.net/?post=2672

VEGA 10 (GCN 5.0) Architecture is at present being judged by the Frontier Edition (Workstation / PRO) Drivers, and while it does have (Consumer / RX) Drivers included with the ability to switch between the two... currently neither of the VEGA 10 Drivers actually support the VEGA 10 Features beyond HBCC.

Yes, the Workstation Drivers do support FP16 / FP32 / FP64, as opposed to the Consumer Drivers that support only FP32 (Native) and FP16 via Atomics.
Atomics allows a Feature to be used that is Supported but you're still Restricted by Driver Implementation as opposed to Direct GPU Optimisation.

FP16 Atomics does not provide the same leverage for Optimisation as a Native FP16 Pipeline.
Essentially we're talking the difference Vs. FP32 Pipeline of +20% Vs. +60% Performance.

Now it should still be noted that, we're not seeing a +100% Performance; because...
The Asynchronous Compute Engines (ACE) are still limited to 4 Pipelines and only support Packed Math Formats, which requires a slightly larger and more complex ACE than an FP32 version... thus you're not strictly getting 8x FP16 or 4x FP32 as in Legitimate Threads, but instead the Packing and Unpacking of the Results is occurring via the CPU (Drivers), so you have added Latency and what can be best described as "Software Threading"

So yeah you're looking ~40% Performance compared to a pure Hardware Solution, still this is within the same region of performance improvement that NVIDIA achieve through Giga-Threading. Which is almost literally Hyper-Threading for CUDA.

And such will see marginal benefits (up to 30%) in Non-Predictive Branches (i.e. Games) and 60% in Predictive Branches (i.e. Deep Learning, Rendering, Mining, etc.)

As this is entirely Software Handled, assuming support for Packed Math within the ACE... this is why we're seeing the RX VEGA Frontier Edition is essentially on-par with GCN 3.0 IPC if it were capable of being Overclocked to the same Clock Speeds. So, eh... this provides Decent Performance but keep in mind, essentially what we're seeing is what VEGA is capable of on FIJI (GCN 3.0) Drivers.

In short... what is happening is the Drivers are acting as a Limiter, in essence you have a Bugatti Veyron in "Road" Mode; where it just ends up a more pleasant drive overall... but that's a W12 under-the-hood. It can do better than the 150MPH that it's currently limiting you to.
The question here ends up being, "Well just how much of a difference will Drivers make?" ... Conservatively speaking, the RX VEGA Consumer Drivers are almost certainly going to provide 20 - 35% Performance Uplift over what the Frontier Edition has showcased on FIJI Drivers.

Yet most of that optimisation will come from FP16 Support, Tile-Based Rendering, Geometry Discard Pipeline, etc. while HBCC will continue to ensure that the GPU isn't starved for Data maintaining very respectable Minimums that are almost certainly making NVIDIA start to feel quite nervous.

Still, this isn't the "Party Trick" of the VEGA Architecture.
Something that most never really noticed was AMDs claim when they revealed Features of Vega.

Primarily that it supports 2X Thread Throughput. This might seem minor, but I'm not sure people quite grasped (NVIDIA did, because they got the GTX 1080 Ti and Titan Xp out to market ASAP following the official announcement of said features) is this actually is perhaps THE most remarkable aspect of the Architecture.
So... what does this mean?

In essence the ACE on GCN 1.0 to 4.0 has 4 Pipelines, each is 128-Bit Wide. This means it processes 64-Bit on the Rising Edge, and 64-Bit on the Falling Edge of a Clock Cycle.
Now each CU (64 Stream Processors) is actually 16 SIMD (Single Instruction, Multiple Data / Arithmetic Logic Units) each SIMD Supports a Single 128-Bit Vector (4x 32-Bit Components, i.e. [X, Y, Z, W]) and because you can process each individual Component ... this is why it's denoted as 64 "Stream" Processors, because 4x16 = 64.

As I note, the ACE has 4 Pipelines that Process, 4x128-Bit Threads Per Clock.
The Minimum Operation Time is 4 Clocks ... as such 4x4 = 16x 128-Bit Asynchronous Operations Per Clock (or 64x 32-Bit Operations Per Clock)

GCN 5.0 still has the same 4 Pipelines, but each is now 256-Bit Wide. This means it processes 128-Bit on the Riding Edge, and 128-Bit on the Falling Edge.
Each CU is also now 16 SIMD that support a Single 256-Bit Vector or Double 128-Bit Vector or Quad 64-Bit Vector (4x 64-Bit, 8x 32-Bit, 16x 16-Bit).

It does remain the same SIMD merely the Functionality is expanded to support Multiple Width Registers, in a very similar approach to AMD64 SIMD on their CPU; which believe it or not, AMD SIMD (SSE) is FASTER than Intel because of their approach. This is why Intel kept introducing new Slightly Incompatible versions of SSE / AVX / etc.
They're literally doing it to screw over AMD Hardware being better by using their Market Dominance to force a Standard that deliberately slows down AMD Performance, hence why Bulldozer Architecture appeared to be somewhat less capable in a myriad of common scenarios.

Anyway, what this means is Vega remains 100% Compatible and can be run as if it were a current Generation GCN Architecture.
So all of the Stability, Performance Improvements, etc. they should translated pretty well and it will act in essence like a 64CU Polaris / Fiji at 1600MHz; and well that's what we see in the Frontier Edition Benchmarks.

Now a downside of this, is well it's still strictly speaking using the "Entire" GPU to do this... so the power utilisation numbers appear curiously High for the performance it's providing; but remember is being used as if under 100% Load; while in reality it's Utilisation is actually 50%.
Here's where it begins to make sense as to why when they originally began showing RX VEGA at Trade Conventions, they were using it in a Crossfire Combination; as it is a Subtle (to anyone paying attention, again like NVIDIA) hint at when fully Optimised the Ballpark of what a SINGLE RX VEGA will be capable of under a Native Driver.

And well... it's performance is frankly staggering as it was running Battlefield 1, Battlefront, Doom and Sniper Elite 4 at UHD 5K at 95 FPS+
For those somewhat less versed in the processing Power Required here.

The Titan Xp, is capable of UHD 4K on those games at about 120 FPS, if you were to increase it to UHD 5K it would drop to 52 FPS; and at this point it's perhaps dawning on those reading this why NVIDIA have somewhat entered "Full Alert Mode" ... because Volta was aimed at ~20% Performance Improvement, and this was being achieved primarily via just making a larger GPU with more CUDA Cores.

RX VEGA has the potential to dwarf this in it's current state.
Still this also begins to bring up the question... "If AMD have that much performance just going to waste... Why aren't they using it to Crush NVIDIA? Give them a Taste of their own Medicine!"

Simple... they don't need to, and it's actually not advantageous for them to do so.
While doing this might give them the Top-Dog Spot for the next 12-18 months... NVIDIA aren't idiots, and they'll find a way to become competitive; either Legitimate, or via utilising their current Market Share.

And people will somewhat accept them doing this to "Be Competitive", but if AMD aren't being overly aggressive and letting NVIDIA remain in their Dominant Position; while offering value and slowly removing NVIDIA from the Mainstream / Entry Level... well then not only do they know that they can with each successive "Re-Brand" Lower Costs, Improve the Architecture and offer a Meaningful Performance Uplift for their Consumers while remaining Competitive with anything NVIDIA produce.

They can also (which they do appear to be doing) with Workstation GPUs appear to be offering better performance and value in those scenarios... again better than what NVIDIA can offer, and in said Arena NVIDIA don't have the same tools (i.e. Developer Support / GameWorks / etc.) to really do anything about this beyond throwing their toys out of the pram.
As I note here, NVIDIA can't exactly respond without essentially appearing to be petty / vindictive and potential breaking Anti-Trust (Monopoly) Laws to really strike out against AMD essentially Sandbagging them.

With perhaps the worst part for NVIDIA here being, they can see it plain as bloody day what AMD are doing; but can't do anything about it. Knowing that regardless what they do, AMD can within a matter of weeks put together a next-generation launch (rebrand); push out new drivers that tweak performance and simply match it while undercutting the price by £20-50.
Even at the same price, it will make NVIDIA look like it's loosing it's edge.

THAT is what Vega and Polaris have both been about for AMD, the same is true with Ryzen, Threadripper and Epyc.
AMD aren't looking at a short term "Win" for a Generation... they're clearly seeking to destroy their competitors stranglehold on the Industry as a whole.

< • >

Oh and if you don't believe me on how seriously NVIDIA are taking this... the Titan Xp Driver up-date that unlocked it's Professional (Workstation) Functionality, essentially brings it inline with the P100 in terms of Performance.

The Titan Xp is $1,200, the Quadro P100 is $4,500 ... they've essentially made with a driver update, that P100 obsolete; and basically given up on $3,000 of pure profit each Point-of-Sale of said Workstation Card gave them. You don't do that if you've not had an "Oh Shit!" Moment about what the Competition is offering.

via：https://www.reddit.com/r/Amd/com ... tter_than_you_think
via:MoePC.net, 地址：http://www.moepc.net/?post=2672

评论
1080 faster than 980sli

评论
捧得越高，到时候摔得越疼。

评论
可不可以翻译一下啊，本人表示只看得懂ABC

评论
普通用户很难买到，已经被旷工预定一空了！！

评论
没姬翻就算了，还弄个红底黑字，看瞎眼啊

评论
And well... it's performance is frankly staggering as it was running Battlefield 1, Battlefront, Doom and Sniper Elite 4 at UHD 5K at 95 FPS+

The Titan Xp, is capable of UHD 4K on those games at about 120 FPS, if you were to increase it to UHD 5K it would drop to 52 FPS

简单说就是5K分辨率可以吊打TTXP

评论
用fp16的2倍flops吊打ttxp fp32

评论
天天吹，倒是上市啊

评论
请问这个几月上市?

评论
比特币自从七月中旬碰到前期低点，一路反弹已经突破前高再上22000了

评论
amd说，挖矿算不算需求，算吧，是不是显卡应用的方向之一，算吧，显卡挖矿以后是不是生产力工具，是吧，显卡挖矿以后算不算不务正业，成天只知道娱乐，不算吧，挖矿是不是拓宽了显卡的应用范围，是吧。那就出高价，vega不愁没人买。下面一个人说：三哥，功耗有点儿高。三哥说：所以说下一代navi我要亲自操刀，你们懂的，会设计成什么样。

评论
AMD = A Mysterious Device

新卡怎么样都上不了市也就说的过去了

评论
这个reddit回复大概是听写了一遍他给的youtube视频吧，一个男人对着屏幕blahblah了10分钟
主要就讲了两个新的特性，一个是fp16，然后解释了一下为啥用fp16相对于fp32提升不是100%，大概说了点限制的点。
我本来就一点都不懂，然后这文章一直在玩些逻辑游戏，老是偷换概念，看的我一头雾水
看reddit下面回复也挺有趣的，很多梦想家

评论
没事，你看黄某人已经开始害怕了，再压一下价amd就要翻身了

评论
老黄一害怕宣布提前推出下一代显卡，性能提升60%

评论
哦吼？better than you think？

评论

专业卡驱动不支持FP16 / FP32 / fp64。

评论
挖矿我最强

评论
vega 与 1080ti 帧数几乎完全一致。在24k 分辨率下

评论
关键点：两倍于fiji的threading throughput per clock，理论上达到furyX [email protected]的吞吐能力，光这一项能带来多少性能提升不明。

但如果游戏可以用fp16，那么就是两倍的吞吐量+2倍的运算能力，理论上就是两倍的性能。算上用cpu(驱动)做线程封装的开销，大概有60%的fury X [email protected]性能。

来个人算算furyX CF在1.6G下有什么性能吧？

评论

人视频自己都没这么说，针对你说的这点人强调的是“为什么没有”而不是“理论上有”

评论
amd说，vega的出现改变了diy市场，改变了人们对于显卡的态度：
A：老婆，我要买一张显卡。
B：干什么
A：玩游戏
B：你个没出息的，就知道玩。
vega出现以后
A：老婆，我要买3张vega
B:干什么
A：挖矿赚钱.你看看这个收益
B :为什么要三张
A :因为有一张是用来跟另外2张通讯的，不然钱取不出来。
B：好吧，买吧。

评论

视频我又没看，但是reddit上的帖子说了个40%的cpu overhead(软线程对硬件线程)

正因为理论上有，才需要解释为什么没有。理论上都达不到，解释什么呢？

另外我只不过是总结下人家写出来的东西，不代表我认同他的观点。个人认为这就是个英文版意淫贴，并没有表现出超越贴吧的档次。

评论
文章的大意是，vega的原生计算单元是fp32的，而现在的驱动在计算fp16的时候也是占了一个fp32单元。新驱动会把一个fp32单元掰成两个来计算fp16。但是因为是软件层面的，所以效率有损失，不能到100%，只有大概40%到60%。然后再计算其它一堆什么pipeline的损耗啥的，跟现在的驱动比有个20～35%的实际游戏性能提升。文章大意，不代表我的观点（我什么都不知道）。

评论

自从DX9（貌似是9C 以后的游戏就被定死了只能跑FP32模式下，如果不对游戏做特别的调整优化是不可能用FP16来跑的....

评论

嗯，这些我就不知道了。

评论
前段时间索尼和微软不是还说今后想充分利用半精度数据优化，优化GPU效率么

评论
半精度就算了吧，需要游戏优化

评论

双方都是1帧向2帧进步你被强化了快上这口毒奶我服

评论
RX VEGA has the potential to dwarf this in it's current state.
Still this also begins to bring up the question... "If AMD have that much performance just going to waste... Why aren't they using it to Crush NVIDIA? Give them a Taste of their own Medicine!"

Simple... they don't need to, and it's actually not advantageous for them to do so.
While doing this might give them the Top-Dog Spot for the next 12-18 months... NVIDIA aren't idiots, and they'll find a way to become competitive; either Legitimate, or via utilising their current Market Share.

评论
求翻译

评论
vega56大概8500分自己揣摩一下吧

评论
被标红了，我感觉到了阴谋

评论

评论

我信了

评论

老外真会yy

评论
啥意思 vega性能被隐藏了？

评论
amd吹vega，就跟那边intel吹7350k一个性质

评论

不对

你看7350k的枪文,炮村那些写的,那特挑的几个秒全家软件,WOW啊WINRAR什么的,泛用度可不低

vega现在还没见到哪个泛用度高的软件里能灭老黄全家

说难听的,买7350k智商估计比买vega的还高不少....

评论

评论

7350K除了价格真没什么黑点，如果7350K价格腰斩卖500，不是新一代客厅神U？

倒是VEGA的优点。。。。。你先给我说一个出来？

评论

冬天减少供暖成本

帮助国家消化过剩电力，让国家能平稳的去产能，有利于国家经济政策推广

促进超大功率开关电源技术发展，促进高功率元器件技术发展，反哺高铁、电机等重工业行业

让chh电脑区能多水几个帖子，离上市又近了一步，轮子看了美滋滋

证明除了阿米尔汗和做飞饼的大师傅之外的所有阿三都不怎么靠谱

这么多优点还不够？？N卡和vega比哪个好你心里还没点b数吗？

评论

那你说，我是不是应该买块VEGA回家给我奶奶治老年风湿？

评论

看你家怎么想的了，我觉得vega替代周林频谱仪这种红外治疗仪问题应该不大，毕竟发热不比人家小

评论

车车车车告诉我我要修手机是不是可以买个VEGA回家用来加热手机里面的胶？
会导致手机爆炸嘛？

评论
单纯就挖矿性能大幅提升这一点来说，VEGA已经不愁卖了，玩儿游戏的能不能抢得到都成问题

评论

照你这样的话，以后上水弯管都可以用VEGA代替了。。。。。热风枪电风吹全都可以886.

评论

照你这么说，如果VEGA 卖999人民币不是一代神卡?然而 7350K并没有卖到500 vega也不是999

评论

Timespy图形分8500 ?

评论

FSU 电路电子维修我现在把定影部分拆出来了。想换下滚，因为卡纸。但是我发现灯管挡住了。拆不了。不会拆。论坛里的高手拆解过吗？评论认真看，认真瞧。果然有收电路电子维修求创维42c08RD电路图评论电视的图纸很少见评论电视的图纸很少见评论创维的图纸你要说版号，不然无能为力评论板号5800-p42ALM-0050 168P-P42CLM-01
·日本中文新闻日本26年新成人预估仅109万与去年并列历史第二低
·日本中文新闻日本皇居新年参贺突发裸奔事件男子涉公然猥亵被捕
·日本中文新闻印度宣布超越日本成为全球第四大经济体
·日本留学生活在熟悉的城市，遇見不一樣的感受
·日本留学生活求购一些水电燃气话费等公共料金请求书
·日本华人网络交流制造信息垃圾的产业，这种现象在日本尤其普遍。
·日本华人网络交流美军入侵委内瑞拉领空并非零伤亡
·日本华人网络交流年末采购食品，恰时间就能全半价。
·中文新闻马丁·克鲁内斯 (Martin Clunes) 在新剧中变身休·爱德华兹 (Huw Edw
·中文新闻当朋友们讲述他们对他们的阴谋感到震惊以及他们如何向他们隐

维修经验

日本电子维修技术显卡VEGA is better than you think

CPUcpu-z 1.77版低调发布

CPU这几天经常开机黑屏，热重启后又正常

CPU超频求助！关于华擎H170和6700K

CPU液态金属会侵蚀cpu核心吗？

CPUAMD Zen处理器、AM4接口实物曝光：1331个针脚

CPUm6i究竟支不支持e3 1231v3

CPU华擎 HYPER 妖板正确玩法

CPUE5 2686 V3和i7 6800K如何选择

CPUHD530硬解4K能力还是有点弱呀！

CPU在组一个小机箱，关于i5 6600和i7 6700的选择

CPUwin10超频稳定，但是睡眠唤醒不了，pll电压di

CPU6900k 1.25V到4.2体质怎么样

CPUI3 6100 华擎B150M pro4超4.5g测试。

CPU系统稳定性测试，我发现prime95半个小时内问题

CPU7系u会兼容100系主板吗？

CPU请教各位：J3710和G1840，哪个性能稍好些？

CPU昨日遇到土豪朋友，又被吓到了，有朋友比这

CPU有心入5820k了，求教下温度问题

CPU6600&6600K才100的差价

CPU打算组双路E5 2670，大家有什么好的建议吗？

日本电子维修技术 显卡VEGA is better than you think

相关推荐

日本电子维修技术显卡VEGA is better than you think