跑Linpack的峰值居然不及我一颗老四核E3-1240的两倍。。。This is a SAMPLE run script. Change it to reflect the correct number
of CPUs/threads, problem input files, etc..
Fri Nov 30 01:33:56 GMT 2012
Intel(R) Optimized LINPACK Benchmark data
Current date/time: Fri Nov 30 01:33:56 2012
CPU frequency: 3.589 GHz
OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23}
OMP: Info #156: KMP_AFFINITY: 24 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 2 packages x 6 cores/pkg x 2 threads/core (12 total cores)
OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 13 maps to package 0 core 0 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 15 maps to package 0 core 1 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 17 maps to package 0 core 2 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 8 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 19 maps to package 0 core 8 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 9 maps to package 0 core 9 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 21 maps to package 0 core 9 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 11 maps to package 0 core 10 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 23 maps to package 0 core 10 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 1 core 0 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 12 maps to package 1 core 0 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 1 core 1 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 14 maps to package 1 core 1 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 1 core 2 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 16 maps to package 1 core 2 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 1 core 8 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 18 maps to package 1 core 8 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 8 maps to package 1 core 9 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 20 maps to package 1 core 9 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 10 maps to package 1 core 10 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 22 maps to package 1 core 10 thread 1
OMP: Info #144: KMP_AFFINITY: Threads may migrate across 1 innermost levels of machine
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {1,13}
Number of CPUs: 2
Number of cores: 12
Number of threads: 12
Parameters are set to:
Number of tests: 15
Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1
Maximum memory requested that can be used=16200901024, at the size=45000
============= Timing linear equation system solver =================
Size LDA Align. Time(s) GFlops Residual Residual(norm)
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {0,12}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {3,15}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {2,14}
OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {5,17}
OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {7,19}
OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {6,18}
OMP: Info #147: KMP_AFFINITY: Internal thread 8 bound to OS proc set {9,21}
OMP: Info #147: KMP_AFFINITY: Internal thread 9 bound to OS proc set {8,20}
OMP: Info #147: KMP_AFFINITY: Internal thread 10 bound to OS proc set {11,23}
OMP: Info #147: KMP_AFFINITY: Internal thread 11 bound to OS proc set {10,22}
OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {4,16}
1000 1000 4 0.017 39.8370 1.158133e-12 3.949530e-02
1000 1000 4 0.010 65.6262 1.158133e-12 3.949530e-02
1000 1000 4 0.010 64.8324 1.158133e-12 3.949530e-02
1000 1000 4 0.010 67.1647 1.158133e-12 3.949530e-02
2000 2000 4 0.061 87.8143 4.878042e-12 4.243299e-02
2000 2000 4 0.059 90.5638 4.878042e-12 4.243299e-02
5000 5008 4 0.736 113.2353 2.221734e-11 3.098029e-02
5000 5008 4 0.735 113.4376 2.221734e-11 3.098029e-02
10000 10000 4 5.099 130.7841 1.002740e-10 3.535762e-02
10000 10000 4 5.160 129.2400 1.002740e-10 3.535762e-02
15000 15000 4 16.166 139.2094 2.076599e-10 3.270679e-02
15000 15000 4 16.127 139.5442 2.076599e-10 3.270679e-02
18000 18008 4 27.437 141.7314 3.217862e-10 3.523954e-02
18000 18008 4 27.420 141.8196 3.217862e-10 3.523954e-02
20000 20016 4 37.579 141.9462 3.679233e-10 3.256927e-02
20000 20016 4 37.754 141.2852 3.679233e-10 3.256927e-02
22000 22008 4 49.121 144.5329 4.668633e-10 3.419590e-02
22000 22008 4 49.140 144.4788 4.668633e-10 3.419590e-02
25000 25000 4 71.456 145.7954 5.337515e-10 3.035254e-02
25000 25000 4 71.515 145.6745 5.337515e-10 3.035254e-02
26000 26000 4 79.948 146.5784 5.681816e-10 2.987669e-02
26000 26000 4 79.958 146.5612 5.681816e-10 2.987669e-02
27000 27000 4 89.251 147.0393 7.510782e-10 3.662639e-02
30000 30000 1 122.044 147.5027 8.718444e-10 3.436820e-02
35000 35000 1 192.415 148.5631 1.091704e-09 3.169050e-02
40000 40000 1 290.798 146.7336 1.486401e-09 3.305806e-02
45000 45000 1 416.193 145.9756 1.749031e-09 3.077237e-02
Performance Summary (GFlops)
Size LDA Align. Average Maximal
1000 1000 4 59.3651 67.1647
2000 2000 4 89.1891 90.5638
5000 5008 4 113.3365 113.4376
10000 10000 4 130.0121 130.7841
15000 15000 4 139.3768 139.5442
18000 18008 4 141.7755 141.8196
20000 20016 4 141.6157 141.9462
22000 22008 4 144.5059 144.5329
25000 25000 4 145.7349 145.7954
26000 26000 4 146.5698 146.5784
27000 27000 4 147.0393 147.0393
30000 30000 1 147.5027 147.5027
35000 35000 1 148.5631 148.5631
40000 40000 1 146.7336 146.7336
45000 45000 1 145.9756 145.9756
End of tests
Done: Fri Nov 30 02:13:19 GMT 2012
复制代码
评论
*/-95 148.5GFLOP,3770KOC4.4记得是60多
评论
那一定是你跑的姿势不对*/-49 E3-1240都有92Gflops
Linpack不支持超线程(具体技术细节Intel工程师给出的答复是Linpack算法是SIMD不能压榨超线程的空闲流水线,启用超过物理核心个数的线程会导致性能降低)。
评论
*/-943770K和2500K,什么情况!?
Intel(R) LINPACK 64-bit data - LinX 0.6.2
Current date/time: Fri Nov 30 10:32:50 2012
CPU frequency: 3.500 GHz
Number of CPUs: 8
Number of threads: 8
Parameters are set to:
Number of tests : 1
Number of equations to solve (problem size) : 11530
Leading dimension of array : 11544
Number of trials to run : 10
Data alignment value (in Kbytes) : 4
Maximum memory requested that can be used = 1065053536, at the size = 11530
============= Timing linear equation system solver =================
Size LDA Align. Time(s) GFlops Residual Residual(norm)
11530 11544 4 17.477 58.4852 1.295321e-010 3.445333e-002
11530 11544 4 17.515 58.3594 1.295321e-010 3.445333e-002
11530 11544 4 17.495 58.4240 1.295321e-010 3.445333e-002
11530 11544 4 17.500 58.4093 1.295321e-010 3.445333e-002
11530 11544 4 17.529 58.3127 1.295321e-010 3.445333e-002
11530 11544 4 17.471 58.5043 1.295321e-010 3.445333e-002
11530 11544 4 17.454 58.5631 1.295321e-010 3.445333e-002
11530 11544 4 17.505 58.3929 1.295321e-010 3.445333e-002
11530 11544 4 17.464 58.5273 1.295321e-010 3.445333e-002
11530 11544 4 17.530 58.3078 1.295321e-010 3.445333e-002
Performance Summary (GFlops)
Size LDA Align. Average Maximal
11530 11544 4 58.4286 58.5631
End of tests复制代码Intel(R) LINPACK 64-bit data - LinX 0.6.2
Current date/time: Fri Nov 30 10:34:49 2012
CPU frequency: 3.300 GHz
Number of CPUs: 4
Number of threads: 4
Parameters are set to:
Number of tests : 1
Number of equations to solve (problem size) : 11530
Leading dimension of array : 11544
Number of trials to run : 10
Data alignment value (in Kbytes) : 4
Maximum memory requested that can be used = 1065053536, at the size = 11530
============= Timing linear equation system solver =================
Size LDA Align. Time(s) GFlops Residual Residual(norm)
11530 11544 4 17.289 59.1192 1.189358e-010 3.163488e-002
11530 11544 4 17.031 60.0170 1.189358e-010 3.163488e-002
11530 11544 4 17.197 59.4359 1.189358e-010 3.163488e-002
11530 11544 4 17.029 60.0247 1.189358e-010 3.163488e-002
11530 11544 4 17.187 59.4714 1.189358e-010 3.163488e-002
11530 11544 4 17.029 60.0234 1.189358e-010 3.163488e-002
11530 11544 4 17.031 60.0174 1.189358e-010 3.163488e-002
11530 11544 4 17.196 59.4389 1.189358e-010 3.163488e-002
11530 11544 4 17.034 60.0065 1.189358e-010 3.163488e-002
11530 11544 4 17.058 59.9206 1.189358e-010 3.163488e-002
Performance Summary (GFlops)
Size LDA Align. Average Maximal
11530 11544 4 59.7475 60.0247
End of tests复制代码
评论
试试20000的problem size呢?我记得我的E3-1240就是跑这个size能获得峰值性能:This is a SAMPLE run script. Change it to reflect the correct number
of CPUs/threads, problem input files, etc..
Fri Nov 30 02:43:13 GMT 2012
Intel(R) Optimized LINPACK Benchmark data
Current date/time: Fri Nov 30 02:43:13 2012
CPU frequency: 3.690 GHz
Number of CPUs: 1
Number of cores: 4
Number of threads: 8
Parameters are set to:
Number of tests: 1
Number of equations to solve (problem size) : 20000
Leading dimension of array : 20008
Number of trials to run : 20000
Data alignment value (in Kbytes) : 4
Maximum memory requested that can be used=3201684256, at the size=20000
============= Timing linear equation system solver =================
Size LDA Align. Time(s) GFlops Residual Residual(norm)
20000 20008 4 56.449 94.4942 4.097986e-10 3.627616e-02
复制代码
评论
另外我都说过了linpack不支持超线程,所以一定要保证每个线程是跑在一个单独的CPU物理核心上!如果两个线程跑在同一个物理核心上,性能必然受损!
如果是linux下的脚本runme_xeon64,开头记得改成如下:
#!/bin/bash
#
export KMP_AFFINITY=warnings,scatter(如果在有超线程的机器上是compact就会悲剧)
export MKL_NUM_THREADS=4(物理核心个数)
评论
压个片就看出区别了
评论
啥叫老E3 1240?明明是x5680更老好伐?
明年如果有haswell的E3,估计两颗E5 2680也不会有一颗E3 v3的两倍
评论
老E3-1240意思是非v2,只是比X5680新一代而已,前者还只是四核
评论
v2与否只有频率差别。同频一摸一样的。就算不带v2的E3还是比x5680新啊,毕竟一个2011年一个2010年
linpack这种测试想必FMA性能翻倍的avx占尽优势,haswell出来后FMA性能再翻倍的avx2应该也是优势不小
评论
同频不一样好吧。。。ivy同频比snb强。
评论
轮子的测试里面i7 2600不超频性能比E3 1230v2略强。笔记本上差距大是因为ivy进入睿频的时间更多
评论
别动不动说什么解毒,试试Power7或者Fujistu的SPARC64好了
跑那些破benchmark有什么用,对个人消费者的脑袋来说,除了看得懂几个打分还看得懂啥
评论
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 13 maps to package 0 core 0 thread 1
喂搞搞清楚啊,你这个测试把两个线程都塞到同一个核上了,你现在是一半的核塞了俩线程,一半的核根本没用。两块
v爷上gpgpu吧。
评论
没问题啊,这些年 没代就提升10%-15%, 加起来能有100% 已经很不错了。。。提升本来就很小
评论
同频ivy比sandy快3%-5%,大概就是高100MHz的级别。
评论
应该没问题,因为指定了只开12个线程,所以第13-24个线程应该是不存在的
评论
丝毫看不懂V爷在说神马
评论
跑Linpack必然是这样, 矩阵计算是最适合使用流处理指令的场所。只要code版本够新,用支持AVX的U优势会极其明显。
你要跑一些不使用AVX的程序就知道差距了。
评论
不是。你看log里那个mapping,两个进程map到了同一个core。我不知道操作系统是不是会重新调度进程,但是如果按照这个omp的分配的话你的确是把两个进程分配到了同一个核上。另外我这里说的线程都是指cpu线程,是在HT开启的时候每个核有2个的那个东西。
没有人能给你保证只有12进程的时候一定给你一个核一个线程的。
评论
X5680这样的古老的NEHALEM架构在IVB甚至SNB面前丝毫占不到便宜,后者的内存效率要高不少了……
评论
搞大型计算的,单cpu性能相对没那么重要,多cpu是王道
评论
snb比前一代的simd指令宽了一倍。
评论
看清楚log嘛,前面是初始的affinity,实际计算时候的是:
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {0,12}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {3,15}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {2,14}
OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {5,17}
OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {7,19}
OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {6,18}
OMP: Info #147: KMP_AFFINITY: Internal thread 8 bound to OS proc set {9,21}
OMP: Info #147: KMP_AFFINITY: Internal thread 9 bound to OS proc set {8,20}
OMP: Info #147: KMP_AFFINITY: Internal thread 10 bound to OS proc set {11,23}
OMP: Info #147: KMP_AFFINITY: Internal thread 11 bound to OS proc set {10,22}
OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {4,16}
评论
哦屎的确。你说的没错。
评论
5680 转了没???
评论
不支持多核心吧?
评论
整个超级计算机你会满意 电路 电子 维修 求创维42c08RD电路图 评论 电视的图纸很少见 评论 电视的图纸很少见 评论 创维的图纸你要说 版号,不然无能为力 评论 板号5800-p42ALM-0050 168P-P42CLM-01 电路 电子 维修 我现在把定影部分拆出来了。想换下滚,因为卡纸。但是我发现灯管挡住了。拆不了。不会拆。论坛里的高手拆解过吗? 评论 认真看,认真瞧。果然有收
·日本中文新闻 唐田绘里香为新剧《极恶女王》剃光头 展现演员决心
·日本中文新闻 真子小室夫妇新居引发隐私担忧
·日本中文新闻 前AKB48成员柏木由纪与搞笑艺人交往曝光
·日本学校 {日本国际学校}梅田インターナショナルスクール
·日本学校 LINE:sm287 陳雨菲、20歳、台湾からの留学生、東京に来たばかり
·日本留学生活 出售平成22年走行48000km 代步小车
·日本华人网络交流 円相場 一時1ドル=140円台まで上昇?
·日本华人网络交流 问日本华人一个问题
·日本旅游代购 富山接机
·生活百科 英国转澳大利亚转换插头
·汽车 【求助】修车遇到困难怎么办?