Intel skylake performance review: Finally ! After Broadwell launched stealthily with months of delays on LGA 1150, the Skylake processors arrive with their share of novelties. Spearheading the range, it is the Core i7-6700K and Core i5-6600K being launched initially. On the menu of this new “tock” found over the Haswell 14nm passage, improved side CPU architecture as iGPU and a new LGA 1151 platform associated Z170 Express chipset.
Reverse paper launch! Intel skylake performance review
Before you get down to business, we want to do for once a little aside on behind the scenes. The launch of a new product is pretty run-in computer, including CPU and GPU side. Usually manufacturers provide upstream technical documentation and marketing in the press, as well as test samples, allowing it to give its opinion on products as they become available. A habit that is really for the benefit of the manufacturer if the product is good, although some think the key is to talk to them, be it good or bad. It happens obviously not always so, and it may be necessary to obtain the information or material via other means, whether before or after the launch, which does not in this case to give notice before availability. There are some years when the competition was fiercer builders were going to make “paper launch”: the product and its specifications were announced, sometimes with test samples, but had to wait a few weeks for availability shop. for Intel launch process so rather unusual. While we received a Core i7-6700K at the last minute (last Friday), no technical documentation available to date. The architecture Skylake will be unveiled to the press at IDF that takes place in two weeks, with an embargo until September. The launch of the Core i7-6700K, Core i5-6600K and Z170 Express has indeed been advanced compared to the rest of the range, so as to be synchronized with the Gamescom show dedicated to video games, gaming and overclocking being target announced by Intel for these products. A target for which the technical details seem accessories … from Intel. We do not really agree with this approach and despite the very short time available to us we tried to do the best despite the circumstances to give you the maximum information as possible today. We will try in the coming weeks to bridge the dark dots left by Intel.Enough chatter, let’s concrete! Intel skylake performance review
A transition to 14nm more complicated than expected: Intel skylake performance review
From the first rumors about Broadwell, Intel’s 14nm found himself surrounded by many questions. For the first time, the manufacturer plans to launch a new generation of processors not only to the mobile platform , leaving aside the desktop platforms. What lay at the time many questions about a possible lack of interest of the manufacturer for these platforms in favor of a strategy focusing on mobility.Looking back today, we can see that in practice the first delay rumors the 14nm were quickly followed in 2013 where Broadwell had started sliding slowly on roadmaps. In October 2013, Intel officially spoke of a delay of a quarter blaming the 14nm production difficulties, but ensured that this would have no impact on the next generation, Skylake. A surprising communication as Skylake uses the same 14nm process considered late Intel skylake performance review
Three months behind Intel by the end of 2013, the red line that we added on the graph rather showed a six-month delay at the time
A communication in late 2013 suggested between the lines that the delay would be more than six months, then finally mid-2014 nine while allowing us think that the builder was going to sacrifice its margins for absolutely launch a 14nm products in 2014 and meet the . promises to investorsconfirmation will arrive soon: Intel launched late 2014 an ephemeral Core million, including the stepping has been announced end of life even before its release . Only beginning in 2015 we have seen happen the first “real” dual heart Broadwell , while it is only the end of the second quarter thatthe quad core models have landed. The coming years will show whether Intel deliberately chose to limit the scope of Broadwell, not by a lack of interest desktop platforms, but rather by the realization that its 14nm process would not be able to deliver in 2014 – and volume – all markets. Conventional launches beginning in fact generally the desktop models with quad core, and therefore more complex diseases to occur when yields are low. In the meantime the manufacturer has decided to take the margin as announced mid-July set aside the rate of Tick-Tock and the arrival of a third product in 2016 with 14nm Kaby Lake on which we do not know for the moment not much, except that he arrived on the roadmaps is due to the difficulty of rise in production of 10 nm. Cannonlake, 10nm Skylake declination only happen during the second half of 2017 (instead of 2016). Stay in the middle of these previous delays and future architecture Skylake launched today is on time since launched about 26 months after Haswell! Intel skylake performance review
LGA 1151 and Z170 Express
LGA 1151 and Z170 Express
After a little more than two years of loyal service, the LGA 1150 passes control to LGA 1151. Visually nothing really changes, the alignment keys are slightly offset to prevent the insertion of an incompatible processor, and the system fixing remains unchanged from the LGA 1156/1155/1150. If the passage of the LGA 1155 LGA 1150 was among other reasons by the integration of the voltage regulator in the CPU … Intel LGA 1151 will revert to an external controller! To recall, on the Haswell motherboard provides the processor 2 voltages via the socket against 6 previously: VDDQ which is directly re-used for feeding the DDR3 VCCIN and which passes through an integrated voltage regulator to be converted into 5 distinct voltages. Intel put forward at the time a simplification of the design of platforms and control by its power processor.
We are witnessing on LGA 1151 in a backward since in addition to VDDQ the motherboard must provide 3 additional voltages via a voltage regulator that is externally again: VCORE for x86 cores and cache LLC, which is derived VRing for the interconnection ring bus, VGT for iGPU and VSA for System Agent (memory controller, DMI, PCI-E). We tried to learn from Intel’s reasons for turning back, which we explained that this choice was made to offer more possibilities in terms of heat dissipation on mobile devices which is quite vague. Is is simply out of the CPU and its packaging losses from voltage regulation, even if it is only a few tenths of watts? A motherboard manufacturer has also indicated that we IVR was not good for the CA and an external regulator was needed in the case of iGPU GT4e. Again this is quite vague and we hope to learn more at IDF, it is certain that this change de facto prevents any compatibility with the previous platform, although Intel has shown us in the past not to have need that kind of excuse. Intel skylake performance review
But the LGA 1151 also brings a lot of new chipset with the side Z170 Express inaugurates a new generation of chipsets. First point, the link with the CPU goes to the DMI DMI 3.0 instead of 2.0 introduced in 2011. The rates are doubled to 4 GB / s in each direction, a necessity given the cumulative rates of interfaces available within the property that chipset we only use them rarely simultaneously. If the number of SATA 6Gbps remains limited to 6 against Intel increases the number of USB 3.0 ports from 6 to 10! Caution in both cases, Intel has reduced the controllers of the old standards. The SATA controller is no longer supports IDE and AHCI works only while the USB 3.0 controller does not work in EHCI but xHCI. Knowing that the Windows 7 installer does not support xHCI, it is not possible to install Windows 7 from a USB stick unless integrating xHCI drivers within the ISO beforehand (as explained here by example .
). Finally, no fewer than 20 lines PCIe Gen3 that can be managed by the Z170, against 8 PCIe Gen lines 2. These lines can be used for embedded chips to the motherboard, for PCIe and SATA interfaces Express x2 or x4 or x2 M.2 for SSDs. Of course even if the DMI bus has been expanded, it is insufficient for any use simultaneously since this link is equivalent to the rate of 4 lines PCIe Gen3! Intel skylake performance review
In all cases the motherboard manufacturers can not integrate all these possibilities at the same time as Intel continues to keep the concept of I / O Port Flexibility introduced with the Z87. The combined number of PCIe lines, USB 3.0 and SATA ports can well exceed 26 (against 18 of Z87) and as needed a line may be used for either of these interfaces. These developments Z170 Express is a good thing, although ideally we would have liked in parallel the number of lines PCIe Gen3 managed by an LGA 1151 CPU increases slightly, for example by passing 16 to 20 which would allow the management a harbor M.2 x4 without monopolizing the DMI link to the chipset.
The new processor side and overclocking
Without documentation, it is unfortunately difficult to know what are the improvements made by Intel at the front end (recovery and decoding x86 instructions, branch prediction), the scheduler (reorganization of the order of execution of instructions) or calculation units. Helped AIDA64 we performed latency and throughput measurements for instructions supported by Skylake ( here, compared to this file for Haswell) and the threads are generally faster, including a gain of 25% FMA type instructions.
Speaking of education, Skylake not ultimately supports the AVX-512 on Core i7 and i5, it will be reserved for Xeon as we already mentioned . The TSX instructions disabled on Haswell following a bug, are like on Broadwell back and ADX instructions and RDSEED introduced on Broadwell are in the game. The only novelty Skylake as introduced today is therefore situated at the Intel MPX instructions that increase the security of software to check the run time that memory references included in a program are not not used in a malicious manner (via a buffer overflow example). side cover the big change made by Intel’s back to an associative L2 cache 4-way, 8 against since the first Core i7. This simplifies the management of cache and should make it more energy efficient, but with a negative impact on performance. Indeed a memory line will no longer have 8 options but 4 to be stored in cache, potentially increasing the risk of conflict with another line and therefore useful memory defects conflicting cache.
The measured speed at AIDA64 shows equivalent frequency (4 GHz) a slightly slower L1 cache speed but equivalent latency to 1ns or 4 cycles, L2 side there is a strong increase (+ 66%) in flow writing and drop of the order of one cycle of latency. The flow in write cache L3 also up (+ 18%) but its latency is rising, from 42 to 45 cycles. Latency access to DDR3-2133 also increases from Haswell.Beyond the cache memory management is also changing with the support at the memory controller DDR4, we have extensively described the occasion the launch of Haswell-E , in addition to DDR3.Officially DDR3 is no longer supported in Version 1.35V ( “DDR3L”) in DDR3-1600, but in practice we have not encountered any problems with the classic memory 1.50V and 1.65V even (we however, please limit yourself to 1.50V), the DDR4 may be of DDR4-2133 or more via overclocking. speaking of overclocking, Intel introduced some innovations on Skylake, the main one being a complete decoupling of PCIe frequencies and DMI and BCLK. Completely linked on LGA 1155 which limited overclocking by the bus a few% on LGA 1150 Intel had introduced ratios for a little more freedom.This time decoupling is complete, regardless of the selected speed of BCLK bus PCIe and DMI will remain at 100 MHz allowing free rein to the desires of overclockers, provided of course the use of proper ratios for the elements which remain relevant (CPU, iGPU, DDR). Intel skylake performance review
A healthy return on paper, but in practice it will be really useful if it allows overclocking the processors’ no K “, and there is no so far to say that. Otherwise, it just will be a useful embellishment for researchers to record all kinds willing to spend time to some points of benchmarks. Side multipliers Skylake is also able to reach the x83, x80 on against Haswell while memory ratios allow for a granularity of frequency hopping of 100/133 MHz 200/266 MHz against before, with up to DDR4-4133 (and of course by increasing the BCLK). Updating the 08/19/2015: When IDF we could learn more, you will find our news published on this occasion for the CPU part: Intel skylake performance review
Finally ! As promised, Intel takes advantage of the IDF to begin to talk about the details of the architecture of Skylake through several sessions that took place on the first day. We have tried to gather as much information in the news long, knowing that the manufacturer tends to drop dropper details of a session to another, it is not impossible that some details are unveiled that in the coming days ! We will try to note the differences from Haswell, you can refresh your memory by reading our article .
Before you begin, note that the manufacturer is working on Skylake several years (more than 5) and that the project, in Israel (Intel alternates two teams, one in Israel and one in Oregon) has been amended several times to successively add TDP of 15 watts (with Ultrabook launched from 2008) then 4.5W (Core M) as well as packaging changes. Note also, and this is a first, that the details we quote below are for the consumer version of Skylake. The server version will take advantage of different choices in terms of architecture, we know for example that only the server version of the Skylake bear AVX512, but the differences should be wider.
Frontend, scheduler and execution units: Intel skylake performance review
In outline it is noted no major change on the frontend remaining type 4-way (up to 4 decoded x86 instructions in simultaneous) as Sandy Bridge and Haswell. In practice these four CISC instructions, up to 6 micro-ops (RISC instructions) can always be generated. Upstream of decoding, we find the branch prediction by changing significantly against. Intel does not fit into the algorithm details but says he is more intelligent and able to consider the connections much longer than before. Similarly, Intel has increased the size of the various buffers during the various stages of the front-end, a change that is found almost all new architectures. The micro-ops are decoded in fact stored in two lines capable ‘store in 64 per thread (against a single buffer of 56 Haswell), a major change that allows the scheduler to try to extract maximum parallelism. the scheduler aims to dispatch micro-ops to units execution. It takes advantage of queues lengthened and some changes in its algorithms in the management of hyperthreading. Intel skylake performance review
The most vague for the time point concerns the execution units. Remember, with Haswell was available 8 ports that were distributed to multiple execution units (ALU for instructions on integers, AVX / floating units, loading / storing data backups, connections …). For now all we know is that the number of units increased without knowing those that have been added. Two other details were given by engineers from Intel: the unit in charge of divisions gaining flexibility while processing latency of some FPU instructions would be declined. Intel has promised us that we would have more details on these issues in a future session. Overall the changes are interesting in the sense that they allow in theory to maximize a bit more use of threads to each core, this which can result in some cases in a significant increase in performance. The builder in this way managed to significantly improve its results in the SPEC benchmarks on a heart (which earned our German colleagues to highlight the rumor – obviously false – a hyperthreading reversed late last week). The changes remain very localized and in absolute terms, Intel continues to refine the outline of a Core architecture certainly excellent, but remains the same in outline for years.
instruction set: Intel skylake performance review
During our test Skylake we noted the presence of a novelty in the instruction set: MPX Memory Protection Extension. Use these instructions to add checks on memory addresses available to prevent buffer overflow attacks and prevent such a process to access a memory to which he is not entitled. We have not yet achieved more detail on these instructions.
By cons, another additional novelty is what the manufacturer calls SGX for Software Guard Extension. Use these instructions to create protected memory areas that are accessible only to the process that creates and which, in case the data were still corrupt, cut process operation concerned to maximize safety. Similarly, the use of a secured memory area (Secure Enclave) disables all the possibilities for debugging the system.
If one can easily imagine the usefulness of these extensions in certain situations in the business world, a perspective we think the public especially to any DRM implementations that could use these technologies in the future.
Power Management: Intel skylake performance review
Much work has also been done on the power management with increased use of power gating some intensive units, that is particularly the case for AVX2 units are off when not requested. Savings have been made at all levels, especially the interconnections and I / O to minimize consumption. The engineers worked specifically on idle scenarios or “almost” idle as video playback to maximize the battery life in mobile use. Among the solutions adopted to reach this goal include the addition of separate areas of clocks for the System Agent, the memory controller and the I / O eDRAM (for models that are filled). Important work also performed on the energy management unit (power Control unit) to make it a little more intelligent in multiple scenarios, it is able to estimate the risk of throttling and reduce the frequency in advance to avoid reaching the maximum temperature at which a severe throttling is inevitable
The other choice concerns the use of the Duty Cycle Control in lieu of a frequency change. As reported by Intel, reduce the frequency (via the P-States for hearts) reduces linearly consumption, and it is often more efficient to turn off and turn on (a bit in the manner of a controller PWM) units while keeping a higher frequency.
Speed Shift: Intel skylake performance review
The other major change in the PCU is what Intel calls Speed Shift, a fundamental change in the operation of P-States. Remember, the processor frequency is managed by both the processor itself and the operating system. The processor offers a so-called P-States table (via the ACPI tables) indicating the different couples voltages / frequencies it can use.
In a typical operation, the operating system, depending on the load that trafficking is explicitly control the changes of P-States (which requires a latency of around 30ms according to Intel) by choosing a level (eg , P1, the maximum frequency “non-turbo”). There are however – Intel – two exceptions to this rule. The first concerns the Turbo frequencies which vary depending on the number of active cores.This management is done directly by the processor. The other is the case of the throttle when the temperature exceeds a critical operation. In this case the processor performs only (thankfully!) Throttling passing in so-called thermal control modes. The idea for Speed Shift is to change the relationship between the operating system and processor. For one, the processor with Speed Shift now exposes all available frequencies, including Turbo modes so far managed transparently. Then, the operating system will give a kind of overall indication for whether to favor performance or energy saving (replacing the concept of performance modes / balanced / etc that are found for example in Windows 7 and the battery mode / power for laptops). Finally, by default the PCU is able to manage alone the P-States by automatically choosing the method that seems most suitable for the fly and completely autonomously. Above this, the operating system may decide to intervene, but this is done in a new way. Indeed, the system sets a minimum frequency and a maximum frequency, leaving again the PCU leeway to automatically optimize at best depending on the load. It is also possible to request a specific frequency, optionally, but this is done in addition to the frequencies minimum and maximum and is not guaranteed anything.
Intel implements advanced algorithms in its PCU attempting to estimate permanently if it is more interesting to limit the frequency in the case of a constant light load, or on the contrary pushing for the frequency to switch off as soon as possible the units and save energy in the end.
The algorithm of the PCU also attempts to detect situations where one interacts with the system to improve responsif side of the system. The basic idea as intended by the designers of the system was to move as quickly as possible in turbo mode when an interaction is detected (alarm clock, mouse, etc.) to give the impression that the system is more responsive. In practice the PCU attempts to detect the typical expenses of an interaction and uses several systems to filter longer charges (video playback), or so short that they do not deserve the frequency increases.
In practice in any case, Intel says being able to reduce both fuel consumption while not sacrificing performance. You can see on this slide some results announced by the same manufacturer if he does not return more in the details! Speed Shift fundamentally changing the mode of interaction between the processor and operating system, you will not be surprised learn that its support must be explicit, and is limited today. Today, only Windows 10 is able to exploit. On all other OS, the operation remains with the old. It is intriguing to see that Intel has not released a patch for Linux to add the Speed Shift support. Other technologies such as MPX Skylake have in fact been entitled to support from January Linux. Nevertheless, the idea to renovate the old fort concept of P-States is a great idea, let’s hand the processor on its frequency almost seems obvious and it will be very interesting to see the practical impact that this technology conservation of battery on mobile versions of Skylake. Intel skylake performance review
eDRAM, IVR, Chipset
Another important change concerns the way the eDRAM memory is interfaced with the processor.With Broadwell eg eDRAM is interfaced behind panel LLC and may contain the memory used by the IGP or the hearts memories (tags marked in the LLC that uses what).
Skylake with the eDRAM is found placed between the LLC and the memory controller, integrating even more transparent in the memory hierarchy. In practice this change allows to cache data that can come from everywhere. The L4 can thus contain cached queries or PCI Express chipset. This transparency is however disengaged through a graphical use. The Intel graphics driver has a specific mode of access that allows him to ask where he wants to be cached certain information. He may request that information be stored in the L3, or the eDRAM choice, or otherwise nowhere. Decisions that the driver, according to Intel, is even to estimate correctly. The real impact of this change is difficult to estimate although potentially it should benefit non-graphic applications. Note on the question of the removal of integrated voltage regulator that engineers have given us an answer: the decision to remove a specifically taken because of 4.5W models or the IVR was ineffective. A “better” solution, if they had had more time as Intel engineers would remove the IVR only on low power models and keep them on others. A subtext suggesting that the manufacturer may opt for this separation Cannonlake for the future.
Final note chipset, beyond the changes already mentioned in our article offers an original feature: it is now able to enter throttling mode in case of overheating. The idea is mainly to avoid the situation where, due to overheating of the platform, the PCH could endanger the system. Intel wins a little room especially for chips 4.5 and 15W that include in the package PCH, even next to die CPU / graphics.
Core i7-6700K, Core i5-6600K, ASUS Z170-A et G.SKILL DDR4-3600
For this test we were able to get their hands on the two references launched today, namely a Core i7-6700K and Core i5-6600K one. Both processors have a TDP postponed to 95 watts by the tools, but oddly Intel talking about 91 watts in marketing materials associated with the launch. Physically a Skylake is very close to its predecessors.
Core i7-6700K has 4 cores with Hyperthreading and a LLC of 8 MB, the base frequency x86 cores is 4 GHz while iGPU may reach a maximum of 1.15 GHz. If the Turbo is 4.2 GHz, it is only valid for active heart while under the same conditions a 4790K was able to reach 4.4 GHz, and 4.2 GHz with 4 cores active. A choice probably guided by consumption, the default voltage of 1.23V already being 4 GHz which is not encouraging for overclocking this 14nm chip (usually more burning finesse is lowered less you have to push the voltage to avoid damaging the transistors).
The Core i5-6600K lacks Hyperthreading and LLC is reduced to 6 Mo. The iGPU maintains a frequency of 1.15 GHz while the 4 x86 cores have a base frequency of 3.5 GHz. The Turbo enables against this time to go a little further with 3.9 GHz (400 MHz) of 1 heart, 3.8 GHz (300 MHz) on 2 cores, 3.7 GHz (200 MHz) on 3 cores and 3.6 GHz (100 MHz) on 4 cores. This time the operating voltage is more reasonable because of 1.16v. The recommended rates are $ 350 and $ 243 respectively, or almost exactly those of the Core i7-4790K and i5-4690K. As often as a result of novelty and supplies that are made only through official channels, however, there should be a price difference of around 10% during the first weeks. Attention as LGA 2011 processors, these LGA 1151 version K are not provided with a cooler! Either this is not a big loss seen performance that provides hitherto, but he could help.
Finally we G.Skill has for its part sent a very fast DDR4 kit Ripjaws V compound 2×4 GB DDR4-3600 in 17-18-18-38 to 1.35v. With such speeds, the gap will widen perhaps facing DDR3 .
CPU: DDR4 DDR3 vs practice: Intel skylake performance review
If Intel and all manufacturers highlight the DDR4 for this launch, we had the chance to get their hands on a Z170 motherboard can accommodate DDR3. Although his bios caused us a few problems at the Turbo we used a Core i7-6700K blocked at 4 GHz with hyperthreading disabled to have better stability in results, to compare numerous memory settings.
We start with the theoretical tests with results as AIDA64. In terms of bandwidth, DDR3 shows itself at its best in DDR3-2400, beyond there is a decline in results. With equal speed (vs DDR4-2400 DDR3-2400) bandwidths are close, with a slight less for DDR4. The DDR4-3600, fastest, provides about 30% more bandwidth DDR3-2400 face. Latency side is less rosy. While adjustments to “equivalent” DDR4 is about 1ns faster, although it should be taken into account as we have seen that on Skylake latency we measured DDR3 is lower by almost 4 ns compared to Haswell, DDR4 sale offers more relaxed timings for the benefit of the frequency increase. So to be at the 11-11-11 DDR3-2400, DDR4-2800 it takes between 16-16-16 and 15-15-15, the most swift DDR3 being exceeded only slightly and from DDR4-3000 15-15-15. Intel skylake performance review
In practice it is more complicated for the DDR4. Under 7-zip, which is very dependent on the memory speed, fast DDR3 allows up to 17.9% of performance relative to the DD3-1600, and these are only the DDR4-3466 and DDR4 -3600 who manage to rise to the same level. x265 is against the opposite and do not really benefit from an increase in memory speed. Between the slower memory and the fastest graphics, the gain is only 0.5%, and despite the relative stability of results we are not far from the benchmark error margin! It ends with Arma III like 7-zip and many games appreciates a swift memory with 10% better about between DDR3-1600 and faster DDR3. The behavior of the DDR4 here is slightly better than in 7-zip as it takes “only” the DDR4-3000 to exceed one bit DDR3 best. If DDR4 offers an advantage in terms of bandwidth, this could be useful for iGPU, CPU side his contribution is somewhat limited especially as the fastest bars will not be given. If you already have the right amount of memory and wish to upgrade to a Skylake, you can go on a DDR3 motherboard without regret, the only limitation being perhaps ensure that a kit can be used 1.65v long-term with a Skylake what we do not know yet. Anyway, on the desktop DDR4 leaves us hungry, but it was also the case for DDR3 and DDR2 their launches! Intel skylake performance review
CPU : Sandy Bridge vs Ivy Bridge vs Haswell vs Skylake à 4 GHz
Of course we wanted to study the frequency equal to performance gains offered by this new architecture. To do this, we used Core i5-2500K, Core i5-3570K, Core i5-4670K, Core i5-6600K and Core i5-5675C, respectively Sandy Bridge, Ivy Bridge, Haswell, Broadwell and Skylake, all clocked at 4 GHz x86 to their hearts. Since Haswell, the ring bus and hides LLC have their own frequency range, generally less conducive to the increase in frequency as part of overclocking. For this reason we have set at 3.5 GHz for these tests performed with DDR3-2133 11-11-11-31 1T. We also added the Skylake performance with DDR4-2800 15-15-35-1T, giving him a greater advantage than in previous i7 on the combined action of hiding LLC more size reduced and clocked at a lower speed.
Compared to Sandy Bridge, Skylake offer an average gain of 22.3% in application, and 1.5 points higher passing DDR4. It is quite variable gasoline, since over 30% in some applications, the two renderers in 3ds, x264 and x265. Note that since the version used for this test, x265 has new more optimized versions for the latest CPU architectures and gain … then climbs to 60%! Conversely WinRAR and 7-zip earnings below 10%. Haswell compared to the gain of 5.5%, with the worst case 1-3% (Visual Studio, Stockfish and Houdini). In the game this time average gain is 21.1% against Sandy Bridge with 4.9 points better in DDR4 nonetheless. Again with gains enough variables: only 7-8% in Watch Dogs and F1 2013, but not less than 45% for X-Plane 10! Compared to Haswell gain rises to 8 DDR3 and 13% in DDR4, with more stable earnings. 5 to 16% better and DDR3, 8 to 19% in DDR4 compared to Broadwell and eDRAM earnings are very weak, and in some applications the gains microarchitecture does not compensate the loss of eDRAM. At equal frequency, in Broadwell games even before Skylake! Intel skylake performance review
CPU: Overclocking in practice
We seek, in steps of 0.05v, the lowest voltage to 4 GHz, then we increase the frequency in steps of 200 MHz as additional 0.05v are sufficient. If this is not the case we are trying to stabilize below 100 MHz and then re-increase the frequency by 100 MHz bearing. Stability is validated using Prime95 FFT 256K for 15 minutes, the temperature being postponed the average of 4 cores in the last minute of the test knowing that this is a test system opened with a room temperature 25 ° C, the CPU being cooled by a Noctua NH-U12P SE2.
We start with the Core i5-6600K that runs by default in 3.6 GHz in Prime95, 100 MHz more than the base frequency. With this type of load its voltage rises to 1.2V by default against 1.16v under a reasonable load as Fritz Benchmark. We could lower it to 1.15V while riding the frequency at 4.0 GHz, so you can enjoy a lower consumption despite increased performance. The rise in frequency is by notch 200 MHz and 4.4 GHz up to 0.05v, but to stabilize the 4.5 GHz we went to 1.3v while the 4.6 GHz were not stable at 1.35v, voltage at which we have decided to stop.
Prime95 the Core i7-6700K running at 4 GHz, Turbo offering no frequency gain. While we were at 1.23V under Fritz, tension rises this time Prime95 1.31v default. Oddly, despite the tension high announced consumption is barely higher than that measured on i5-6600K. This is actually the action of two factors, firstly on Skylake the load is reduced under Prime95 with HyperThreading, but still remains higher than that of conventional applications, secondly same equal load (4 thread) we noted an advantage of about 8 watts in the decision for the i7 (at 1.15V and 4 GHz), which probably suffer less leakage current than the i5. Returning overclocking itself. At 4 GHz we could use a voltage of 1.15V on the i7-6700K which leads to a more than noticeable drop in consumption. For 4.2 and 4.4 GHz is used and 1.2 1.25v, as the i5-6600K, by against the 4.5 GHz could be stabilized with a voltage of 1.35V. The 4.6 GHz could not be stabilized at 1.35v, our voltage limit. On the basis of these two samples so we have a limit to 4.5 GHz, with the need to increase the voltage of 0.05 or 0.1v compared 4.4 GHz. It seems to be at about the level of Haswell for OC H.24 (i5-4690K and i7-4790K), but some copies including Haswell Refresh were able to go higher, it will see if it is also if Skylake side. Thus rest by cons down compared to Sandy Bridge that mature allowed to exceed 4.5 GHz and even approach the 5 GHz. Intel skylake performance review
HD Graphics 530 in practice: OpenCL, QuickSync
If the question of the game is always tricky, what about the OpenCL? This is what we wanted to check in several tests.
We use this test version 2.0 Luxmark, a benchmark using the 3D rendering engine LuxRender Open Source software. A version 3.0 is available for some time but the driver support – particularly from Intel – still far from perfect. Thus, only the smallest stage runs on the Intel drivers, the two most important scenes still continue to plant the software that runs smoothly on other platforms. The issue of drivers – OpenCL included – remains a point on which Intel is working, even if we salute the progress. Last month, Intel drivers were not even work the first stage Luxmark 3.0 … We look at the performance booster in three modes, OpenCL CPU, OpenCL OpenCL GPU and CPU + GPU:
Sony Vegas Pro 13
Sony Vegas Pro 13 video editing software has two levels of acceleration. The first concerns the preview and rendering can be accelerated via OpenCL. The second concerns H.264 encoding managed by the MainConcept codec can either use a CPU mode, OpenCL or CUDA. In practice, however our tests, the impact of the acceleration of our MainConcept test sample is negligible, the bulk of the gains has accelerated rendering. We perform rendering / H.264 compression test of a scene mode BluRay 1080p24 CPU and accelerated modes (rendering and MainConcept).
DxO OpticsPro 10.4.2
We use version 10 of DxO photo software that has both a GPU acceleration for display, but above all a way to accelerate OpenCL parts of treatments. We apply the treatment DxO Standard that includes various fixes including optical lens aberration. We note the results in CPU and OpenCL fashion: Intel skylake performance review
In processor mode, Skylake is a bit faster qu’Haswell of around 4% when comparing to DDR3-2133.The gap with the AMD APU is … broad. By activating the OpenCL, the gains vary widely. The processing time is reduced by over a third from AMD, which saves the performance of APU. Haswell on the gain is infinitesimal, while on Broadwell, compression time down 25%, probably helped by the additional units and above L4 cache. Skylake advantage a little more qu’Haswell, reducing around 6% its processing time. In the end, and that’s rare enough to be reported, this is the entry level GPU that bring here the most gains, particularly those from AMD while the GTX 750 catches up with Nvidia to give compared to models from below. The 740 GT was here consistently slower than the GT 730 without one will explain why.
QuickSync: Intel skylake performance review
We wanted to look at what the proposed QuickSync acceleration on various Intel platforms.Remember, we noted during our test platforms Braswell gain notable quality at this accelerated H.264 encoding. For this, we used the software Cyberlink MediaEspresso, our tests are performed in the “Best Quality” mode of the software. Note that if theoretically Skylake provides accelerated encoding H.265 / HEVC, this is not supported by currently available software versions as MediaEspresso and Handbrake.
Click on the image to open the comparison!
HD Graphics 530 in practice: H.265, consumption
We also wanted to verify the H.265 acceleration heralded as one of the new Intel managed to Skylake. It must be said that the subject is a bit nebulous, we saw for example Intel announce its drivers earlier in the addition of a partial acceleration of H.265 decoding in its drivers Haswell and Broadwell, while card manufacturers graphics sometimes also speak of partial acceleration.
H.265 playback in MPC-HC: Intel skylake performance review
To test this, we used a test scene H.265 / HEVC 4K with a throughput of 17 Mb / s ( Elecard 4K video about Tomsk, part 3 ). We use the MPC-HC video player in version 1.7.9 which includes LAVFilters filters based on the excellent ffmpeg. These allow you to enable the H.265 hardware decoding 4K in their options (these options are disabled by default) via the DXVA2 protocol in Windows.
We compared the result in CPU mode and “accelerated” on all platforms, noting the one hand the total CPU utilization of the platform, and other consumption raised in the decision, all during playback.
Let’s be very clear, there is in practice a single platform that fully supports the acceleration of H.265 playback 4K is Skylake. The net result is with consumption falling by 38% and a much lower processor utilization. This however is not the only platform to “pretend” manage by H.265 DXVA.Talking on the case of the GTX 750, which declares itself capable of decoding the H.265 via the driver (not the other two). In practice the saccade and reading is useless. What about Haswell / Broadwell?They also say they can through their pilot a DXVA acceleration, but one wonders what is accelerated.In both cases, we have noted an increase of 10 to 15 watts of consumption of the platform, and CPU consumption rather also rising. If there is a carrier, it is very partial and completely useless on our test scene. It has enabled these relatively Intel damage deficient acceleration for older platforms in its pilots, except for obvious bug, it serves the builder rather than anything else, while denigrating what brings true hardware acceleration properly managed as this is the case on Skylake! Update: Note that we have not managed to enable DXVA playback Skylake with videos encoded with the “hand 10” profile using 10bit per component. We await confirmation from Intel to see if it is a hardware limitation or simply a software problem.
We measured the consumption of our multiple platforms in six scenarios: – Idle – When playing a Blu-Ray 1080p H.264 in MPC-HC (all hardware acceleration enabled) – In F1 2014 1080p intermediate – Sub Luxmark CPU Mode – Under Luxmark in GPU mode – Under Luxmark CPU + GPU Mode We use a power supply Seasonic 660SP, consistent with the performance standard 80Plus Platinum, measures are those of the overall platform, the 230V outlet. The measurements are performed with the lowest level tested memory, DDR3-1600 and DDR4-2133 respectively based platforms.
Note that overall consumption Skylake is down compared to Haswell, beyond variations TDP announced. Conversely it is the Core i5 Broadwell that consumes the most among the three Intel platforms, a slight height as its TDP is announced at 65W only. The difference is marked in expenses CPU / GPU or cumulative, it must be said, it is very efficient due to its L4 cache (See our previous benchs). Compared to the competition is the A8-7600 65W Skylake that, in our graphics tests, the closest thing to a consumption profile view. Note that graphics cards AMD add a significant extra cost consumption when accelerated H.264 playback, something that is also found less pronounced on the APU. Nvidia cards and Intel processors do not have this problem
We finally crossed our measures of performance and consumption platforms under Luxmark to calculate the performance / watt compared different solutions. We use scores DDR3-1600 / DDR4-2133. Here are the results we obtain:
On pure CPU scores, one notes that between the three Intel architectures is that Broadwell is doing best here, thanks to its L4 cache that improves performance on Luxmark. In GPU mode and CPU + GPU combined, Skylake takes against by the wider thanks to the excellent scores he gets here, higher scores than the rest of the score obtained. We approach the very good report obtained by the R7 360 actually. If in the absolute the performance / watt APU GPU mode report are correct, the 7870K on a par with Haswell CPU mode is still very, very far . Intel to provide a counterpoint to the particular case under Luxmark Skylake, we also conducted a performance / watt F1
CPU Compilation : Visual Studio et MinGW-w64/GCC
Visual Studio 2013
We compile the animated 3D Blender Open Source software under Visual Studio 2013. We use the source code for the latest stable version at the time of the creation of our protocol, namely the 2.71 release. The project is compiled with dependencies by default. Visual Studio 2013 is able to compile non-dependent modules in parallel.
The gains are smaller this time with 3.8% more facing Haswell and 30.6% against Sandy Bridge i5, but they are higher on i7 with respectively 19 and 34%. The i7-6700K is that times faster than the i7-5820K hearts despite his less.
MinGW-w64 – GCC 4.7.1
We use again the same source code of Blender 2.71, this time compiled under MinGW-w64 / GCC 4.7.1. Dependencies are identical to those for Visual Studio. We force multithreaded compilation via the make command. A big thank you to Guillaume for the development of these tests!
MinGW under 36% performance is gained by passing a i5-2500K a i5-6600K, a gain that increases to 12% if you come from a i5-4670K. On i7 gains are higher with 46.2% and 10%, but do not forget that i7 Sandy Bridge suffers a default frequency sizable deficit facing the 4790K and 6700K. It is this time slightly behind the i7-5820K.
Approximately 7.5 GB of files from a version of Arma II with its extensions are compressed using WinRAR. We use the compression format RAR5 Ultra mode. Introduced with the latest versions of the software, RAR5 allows among others to make better use of multithreading.
WinRAR with the i5-6600K is 9% faster than the i5-2500K and 8% faster than the i5-4670K. Thanks to the frequency differential of the i7-6700K pushes the advantage to 17% against the i7-2600K whereas it is 4% relative to the i7-4790K. This time the i7-5820K is significantly faster than the i7-6700K. Intel skylake performance review
7-Zip is the second compression software used. We use this time LZMA2 mode compression Ultra, always on a version of Arma II, but this time without the extensions (3.5 GB) to reduce the test time.WinRAR is faster but then again it is not a question of comparing them software, which would require comparing the sizes of the resulting archive.
7-zip under the i5-6600K is 1.5% faster than its predecessor in Haswell against 10% better compared to Sandy Bridge. The i7-6700K is only 2% faster than the i7-4790K and the gap increases to 14% against the i7-2600K. Again the i7-5820K is out of reach. Intel skylake performance review
CPU Encoding: x264 and x265
Our first video encoding test is done under x264, specifically a build compiled in GCC 4.9.1 by Komisar, with compression of an extract of Blu-ray in 1080p a minute with an average throughput of 23 Mbps. This is ffmpeg which image server, we use one encoding pass in 20 CRF fashion profile Slower, we thank Sagittarius passage for these exchanges on the subject. The exact command line is–preset Slower –tune grain –crf 20 –ssim –psnr .
Skylake achieves gains of around 9% on i5 and i7 as compared to x264 Haswell, and 44 and 55% against the Sandy Bridge. Despite these gains the i7-5820K is still a little too fast for the i7-6700K. Intel skylake performance review
x265 v1.2 + 507
Then passes that allows x265 to encode videos in H.265, a new high performance video format because it promises a quality equivalent to H.264 rate divided by two but with a corresponding decoding load and ‘much heavier encoding. x265 is used in a version compiled with GCC 4.9.1 by snayper. This is again ffmpeg which image server for encoding in CRF 16 this time, to take advantage of the increased efficiency of x265 to win as always in profile Slower but in some adapting to reduce the gap with the slow profile while enjoying psychovisual options. The exact command line is –crf 16 –preset Slower –me hex –no-rect –no-amp –rd –aq 4-Mode 2 –aq-strength 0.5—psy rd 1.0 –psy-rdoq 0.1 –bframes –min 3-keyint 1 –ipratio –pbratio 1.1 1.1 –ssim -psnr .
This time gains offered by Skylake are about 8% face in Haswell i5 i7 as they spend 41% and 47% when compared to Sandy Bridge. With a newer build integrated assembler optimizations for AVX2, the gap is widening also more since the i5 2500K, 4670K and 6600K respectively pass at 3.98, 6.22 and 6.81 fps. The advantage of Skylake on Haswell is then 9%, but it goes 71% compared to Sandy Bridge! Intel skylake performance review
CPU 3D games: Crysis 3 and Arma III
Crysis 3 launches 3D games of this comparison. We use a backup on a particularly busy area of the game where we go for 20s in order to have an average framerate. The tests are performed in 1920 * 1080 Very High, without anti-aliasing.
On heavy parts with lots of vegetation as the scene of test used, Crysis 3 comes to pretty much operate more than 4 cores. The i5-6600K is 44% faster than the i5-2500K, but the gain was only 3% compared to the i5-4670K. i7 hand, 6700K displays 52% more than 2600K and 8% better than 4790K. Note that the official version of the game refuses to run on Skylake, the DRM returning an error “8016”!
Arma III we load a save during a training helicopter in which we fly for 20s the island of Stratis. The tests are performed in 1920 * 1080 Ultra without anti-aliasing. Intel skylake performance review
CPU 3D Games: Watch Dogs and Total War: Rome 2
The framerate of Watch Dogs is in turn measured at a 20s race in a rather busy part of the game. We found scenes with framerate 10 to 20% lower but the automatic backup system prevented us from using in a very reproducible, as the changing environment. Performance is measured 1920 * 1080 with a level of overall quality Ultra without anti-aliasing.
Watch Dogs can be operated more than 4 cores. The i5-6600K is 14% faster than the i5-2500K, a reduced gap in half face at 4670K. i7 side the advantage of 25% against the 2600K Sandy Bridge and 9% compared to the 4790K. Intel skylake performance review
Total War: Rome II
For Total War: Rome II we simply measure the framerate in the first game of the prologue scene in 1920 * 1080 Extreme but by disabling the AA and SSAO.
Performance Indices CPU
Now the average. Although the results of each application all have an interest, we calculated performance indices Based on the result set and giving the same weight to each test. We present two medium, one application integrates all tests outside of 3D games and the other is specific to 3D games that are generally less multithreaded.
In an application i5-6600K saves 30.4% of performance against a i5-2500K has four and a half years, which is not negligible. The gain is reduced when compared to CPU latest course: 18.7% against the 3570K, 7.9% against the 4670K (while there is a 4690K) and 4.1% compared to the 5675C. If comparing the i7-6700K face a i7-4770K launched two years ago the gain is 19.2%, but in the meantime released the Intel 4790K and then the gain is only 6.5%, the frequency deficit (4200 vs. 4000 MHz Turbo frequency of 4 cores) does not help. The comparison with a less high clocked i7-2600K is necessarily advantageous, the gain is 42.2%. Not so distant at the same rate if motherboards are a little more expensive, the i7-5820K is an average of 7.1% before the i7-6700K. Intel skylake performance review
For games, the i7-5820K merit against by being overclocked to approach the i7-6700K which is 15.4% faster. Compared to 4770K and 4790K earnings are 22.8 and 9.6%, against 30.6% better side and 40% against a i7-2600K. The frequency difference is less, the gain between an i5-6600K and i5 -2500K is reduces to 25.8%. 10% is gained from a i5-4670K and i5-3570K 16.1% from a. Note that i5-5675C remains the fastest i5 in games because of its eDRAM which L4 cache automatically for the CPU! Intel skylake performance review
Conclusion: Intel skylake performance review
In a survey conducted a few months ago , 42% of you were expecting a greater than 20% gain between Haswell and Skylake to change platform. In practice we are far since with only 10% better at equal frequency, even though the actual frequencies are slightly reduced Turbo side. Although flexible because now possible as much by bus as the multiplier, overclocking n ‘ will not change much since with 4.5 GHz with our two samples, we do not note significant improvement in frequency. A criterion that will be taken into account because if the gain is about 23% between Sandy Bridge and Skylake equal frequency, Sandy Bridge could reach higher overclocking frequency. The other disappointment, more expected this, is at DDR4. Only the fastest versions, DDR4-3000 and higher, can fight face to DDR3-2400 while these kits are more expensive this day. Intel skylake performance review Intel skylake performance review
The arrival of decoding H.265 is a good thing iGPU side but it is incomplete, not managing the format “Main10” which should be used widely in the future. Missing anyway management HDMI 2.0 and HDCP 2.2 to fully exploit the future 4K content with DRM. Despite the increased number of EUs and the transition to DDR4 performance, albeit slightly (at best 10%) are against largely inadequate for anything other than games with graphics support ranging from light to medium 1080p. Only eDRAM allows the iGPU Intel to start being interesting in 3D, and it is not present in these versions K. Of course the picture is not completely black and we must not forget that in the absolute performance of the i7-6700K and i5-6600K are excellent and without real competition. We are above all disappointed with the lack of gains face Haswell in 2 years and Sandy Bridge in 4 years. Booking a part of more and more of the die to a iGPU not necessarily convincing, Intel denies – and deprives us – probably important CPI gains or failing to increase the number of cores that would welcome on this platform: it will go on LGA-2011 v3 for 6 cores, the i7-5820K is not besides being much more expensive than i7-6700K. Another term track to Intel would of course be applied to the eDRAM on these versions, the gain can be significant also because CPU side acts L4 cache. We must also acknowledge the progress side platform because even if the voltage regulation suffers a flashback since returns externally, the chipset part is brought up to date with the Z170 which has a link to two times faster with the processor and extended possibilities side input / output (over PCIe Gen3 lines and, more USB 3). in the end with the platform LGA 1151, Intel offers a good product but not surprised, that will suit those with pre Sandy Bridge machines. For those currently in Sandy Bridge, the question remains, especially depending on the frequency reached via overclocking, whereas if you are equipped with Haswell you can safely go your way this year. Intel skylake performance review