<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>vPivot &#187; hyper-threading</title>
	<atom:link href="http://vpivot.com/tag/hyper-threading/feed/" rel="self" type="application/rss+xml" />
	<link>http://vpivot.com</link>
	<description>Scott Drummonds on Virtualization</description>
	<lastBuildDate>Wed, 01 Feb 2012 06:46:55 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Optimizing vSphere for Hyper-threading</title>
		<link>http://vpivot.com/2010/09/13/optimizing-vsphere-for-hyper-threading/</link>
		<comments>http://vpivot.com/2010/09/13/optimizing-vsphere-for-hyper-threading/#comments</comments>
		<pubDate>Mon, 13 Sep 2010 14:23:57 +0000</pubDate>
		<dc:creator>drummonds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[hyper-threading]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[numa]]></category>
		<category><![CDATA[scheduler]]></category>

		<guid isPermaLink="false">http://vpivot.com/?p=647</guid>
		<description><![CDATA[VMware&#8217;s Jeff Buell has been looking into High Performance Computing (HPC) in support of a new addition to the office of the CTO.  Jeff just posted an article on VROOM! showing outstanding memory bandwidth in vSphere virtual machines.  No one should be surprised by this&#8211;virtual machine memory bandwidth has rarely been a problem.  But Jeff [...]]]></description>
			<content:encoded><![CDATA[<p>VMware&#8217;s Jeff Buell has been looking into High Performance Computing (HPC) in support of <a href="http://communities.vmware.com/community/cto/high-performance">a new addition to the office of the CTO</a>.  Jeff just posted <a href="http://blogs.vmware.com/performance/2010/09/hpc-application-performance-on-esx-41-stream.html">an article on VROOM! showing outstanding memory bandwidth</a> in vSphere virtual machines.  No one should be surprised by this&#8211;virtual machine memory bandwidth has rarely been a problem.  But Jeff did discuss a advanced configuration parameter that should pique everyone&#8217;s curiosity: NUMA.preferHT.</p>
<p><span id="more-647"></span>Hyper-threading presents an interesting dilemma to any software running on Nehalem-based processors.  For some multithreaded workloads, an operating system scheduler can spread threads across multiple NUMA nodes or co-locate them to a single node.  Consider the following figure, which depicts a single 8-way virtual machine being scheduled to all of the eight physical cores on a server.</p>
<div id="attachment_664" class="wp-caption aligncenter" style="width: 282px"><a href="http://vpivot.com/wp-content/uploads/2010/09/Screen-shot-2010-09-17-at-2.12.25-PM.png"><img class="size-full wp-image-664" title="8-way Virtual Machine Using Eight Cores" src="http://vpivot.com/wp-content/uploads/2010/09/Screen-shot-2010-09-17-at-2.12.25-PM.png" alt="" width="272" height="287" /></a><p class="wp-caption-text">This figure depicts the eight vCPUs of a single virtual machine being spared across two NUMA nodes&#39; eight cores.</p></div>
<p>In this case the threads (vCPUs for vSphere) are each given their own physical core.  The benefit is that the vCPUs get unfettered access to their physical cores and the resulting additional computational power.  The drawback is that common memory is remote for half the vCPUs and will have to go through the other NUMA node.  This means memory-intensive workloads might run slower.</p>
<div id="attachment_665" class="wp-caption aligncenter" style="width: 366px"><a href="http://vpivot.com/wp-content/uploads/2010/09/Screen-shot-2010-09-17-at-2.12.35-PM.png"><img class="size-full wp-image-665" title="8-way Virtual Machine Using Four Cores" src="http://vpivot.com/wp-content/uploads/2010/09/Screen-shot-2010-09-17-at-2.12.35-PM.png" alt="" width="356" height="286" /></a><p class="wp-caption-text">This figure depicts the eight vCPUs of a single virtual machine being consolidated to one NUMA node&#39;s four cores.</p></div>
<p>This second configuration places the same virtual machine&#8217;s eight vCPUs on a single NUMA node.  This means physical cores are shared but all memory access is local.  The vCPUs are contending for fewer CPU cycles, although they are benefiting from Hyper-threading.  This will result in less computational power than dedicated physical cores.  On the other hand, assuming the virtual machine was sized to fit in a single node, 100% of memory access will go to fast, local memory.  This could produce better performance for memory intensive workloads.</p>
<p>vSphere will prefer to spread virtual CPUs across NUMA nodes (option one above) to gain the benefit of more physical cores.  But if you are running an application where memory throughput is more important than processor speed, you should consider testing a change vSphere&#8217;s default behavior.  You can do this by setting the ESX 4.1  advanced parameter NUMA.preferHT to 1.  This will configure the scheduler to prefer consolidating threads on logical processors on a single NUMA instead of using more physical cores across multiple nodes.</p>
<p>It would be nice if VMware provided definitive guidance on when virtual machines should be configured to prefer more physical cores (the default setting) or local memory access (NUMA.preferHT=1).  But this guidance would be dependent on application, CPU, virtual machine size, consolidation ratios and utilization.  The complexity of this guidance likely means that we will not see an authoritative word on this any time soon.  But that does not stop you from experimenting on your own and sharing results.  I would love to see any results of experiments posted here.</p>
]]></content:encoded>
			<wfw:commentRss>http://vpivot.com/2010/09/13/optimizing-vsphere-for-hyper-threading/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>vSphere 4.0, Hyper-Threading, and Terminal Services</title>
		<link>http://vpivot.com/2010/03/17/vsphere-4-0-hyper-threading-and-terminal-services/</link>
		<comments>http://vpivot.com/2010/03/17/vsphere-4-0-hyper-threading-and-terminal-services/#comments</comments>
		<pubDate>Wed, 17 Mar 2010 22:23:28 +0000</pubDate>
		<dc:creator>drummonds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[benchmarking]]></category>
		<category><![CDATA[hyper-threading]]></category>
		<category><![CDATA[intel]]></category>
		<category><![CDATA[scheduler]]></category>
		<category><![CDATA[terminal services]]></category>
		<category><![CDATA[vsphere]]></category>

		<guid isPermaLink="false">http://vpivot.com/?p=333</guid>
		<description><![CDATA[I recently wrote a blog article detailing Hyper-Threading (HT) and its effect on vSphere.  An astute reader pointed out, a recent update to Project VRC&#8216;s terminal services analysis suggests disappointment with HT on vSphere.  We spent a lot of time looking at those results to understand why they contradicted the body of performance data, which [...]]]></description>
			<content:encoded><![CDATA[<p>I recently wrote <a href="http://vpivot.com/2010/03/06/hyper-threading-on-vsphere/">a blog article detailing Hyper-Threading (HT) and its effect on vSphere</a>.  An astute reader pointed out, a recent update to <a href="http://www.virtualrealitycheck.net/">Project VRC</a>&#8216;s terminal services analysis suggests disappointment with HT on vSphere.  We spent a lot of time looking at those results to understand why they contradicted the body of performance data, which show HT offering 10-30% gain on vSphere. What we discovered led us to create a vSphere patch that would allow users to improve performance in some benchmarking environments.</p>
<p><span id="more-333"></span>Among the many results presented by VRC, the configurations that most perplexed us were the two and four virtual machine configurations, each with four vCPUs per virtual machine.  The configuration with two virtual machines looked good and matched our internal numbers.  In this configuration there are a total of eight vCPUs on the host which maps each to its own physical core on the Xeon 5500 series processor.  The problem arose when the virtual machine count was increased to four, resulting in 16 total vCPUs.  In this configuration each vCPU is paired with one logical, Hyper-Threaded core.  Project VRC showed this configuration supporting no more desktops than the two-VM configuration, which suggests no value to Hyper-Threading on this configuration.</p>
<p>It took us some time to understand the reason for these results, but we eventually identified a very specific condition where ESX&#8217;s scheduler enforces fairness in scheduling vCPUs at at cost of throughput.  ESX&#8217;s scheduler has long be subject of the intensive scrutiny of a large number of VMware engineers to guarantee fair access to the processor for each virtual machine.  It is because of this fairness that VMware&#8217;s customers can rely on CPU resource controls.  But, when fairness goes too far, throughput may be sub-optimal.</p>
<p>Hyper-Threading presents particular problems to fairness because of the non-linear performance it delivers.  A thread will run at one speed when it has full access to a physical core, at another speed when it is sharing a core, and at third speed when sharing a core with a different thread.  As a result, ESX&#8217;s scheduler will sometimes pause a thread to enforce fairness.  These pauses are more common when Hyper-Threading is present to account for its lack of uniformity in thread performance.  If the host lacks vCPUs that are ready to run, the result is CPU utilization below saturation, leaving CPU cycles unused.</p>
<p>There are three specific conditions that can excite this condition:</p>
<ol>
<li>A Xeon 5500 series processor is present with Hyper-Threading enabled,</li>
<li>CPU utilization is near saturation, and</li>
<li>A roughly one-to-one mapping between vCPUs and logical processors.</li>
</ol>
<p>In this scenario, VMware vSphere favors fairness over throughput and sometimes pauses one vCPU to dedicate a whole core to another vCPU, eliminating gains provided by Hyper-Threading.  In cases outside of these three conditions, the performance of VMware vSphere 4 meets the high expectations of VMware&#8217;s R&amp;D team and its customers.  Of course production environments rarely (never?) have a one-to-one ratio of vCPUs to logical processors.  This occurs when there are only four 4-way virtual machines on a Xeon 5500 system, for example.</p>
<p>But environments such as Project VRC&#8217;s are simplifications of production environments meant to understand the capabilities of virtual platforms.  VMware has provided a patch to Project VRC that will allow them to improve throughput in their environment.  We are going to release this patch and its documentation to the general public within a couple of weeks.  I do not expect that any of VMware&#8217;s customers will benefit from the changes is allows, but I will later document the patch and its usage for anyone that cares to experiment.</p>
]]></content:encoded>
			<wfw:commentRss>http://vpivot.com/2010/03/17/vsphere-4-0-hyper-threading-and-terminal-services/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Hyper-Threading on vSphere</title>
		<link>http://vpivot.com/2010/03/06/hyper-threading-on-vsphere/</link>
		<comments>http://vpivot.com/2010/03/06/hyper-threading-on-vsphere/#comments</comments>
		<pubDate>Sat, 06 Mar 2010 18:05:38 +0000</pubDate>
		<dc:creator>drummonds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[cpu]]></category>
		<category><![CDATA[hyper-threading]]></category>
		<category><![CDATA[intel]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[scheduler]]></category>
		<category><![CDATA[vmkernel]]></category>
		<category><![CDATA[vmmark]]></category>
		<category><![CDATA[vsphere]]></category>

		<guid isPermaLink="false">http://vpivot.com/?p=328</guid>
		<description><![CDATA[I continue to receive many questions from our customers on the expected performance gains of the new version of Hyper-Threading in Intel&#8217;s Core i7 processors. The answer requires a little bit of discussion on Hyper-Threading, a little bit on ESX, and comes with some performance data. If you are still interested, read on. On VI3, [...]]]></description>
			<content:encoded><![CDATA[<p>I continue to receive many questions from our customers on the expected performance gains of the new version of Hyper-Threading in Intel&#8217;s Core i7 processors.  The answer requires a little bit of discussion on Hyper-Threading, a little bit on ESX, and comes with some performance data.  If you are still interested, read on.</p>
<p><span id="more-328"></span>On VI3, many of VMware&#8217;s customers disabled Hyper-Threading on their older, Netburst architecture Intel processors.  Intel has vaguely described the new Hyper-Threading as more efficient than the previous generation and I believe this to be due to a shorter pipeline and an improved ability to context switch pipeline stage data.  Long pipelines&#8211;such as the Netburst era Xeons of model numbers x1xx and x2xx&#8211;are more likely to suffer bubbles during context switches and are therefore penalized versus shorter pipeline products, such as the Core i7.  Furthermore, by pushing and restoring pipeline stage data during a hardware context switch, the new HT can reduce pipeline bubbles.</p>
<p>But the gains vSphere users experience as a result of the new Hyper-Threading also comes from changes in ESX.  ESX&#8217;s scheduler must make decisions as to when to co-locate two worlds on a physical core to take advantage of Hyper-Threading.  In some conditions the scheduler will perform this co-location and in others it will allow a world to run on the core by itself.  The decision to execute worlds concurrently instead of serially on a physical core can be informally called the scheduler&#8217;s <em>trust</em> of Hyper-Threading.  The vSphere scheduler <em>trusts</em> Hyper-Threading more than the VI3 scheduler did.  This amplifies the effect of HT.</p>
<p>I am now going to bore you with a disclaimer before I give you any data showing the effect of Hyper-Threading.  The value of HT will vary from workload to workload and the ultimate authority of HT&#8217;s value is the end-user.  The following numbers are the result of informal analysis and VMware that should only be used as a guide in your own analysis.  Please do not make purchasing decisions on this information, which is devoid of the detail we would normally commit to a white paper.</p>
<table id="newspaper-a">
<tbody>
<tr>
<th>Workload</th>
<th>Observed Throughput Gain Due to HT</th>
</tr>
<tr>
<td>VMmark</td>
<td>24%</td>
</tr>
<tr>
<td>SPECjbb</td>
<td>10%</td>
</tr>
<tr>
<td>Consolidated SQL</td>
<td>19%</td>
</tr>
</tbody>
</table>
<p>In addition to the gains we informally cite here, I can say that we have not yet seen a workload where the new Hyper-Threading slows down consolidated performance.  As far as we can tell, the new Hyper-Threading should be left enabled in 100% of virtualized environments.</p>
]]></content:encoded>
			<wfw:commentRss>http://vpivot.com/2010/03/06/hyper-threading-on-vsphere/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
	</channel>
</rss>

