<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>vPivot &#187; intel</title>
	<atom:link href="http://vpivot.com/tag/intel/feed/" rel="self" type="application/rss+xml" />
	<link>http://vpivot.com</link>
	<description>Scott Drummonds on Virtualization</description>
	<lastBuildDate>Wed, 01 Feb 2012 06:46:55 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>A Performance Tip for ESX 3.0 and ESX 3.5</title>
		<link>http://vpivot.com/2010/04/20/a-performance-tip-for-esx-3-0-and-esx-3-5/</link>
		<comments>http://vpivot.com/2010/04/20/a-performance-tip-for-esx-3-0-and-esx-3-5/#comments</comments>
		<pubDate>Tue, 20 Apr 2010 22:37:28 +0000</pubDate>
		<dc:creator>drummonds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[amd]]></category>
		<category><![CDATA[intel]]></category>
		<category><![CDATA[monitor]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://vpivot.com/?p=415</guid>
		<description><![CDATA[Do you have any running instances of ESX 3.5 or older?  Are those instances running on processors that are no more than a couple of years old?  If so, I have a tip for you: update your hosts to ESX 4.0. Seriously, upgrade to vSphere already. It&#8217;s been out for a year! All kidding aside, [...]]]></description>
			<content:encoded><![CDATA[<p>Do you have any running instances of ESX 3.5 or older?  Are those instances running on processors that are no more than a couple of years old?  If so, I have a tip for you: update your hosts to ESX 4.0.  Seriously, upgrade to vSphere already.  It&#8217;s been out for a year!</p>
<p><span id="more-415"></span></p>
<p>All kidding aside, yesterday I wrote an article about a sneaky trick to leverage improved hardware assist performance on ESX 3.5 virtual machines that defaulted to binary translation.  I learned very late in the day that the recommended guidance only works for AMD processors.  The text below is from the original post but has since been updated to reflect my newly discovered information.</p>
<h2>Begin Updated Article</h2>
<p>Do you have any running instances of ESX 3.5 or older?  Are those instances running on AMD processors that are no more than a couple of years old?  If so, I have a tip for you: force hardware assist in those virtual machines.  In most situations application performance will improve by 10% or more.  Details follow.</p>
<p>ESX&#8217;s monitor presents virtual hardware to virtual machines&#8217; guest operating systems.  VMware&#8217;s multi-mode monitor uses three technologies to do this: hardware assist, para-virtualization, and binary translation.  Hardware assist has gotten much faster over the years, as this figure demonstrates.</p>
<div id="attachment_418" class="wp-caption alignnone" style="width: 570px"><a href="http://vpivot.com/wp-content/uploads/2010/04/vmexit_latency.png"><img class="size-full wp-image-418" title="VMEXIT Latencies" src="http://vpivot.com/wp-content/uploads/2010/04/vmexit_latency.png" alt="" width="560" height="294" /></a><p class="wp-caption-text">The latency of the VMEXIT instruction is shown on Intel VT systems.  The longer this instruction takes to execute, the worse the virtual machine performs.</p></div>
<p>Johan De Gelas included his take on monitor mode performance when he reported &#8220;Virtualization Round Trip Latency&#8221; in a <a href="http://it.anandtech.com/show/2964/the-intel-xeon-5670-six-improved-cores">recent article on the new Xeon 5600</a>.  His results reiterate the trend I have been sharing with my audiences for over a year now.</p>
<p>Because hardware assist was once so slow, older versions of ESX would utilize our faster-performing binary translation in many situations.  But virtualization assist in today&#8217;s processors&#8211;and here I am talking about Intel and AMD processors manufactured in the past two years&#8211;is generally faster than binary translation. This means your virtual machines running on ESX 3.5 on shiny new processors may not be reaching their full potential performance.</p>
<p>The fix is simple: force hardware assist for your ESX 3.0 and ESX 3.5 virtual machines running on newer AMD processors.  You can do this with the following lines in your virtual machines&#8217; VMX files:</p>
<blockquote><p>monitor.virtual_mmu = hardware</p></blockquote>
<p>A reboot is then required.  But this setting only works for AMD processors, where RVI (AMD&#8217;s hardware memory management unit) is available.  With this setting both AMD-V and RVI are forced on.  This setting is ignored on Intel processors, whose hardware MMU is not leveraged on ESX 3.5.</p>
<p>These changes are not needed with vSphere because its default monitor modes favor hardware assist more than ESX 3.5 and earlier.  You can see vSphere&#8217;s default monitor modes in our wonderful <a href="http://www.vmware.com/files/pdf/perf-vsphere-monitor_modes.pdf">monitor modes paper</a>.</p>
<p>As one example of the magnitude of performance increase this small change can produce, look to the <a href="http://www.vmware.com/files/pdf/perf_vsphere_sql_scalability.pdf">SQL Server performance paper</a> we released last year.  Here is one graph I lifted from that document.</p>
<p><a href="http://vpivot.com/wp-content/uploads/2010/04/monitor_performance.png"><img class="alignnone size-full wp-image-419" title="SQL Server Performance of Different Monitor Modes" src="http://vpivot.com/wp-content/uploads/2010/04/monitor_performance.png" alt="" width="584" height="333" /></a></p>
<p>This figure shows AMD-V improving performance by 18% over binary translation on virtual machines running SQL Server 2008.  More gain is possible when the hardware memory management unit is utilized.</p>
<p>Because of ESX 3.5&#8242;s continued wide deployment, it may very well be running millions of virtual machines.  Many those virtual machines are running on newer AMD processors that can benefit from this change.  Go forth and reconfigure those virtual machines to claim the performance to which you are entitled!</p>
]]></content:encoded>
			<wfw:commentRss>http://vpivot.com/2010/04/20/a-performance-tip-for-esx-3-0-and-esx-3-5/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>vSphere 4.0, Hyper-Threading, and Terminal Services</title>
		<link>http://vpivot.com/2010/03/17/vsphere-4-0-hyper-threading-and-terminal-services/</link>
		<comments>http://vpivot.com/2010/03/17/vsphere-4-0-hyper-threading-and-terminal-services/#comments</comments>
		<pubDate>Wed, 17 Mar 2010 22:23:28 +0000</pubDate>
		<dc:creator>drummonds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[benchmarking]]></category>
		<category><![CDATA[hyper-threading]]></category>
		<category><![CDATA[intel]]></category>
		<category><![CDATA[scheduler]]></category>
		<category><![CDATA[terminal services]]></category>
		<category><![CDATA[vsphere]]></category>

		<guid isPermaLink="false">http://vpivot.com/?p=333</guid>
		<description><![CDATA[I recently wrote a blog article detailing Hyper-Threading (HT) and its effect on vSphere.  An astute reader pointed out, a recent update to Project VRC&#8216;s terminal services analysis suggests disappointment with HT on vSphere.  We spent a lot of time looking at those results to understand why they contradicted the body of performance data, which [...]]]></description>
			<content:encoded><![CDATA[<p>I recently wrote <a href="http://vpivot.com/2010/03/06/hyper-threading-on-vsphere/">a blog article detailing Hyper-Threading (HT) and its effect on vSphere</a>.  An astute reader pointed out, a recent update to <a href="http://www.virtualrealitycheck.net/">Project VRC</a>&#8216;s terminal services analysis suggests disappointment with HT on vSphere.  We spent a lot of time looking at those results to understand why they contradicted the body of performance data, which show HT offering 10-30% gain on vSphere. What we discovered led us to create a vSphere patch that would allow users to improve performance in some benchmarking environments.</p>
<p><span id="more-333"></span>Among the many results presented by VRC, the configurations that most perplexed us were the two and four virtual machine configurations, each with four vCPUs per virtual machine.  The configuration with two virtual machines looked good and matched our internal numbers.  In this configuration there are a total of eight vCPUs on the host which maps each to its own physical core on the Xeon 5500 series processor.  The problem arose when the virtual machine count was increased to four, resulting in 16 total vCPUs.  In this configuration each vCPU is paired with one logical, Hyper-Threaded core.  Project VRC showed this configuration supporting no more desktops than the two-VM configuration, which suggests no value to Hyper-Threading on this configuration.</p>
<p>It took us some time to understand the reason for these results, but we eventually identified a very specific condition where ESX&#8217;s scheduler enforces fairness in scheduling vCPUs at at cost of throughput.  ESX&#8217;s scheduler has long be subject of the intensive scrutiny of a large number of VMware engineers to guarantee fair access to the processor for each virtual machine.  It is because of this fairness that VMware&#8217;s customers can rely on CPU resource controls.  But, when fairness goes too far, throughput may be sub-optimal.</p>
<p>Hyper-Threading presents particular problems to fairness because of the non-linear performance it delivers.  A thread will run at one speed when it has full access to a physical core, at another speed when it is sharing a core, and at third speed when sharing a core with a different thread.  As a result, ESX&#8217;s scheduler will sometimes pause a thread to enforce fairness.  These pauses are more common when Hyper-Threading is present to account for its lack of uniformity in thread performance.  If the host lacks vCPUs that are ready to run, the result is CPU utilization below saturation, leaving CPU cycles unused.</p>
<p>There are three specific conditions that can excite this condition:</p>
<ol>
<li>A Xeon 5500 series processor is present with Hyper-Threading enabled,</li>
<li>CPU utilization is near saturation, and</li>
<li>A roughly one-to-one mapping between vCPUs and logical processors.</li>
</ol>
<p>In this scenario, VMware vSphere favors fairness over throughput and sometimes pauses one vCPU to dedicate a whole core to another vCPU, eliminating gains provided by Hyper-Threading.  In cases outside of these three conditions, the performance of VMware vSphere 4 meets the high expectations of VMware&#8217;s R&amp;D team and its customers.  Of course production environments rarely (never?) have a one-to-one ratio of vCPUs to logical processors.  This occurs when there are only four 4-way virtual machines on a Xeon 5500 system, for example.</p>
<p>But environments such as Project VRC&#8217;s are simplifications of production environments meant to understand the capabilities of virtual platforms.  VMware has provided a patch to Project VRC that will allow them to improve throughput in their environment.  We are going to release this patch and its documentation to the general public within a couple of weeks.  I do not expect that any of VMware&#8217;s customers will benefit from the changes is allows, but I will later document the patch and its usage for anyone that cares to experiment.</p>
]]></content:encoded>
			<wfw:commentRss>http://vpivot.com/2010/03/17/vsphere-4-0-hyper-threading-and-terminal-services/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Hyper-Threading on vSphere</title>
		<link>http://vpivot.com/2010/03/06/hyper-threading-on-vsphere/</link>
		<comments>http://vpivot.com/2010/03/06/hyper-threading-on-vsphere/#comments</comments>
		<pubDate>Sat, 06 Mar 2010 18:05:38 +0000</pubDate>
		<dc:creator>drummonds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[cpu]]></category>
		<category><![CDATA[hyper-threading]]></category>
		<category><![CDATA[intel]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[scheduler]]></category>
		<category><![CDATA[vmkernel]]></category>
		<category><![CDATA[vmmark]]></category>
		<category><![CDATA[vsphere]]></category>

		<guid isPermaLink="false">http://vpivot.com/?p=328</guid>
		<description><![CDATA[I continue to receive many questions from our customers on the expected performance gains of the new version of Hyper-Threading in Intel&#8217;s Core i7 processors. The answer requires a little bit of discussion on Hyper-Threading, a little bit on ESX, and comes with some performance data. If you are still interested, read on. On VI3, [...]]]></description>
			<content:encoded><![CDATA[<p>I continue to receive many questions from our customers on the expected performance gains of the new version of Hyper-Threading in Intel&#8217;s Core i7 processors.  The answer requires a little bit of discussion on Hyper-Threading, a little bit on ESX, and comes with some performance data.  If you are still interested, read on.</p>
<p><span id="more-328"></span>On VI3, many of VMware&#8217;s customers disabled Hyper-Threading on their older, Netburst architecture Intel processors.  Intel has vaguely described the new Hyper-Threading as more efficient than the previous generation and I believe this to be due to a shorter pipeline and an improved ability to context switch pipeline stage data.  Long pipelines&#8211;such as the Netburst era Xeons of model numbers x1xx and x2xx&#8211;are more likely to suffer bubbles during context switches and are therefore penalized versus shorter pipeline products, such as the Core i7.  Furthermore, by pushing and restoring pipeline stage data during a hardware context switch, the new HT can reduce pipeline bubbles.</p>
<p>But the gains vSphere users experience as a result of the new Hyper-Threading also comes from changes in ESX.  ESX&#8217;s scheduler must make decisions as to when to co-locate two worlds on a physical core to take advantage of Hyper-Threading.  In some conditions the scheduler will perform this co-location and in others it will allow a world to run on the core by itself.  The decision to execute worlds concurrently instead of serially on a physical core can be informally called the scheduler&#8217;s <em>trust</em> of Hyper-Threading.  The vSphere scheduler <em>trusts</em> Hyper-Threading more than the VI3 scheduler did.  This amplifies the effect of HT.</p>
<p>I am now going to bore you with a disclaimer before I give you any data showing the effect of Hyper-Threading.  The value of HT will vary from workload to workload and the ultimate authority of HT&#8217;s value is the end-user.  The following numbers are the result of informal analysis and VMware that should only be used as a guide in your own analysis.  Please do not make purchasing decisions on this information, which is devoid of the detail we would normally commit to a white paper.</p>
<table id="newspaper-a">
<tbody>
<tr>
<th>Workload</th>
<th>Observed Throughput Gain Due to HT</th>
</tr>
<tr>
<td>VMmark</td>
<td>24%</td>
</tr>
<tr>
<td>SPECjbb</td>
<td>10%</td>
</tr>
<tr>
<td>Consolidated SQL</td>
<td>19%</td>
</tr>
</tbody>
</table>
<p>In addition to the gains we informally cite here, I can say that we have not yet seen a workload where the new Hyper-Threading slows down consolidated performance.  As far as we can tell, the new Hyper-Threading should be left enabled in 100% of virtualized environments.</p>
]]></content:encoded>
			<wfw:commentRss>http://vpivot.com/2010/03/06/hyper-threading-on-vsphere/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Newer Processors and Virtualization Performance</title>
		<link>http://vpivot.com/2009/09/16/newer-processors-and-virtualization-performance/</link>
		<comments>http://vpivot.com/2009/09/16/newer-processors-and-virtualization-performance/#comments</comments>
		<pubDate>Wed, 16 Sep 2009 20:08:33 +0000</pubDate>
		<dc:creator>drummonds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[amd]]></category>
		<category><![CDATA[cpu]]></category>
		<category><![CDATA[ept]]></category>
		<category><![CDATA[intel]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[monitor]]></category>
		<category><![CDATA[rvi]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[vmkernel]]></category>

		<guid isPermaLink="false">http://vpivot.com/?p=18</guid>
		<description><![CDATA[[New content has been added to this is an update to an old article from the performance community.] Newer processors are much more important to virtualized environments than the non-virtualized counterpart. Generational improvements have not just increased the raw compute power, they have also reduced virtualization overheads. This blog entry will describe three key changes [...]]]></description>
			<content:encoded><![CDATA[<p><em>[New content has been added to this is an update to an <a href="http://communities.vmware.com/blogs/drummonds/2009/06/02/newer-processors-and-virtualization-performance">old article from the performance community</a>.]</em></p>
<p>Newer processors are much more important to virtualized environments than the non-virtualized counterpart. Generational improvements have not just increased the raw compute power, they have also reduced virtualization overheads.  This blog entry will describe three key changes that have particularly impacted virtual performance.</p>
<h2><span id="more-18"></span>Hardware Assist Is Faster</h2>
<p>In 2008, with the launch of the Opteron 1300, 2300 and 8300 parts, AMD became the first CPU vendor to produce a hardware memory management unit equipped to support virtualization.  They called this technology Rapid Virtualization Indexing (RVI).  This year Intel did the same with Extended Page Tables (EPT) on its Xeon 5500 line.  Both vendors have been providing the ability to virtualize privileged instructions since 2006, with continually improving results.  Consider the following graph showing the latency of one key instruction from Intel:</p>
<p><img class="jive-image-thumbnail jive-image" src="http://communities.vmware.com/servlet/JiveServlet/downloadImage/38-3171-5926/vmexit_latencies.png" alt="vmexit_latencies.png" width="620" /></p>
<p>This instruction, VMEXIT, is called each time the guest exits to the kernel.  The graph shows its latency (delay) in completing this instruction, which represents a wait time incurred by the guest.  Clearly Intel has made great strides in reducing VMEXIT&#8217;s wait time from its Netburst parts (Prescott and Cedar Mill) to its Core architecture (Merom and Penryn) and on to its current generation, Core i7 (Nehalem).  AMD processors have shown commensurate gains with AMD-V.</p>
<p>In a recent <a href="http://www.vmware.com/files/pdf/perf_vsphere_sql_scalability.pdf">white paper detailing SQL Server on vSphere</a>, the following graph showed the gains derived by using AMD-V in the Opteron 8324 (Shanghai).</p>
<div id="attachment_33" class="wp-caption alignnone" style="width: 609px"><img class="size-full wp-image-33" title="Monitor Mode and SQL Server Performance" src="http://vpivot.com/wp-content/uploads/2009/06/picture-3.png" alt="Binary translation, AMD-V, and AMV-V plus RVI are measured using SQL Server." width="599" height="343" /><p class="wp-caption-text">Binary translation, AMD-V, and AMV-V plus RVI are measured using SQL Server.</p></div>
<p>This graph shows the practical value of the great gains that CPU manufacturers have made with virtualization assist.  Hardware assist can now be regularly relied upon for great performance.</p>
<h2>Pipelines Are Shorter</h2>
<p>The longest pipelines in the x86 world were in Intel&#8217;s Netburst processors.  These processor&#8217;s pipelines had twice as many stages at their counterparts at AMD and twice as many as the generation of Intel CPUs that followed.  The increased pipeline length would have enabled support for 8 GHz silicon, had it arrived.  Instead, silicon switching speeds hit a wall at 4 GHz and Intel (and its customers) were forced to suffer the drawbacks of large pipelines.</p>
<p>Large pipelines are not necessarily a problem for desktop environments, where single threaded applications used to dominate the market.  But in the enterprise, application thread counts were larger.  Furthermore, consolidation in virtual environments drove thread counts even higher.  With more contexts in the processor, the number of pipeline stalls and flushes increased, and efficiency fell.</p>
<p>Because of decreased efficiency of consolidated workloads on processors with long pipelines, VMware has often recommended that performance-intensive VMs be run on processors no older than 2-3 years.  This excludes Intel&#8217;s Netburst parts.  VI3 and vSphere will do a fine job at virtualizing your less-demanding applications on any supported processors.  But you should use newer parts for applications that hold your highest performance expectations.</p>
<h2>Caches Are Larger</h2>
<p>A cache is highly effective when it fully contains the software&#8217;s working set.  The addition from the hypervisor of even a small about of code will change the working set and reduce cache hit rate.  I&#8217;ve attempted to illustrate this concept with the following simplified view of the relationship between cache hit rates, application working set, and cache sizes:</p>
<div id="attachment_34" class="wp-caption alignnone" style="width: 610px"><img class="size-full wp-image-34" title="Cache Size, Working Set, and Performance" src="http://vpivot.com/wp-content/uploads/2009/06/cache_size_perf.png" alt="Performance drops with small cache systems for even small increases to working set size." width="600" height="400" /><p class="wp-caption-text">Performance drops with small cache systems for even small increases to working set size.</p></div>
<p>This graph is based on a model that greatly simplifies working sets and the hypervisor&#8217;s impact on them.  Assuming that ESX increases the working set by 256 KB, this graph shows the decrease cache hit rate due to the contributions of the hypervisor.  Notice that with very small caches and very small application working sets, the cache hit rate suffers greatly due to the addition of even 256 KB of virtualization code.  And even up to 2 MB, a 10% decrease in cache hit rate can be seen in some applications.  With a 256 KB contribution by the kernel, cache hit rates do not change significantly with cache sizes of 4 MB and beyond.</p>
<p>In some cases a 10% improvement in cache hit rate can double application throughput.  This means that a doubling of cache size can profoundly effect the performance of virtual applications as compared to native.  Given ESX&#8217;s small contribution to the working set, you can see why we at VMware recommend that customers run their performance-intensive workloads on CPUs with 4 MB caches or larger.</p>
]]></content:encoded>
			<wfw:commentRss>http://vpivot.com/2009/09/16/newer-processors-and-virtualization-performance/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

