Rate this page del.icio.us  Digg slashdot StumbleUpon

Red Hat Enterprise Linux 5 virtualization on HP DL585: AMD Barcelona with Rapid Virtualization Indexing

by Sanjay Rao

This article is a follow-up to Red Hat Enterprise Linux 5.1 utilizes nested paging on AMD Barcelona Processor to improve performance of virtualized guests.

With new hardware releases, customers are faced with situations in which they want to take advantage of increased speeds but are forced to stay on older hardware because their operating environments are not supported on the newer hardware. Virtualizing their operating environment helps them get past this issue. Virtualization also helps them:

  • Consolidate hardware to
    • improve utilization
    • reduce floor space requirements
    • reduce power consumption
  • Take advantage of hardware speed up without having to upgrade the software environment
  • Reduce downtime for upgrades
  • Create development and test environments

RHEL 5 virtualization lets customers virtualize their existing systems and take advantage of the benefits mentioned above.

There are two modes to virtualize servers:

  • Para-virtualized servers. In this mode, the virtualized server has direct access to the hardware/hypervisor and delivers performance close to bare metal. Systems running Red Hat Enterprise Linux 4.5 and newer can be deployed as para-virtualized guests.
  • Fully-virtualized servers. In this mode, the virtualized server interacts with the hypervisor through a hardware abstraction layer. The hypervisor presents its hardware as generic hardware through the abstraction layer so most operating systems can be run in a virtual mode on it. However, all the I/O, network, and memory requests from the guest have to be translated by the hypervisor. This translation results in very poor performance by systems deployed as fully virtualized guests.

The I/O and network performance on fully virtualized guests can be improved by implementing para-virtualized drivers within these guests.

But memory access can still be an issue, particularly with process-based applications where processes need to modify virtual memory. Specifically, the guest operating system (guest OS) in the virtual machine creates page tables that translate virtual memory addresses to guest pseudo-physical addresses, which means that older CPUs cannot directly use them. To deal with that limitation, the hypervisor has to create so-called shadow page tables, which translate the same virtual memory addresses to real physical addresses. Maintaining these shadow page tables can cause performance and scalability problems with certain workloads. One particular cause of performance issues is the fact that the hypervisor needs to intercept each page table write by the guest OS in order to keep the guest page tables and the shadow page tables in sync.

The AMD Barcelona processor has a feature called Rapid Virtualization Indexing (RVI) which allows the processor to directly use the virtual to pseudo-physical address page tables created by the guest OS. This works because the CPU translates these pseudo-physical addresses to real machine physical addresses using a second set of page tables, which define this translation for each virtual machine. Because the CPU can do both of the memory address translations, no shadow page tables are required and the guest OS can alter its page tables directly, without trapping to the hypervisor. Context switches in the guest OS also benefit from RVI technology because the shadow page table gets flushed with most context switches on the guest and there is a cost incurred to re-populate it.

All systems running Red Hat Enterprise Linux 4.4 or older have to be run as fully virtualized guests, as do all other operating systems (e.g. Windows, Solaris, etc.).

A series of tests on RVI were carried out on an HP DL585 system using an OLTP workload in an Oracle database. Oracle was chosen as the database for the testing because it is a process-based workload, and it is also a widely used database. The testing was carried out with 8 and 16 CPUs to understand the effects of vertical scaling on fully virtualized guests and the benefits of RVI.

Hardware used for the testing

System: HP DL 585
Memory: 32 G
CPUs: 16 AMD Barcelona @ 2.3 GHz

Test results from 8 CPU testing

System 20U 40U 80U
Dom0 100.00 109.03 112.73
FV – no RVI or PV 11.11 6.56 4.26
FV – RVI 15.91 18.59 18.08
FV PV – no RVI 14.34 7.39 4.38
FV PV – RVI 74.35 85.07 89.98

For the purpose of comparison, the transactions per minute generated with 20 users was baselined at 100, and all other numbers are relative to that number. The table and the graph above show the remarkable advantage that PV drivers and RVI provide to a fully virtualized guest. Line 2 in the chart, which represents the FV guests without RVI or PV drivers, shows a huge drop-off in the transactions per minute relative to line 1, which shows the baseline of transactions per minute on Dom0. By turning on the RVI feature, some of the performance is regained. But without the PV drivers, the performance remains way below the mark, as shown on line 3. Line 4 shows that by adding PV drivers without RVI, the performance remains way off. Finally line 5 shows that by using RVI and PV drivers, the performance of the workload gets within 80% of Dom0.

Test Results from 16 CPU testing

System 20U 40U 80U
Dom0 100.00 113.07 115.97
FV – no RVI or PV 5.13 2.92 2.14
FV – RVI 12.94 11.34 11.26
FV PV – no RVI 4.22 2.22 1.64
FV PV – RVI 66.79 83.30 89.62

The results from the 16 CPU testing shows similar trends to the 8 CPU testing, but the delta gets wider as the user count is increased when RVI and PV drivers are not used.

Figure 3 below shows the comparison between the 8 and 16 CPU runs. All the numbers are relative to the 8 CPU numbers without PV drivers and RVI, which has been baselined at 1. The figure shows that as the guest size increases, the performance of the FV guest without RVI and PV drivers decreases. With RVI and PV drivers, performance inside the guest comes close to 80% of the performance on Dom0.

Comparison between 8 CPU and 16 CPU FV guests

Conclusion

From the results it is clear that when a process-based workload is run in a fully virtualized guest, performance takes a big hit due to the way the TLB information is maintained and the way I/O works inside the guest. But by adding PV drivers on an AMD Barcelona based system which uses RVI, performance aligns with Dom0 performance. With this feature, AMD lets customers take advantage of hardware speed-up without having to upgrade their software environments by letting them run their systems as fully virtualized guests on the Barcelona-based systems.

2 responses to “Red Hat Enterprise Linux 5 virtualization on HP DL585: AMD Barcelona with Rapid Virtualization Indexing”

  1. surendra chaudhary says:

    i am intrested in linux ……………
    thaks for supports

  2. rajpoo says:

    So far I’ve only found 1 dedicated Hardware Virtualization Appliance on the market (from 360is.com) boasting this kind of performance. Are there any similar alternative suppliers offering a dedicated appliance for virtual machines with this kind of high performance?

    RS.

Leave a reply