Memory Resource Management in VMware ESX Server
Carl Waldspurger. Memory Resource Management in VMware ESX Server in Proceedings of the 5th Symposium on Operating Systems Design and Implementation, 2002.
Reviews due Thursday, 10/16
« Scheduler Activations: Effective Kernel Support for the User-Level management of Parallelism | Main | Implementing Remote Procedure Calls »
Carl Waldspurger. Memory Resource Management in VMware ESX Server in Proceedings of the 5th Symposium on Operating Systems Design and Implementation, 2002.
Reviews due Thursday, 10/16
Comments
Summary: ESX virtual machine server is a thin layer designed to multiplex HW resources among VM. Its memory management uses an additional level of page indirection for VM virtualization. It overcommits memory and uses ballooning to reclaim memory, content-based page sharing, and 'shares' to pick revocable pages.
Problem: Memory is often underutilized when groups of VM run over a VMM. Overcommitment of memory is a possible solution to this. However overcommitment implies the need to reclaim memory from guest OSs. The VMM can make a high level decision on which VM to reduce memory, but the VMM lacks proper knowledge to select a victim from within a VM. An additional constraint is that guests must run unmodified.
Contributions: The paper describes three mechanisms that help overcommitment of memory:
1) The paper introduces ballooning, as a hook within a VM that forces the Guest VM's page replacement policy to select a victim for external VMM page needs.
2) They propose a low overhead content based page sharing among VMs.
3) For VMM high level decision they introduce 'shares' and 'idle memory tax'.
Performance techniques: Overcommitment means that the sum of memory allocated for each VM is greater than total machine memory. The ballooning driver footprint is small, by polling once/second. Page sharing is done on a best effort basis, rather then a complete scan for copies of all pages; also pages result in in hash collision are ignored. Hashed unshared pages are not marked COW, and at collision they are checked again if a possible match happens. The VMM use 'idle memory tax', rather than a pure share based mechanism.
Tradeoffs: When ballooning is not feasible, demand paging is enforced. In general, the big tradeoff here is between CPU cycles (overhead of both the VMM and guests) and memory consumptions.
Another part where techniques may apply:
Looks like the ballooning mechanism could be applied to CPU scheduling as well, where a driver (inside an idle VM) could yield it's share back to VMM. As they pointed out, the memory reduction obtained by sharing can be useful in general for a stand alone OS (although at a price in CPU cycles).
Weaknesses:
Ballooning itself implies a modification of the behavior of a guest OS. For instance performance tests running inside a VM may be affected by this.
Since ballooning is done by polling, there may be times when all guests may require more memory, and no guest is ready to yield it.
Posted by: Daniel Luchaup | October 16, 2008 09:26 AM
In this paper, the authors present VMware ESX Server, a thin software layer designed to multiplex hardware resources efficiently among virtual machines running unmodified commodity operating systems. More specifically, they describe several mechanisms for managing memory.
In virtual machines, multiple operating systems might be running with varying memory needs. The management of the memory is crucial as it can affect the performance of the whole system. The existing VMMs made modifications to the running operating systems. This can’t always be easy. ESX Server is designed to run existing operating systems without modification and simultaneously providing efficient mechanisms for memory management in order to improve the performance of the system.
In my opinion one of the main contributions of the paper is the ballooning technique, which is used to force a guest operating system to invoke its own memory management algorithms .It can also be used to free memory for general use within the OS. To achieve that a small balloon module is loaded into the guest OS as a pseudo-device driver or kernel service. As we can see, we don’t need to modify the guest OS in order to use this technique. Another contribution of the ESX Server is that it exploits sharing opportunities, so that server workloads running in VMs on a single machine often consume less memory than they would running on separate physical machines. More specifically it uses content-based sharing, that is it identifies page copies by their contents. To reduce the cost to compare the pages it uses hashing. Finally the paper introduces the idle memory tax to reclaim idle memory. More specifically, this technique solves the problem of idle clients with many shares who hoard memory unproductively, while active clients with few shares suffer under severe memory pressure.
Techniques used to achieve performance are ballooning, page remapping, shared page hashing, and an idle tax to maximize reclaim idle memory. These techniques can be used to applications, like databases, to help them manage their memory. The tradeoff of content-based sharing is its computation overhead (compare the contents of the pages) versus the usage of less memory.
Posted by: Avrilia Floratou | October 16, 2008 08:44 AM
Summary
The paper discusses the important mechanisms and policies employed by ESX server to efficiently manage memory resources. Content based page sharing to eliminate redundant copies of pages, ballooning mechanism to reclaim pages from a VM and idle memory tax to utilize memory efficiently without affecting performance isolation are presented as the key aspects of the ESX server's memory management polocies.
Problem attempted
The objective is to build a software layer for multiplexing hardware resources efficiently among virtual machines. Besides, this software layer must manage hardware directly without depending on any host OS.
Contributions
1. ESX server is designed such that no change is required in any guest OS code. This requires the ESX server to give the guest OS running in each VM the illusion that it has a zero-based physical address space. ESX adds an additional level of address translation to achieve this. Specifically, it maintains a pmap data structure for each VM to translate physical page numbers to machine page numbers.
2. ESX uses the ballooning technique to achieve the twin benefits of reclaiming the least valuable page and doing this without getting the explicit knowledge from OS about the pages it considers less important. This is achieved by causing an intended memory pressure in the OS by using a balloon driver - a pseudo device driver. This forces the guest OS to invoke its native memory management algorithms to swap out pages it considers least important.
3. Pages that have the same content are shared so that any dependence on guest OS to decide on shareable pages is eliminated. The high cost involved in comparing pages for content is brought down by using a hash value for each page and comparing the contents of two pages only if their hash values match. But the cost involved in scanning for copies remains and this is reduced by scanning during idle cycles.
4. Maintaining performance isolation in general has always meant poor memory utilization as idle clients could hold memory unproductively. The paper comes up with the idea of idle memory tax to deal this problem. The tax rate is a configurable parameter specifying the maximum fraction of idle pages that could be reclaimed from a client. This method charges a client more for an idle page than for a page activly used. The ESX serrver uses statistical sampling to measure the fraction of memory in active use in a VM thus giving the ability to control the sampling rate depending on desired accruacy and performance.
Flaws
CPU overhead in scanning pages for identifying identical copies has been claimed to be low but not sufficient data has been provided to support the claim. I believe that CPU overhead will be quite significant in doing this.
Tradeoffs
The paper rather than just setting a mechanism for deciding on a trade-off gives parameters to effectively decide the trade-off based on current performance. The sampling rate, the idle memory tax are parameters that can be dynamically adjusted to desirable level.
Where else it can be used
The technique of statistical sampling can be used by an OS whenever it needs to make an informed decision without unduly affecting performance. Decisions seldom depend on perfect values and hence information collected through statistical sampling is good enough to achieve the desired result.
Content based sharing indicates the opportunity to avoid redundant copies even within a single VM and this can be used by general Operating systems to reduce the redundant copies maintained.
What it makes easier
Experiments reported suggest a surprisingly large percentage of memory gained through content based page sharing. This approach also does not entail any change to guest OS thus making it a very generic approach to page sharing.
Posted by: Balasubramanian Sivan | October 16, 2008 07:57 AM
Memory resource Management in VMWare ESX Server
Summary
The paper presents novel techniques for optimized, isolated, dynamic memory management across Guest Operating systems in a VMWare ESX Server Virtual Machine Monitor(VMM) and evaluate their performance. The paper also combines these techniques to achieve a effective memory management solution for supporting overcommited memory workloads in VMM.
Description of the problem being solved
The paper is trying to solve the problem of providing a optimized, isolated, dynamic memory management solution, across Guest Operating systems in a VMM, which performs effectively for overcommited memory workloads.
Contributions of the paper
1) Using Balloon drivers/ kernel modules that integrate well with the Guest operating system policies ( e.g. extreme policies like application profiling to find least needed pages ) while determining the pages that can be reclaimed is a novel idea which performs effectively. Combining this with demand paging in cases where ballooning is not possible provides a complete solution.
2) Using Memory virtualization with content based copy-on-write page sharing mechanism to provide transparent page sharing (inspired from Disco) is again a effective optimization particularly useful for shared of zero pages. The idea of page remapping used to provide efficient I/O is also novel.
3) The concepts of Shares and working sets to dynamically partition and encourage active usage of the available memory resources among guest OSes is interesting in its anology to the financial world of offered prices and taxes and is novel.Using different sampling rates to achieve varied dynamic response rates to chaning memory needs in a VMM is also novel in its application to this problem and is very effective as shown by the results.
Flaws in the paper
1) I think the paper could have discussed about the effectiveness of dynamic reallocation alogrithm with more varied real world VMM settings to show that the algorithm is generic and always works.
Techniques used to achieve performance
1) Monitoring/Sampling of memory resources usage to dynamically reallocate resources to Guest OSes ( users of resources)
Tradeoff made
1) Taxing Guest OSes who pay less and dont actively use allocation resources to benefit Guest OSes that pay more or use resources actively
Another part of OS where this technique could be applied
I think this concept of minimal , effective monitoring/sampling ( with low performance overheads for the sampling tasks itself) to dynamically reallocate resources periodically among users to achive maximal usage can be applied to all kinds of resource management uses where resource needs/ usage varies with time. For eg. in a database service on the cloud used by many users with varies query loads and storage loads , similar techniques can be applied to manage physical storage(Hard disk) and compute (CPU) resources.
Posted by: Leo Prasath Arulraj | October 16, 2008 07:33 AM
Memory Resource Management in Vmware ESX Server
This paper describes Vmware ESX Server, a software layer that multiplexes hardware between existing Virtual Machines in a system. It virtualizes Intel IA-32 architecture and it supports Microsoft Windows 2000 Advance Server and Red Hat Linux. It explains the novel mechanisms introduced to efficiently share memory between virtual machines: ballooning, idle memory tax, content-based page sharing and hot I/O page remapping.
The goal of this research is to have an efficient sharing of resources between virtual machines in the same system. Lots of mechanisms are implemented with the purpose of ESX to be able to run unmodified OS. And lots of the design decisions are based on designing a systems that is as simple as possible.
A ballooning mechanism is implemented to force the OS to use less memory by using its own page replacement mechanisms in a transparent manner to the OS; so the ESX Server does not need to predict which page from the VM is best to evict. A mechanism for sharing pages between different VM is implemented; the novelty is that instead of comparing the content of each single page, a hashing mechanism is used and only pages with the same hash number are compared. A mechanism for taxing idle pages is implemented, when memory is needed guests that are not using memory efficiently are more penalized. An I/O page remapping mechanism allows a much better use of memory for I/O data transfers.
It adds an extra level of indirection to manage memory pages. The VM works with virtual memory addresses that are translated to physical pages, the ESX translates physical pages to machine pages. Demand paging is sometimes used by ESX Server whenever ballooning is not sufficient, when is ballooning not enough? When is it necessary to use demand paging? A hash table is used for handling the content-based page sharing mechanism, how big is that table?
Posted by: Paula Aguilera | October 16, 2008 04:38 AM
Summary:
This paper presents the memory management techniques used in VMware ESX server. Notable features of the paper are - ballooning to reclaim pages, idle memory taxing for efficient memory utilization, content-based page sharing for reducing redundancy and copy overheads for multiple virtual machines.
Problem:
This paper presents the memory resource management mechanism used in the commercially available VMware ESX server. Two novel mechanisms/problems studied in the paper are - a high-level resource management policy to compute target memory allocation for each VM, and secondly how to use a background activity which tries to share identical pages between VMs.
Contributions:
VMware ESX server represents one of the major paradigms for virtualization today (directly over the hardware). One of the important techniques presented in this paper is the balloning drivers. It helps the ESX server to control the amount of memory which can be reclaimed from a VM. Another interesting idea of the paper is content-based page-sharing. However, I believe that the implemementation using random scanning of pages is particualy biased towards the large page clusters.
Another important idea in the paper is page remapping which helps to reduce the IO copying overheads particularly in large memory systems. Idle memory taxing takes into account the activity of all VMs, thereby not simply cutting down their shares in proportion to their total memory shares.
Flaws:
Most of the strategies used in the paper optimize the "common case" workloads, and do not consider the pathological cases. One example is ranodomized page replacement policy which the authors claim to avoid the interference with native guest OS memory management algorithm. I seriously doubt that the randomized approach will completely eliminate this possibility. In my opinion, it is actually more biased depending on the kind of workload. Similar is the case with random scanning for identifying potential shared pages. The authors do mention to consider feedback driven workload managent techniques in future.
Techniques used:
Three major techniques used in the paper for memory management are - ballooning for reclaiming the pages considered least valuable to the guest OS, content-based page sharing (using a Bob Hash) and and idle memory tax for providing efficient memory utilization and performance isolation guaranttees.
Tradeoffs:
Both ballooning and content-based sharing tradeoff for the common and worst case performance of the system. The authors claim their heuristic approaches work well for the common case, and the worst cases like popping the ballon driver are rare (though they provide the fallback mechanism for this case).
Alternative uses:
Content-based sharing and Copy on Write concepts are well used for memory buffers (fbufs and mbufs). They are also used for networking applications to reduce copying overheads in modern web servers (eg. sendfile implementation).
Posted by: Mohit Saxena | October 16, 2008 01:50 AM
Summary:
This paper describes several novel ESX Server mechanisms and policies for managing memory in a virtual machine environment. In order to efficiently manage memory, the paper also presents some techniques like ballooning and memory sharing between guest operating systems in the VMware ESX server.
The problem the paper was trying to deal with:
The main goal of this paper is to develop a virtual machine in which other operating systems can run while not requiring any modifications to the guest system. In order to achieve this goal, those issues related to memory virtualization including memory allocation, and memory sharing are discussed in this paper, and some novel solutions including new ballooning technique, an idle memory tax, content-based transparent page sharing, page remapping are also presented.
Contributions
1. The paper presents a novel idea called ballooning which causes a guest operating system to invoke its own memory management routines to free-up memory so that it can be given to other virtual machines.
2. The paper makes contributions on dealing with reducing the amount of memory needed to run multiple virtual machines.
3. A tax system is used to allocate shared memory to VMs, which was introduced to solve an open problem in share-based management of space-shared resources, enabling both performance isolation and efficient memory utilization.
4. The paper also presents a new idea on page remapping which help to reduce I/O copying overheads in large-memory systems.
5. The paper presents a content-based transparent page sharing technique which exploits sharing opportunities within and between VMs having no any guest OS involvement.
Flaws:
One of the flaws of the paper is that page replacement policies may greatly influence the performance of an operating system. I don’t think it is a good idea to use randomized page replacement policy
Performance technique:
In order to improve the efficiency of memory management, some novel techniques are proposed in this paper, including a new ballooning technique, an idle memory tax, content-based transparent page sharing, and page remapping. A higher-level dynamic reallocation policy is also presented to coordinate these diverse techniques to efficiently support virtual machine workloads that overcommitted memory. Memory management is a very important factor that affects the performance of a virtual machine system. And the memory management policies presented in this paper can obviously improve the performance of the system.
Posted by: Tao Wu | October 16, 2008 12:41 AM
Summary:
Paper talks about memory management of VMWare ESX Server. ESX server is thin software layer designed to multiplex hardware resources efficiently among virtual machine.
Problem:
Currently most of the servers are underutilized, which allows them to consolidate as virtual machines on a single physical server. In order to allow efficient multiplexing of resources, system should be able to over commit the resource like memory.
Contribution:
1. VM to read sectors from its virtual disk, and issues a read() system call to the underlying Linux host OS to retrieve the corresponding data. In contrast, ESX Server manages system hardware directly, providing significantly higher I/O performance and complete control over resource management.
2. In contrast to VMware workstation, it does not run atop a third-party operating system, but instead includes its own kernel.
3. Introduce novel Balloning technique for memory management.
4. It allows content based page sharing.
5. Mechanisms such as proportional shares, min and max sizes ensure the memory allocation is handled according to some desired Resource management policy.
6. Using idle memory tax penality, guest OS who ask for more memory but doesn’t use it will be first to reclaim memory in case memory is scarce.
Flaws:
1. Not clear what will happen if Guest OS doesn’t support ballooning or have no driver for ballooning.
Tradeoff:
1. By over committing memory ESX server might be improving the utilization but that this might perform poorly in case memory is scarce. So we are trading of utilization with worst-case performance.
2. By allowing to sharing memory between similar guest OS and same application we are compromising with isolation feature of VMM.
Another part of OS where technique can be applied:
Same memory management techniques can be applied to application like Database which are running multiple instance on same server machine.
Posted by: Nikhil Teletia | October 16, 2008 12:31 AM
Summary
VMWare ESX Server discusses a virtual machine monitor underlying a number of unmodified operating systems or virtual machines. In implementing this design, new techniques including ballooning to reclaim pages, page sharing, and idle memory tax are presented to improve memory sharing between virtual machines.
Problem to Solve
Sharing memory resources between virtual machines techniques have been done by a number of different systems, however VMWare is trying to efficiently share memory without modification to the virtual machines.
Contributions
- A ballooning technique that runs as a driver in the operating system and communicates with the hypervisor to expand in size releasing memory resources.
- An idle memory tax that balances a mix of share-based isolation and the ability to reclaim idle memory.
- Measuring idleness through using a uniform distribution of samples which approximates the percentage of idleness.
- Transparent page sharing using a hash of the contents for efficiency.
Flaws
The paper presents in the conclusion that VMware ESX has an improved technique of page remapping to reduce I/O copying, however, in section 7, they do not discuss the means of handling low page remapping. Section 7 discusses that “hot” high pages will be mapped to low page, but when low pages are fully utilized, they state that they are currently exploring various techniques.
Techniques
Techniques used to achieve performance are addition of an operating system level driver, page remapping, shared page hashing, and an idle tax to maximize active memory usage. The trade off with this hypervisor memory resource manager is that a group of smaller improvement techniques are used at the expense of not modifying the operating system code. For example both the hypervisor and operating system contain page swapping algorithms which forces the hypervisor to operate randomly and less optimally to avoid conflict. The techniques each could be applied in different ways such as transparent page sharing and hashing could be implemented in the buffer caches to reduce duplication.
Posted by: Cory Casper | October 16, 2008 12:03 AM
Summary
This paper talks about the popular VMware ESX Server and presents several mechanisms and policies for managing memory in an efficient way amongst commodity host operating systems that run over it. The VMware server provides significantly higher I/O performance and control over resources than hosted virtual machine architectures.
Problems:
In virtual machines, multiple operating system and applications might be running with varying memory needs. Virtual machines provide an environment to guest OS, where the memory is always overcommitted to them. If the policies and mechanism to deal with memory resource management amongst the operating systems isn’t efficient, applications can suffer significantly. Here, VMware ESX Server is presented as a thin layer above the hardware to multiplex hardware resources properly. But this paper talks about the memory resource management.
Contributions:
• Ballooning technique, used to make guest operating system to invoke its own memory management algorithm and move out the pages from the memory that are considered less valuable at that time.
• The presence of same guest OS and similar applications supports the idea of similar pages in the memory. The algorithm presented to compare pages in the memory and page out all copies except one, marking it as copy on write is a new contribution.
• Concept of idle memory tax, where clients having idle pages are charged more than client using pages efficiently. Memory is reclaimed from clients having idle memory. Active memory technique is used to touch upon pages and evaluate the memory usage by an operating system. This technique helps in claiming memory.
• Page remapping is leveraged to reduce I/O copying overheads in large memory systems.
• Higher level dynamic relocation policy coordinates these diverse techniques to efficiently support virtual machine workloads that overcommit memory.
Flaws:
The paper is very well-written. The evaluation has been done largely for the virtual systems sharing enough pages; the evaluation should have also been done on distinct applications and distinct operating systems, so as to incur minimal shared pages.
There’s no experimental evaluation to prove I/O page remapping.
Performance:
The VMware ESX Server is a thin layer designed to multiplex hardware resources amongst virtual machines. The design of ESX Server differs significantly from VMware workstation, which uses hosted virtual machine architecture that takes advantage of pre-existing operating system for portable I/O device support. This architecture reduces lot of overhead and provides a better performance. The various techniques ballooning, idle memory tax, content based page sharing and hot I/O page remapping efficiently multiplexes memory amongst guest operating systems.
There’s a tradeoff made in while measuring idle memory. The technique used for measuring idle memory, statistical sampling working set introduces page faults incurring some overhead. Content based page sharing also adds extra overhead for copy on write. There’s an extra overhead for every new page even if they don’t have any matches in the memory, because when a new page is brought in, the hash values are always compared. But all these overhead are paid to achieve a VMM with overcommitted memory and efficient multiplexing of memory.
Posted by: Rachita Dhawan | October 16, 2008 12:00 AM
Summary
The paper describes various memory management mechanisms and policies used by the VMWare ESX server to multiplex the available machine memory between contending virtual machines.
Problem Addressed
The problem address here is that of efficiently overcommitting memory. Server consolidation and other motivations drive running multiple virtual machines on the same physical machine. And quite often, the total of memory committed to each virtual machine exceeds the total available memory on the machine. Hence, The Virtual Machine Monitor has to dynamically reallocate memory depending on requirements of the hosted VMs.
Contributions
- Introducing a ballooning module into the hosted VM to "hint" to the VM to perform paging actions. The technique is particularly interesting to me because "physical" pages claimed by the ballooning driver can just be mapped to null.
- The ballooning also has the advantage that the guest OS gets to decide which page to push out/in. This is the correct behaviour because the VMM can decide only at the granularity of the VM; the guest's voice is needed for the decision at the granularity of the page. Also because of this delegation, double paging is reduced.
- Content based page sharing across VMs: this allows for redundancy elimination and hence reclaims more pages. And the implementation through hashing manages to achieve this in O(n) complexity.
- the idle memory tax allows taking away memory from VMs that are just sitting on the memory to give to those that are activily using their share. Without it memory of all VMs would just shrink proportionally to their shares.
- The statistical sampling approach proposed is a nice, low overhead way of estimating the amount of idle memory. This is what makes the idle memory tax based reclaimation feasible.
- Identifying "hot" pages in high memory and remapping them to low memory helps prevent overhead incurred by guest os in copying through bounce buffers.
Flaws
- The fallback mechanism in the absence of ballooning is radomized page replacement. This can mitigate the problem of double paging etc but isn't quite a perfect solution.
- How effective is random checking for shared pages is unclear. I would have liked an evaluation of a machine's static snapshot which analyzes how much redundancy was missed out.
Technique used
There are basically three main techniques used here. One is ballooning which is a gray-box technique to induce guest OSes to undertake the required paging action. The second is redundancy elimination by locating identical "physical" pages and mapping them to the same "machine" page. And the final one is penalizing entities which amass resources without using them.
Tradeoffs
- ballooning: modification/intrusion to the guest's environment versus delegating the paging job to the appropriate entity.
- content based sharing & idle memory tax: computation overhead versus the reduction in memory required.
Another area where the technique applies
- An approach like idle memory tax could be used with OSes to decide on which processes to page out.
Posted by: Varghese Mathew | October 15, 2008 11:59 PM
Summary
This paper talks about the popular VMware ESX Server and presents several mechanisms and policies for managing memory in an efficient way amongst commodity host operating systems that run over it. The VMware server provides significantly higher I/O performance and control over resources than hosted virtual machine architectures.
Problems:
In virtual machines, multiple operating system and applications might be running with varying memory needs. Virtual machines provide an environment to guest OS, where the memory is always overcommitted to them. If the policies and mechanism to deal with memory resource management amongst the operating systems isn’t efficient, applications can suffer significantly. Here, VMware ESX Server is presented as a thin layer above the hardware to multiplex hardware resources properly. But this paper talks about the memory resource management.
Contributions:
• Ballooning technique, used to make guest operating system to invoke its own memory management algorithm and move out the pages from the memory that are considered less valuable at that time.
• The presence of same guest OS and similar applications supports the idea of similar pages in the memory. The algorithm presented to compare pages in the memory and page out all copies except one, marking it as copy on write is a new contribution.
• Concept of idle memory tax, where clients having idle pages are charged more than client using pages efficiently. Memory is reclaimed from clients having idle memory. Active memory technique is used to touch upon pages and evaluate the memory usage by an operating system. This technique helps in claiming memory.
• Page remapping is leveraged to reduce I/O copying overheads in large memory systems.
• Higher level dynamic relocation policy coordinates these diverse techniques to efficiently support virtual machine workloads that overcommit memory.
Flaws:
The paper is very well-written. The evaluation has been done largely for the virtual systems sharing enough pages; the evaluation should have also been done on distinct applications and distinct operating systems, so as to incur minimal shared pages.
There’s no experimental evaluation to prove I/O page remapping.
Performance:
The VMware ESX Server is a thin layer designed to multiplex hardware resources amongst virtual machines. The design of ESX Server differs significantly from VMware workstation, which uses hosted virtual machine architecture that takes advantage of pre-existing operating system for portable I/O device support. This architecture reduces lot of overhead and provides a better performance. The various techniques ballooning, idle memory tax, content based page sharing and hot I/O page remapping efficiently multiplexes memory amongst guest operating systems.
There’s a tradeoff made in while measuring idle memory. The technique used for measuring idle memory, statistical sampling working set introduces page faults incurring some overhead. Content based page sharing also adds extra overhead for copy on write. There’s an extra overhead for every new page even if they don’t have any matches in the memory, because when a new page is brought in, the hash values are always compared. But all these overhead are paid to achieve a VMM with overcommitted memory and efficient multiplexing of memory.
Posted by: Rachita Dhawan | October 15, 2008 11:58 PM
Summary: The authors of this paper present several techniques to efficiently manage/overcommit memory resources in VMMs to decrease the amount of hardware resources needed and increase the performance of individual VMs in the VMware ESX Server. They introduce ballooning and idle memory tax as a way to help control the memory usage by a VM without required changed to the guest OS system.
Problem to Solve: The authors are trying to improve upon existing techniques used to effectively virtual hardware memory resources in VMMs. Specifically they are targeting the ability to overcommit memory to increase the number of VMs that can be handled on a single system.
Contributions:First, the authors introduce ballooning as a technique to force guest OSes to invoke it’s own memory management techniques to free up pages for reclaiming. In the opposite case it allows pages to be swapped in to reduce memory contention.Secondly, idle memory tax is introduced as a way to encourage VMs to reduce the amount of memory requested when it is unnecessary. This allows VMs which need the memory to access and use it. Another contribution is the introduction of content-based page sharing which is able to successfully identify shared pages via page hashes and hints. This allows additional COW pages to be shared between VMs, which previously would go unrecognized as the same. Flaws:
One flaw of this paper is the lack of diversity in OSes being run in the VMM in the evaluation of real-world page sharing. Although they test different guest OS types, they don’t intermix them to see how this affects the shared and reclaimed memory percentages. In reality many VMMs would probably not only have a single OS among it’s VMs. Another flaw is that sampling may not always produce an accurate depiction of pages that are in use. This would occur if a VM continued a repetitive cycle of certain commands and the sampling happened to always fall at the same place in the cycle.
What tradeoff is made: High levels of shared page memory increases system performance during reads only, but if writing to a shared page when a page fault occurs. This fault causes the page to be copied out and written to, but it comes with a performance cost for reading and writing this new page. Another tradeoff is that clients can easily request more space, but another client is the victim that must give up this memory. If the victim client was getting ready to use this memory then it would incur the overhead of fetching more memory for itself that it just lost.
Another part of the OS where the technique could be applied: Idle memory tax could be expanded to apply to not only memory, but caches as well. By applying this to caches it would force VMs to quickly swap out data that they don’t need instead of risking losing it. This should decrease the time to look for requested information in the cache because the cache would be less full.
Posted by: Holly Esquivel | October 15, 2008 11:21 PM
Summary
In this paper, the author introduces many new concepts used by the
VMware ESX Server to manage the physical memory of the guest
OS. Techniques such as ballooning, dynamic page reclamation, idle page
tax are used to avoid wastage of resources by the OS. Content based page sharing
and I/O page mapping are used to make the VMM memory management more
efficient.
Problem
One of the primary use of VMware is for server consolidation.
The servers might be made to run in systems that have less than the
required memory knowing that they rarely use up the entire
memory space. This makes the most efficient usage of available memory
and preventing wastage by high priority OS's very important.
Contributions
- Ballooning is a novel technique to force the OS into swapping out
its idle pages on command by the VMM. It requires almost no change in
the guest OS and yet provides a communication path between ESX Server
and the OS. Also this avoids second guessing the correct page
replacement technique for a given OS
- Using the content of the page to share them between the OS's is
very useful when you are running similar services on similar
OS's(common case in server consolidation).
- ESX Server samples the page accesses to measure the idle pages under
each VM and only swaps out pages from VM that have the most idle
resources while still honoring the shares of the VM to an extent.
Flaws
- Sampling to determine the idle OS pages, aggravates the situation
when there is a sudden increase in load. To catch sudden increases,
the sampling has to be done at a fast rate, which would mean more
page faults(when the load is high) to obtain the touched page count.
Technique
ESX Server uses the available free memory to decide which memory
management technique to use for reclaiming memory from the
VM's. It uses a driver in the OS to fool it into swapping out
pages. It also shares pages without compromising on isolation or changing
the guest OS. But this increase in efficiency is obtained at the cost
of maintaining additional hash tables and minor impact on performance.
The content based sharing can probably be used for caching in Internet
hosting servers(Akamai).
Posted by: Tushar Khot | October 15, 2008 11:18 PM
The memory management system in VMWare ESX is an exercise in grey-box manipulation of an operating system. The authors maintain two goals which appear to contradict each other: that the guests should not be modified in any way, and that the VMM must yet interact non-pathologically with the guests' virtual memory policies to make the physical system more available.
Only one of the methods they employ, ballooning, requires any interaction with the guest at all. To make page reallocation interact well with an operating system's expectations of a physical machine, the "balloon" is introduced as a loadable module (or driver) to the system in question. This balloon establishes a private communication channel with the VM and consumes or releases memory to influence the guest's behavior. This is a really clever compromise between an unmodified guest and a well-behaving guest.
All the other methods for improving behavior are optimizations. The idle-memory tax helps put pages to use where they are lacking in number, and content-based sharing avoids wastes of duplicate memory. The latter, per my intuition, probably improves vm-to-vm context-switching time. Though they did not address this directly, they were able to quantify a 0.5% improvement in throughput. I/O mapping is another optimization which has more to do with the layout of memory and its effects in hardware; after identifying a hot I/O-mapped page, the VMM may move it to "low memory" on order to decrease the overhead in subsequent accesses.
The catch to all of these methods, ballooning included, is that they are heuristic (and thus probably not optimal). Furthermore, these methods really don't guarantee non-pathological performance in the worst case (unlocalized random access to memory pages), but for common workloads are evidently just fine. Tuning these procedures to be more optimal by applying different strategies would probably be wasted effort, as simpler changes might get 90% of the benefit and cost less in cpu time to run. I would expect any publishable future work to center around simple, isolated ideas complimenting the effects of these given here.
Posted by: Tack | October 15, 2008 11:12 PM
Summary
In “Memory Resource Management in VMware ESX Server,” Carl A. Waldspurger describes how the VMware ESX Server deals with allocating, sharing, and multiplexing physical memory to the various operating systems which it runs.
Problem
Modern computers have ample processing power to support multiple simultaneous configurations. Doing this, costs can be minimized and resources can be used more efficiently as a variety of systems are virtualized on one set of hardware. However, this virtualization brings with it a variety of challenges. For example, operating systems make many assumptions about their memory resources. If multiple operating systems are to run on one set of hardware, the available memory needs to be multiplexed and dynamically reallocated as the loads on the various operating systems change. Since operating systems commonly assume full memory management of a static amount of memory, effectively managing memory presents a number of challenges.
Contributions
• Creating a “ballooning technique” to force a guest operating system to manage memory itself without making significant modifications to the guest OS. Trying to double-virtualize memory could cause a “double-paging” issue where the virtual machine and the guest OS both try to page the same set of memory in and out. By inserting a “ballooning” program into the guest OS and inflating it, memory could be allocated to the program by the guest OS and thus the virtual machine could learn what memory is allocated to the balloon and page out that memory with the knowledge that (because it is pinned in memory), the guest OS will not swap out that memory, causing double-paging.
• Mapping identical memory copies across virtual machines to the same physical memory locations, effectively compressing the total amount of memory needed while preserving the speed of access to the common memory for all machines.
• Developing a method for measurement of idle memory on a virtual machine and then employing reclamation techniques to reclaim and reallocate that memory where it may be better used.
Flaws
• Failing to address the issue that modern operating systems tend to use up all available memory as a cache (something that is becoming more aggressive in a recent operating system such as Microsoft Windows Vista, which actually fills the memory purposefully by prefetching based off older usage patterns). The described techniques will necessarily have a performance cost even if all the memory management techniques which the VMware ESX Server employed were exceedingly efficient.
Performance Techniques
To achieve performance, the designers of the VMware ESX server memory management software have implemented methods to allow memory to be dynamically reallocated from idle virtual operating systems to more heavily loaded ones (or from ones that have a lesser service level agreement to ones that have a higher one).
The system of virtualization presented, here with attention to sharing memory among virtual operating systems, sacrifices performance to gain efficiency by consolidation of hardware. This technique can be applied in other areas, such as creating larger buffers at network interconnect points to allow for a slower network to be used to support additional network traffic at the expense of spending more time transmitting network traffic.
Posted by: Mark Sieklucki | October 15, 2008 10:01 PM
Summary
This paper discusses various memory resource management mechanisms between virtual machines in VMware ESX Server.
Problem
For cost reasons or otherwise, we wish to run multiple virtual machines on one physical machine. We wish to do so in such a way that each virtual machine has the illusion of having a certain amount of hardware memory, and the sum of these amounts across all virtual machines is greater than the actual amount of hardware memory. Furthermore, we need to be able to do this without modifying the guest operating systems.
Techniques/Contributions
Specifically, the problem constraints raise three questions. They are:
(i) How do we reclaim pages used by one VM for another VM?
(ii) Can we reduce memory usage by sharing memory for redundant pages?
(iii) Can we make performance guarantees and ensure performance isolation while efficiently utilizing memory?
(i) A simple solution is to use a paging mechanism. Indeed, ESX Server requires the allocation of disk swap space for the maximum number of pages that might need to be swapped to disk at any one time. However, the paging mechanism is used as a fallback. ESX Server first attempts to use a “ballooning” technique. A “balloon” module is loaded in the guest OS which attempts to allocate pinned pages of memory; this information is then relayed to ESX Server, allowing ESX Server to reclaim the memory for use by another VM.
(ii) Yes, obviously, but it can be difficult to do efficiently (e.g. we could do pair-wise comparisons between the content of two pages, but it would be slow). ESX Server randomly scans pages and computes a 64-bit hash value. Upon collision it is likely but not necessary that two pages are the same, so a full comparison is done only then. The results show that this reduces memory usage. It is even possible to share pages within a single virtual machine, although more sharing results with multiple virtual machines.
(iii) There is an inherent tradeoff between performance isolation and efficient use of memory. We can guarantee isolation by guaranteeing a VM a certain “share” of memory, which is essential a weight. We can allow idle pages to be reclaimed if we take away memory from VMs that have idle pages, but then we lose isolation. The authors propose a model based on economics that attempts to compromise between the two. Under ideal conditions where all memory is being utilized allocation is based on shares. The ratio “shares per page allocated” is viewed as a price. If memory is in high demand, pages are reallocated to those willing to pay a higher price. However, if there are idle pages an “idle memory tax” is imposed that decreases a VM’s effective number of shares. This raises the additional question of how to measure what is idle memory. The authors propose a sampling approach to give reasonable results with minimal performance overhead.
Posted by: Sam Javner | October 15, 2008 09:54 PM
Introduction:
This paper describes the different memory management techniques used by the VMware ESX virtual machine monitor. The paper explains four different techniques which increase performance while maintaining isolation.
What were they trying to solve:
Virtual machines often have two conflicting objectives to meet: efficient use of the hardware while maintaining strict isolation between guests. This is especially true for memory, since the total amount of memory needed by all the guest instances often is greater than the total memory available. Also, in pure virtualization, since each guest is unaware of the VMM, resource management policies of the guest may conflict with those of the VMM, leading to performance problems(eg: double paging).
Contributions:
A balloon driver placed in the guest OS acts as a way for the VMM to signal the guest OS whenever it wants to reclaim pages. This avoids problems like double paging, and prevents unnecessary paging-out of pages by both VMM and guest OS.
Pages which have the same content are not duplicated in hardware memory. This is very useful when there are multiple instances of the same OS running at the same time. Most of the readonly code sections can be shared between instances.
Mechanisms such as proportional shares, min and max sizes ensure the memory allocation is handled according to some desired Resource management policy. The min-funding revocation algorithm uses the shares-per-page metric to decide which guest OS to target for revocation.
The idle memory tax penalizes guests who allocate memory and don't use it. Statistical sampling is used to estimate idleness. The sampling is robust enough to adapt quickly to changing memory needs.
Repeated I/O operations from "high" memory automatically cause the ESX to remap the physical page to some low-memory machine page.
Flaws:
Balloons break the "pure virtualization" abstraction by having entities in the guest which know about the virtualization. It's not clear what would happen if there is a guest without a balloon driver available.
It's unclear which guest gets charged for the shared page.
Techniques used to improve performance
Breaking the pure virtualization abstraction for more communication.
Removing redundancy by sharing pages with same content.
Tracking I/O requests involving high memory and automatically remapping such addresses to low memory, thus avoiding costly copies.
Tradeoffs:
Isolation vs Overall performance is the central tradeoff here. Strict partitioning of memory ensures complete isolation but at the cost of performance.
another part of the OS where the technique could be applied:
Similar techniques can be used for memory management between certain applications which manage their own memory, like databases, and the OS.
Posted by: priyananda | October 15, 2008 09:01 PM