Garbage Collection - How it works in JVM
In the previous blog post, I wrote about the Java Architecture and JVM components. As part of that post, covered briefly about the Garbage Collector section in the Execution Engine component of Java Virtual Machine (JVM).
In this article, let's deep dive into Garbage collector(GC), how it works, various types of GC available in Java and their pros. Also explore on the new experimental Garbage collectors that are available in the latest Java releases.
What is Garbage Collection?
Garbage Collection is a process of reclaiming the runtime unused memory by destroying the unused objects. In other words, it is a way to destroy the unused objects.
The used memory of the system grows and eventually gets to the Out of Memory scenario as there is no memory in the system to allocate. Such a system suffers from memory leaks. Let's explain the various out of memory scenario as we go further
In C and C++, the programmer is responsible for the creation and destruction of the objects using free() and delete() to perform garbage collection. In Java, the garbage collection happens automatically during the lifetime of the program. This eliminates the need to de-allocate memory and therefore avoid memory leaks. Java garbage collection is the process by which Java programs perform automatic memory management and compile into bytecode that can be run on the Java Virtual Machine (JVM).
The heap memory consists of two types of objects -
Live - These objects are in use and some are referenced from somewhere
Dead - These objects are no longer used or referenced from anywhere. These objects are identified by the garbage collector and are deleted to free up the memory.
Advantages of GC
It makes java memory efficient because garbage collector removes the unreferenced objects from heap memory.
It is automatically done by the garbage collector(a part of JVM) so we don't need to make extra efforts.
Dereferencing an Object
There are various ways in which the references to an object can be released.
By making reference as null
By assigning a reference to another
By using an anonymous object
How GC works in Java?
Java GC is an automatic process. The programmer does not explicitly mark objects for deletion. JVM takes care GC implementation and can implement has its own version of GC. It should meet the standard JVM specification of working with the objects present in the heap memory, marking and or identifying the unreachable objects and destroying them with compaction.
Garbage Collectors work on the concept of Garbage Collection Roots (GC Roots) to identify live and dead objects. The implementation involves three phases
Mark objects as alive - GC identifies all the live objects in memory by traversing the object graph. Every object that GC visits is marked as alive. The objects that are not reachable by GC roots are considered as candidates for GC.
Sweep dead objects - After Marking, the memory space which is occupied by live and dead objects. The sweep phase releases the memory fragments which contain these dead objects.
Compact remaining objects - The dead objects that were removed in sweep phase may not next to each other. So end with fragmented memory space. The memory is compacted so the remaining object (live) are in a continuous block at the start of the heap.
Java Memory Architecture (Java Memory Model)
Java Garbage Collectors implement a generational garbage collection strategy that categorizes objects by age. The heap memory area in the JVM is divided into three sections:
The objects that are created newly start in the Young Generation. It is further subdivided into:
Eden Space - All new objects start here and initial memory is allocated to them
Survivor Space(FromSpace and ToSpace) - Objects are moved from Eden after surviving one garage collection cycle.
When the objects are garbage collected in this generation, it is a Minor GC event. When the Eden space is filled with objects, a Minor GC is performed - all the dead objects are deleted and the live objects are to survivor spaces (From or To). The size of the Young Generation is set using -Xmn flag.
Eden has all objects (live and dead)
Minor GC occurs - all dead objects are removed from Eden. All live objects are moved to T1 (FromSpace). Eden and T2 are now empty.
New objects are created and added to Eden. Some objects in Eden and T1 become dead.
Minor GC occurs - all dead objects are removed from Eden and T1. All live objects are moved to T2 (ToSpace). Eden and T1 are now empty.
At any time, any one of the survivor space will be empty
Objects that are long-lived are moved from Young Generation to Old Generation. This is also known as Tenured Generation contains objects that remained in the survivor spaces for long time. When the objects are garbage collected in this generation, it is a Major GC event.
The size of the Older Generation is set using -Xms and -Xmx for initial and Maximum Size respectively.
There are other important flags like, -XX:InitialTenuringThreshold, -XX:MaxTenuringThreshold and -XX:TargetSurvivorRatio which lead to an optimum utilization of the tenured as well as the survivor spaces.
Permanent Generation (Permgen Space)
The 'Permgen' is used to store the following information: Constant Pool (Memory Pool), Field & Method Data and Code. Each of them related to the same specifics as their name suggests.
Using the -XX:PermGen and -XX:MaxPermGen flags to set the initial and maximum size of the Permanent Generation.
Types of Garbage Collectors
1. Serial GC (-XX:UseSerialGC):
GC on Young Generation and Old Generation
This is the simplest implementation of GC and is designed for small applications running on single-threaded environment.Use the simple mark-sweep-compact cycle for young and tenured generations. This is good for client systems and systems with low memory footprint and smaller CPU.
2. Parallel GC (-XX:UseParallelGC):
GC on Young Generation and Old Generation
This is the default implementation of GC in the JVM and is also known as Throughput Collector uses N threads, which can be configured using -XX:ParallelGCThreads=N, here N is also the number of CPU cores for garbage collection. It uses these N threads for GC in the Young Generation but uses only one-thread in the Old Generation.
3. Parallel Old GC (-XX:UseParallelOldGC):
GC on Young Generation and Old Generation
This is the default version of Parallel GC since Java 7u4. This is same as the Parallel GC, except that it uses N threads for GC in both Old and Young Generation
4. Concurrent Mark and Sweep(CMS) GC (-XX:ConcMarkSweepGC):
GC on Old Generation
This is also known as concurrent low pause collector. Multiple threads are used for minor GC using the same algorithm as Parallel. Major GC is multi-threaded, like Parallel Old GC, but CMS runs concurrently alongside application processes to minimize “stop the world” events. The CMS collector uses more CPU than other GCs. If you can allocate more CPU for better performance, then the CMS garbage collector is a better choice than the parallel collector. No compaction is performed in CMS GC.
5. Garbage First (G1) GC (-XX:UseG1GC):
GC on Young and Old Generation (By Dividing Heap into Equal Size Regions)
This is a parallel, concurrent, and incrementally compacting low-pause garbage collector. G1 was introduced in Java 7 with the ultimate vision to replace CMS GC. It divides the heap into multiple, equal sized regions and then performs GC, usually starting with the region that has less live data, hence "Garbage First".
Apart from the Eden, Survivor, and Old memory regions, there are two more types of regions present in the G1GC:
Humongous - used for large size objects (larger than 50% of heap size)
Available - the unused or non-allocated space
6. Epsilon GC (-XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC)
It is a do-nothing GC that was part of Java 11. It handles memory allocation but does not implement any actual memory reclamation mechanism. Once the available Java heap is exhausted, the JVM shuts down. It can be used on the applications where the memory footprint is known exactly
7. Shenandoah (-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC)
It is a new GC that was released part of JDK 12. It has advantage over G1 is that it does more of its garbage collection cycle work concurrently with the application threads. Shenandoah can compact live objects, clean garbage, and release RAM back to the OS almost immediately after detecting free memory.
8. ZGC (-XX:+UnlockExperimentalVMOptions -XX:+UseZGC)
ZGC is another GC that was released as part of JDK 11 and has been improved in JDK 12. It is intended for applications which require low latency and use a very large heap.The primary goals are low latency, scalability, and ease of use. To achieve this, ZGC allows a Java application to continue running while it performs all garbage collection operations
The major benefit of GC is that i makes the code simple and programmers need not worry about proper memory assignment and release cycles. By stop using an object in the code, the memory it is used are automatically reclaim at some point. For scenarios in which the garbage collector is negatively impacting performance, Java offers many options for tuning the garbage collector to improve its efficiency.