7 min read
Java Modularity Deep Dive.

Background

While working on the Java runtime, our team’s main barrier appeared to be the widely overlooked Java modularity. Its introduction in Java 9 has affected the most crucial areas: class loading, linking, and native reflection code. The best source of information on this topic obviously would be the Java Specification.

Here I would like to highlight some unobvious aspects of it for those who really wanted to look under the hood. I believe that few developers truly understand how it works, which makes it intriguing. The topic is complex, so if you find any issue in the article or find some wording not quite clear - please let me know by email. I will be happy to see your feedback and update the material.

Modularity

Modularity was introduced with Project Jigsaw in order to address the issues of scalability, strong encapsulation, integrity, and so on. Java’s modularity relies on three key pillars: Module, ModuleLayer, the omnipresent ClassLoader

A module is a group of related packages and resources, a unit that imposes strong rules of isolation of different parts of the JDK and/or the application. The relationship between modules is usually declaratively expressed via module-info.java files. It means that modules have to explicitly declare their relationship:

  • When they let other modules use their particular packages (other packages will be unavailable from other modules);
  • When they are dependent on another module;
  • When they allow another module to reflectively access their packages (There is a notable caveat related to backward compatibility in the earliest versions of modular Java);
  • When they provide or consume some services via ServiceLoader, etc.

The module system introduction obviously heavily impacted class loading:

  • It should not be possible to define a class that is not supposed to be loaded from the current module;
  • It should not be possible to implement an interface or extend a class that is not available in the current module;

Module properties and relationships are usually bidirectional and can be expressed in the following terms:

  • A module can be named or unnamed; unnamed modules are usually application modules that don’t have any module-info.java declaration;
    • An unnamed module exports all packages to every module;
    • An unnamed module can read and use any module;
    • Each class loader has its own unnamed module;
  • Module A requires/reads Module B - meaning Module A can access packages in Module B that Module B wishes to expose to Module A; this is basically a dependency;
  • Module A can export packages to Module B - making public classes from the packages statically available to Module B at compile time;
  • Module A can open packages to Module B - making classes from the packages reflectively available to Module B at runtime;
    • In JDK 11, by default (for backward compatibility’s sake), the packages that were available in JDK 8 are open for reflection access. The setting --illegal-access=deny should be used to disable this feature. In later JDK versions, this feature was tightened up to use deny by default and then completely removed;
  • Module A provides a service - registering an implementation of a service S, making it available via ServiceLoader;
  • Module B uses service S - meaning Module B needs the service S (usually an interface or an abstract class) and can obtain it via ServiceLoader (all rules above are still in place, e.g., if Module B can’t read Module A, it won’t be able to instantiate the provider and therefore won’t be able to obtain the service);

The module system boundaries are maintained both in the Java world and in the native code at runtime, e.g., during class definition or linking.

The modules are hierarchically organized in ModuleLayers. There is a boot layer available via the ModuleLayer.boot() method call. Most of the modules are defined in a layer. The layers are interconnected, but a layer only knows about parent layers, so when you enumerate the layer’s layers - you enumerate its parents.

Run-time Package

The module a class belongs to is not explicitly defined by the developer. When client code asks the JVM to define a class, it doesn’t provide the runtime with the module of this class. The module is calculated by the JVM itself during the definition process using the so-called run-time package. Its value is based on the combination of the package name and the defining class loader.

Imagine you have loaded class C from module M with class loader L1. If you try to load the nested class C.N using a different class loader L2 and it tries to use the classes from the modules available uniquely to M, the class linking will fail. Why? Because N’s run-time package will be based on the correct package name and the wrong class loader, meaning N will land in L1’s unnamed module U. Despite U being able to read any module, the modules never exported the packages to U, so U’s hands are tied, and the linking process obediently throws an IllegalAccessError.

Module Loader Map

Another interesting aspect is the fact that the modules are defined in groups:

  • Boot modules should be loaded by the boot loader (null class loader),
  • Platform modules - by PlatformClassLoader,
  • Application modules - by AppClassLoader.

There is a wonderful class ModuleLoaderMap that is aware of what module should be defined using what class loader. Interestingly, the class is mostly generated at build time, so to take a peek, you can find it not on GitHub but in your IDE if it can decompile Java code or just use the javap utility. ModuleLoaderMap knows what modules are considered to be boot modules and what are considered to be platform modules. Everything else is considered to be an application module.

  • Boot modules typically include core JDK components such as java.lang, java.net, etc.;
  • Platform modules contain something that might be required for some applications, for example, java.sql; Java agents are also loaded with the platform class loader;
  • Application modules are basically the client code; the program class containing the main method belongs to the application module;

For a module to be defined by a class loader, the following rules apply:

  • The packages of this module and the current class loader will identify the classes in these packages as belonging to this module;
  • The service providers declared by the module via the descriptor will be registered in the corresponding ServicesCatalog identifiable by the current class loader;

Technical details: During module definition in native code, the JVM creates a module entry in the module table of the class loader data. Then it obtains the package table from the class loader data and creates the package entries there (see modules.cpp). In fact, only the class loader has access to its own table, so in this way, we can say that in the scope of one class loader, one package is strictly mappable to a single module. The class loader provides this data to the module because the module always knows what class loader it was defined with. If the package is not defined in the class loader data’s package table, the module will be the class loader’s unnamed module.

Conclusion

Understanding the complexities of Java’s modularity system is essential for developers working with the JVM, especially in large, modularized applications. The interaction between modules, class loading, reflection, and service management can be intricate. A deep understanding of these concepts will guide you in upgrading your services to newer Java versions, performance optimization and troubleshooting.