.NET Framework For Java Programmers
Objective
After reading this article Java programmers should be able to decipherand de-jargonize the .NET architecture and relate it with the proposedECMA standard.
Target Audience
Java programmers and system architects.
Summary
This article outlines Microsoft's proposed standardization of .NET frameworkin ECMA forum as CLI (Common Language Infrastructure), but the Microsoftdocumentation refer this as CLR (Common Language Runtime). The CLR andJVM are compared with respect to market forces which shaped the CLR definition.Components of CLR are examined followed by details of Microsoft's implementationof the CLR as the .NET framework.
All along .NET framework is compared with Java architecture.
The material is derived from author's own experience with Java sinceearly 1996, Microsoft's MSDN siteand standard documents from sites like ECMAand W3C.org. Overview
.NET framework is the Microsoft's answer to Javacommune's objections to \"Windonization\" of Java. Microsoft introduces a new language C#, designed by the Visual J++ team.But in the process it has done away with DCOM and also have changed it'sflagship language Visual Basic.
In a nutshell, .NET constitutes presently of three compiled languagesC#, VB.NET and C++, a Java like runtime virtual machine environment, fiveexecution containers hosting this runtime, namely: ASP.NET, Windows Shell,VBA scripting host for Office suite, Visual Forms container and IE (InternetExplorer). Much like Java it contains a rich set of API and lib.
Enhancements over Java framework include use of SOAP (Simple ObjectAccess Protocol) for remoting. Version and security scoping using conceptof Application Assembly (described later). A Common Type System is introducedfor making mixed language programming easier. For example a VB componentcan inherit from a C# class.
In longer term Java and .NET will converge and therefore an overviewof the new framework is presented here from Java programmer's perspective.
Comparing CLR with JVM
The .NET framework's Common Language Runtime (CLR) is much similar to JavaVirtual Machine (JVM), in terms of garbage collection, security, just intime compilation (JIT).
However, the fundamental difference arises from the variance in perceptionof the Sun's Java design team headed by James Gosling and that of Microsoft'sC# designers spear headed by Anders Hejlsberg
Sun viewed the Internet as an heterogeneousnetwork consisting of multiple operating systems. Thus Sun had to designthe GUI as the least common factor, supportable by all such platform. Thiswas also the major reason of Java's failure in client side applications.Java has been successful only on server side where there is no great needfor GUI. Having failed at client side desktop application arena, Sun is now targetingJava to server side applications market, which is dominated by Unix andLinux flavors having approximately 60% of the server market, the rest 40%rests with Windows NT.
But this view was not conducive to Microsoft, which holds about 90%of client side desktop market. Microsoft wanted to provide a window centricInternet development platform. Thus it added a few Window specific featuresin it's Java implementation, similar to what it had had done with it'sC++ implementation. This along with Microsoft's refusal to support JavaRMI, which competed with it floundering remoting technology called DCOM,resulted a law suite. Microsoft lost the law suite in late 2000, and hadto pay USD 20 million to Sun as settlement amount. This antagonistattitude made Microsoft break away from Java and float it's own languagecalled C#.
The C# team was carved out of the Microsoft J++ team, and it's effortfinally led to the creation of .NET framework.
Microsoft intends to leverage it's desktop leadership, to shape theInternet applications development by introducing the .NET framework. Thusthe supported languages map the Windows GUI more closely in it's framework,much similar to C++ MFC and J++ WFC (Windows Foundation Classes). In spiteof the platform independence design claims, all the three supported languagesproduce windows .exe code by default.
Microsoft played the standardization game better than Sun. Microsoft,though being an USA based company proposed the C# and Common Language Infrastructure(CLI), the back bone of .NET framework, for standardization with ECMA (EuropeanComputer Manufacturing Association) TC39 Technical Committee in October2000. Ironically Sun also happens to be a member of this standing committee,which looks after computer languages related standardization issues. Seehttp://www.ecma.ch. Microsoft has also successfully standardized Simple Object Access protocol(SOAP) through W3C (http://www.w3c.org).SOAP is a XML and HTTP based remote object access protocol. SOAP competeswith Java's RMI and Microsoft's own DCOM. RMI has the limitation of being language specific, and DCOM had limited acceptability outside the Windowscommunity, this was, despite the best of Microsoft's effort to port DCOMon Unix platforms. CORBA, another remoting contender, which even has internet specifictransport namely IIOP, is more or less dead, due to it's vendor non interoperability.
SOAP, by virtue of HTTP transport can operate easily over firewallsand therefore can easily transident LAN and Internet. However, SOAPbeing XML based, burdens both client and server for XML parsing,which is relatively CPU intensive, compared to binary protocols like RMIand DCOM.
Java platform views the Internet world as one language runningon different operating systems (OS), whereas .NET framework views the worldrunning on one OS with a programmers having choice of multiple languages.Therefore Java platform interpolates multiple operating systems, and .NETframework interpolates multiple languages.
Apparently from the above discussion, the market forces are largelyresponsible for the state-of-the-art rather than technical design considerations.
Inside The Common Language Runtime
The Common Language Runtime (CLR), is the runtime environment of the .NETframework, which manages the execution of code and provides services.
The Common Language Runtime (CLR), is also proposed for ECMA standard.However, The ECMA documents refer the CLR as Common Language Infrastructure(CLI). It has five components namely:
- CTS – Common Type System
- CLS – Common Language Specification
- CIL – Common Intermediate Language
- JIT - Just in Time Compiler
- VES – Virtual Execution System
CLI – Common Language Infrastructure
The Common Language Infrastructure (CLI) provides a language neutral platformfor application development and deployment. CLI supports both Object OrientedParadigm (OOP) as well as hooks for modeling procedural and structuredlanguages.
CLI provides languages with a framework for security, garbage collection,exception handling and also provides a platform for language interoperability.For example C# objects can inherit from C++ classes and VB procedures canuse the C# components.
Please Note that the Microsoft documentations refer CLI as CLR (CommonLanguage Runtime).
After reading through the ECMA standard documents, like me, you willprobably develop the feeling that CLI is an attempt to standardize thenext generation Java framework for accommodating the older pre Internetera languages like VB and C++.
The five components of the CLI is briefly described below.
CTS – Common Type System
The Common Type System, support both Object Oriented Programming like Javaas well as Procedural languages like 'C'. It deals with two kinds of entities:Objects and Values. Values are the familiar atomic types like integersand chars. Objects are self defining entities containing both methods andvariables.
Objects and Values can be categorized into the following hierarchy:
Types can be of two kinds Value Types and Reference Types. Value Typescan further categorized into built-in (for example Integer Types and FloatType) and user defined types like Enum.
Reference Type can be divided into three sub categories: Self DescribingReference Type, Pointers and Interfaces. Pointers can be sub divided intoFunction pointers, Managed and Unmanaged Types.
Value Types can be converted into Reference Type, and this conversionis called Boxing of Values. De-referencing the Boxed Value Types from theReferenced Type is called Un-Boxing.
Casting rules from one type to another, for example conversion of charto integer types are also defined within the Common Type System.
Common Type System also defines scope and assemblies. An assemblyis a configured set of loadable code modules and other resources that togetherimplement a unit of functionality. A scope is a collection of grouped namesof different kinds of values or reference types.
CLS – Common Language Specification
The Common Language Specification (CLS) aids the development of mixed languageprogramming. It defines a subset of Common Type System which all classlibrary providers and language designers targeting CLR must adhere to.
CLS is a subset of CTS. If a component written in onelanguage (say C#) is to be used from another language (say VB.NET), thenthe component writer must adhere to types and structures defined by CLS.
CIL – Common Intermediate Language
All compilers complying with CLI must generate an intermediate languagerepresentation called Common Intermediate Language (CIL). The CLI usesthis intermediate language to either generate native code or use Just InTime (JIT) compilation to execute the intermediate code on the fly.
The Microsoft documents refer this standard's implementation as MSIL(Microsoft Intermediate Language).
JIT - Just in Time Compiler
The JIT or Just in Time Compiler is the part of the runtime execution environment,which is used to convert the intermediate language contained in the executablefile, called assemblies, into native executable code.
The security policy settings are referred at this stage to decide ifcode being compiled needs to be type safe. If not an exception is thrownand JIT process is aborted.
VES – Virtual Execution System
Virtual Execution System (VES), is more or less equivalent to theJVM (Java Virtual Machine).
VES loads, links and runs the programs written for Common Language Infrastructurecontained in Portable Executable (PE) files.
Virtual Execution System (VES) fulfills it's loader function by usinginformation contained in the metadata and uses late binding (or linking)to integrate modules compiled separately, which may even be written indifferent languages.
VES also provides services during execution of the codes, that includeautomatic memory management, profiling and debugging support, securitysandboxes, and interoperability with unmanaged code, such as COM components.
Managed codes are Intermediate Language (IL) code along with metadatacontained in Portable Executable (PE) files, these may be .EXE or .DLL.This needs just in Time (JIT) compiler to convert it into native executablecode. There is also a provision of pre compiled executable which is calledunmanaged code. The advantage of unmanaged code is that is does not needto JIT compilation but has the disadvantage of unportablity across differentOperating System (OS) platforms.
Microsoft's Implementation of CLI is CLR
The Microsoft's implementation and adaptation of the above standard hasresulted in difference in terminology, for example Common IntermediateLanguage (CIL) is called Microsoft Intermediate Language (MSIL) and CommonLanguage Infrastructure (CLI) is referred to as Common Language Runtime(CLR).
These changes in naming convention, I believe, is to create a brandingdistinction while implementing the standards. This was probably intendedto avoid the clash that occurred with the Java the language standard, Javathe island, Java the coffee brand and Java the Sun's trademark! But, inthe long run, it will only lengthen the already long list of confusingacronyms and jargons in the programmer's dictionary.
We use CLI and CLR interchangeably, however, it will be more correctto say that CLR is the Microsoft's implementation of CLI.
Apart from scripted languages like JavaScript and VBScript, the .NETframework presently supports three compiled languages, namely: VB.NET,VC++ and C# (pronounced C Sharp) These language compilers targetthis runtime. The type verifiable compiler's output is calledmanagedcode.
Unsafe codes can also be generated by compilers, which is called unmanagedcode. Garbage collection is only handled for managed codes.
The managed code has access to Common Language Runtime (CLR) featuressuch as multi- language integration, exception handling across languageboundaries, security and versioning and a simplified deployment .
An interesting facility being experimented by microsoft is the crosslanguage inheritance. For example, a C# class can inherit from a VB object!Each of these features will be discussed in detail later.
The CLR provides services to the managed code. The language compilersemit metadata, that describes the types, members, and referencesin the code. Metadata is stored along
with the code: every loadable commonlanguage runtime image contains metadata.
The metadata helps the CLR to locate and load classes, lay outinstances in memory, resolve method invocations, generate native code,enforce security, and set up run time context boundaries.
The CLR, much like Java Virtual Machine (JVM) provides automatic garbagecollection facilities to the managed code, this garbage collection featureis called managed data. But unlike Java VM, the CLR also has mechanismto syntactically switch off automatic garbage collection called unmanageddata, where the programmer is responsible for garbage collection.
The CLR has been designed to facilitate cross language integration.Two kind of integration is possible: tightly coupled and loosely coupled,which is also called remoting. The tightly coupled inter languagemethod call is achieved within the CLR; this assumes that the two languagescalling each other are both .NET framework compliant like VC++, VB.NETor C# or are at least COM compliant. Thus C# programs can talk to Javaprograms through ActiveX Java Bean bridge! This is assuming that both theC# and Java codes reside on a single computer.
Remoting or loosely coupled inter language interaction is suitable whenthe two interacting programs written in different languages are on differentoperating system (OS) platforms, like C# client residing on Windows CEtalking to Solaris based server side Java code. This integration is achievedthrough an XML based protocol called Simple Object Access Protocol (SOAP)which was proposed by Microsoft and is adopted by W3Cconsortium (http://www.w3c.org). An open source SOAP gatewayimplementation of Java is available fromApache.orgat http://xml.apache.org. SOAP has transport layer independent, XML formatted content and currentlyHTTP and SMTP transport implementations are available from both Microsoftand Apache.org for .NET framework and Java platforms respectively .
All .NET framework components carry information about the componentsand resources they use, in a XML formatted document called metadata. Theruntime, uses this information to dynamically link the components, ensuringversion integrity and security controls; This makes the application theoreticallymore resilient against version changes. Only time will tell if this innovationis successfully implemented.
Another good feature introduced in this new framework is reduction ofWindows system registry dependency. Registration information and statedata are no longer stored in the system registry, but inside the metadata.This should make the server side component deployment much easier.
.NET framework's Common Language Runtime (CLR) claims to have the abilityto compile once and run on any CPU and operating system that supports theruntime. We will see if this becomes a real possibility in near future.
Common Intermediate Language (CIL)
The .NET framework's implementation of Common Intermediate Language(CIL) is called Microsoft Intermediate Language (MSIL). Unless specifiedotherwise, we will use the terms Intermediate Language (IL), MSIL and CILinterchangeably.
Managed code is produced by one of the three compilers which translatethe source code into Microsoft intermediate language (MSIL).
Common Intermediate Language (CIL) and therefore it's Microsoft renderingcalled Microsoft intermediate language (MSIL) is said to be a CPU independentset of instructions that can be efficiently converted to native code.
MSIL intermediate instruction set has instructions for loading,storing, initializing, object method calling , many conventionalinstructions for arithmetic and logical operations, control flow, directmemory access, and exception handling. All the three languages includedin this framework have Java like \"try catch\" exception handling facility.
Just like Java, before the managed code is executed, the intermediatelanguage is converted to CPU specific code by a just in time (JIT)compiler. The runtime supplies one or more JIT compilers for each computerarchitecture it supports. However, the code can be compiled into nativeform during installation itself.
When a Common Language Specification (CLS) compliant compiler producesCommon Intermediate Language (CIL), it also produces metadata, describingthe Common Language Types (CLT) specific types used in the code, includingthe definition of each type, the signatures of each type's members, themembers that the code references, and other data that the runtime usesat execution time.
The MSIL and metadata are contained in a portable executable (PE) filewhich is an extension of the Microsoft Portable Executable (PE) and Unixworld's Common Object File Format (COFF) used for executable content.They appear to the user as the familiar .EXE and .DLL files.
One of the fundamental differences between Java Virtual Machine (JVM)instruction sets and Common Intermediate Language (CIL) is that JVM isbig endian ( most significant byte first) and CIL uses little endian (least significant byte first) binary representation. This difference willnot be apparent to most of the programmers. Only system level programmerswould have to deal with it.
The file format, can accommodate either of Common Intermediate Languageor native code as well as metadata, a signature pattern enables the operatingsystem to recognize Common Language Runtime images.
The presence of metadata in the executable file enables the componentsto be self descriptive. This eliminates the need for additional type librariesor Interface Definition Language (IDL) used in DCOM and CORBA. The runtimelocates and extracts the metadata from the file as necessary during execution.
Managed Execution
There are two kinds of codes that can exist inside the executable filesnow, the old machine dependent codes, like existing ActiveX controls, arecalled unmanaged
As mentioned earlier, there are currently three compiled languages C#,C++ and VB provided by Microsoft, which target the Common Language Runtime(CLR). This runtime is a multi-language execution environment, and supportsa common base of data types and language features. however, the languagecompiler determines what subset of the runtime's functionality is available,and the design pattern of the code is influenced by the features exposedby the compiler.
The coding syntax is determined by the compiler, not bythe runtime. If the component is required to be completely usable by componentswritten in other languages, it must use only language features that areincluded in the Common Language Specification (CLS) in the component'sexported types.
Application Domains
Application domains are light weight process. It can be visualized as anextension of Java's sandbox security and Thread model.
The Common Language Runtime provides a secure, lightweight unitof processing called an application domain. Application domains also enforcesecurity policy.
By light weight it means that multiple application domains runin
a single Win32 process, yet they provide a kind of fault isolation,that is fault in one application domain does not corrupt other applicationdomains. This aids in enhancing execution security against viruses as wellas helps in debugging faulty codes.
The Common Language Runtime relies on type safety and verifiabilityfeatures of Common Type System (CTS) to provide fault isolation betweenapplication domains. Since type verification can be conducted staticallybefore execution, it is cost efficient and needs less security supportfrom microprocessor hardware.
Each application can have multiple application domains associated withit. And each application domain has a configuration file, containing securitypermissions. This configuration information is used by the Common LanguageRuntime to provide sandbox security similar to that of Java sandbox model.
Although multiple application domains can run within a process, no direct calls are allowed between methods of objects in different applicationdomains. Instead, a proxy mechanism is used for code space isolation.
Assemblies
An assembly is the functional unit of sharing and reuse in the Common LanguageRuntime. It is the equivalent of JAR (Java Archive) files of Java.
Assembly is a collection of physical files package in a .CAB formator newly introduced .MSI file format. The assemblies contained in a .CABor .MSI files are called static assemblies, they include .NET Frameworktypes (interfaces and classes) as well as resources for the assembly (bitmaps,JPEG files, resource files, etc.). They also include metadata that eliminatesthe need of IDL file descriptors, which were required for describing COM components.
The Common Language Runtime also provide API's that script engines useto create dynamic assemblies when executing scripts. These assemblies arerun directly and are never saved to disk.
Microsoft has greatly diminished the role of Windows Registrysystem with introduction of assemblies concept, which is an adaptationof Java's JAR deployment technology.
Assemblies is an adaptation, but not a copy of Java's JAR technology.It has been improved upon in some ways, for example it has introduced aversioning system. However, since the .NET framework is skewed towardsthe Windows architecture some of the Java's JAR portability features mayhave been sacrificed.
Again, similar to JAR files, the assemblies too contain an entity calledmanifest.However, manifest in .NET framework plays somewhat wider role. Manifestis a metadata describing the inter-relationship between the entities containedin the assemblies like managed code, images and multimedia resources. Manifestalso specifies versioning information.
The manifest is basically a deployment descriptor, having XML syntax.Java programmers can relate it with J2EE (Java 2 Enterprise Edition) deploymentdescriptors for EjB (Enterprise Java Beans) applications.
The Microsoft documentation stress that assemblies are \"logical dlls\".This may be a reasonable paradigm for VB or C++ programmers, but Java programmers will find it easier, if we visualize assemblies as anextension of JAR concept. However, unlike JAR, each assembly can have onlyone entry point defined, which can be either DllMain, WinMain, or Main.
As stated earlier, Assemblies have a manifest metadata. This containsversion and digitally signed information. This purports to implement versioncontrol and authentication of the software developer. Version and authenticationprocedure is carried out by the runtime during loading the assembly intothe code execution area.
Again, much like Java's trusted lib. concept, .NET Assemblies can beplaced in secured area called global assembly cache. This area is equivalent to trusted class path of Java. Only system administratorscan install or deinstall Assemblies from the global assembly cache. Thereis a place for downloaded or transient Assemblies called downloadedassembly cache. The Assemblies loaded from global assembly cache runoutside the sandbox and have faster load time as well as enjoy more freedomto access file system resources. The Assemblies loaded from the downloadedcache area are subject to more security checks, therefore are slower toload and since they run inside the sandbox; enjoy much less privileges.
Assemblies manifests also contain information regarding sharing of codeby different Applications and Application Domains.
To summarize, the Operating System can have multiple applications runningsimultaneously, each such application occupies a separate Win32 processand can contain multiple Application Domains. An Application Domain canbe constructed from multiple assemblies.
Execution
The Common Language Runtime provides the infrastructure that enables executionto take place as well as a variety of services that can be used duringexecution. Before a method can be executed, it must be compiled to processorspecific code. Each method for which MSIL has been generated is JIT compiledwhen it is called for the first time, then executed. The next time themethod is executed, the existing JIT compiled native code is executed.The process of JIT compiling and then executing the code is repeated untilexecution is complete.
As mentioned earlier, the recompilation can be avoided by compilingthe code during installation into native executable code.
During execution, managed code receives services such as automatic memorymanagement, security, interoperability with unmanaged code, cross languagedebugging support, and enhanced deployment and versioning support.
JIT Compilation
Before Intermediate Language (IL) can be executed, it must be convertedby a .NET Framework Just In Time (JIT) compiler to native code, which isCPU specific code that runs on the same computer architecture that theJIT compiler is running on.
Microsoft's designers insist that the runtime never interprets any language,it always executes native code, only conversion to native form may be deferred.Even the scripting languages like VBScript are now compiled and executed!
The idea behind JIT compilation recognizes the fact that some code maynever get called during execution; therefore, rather than using time andmemory to convert all of the MSIL in a PE (portable executable) file tonative code, it converts the Intermediate Language as it is neededduring execution and store the resulting native code so that it is accessiblefor subsequent calls.
The loader creates and attaches a stub to each of the type's methodswhen the type is loaded; on the initial call to the method, the stub passescontrol to the JIT compiler, which converts the MSIL for that method intonative code and modifies the stub to direct execution to the location ofthe native code. Subsequent calls of the JIT compiled method proceed directlyto the native code that was previously generated, reducing the time ittakes to JIT compile and execute the code.
The compilation process (JIT or during installation time) converts theIntermediate Language (IL) to native code. The code however, must passa verification process. Verification examines the Intermediate Language(IL) an
d metadata to see whether the code is type safe, that is, it accessesonly the authorized memory locations, Identities are what they claim tobe and reference to a type is compatible with the type referenced. Thesefeatures protects the application from bugs and viruses.
During the verification process, Intermediate Language (IL) code isexamined in an attempt to confirm that the code can access memory locationsand call methods only through properly defined types.
Due to design limitation of some programming languages, like 'C', it's compilers may not be able to produce verifiable type safe codes, suchcodes can only be executed from trusted area.
Runtime Hosts
The runtime is typically started and managed by environments like ASP.NET,IE or the Windows Shell. These hosting environments run managed code onbehalf of the user and take advantage of the application isolation featuresprovided by application domains. In fact it is the host that determineswhere the application domain boundaries lie and in what application domainuser code is run in. The Common Language Runtime provides a set of classesand interfaces used by hosts to create and manage Application Domains.
There are five Common Language Runtime hosts:
ASP.NET – ASP.NET creates application domains to run user code. Applicationdomains are created per application as defined by the web server.
Internet Explorer – IE creates an application domain per site.
Windows Shell EXE – Each application that is launched from the commandline runs in a separate application domain.
VBA – VBA runs the script code contained in an Office document in anapplication domain.
Windows Forms Designer – The Windows Forms Designer places each formthe user is building in a separate application domain. When the user editsthe form and rebuilds, Windows Forms shuts down the old application domain,recompiles the code and runs it in a new application domain.
Conclusion
.NET is definitely an improvement over Java framework, but it is NOTgoing to displace Java any time soon. Though in coming years Java and .NETwill converge.
It currently lacks support for other platforms. Since .NET has beenarchitected by Microsoft, it is less likely to find the open sourcesupport base of free thinking programmers, which was one of the main reasonsof Java's popularity.
Java has been there for more than five years now, and Java programmershave already survived two waves of downturn. First in 1998 when most websites weeded out Applets and second in late 2000, when all the VC fueledDOTCOM hot balloons came down. Scott Adams'Dilbert strips at http://www.dilbert.com has a good fill of VC andDOTCOM cartoons. All remaining employed Java programmers must have a good handle of .NETarchitecture to continue to remain employable.
The party is over for DOTCOM, so let's party with DOTNET !!!