Wednesday, June 29, 2005

Migrating a J2EE application: Tomcat to Websphere

A piece of cake, you say! But think again. Much of the hype and sales pitch behind J2EE applications has been centered around the fact that they are portable across platforms. Here is how the theory goes. If you build a application to run on a specification and if that specification is used as a standard then any platform that supports the specification should be able to run an application built to that specification. This sounds logical and simple enough. Yet it is difficult to port J2EE applications across all of the major vendors or application servers without an exception. Yes, without an exception!

There are a number of reasons for this. They include:
1 - vendors providing features and functionality over and above what a specification provides for
2 - specifications not being comprehensive in their scope. This results in only parts of an application being portable.
3 - platforms conforming to a specification sometimes run on older versions of virtual machines, operating systems or hardware and negating true portability
4 - developers writing code that is inherently non-portable. For example making use of system properties increases the chance that applications will be non-portable.

The bottom line is that firms investing in J2EE technologies do not receive all the potential value they might receive from using a technology. It therefore imperative for a any development effort to account for potential portability issues and try to mitigate them.

I use a case from a recent client to illustrate my point. The source application was built and deployed on a Tomcat application server. It has a three tiered architecture, with a web front end and a database. The client had acquired the application as part of an acquisition and sought to migrate it to the corporate standard, which was IBM's Websphere Application server.

A number of issues were faced during this process. These included:
1 - Exceptions during deployment
2 - JVM mismatch errors
3 - Classpath exceptions during runtime
4 - Library mismatches
5 - Suboptimal performance

While these problems seem straightforward to resolve to an experienced developer, they are not necessarily so. For example, the latest versions of the Websphere Application Server, come packaged with a verification and deployment tool. This tool seeks to prevent unstable applications from being loaded onto the application server. It attempts to verify that the deployment descriptors and the application descriptors conform to the relevant J2EE specification. If it does not however, the tool just gives up and displays an exception that is very user-unfriendly to say the least. To resolve these problems I had to import the descriptors into a XML utility and fix the descriptor files to make them comply completely with the schema.

The second problem was caused by a JVM mismatch. A JVM mismatch??!!! No that can't be. I agree, it should not be. The source application had been compiled on a later version of the java compiler and was being deployed to a target application server that ran on an older version of the java runtime. This caused the runtime to handle the version mismatch most ungracefully. On the bright side though, we have learned something, a good rule of thumb, compile on earlier and deploy on later versions.

The third issue is well known. Libraries that are packaged within a war file may be picked up by one application server but totally ignored by another. The root of this problem stems from how application servers load their classes. Different application servers load their resources differently and at times via different classloaders. This causes numerous headaches for an inexperienced J2EE programmer. Fortunately, the Websphere application server allows the administrator to specify additional classpaths that allows such problems to be ironed out.

The third error is the most difficult to resolve. This is especially so when the target application server has multiple applications running on it that are dependent on the older versions of the libraries. It now no longer is a process of simply updating the libraries. One needs to perform a dependency analysis to ensure that no other applications will be broken by changing one or more of the libraries.

Finally, migrating an application from one application server to another can deteriorate performance. There are a number of factors that can cause this. The source application may be tuned to one application server and that optimization may be counterproductive on the target. For example the manner in which caching, thread synchronization and pooling is done on different application servers does vary and consequently the performance on them varies too. Furthermore some application servers come with JVMs that are tuned to provide a high performance with them. A sub-optimal configuration can degrade performance significantly.

Migration of a J2EE application from one application server to another is therefore a very complex task. It can however, be achieved with adequate planning and the use of skilled resources.

(c) 2005 Vivek Pinto For more details please visit us at Wonomi Technologies

Migrating data: In search of a lost context

How would you represent the knowledge and expertise that you possess? The answer to this question will vary with each individual. For example, you could write it down in a fashion similar to an encyclopedia with the terms and their meaning to you. Alternatively, you could creates an exhaustive How-to-Guide that listsa number of activities with detailed instructions on how someone would go about achieveing an end goal.

Knowledge representation has always been a difficult task. For years, researchers in the field of artificial intelligence have struggled to create expert systems that would contain rules. These rules will provide the system a detailed set of actions to undertake if provided a given stimulus. The stimulus could be a simple event or a complex situation. In a simple event, for example, the push of a keyboard, the system would have to process just that one stimulus. WIth a complex situation, such as the entry of a new customer's data the system will have to process multiple pieces of information as well as refer to its historical records to evaluate, for example, the risk associated with accepting the new individual as a customer.

Over the last twenty years, database systems have ammassed a large amount of data about businesses and their processes. Typically a database, at its core consists of a data model that seeks to represent all the relevant or meaningful information about a business, its processes, stakeholders, customers and partners. This representation seekd to identify the key attributes and entities and then map the relationships between them. This is a very complex task and as much an art as a science. Take the simple case of a picture. If the picture was drawn on a single sheet of white paper and consisted of two perpendicular lines, it would be very easy to represent or describe this picture. To describe the picture as accurately as possible, for example, one could state as follows:-
  1. The picture consists of two black lines intersecting each other at right angles on a white piece of paper.
  2. The picture is drawn on a A4 size paper. It consists of two black lines of length 15 cm each running parallel to the sides of the sheet. The intersection point is in the middle of the sheet
  3. The picture is drawn on a A4 size sheet. It consists of two black lines of length 15 cm each running parallel to the sides of the sheet. The lines intersect each other at a point one-third of the distance from one of their edges. The intersection point is in the middle of the sheet

As the astute reader has probably noticed, each of these descriptions is valid. However, as one goes down the list it is obvious that the amount of information present in each description increases. A good data modeler needs to decide which of these descriptions will be adequate for the data model. Obviously the more information one has in the description the better it will be. However, the more detailed the representation the more space it occupies. Furthermore, it takes more effort and time to create a description (in the case of the picture) or data model (in the case ofa database).

Once the decision is made on how to represent the picture in the database, everything that is not represented as data in the data model becomes the context. Some of the contextual information that was not captured in description 3 above for example was the texture of the paper, the artist that drew the picture, the time at which the picture was drawn, its age and so forth. Each of these pieces of information could become important at somepoint in the future. For example, if the picture is put up for auction, the identity of the artist that created it would become very important.

A similar situation is found is database technologies. Quite often, the data that has been stored in a database needs to be utilized at a later point in time for a number of reasons. One such reason is the migration of data from one database to another. Typically the system from which data is being obtained is called the source and the system to which data is being moved is called the target. Moving the data from a source to a target that has the same datamodel would be a trivial task if the no changes were needed. However, more often than not, the target database has a different data model, better data integrity requirements, higher data quality needs and so forth. If the target system has a higher data quality requirement, the data from the source system will have to be cleansed before it can be moved into the target system.

Data quality refers to the the actual data stored within the datamodel rather than the datamodel. For example, if the data model stores the name of the author, then the data could be entered into the system as "vivek pinto" or "vivekpinto" or " " or even "vkpinto". Clearly the fourth entry is empty while the third entry is mis-spelled and thus of inferior quality. However, the second entry could be of poor quality too. In the absence of contextual information such as the first name and last name of the author iut would be very difficult to know that the correct entry should be "vivek pinto" if all one has is entry three or four. To complicate matters further, it would be difficult to know which of the two words in the name was the first name and the last name.

The number of issues that are similar to the situation mentioned above are too numerous to count. However, they are real problems that come up during data migration. A few solution to the problem have been devised but they are at best limited. However, more on that later.



(c) 2005 Wonomi Technologies All rights reserved