Skip to end of metadata
Go to start of metadata

Re-factoring of Castor SQL Engine

Google Summer of Code Work - Ahmad Hassan

This page is dedicated for project work on 'Re-factoring the SQL Engine of Castor' by me (Ahmad Hassan) as a Google Summer of Code student under the supervision of Ralf Joachim.

I will try to update this page with the work I will done on the project.

XIRCLES - The Codehaus

Xircles is the management platform for interacting with the haus. The brief introduction about The Codehaus is:

'The Codehaus' is a collaborative environment for building opensource projects with a strong emphasis on modern languages, focussed on quality components that meet real world needs.The Codehaus was formally registered on 26 February, 2003. Bob had been active in open-source for several years, with projects such as Jaxenand Drools_. For a while,_ The Werken Company hosted his projects along with the projects of a handful of other developers. The Codehaus "brand" was created to allow for a neutral environment for non-Werken people to host their projects. Mostly, this was due to several folks thinking that Xulux was a project of The Werken Company since it was hosted on our server. The Codehaus helped to correct that misconception. Kurt Schrader publically announced the existance of the Codehaus a wee bit before Bob was ready for it. Oh well.

Reference: http://xircles.codehaus.org/pages/History

Starting with The Codehaus

To initiate project work with The Codehaus, I was asked to sign up on the Xircles management platform. Before sign up for xircles, I was requried to subscribe for

JIRA (JIRA is a solution used to track bugs, manage project and distribute help desk tickets etc. The codehaus uses JIRA for project management). Signed up at http://jira.codehaus.org/secure/Dashboard.jspa

Confluence Dashboard (Confluence is an enterprise wiki design to share project information among team member and the world). Signed up at http://docs.codehaus.org/signup.action

Then, I signed up to the Xircles at http://xircles.codehaus.org/signup

For subscription, we need to specify same email address in JIRA, CONFLUENCE and XIRCLES.

Configuring Eclipse with Maven Integration, CheckStyle and Subclipse Plug-ins

I will share with you the steps which I did to configure eclipse with Maven and Checkstyle plug-in.

1) First I installed the Eclipse IDE for JAVA  EE Developers. http://www.eclipse.org/downloads/

I installed eclipse-jee-ganymede-SR2-win32

I then added the additional JVM parameters in order to increase the memory allowed for Eclipse. I made a bat file with the following configuration

D:\eclipse-jee-ganymede-SR2-win32\eclipse\eclipse.exe -jvmargs -Xms128m -Xmx512m -XX:MaxPermSize=128m

2) Then I configured the checkstyle plug-in manually with the eclipse by the following procedure. 

Unzipped the checkstyle plug-in inside the plug-in directory of eclipse plug-in directory and then restart the eclipse. I copied checkstyle plugin directory contents in eclipse plugin directory and also copied checkstyle feature directory contents in the eclipse feature directory. It will pick this plug-in automatically. Reference http://agile.csc.ncsu.edu/SEMaterials/tutorials/install_plugin/

3) Then I moved on to install Maven 2.1.0.

"Maven is a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project's build, reporting and documentation from a central piece of information"

I dowloaded Maven 2.1.0 from http://maven.apache.org/download.html and then configured it on my Windows Vista through the following steps.

  • Unzip the distribution archive, i.e. apache-maven-2.0.10-bin.zip to the directory you wish to install Maven 2.0.10.
  • Add the M2_HOME environment variable by opening up the system properties, selecting the "Advanced" tab, and the "Environment Variables" button, then adding the M2_HOME variable in the user variables with the value install_directory\apache-maven-2.0.10. 
  • In the same dialog, add the M2 environment variable in the user variables with the value %M2_HOME%\bin.
  • In the same dialog, create the Path environment variable in the user variables and prepend the value %M2% to add Maven available in the command line.
  • In the same dialog, make sure that JAVA_HOME exists in your user variables or in the system variables and it is set to the location of your JDK, e.g.C:\Program Files\Java\jdk1.5.0_02 and that %JAVA_HOME%\bin is in your Path environment variable.
  • Open new command terminal run mvn --version to verify that it is correctly installed.

4) Then I installed and configured the Eclipse-Maven Plugin so that Eclipse IDE can directly use the Maven functionality. 

I installed the Maven Integration for Eclipse by using the following update site from within Eclipse:

 http://m2eclipse.sonatype.org/update/

I went to  Help > Software Updates > Available Softwares. There I added the upper site link and then installed the plug-in 

5) Then I installed the subclipse plug-in for eclipse. Subclipse is an Eclipse Team Provider plug-in providing support for Subversion within the Eclipse IDE. I installed the version 1.6.x

The instruction I followed can be seen at http://subclipse.tigris.org/install.html

 I also needed to configure the eclipse to use specific virtual machine. For that I created a bat script and passed the follwing parameters in order to specify virtual machine and memory requirement for eclipse.

eclipse.exe -clean -jvmargs -Xms128m -Xmx512m -XX:MaxPermSize=128m -vm "C:\Program Files\Java\jdk1.6.0_13\bin
 

Download and Configure CASTOR in Eclipse environment


I installed the following pre-requisites first

  1. JDK 1.6
  2. Eclipse IDE
  3. Latest Subclipse eclipse plugin
  4. Latest CheckStyle eclipse plugin
  5. Maven
  6. Maven Integration plugin for Eclipse

Then I downloaded and configured the CASTOR Project for SVN Repository

Create Project

 

-

Create a New Project in eclipse from File -> New -> Projects

-

Select "Checkout Projects from SVN" in "SVN" from "Select a wizard" window and click Next (this option will only come if you have installed the subclipse plugin)

-

Select "Create a new repository location" and click Next

-

Enter the URL "https://svn.codehaus.org/castor/castor" and click Next

-

Select the folder "trunk" from the list and click Next

-

In "Check Out As" window the name of the project will be "castor" then click Next

-

At last, you can choose the workspace and click Finish

-

You can see castor project in your "Project Navigator" of eclipse

I followed the reference manual at http://www.castor.org/how-to-setup-castor-project-in-eclipse.html

Castor Specification

Castor consists of two parts.

  1.  XML data binding farmework almost compliant with JAXB
  2. Database persistence 

In addition to this, it has some support modules as well. Most of these modules do also have a set of plain junit test that get executed during 'mvn install'

For database persistence we have 4 sets of tests

  1. The plain junit tests that test contracts of single classes. atm they only cover a small subset of classes
  2. The old tests framework that executes blackbox tests against the database of your choice
  3. A new test framework that will replace the old one after migration of all tests
  4. The test suite is for testing performance

Both test farmeworks (2. and 3.) are refered as cpactf. When changing anything at castor this 3 test suites have to be executed without failure atleast on one database

Patch Submission

 If we prepare a patch for me that is ready to commit then we have to execute  'mvn clean' and 'mvn install' on project root before. In addition we have to execute old cpactf within eclipse to verify patch does not break anything

 Module Specific details

cpa - persistance part
ddlgen - Mapping file for cpa part. File that defines which property is persisted in which table and column. It can generate db scripts and mapping information
xmlctf - framework code for xml test framework. it doesn't include test but some helper classes to run tests
cpactf - test framework for persistance part. directory use castor interfaces
cpaptf. performance test framework
core - some classes that is used by rest of castor components or modules
diff - classes used by xmlctf framework
src - obsolete
jdo extension it - Integration and test framework for persistance part. It uses spring interface to test the spring integration of castor against database.

Castor is going towards JPA complaint

persistance mismatch -  It is a conversion problem. Generally in the database, relationship works like N:1 but if 1:N and N has no knowledge of 1
then it is called persistance mismatch. castor solve this problem.

The execution flow control of object relation mapping done by castor follows the following pattern

entity > get classs > look mapping entry > get properties from class > run query

sqlstatementload is select statement that runs if create fails. one reason would be duplicate identity

Setting up Database for Castor

Suppose the installation directory of Castor is

D:\ECLIPSE_PROJECTS\workspace. I will explain the procedure to setup MYSQL database for Castor.

Go to the command prompt of MYSQL and login with root account.

After that just run the following commands

 create database test;

grant all on test.* to 'test'@'localhost' identified by 'test';

use test;

source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\old\ddl\mysql.sql

 create database cpactf;

grant all on cpactf.* to 'test'@'localhost' identified by 'test';

use cpactf;

D:\ECLIPSE_PROJECTS\workspace\castor /cpactf/src/test/ddl/org/castor/cpa/test/test1002/mysql.sql

source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\test\ddl\org\castor\cpa\test\test1002\mysql.sql

source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\test\ddl\org\castor\cpa\test\test30\mysql.sql

source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\test\ddl\org\castor\cpa\test\test31\mysql.sql

source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\test\ddl\org\castor\cpa\test\test87\mysql.sql

source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\test\ddl\org\castor\cpa\test\test241\mysql.sql

source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\test\ddl\org\castor\cpa\test\test303\mysql.sql

source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\test\ddl\org\castor\cpa\test\test356\mysql.sql

source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\test\ddl\org\castor\cpa\test\test881\mysql.sql

source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\test\ddl\org\castor\cpa\test\test954\mysql.sql

source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\test\ddl\org\castor\cpa\test\test972\mysql.sql

source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\test\ddl\org\castor\cpa\test\test1073\mysql.sql

source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\test\ddl\org\castor\cpa\test\test1158\mysql.sql

source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\test\ddl\org\castor\cpa\test\test1196\mysql.sql

source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\test\ddl\org\castor\cpa\test\test1206\mysql.sql

source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\test\ddl\org\castor\cpa\test\test1217\mysql.sql

source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\test\ddl\org\castor\cpa\test\test1355\mysql.sql

source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\test\ddl\org\castor\cpa\test\test1379\mysql.sql

source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\test\ddl\org\castor\cpa\test\test2177\mysql.sql


source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\test\ddl\org\castor\cpa\test\test2527\mysql.sql

source D:\ECLIPSE_PROJECTS\workspace\castor\cpactf\src\test\ddl\org\castor\cpa\test\test2567\mysql.sql

You can see 55 tables in cpactf and 134 tables in test database. 

In my case localhost does not work. I have to specify 127.0.0.1. So you have to change cpactf/src/test/resources/cpactf-conf.xml and update all the sections with the ip. 
 

Now you are done with the database setup. 

Run test cases for the first time


open cpactf/src/old/resources/jdo/mysql.xml

Replace localhost with ip

open run configuration

add mysql driver to classpath

add castor.mysql.TC70 as program argument

In classpath tab select castor add external jar and provide mysql connector 5 located in m2 repository
goto cpactf/src/main/java default package 

execute MainApp class as java application

Run it with program argunment  to be: castor.mysql
with castor.mysql you execute all tests with castor.mysql.TC70 only TC70

These tests will confirm that castor is setup successfully

Refactoring of Remove Module of SQL Engine

First task we defined in this project is the refactoring of Remove module of SQL Engine. So from now onwards I will look into the code base of Remove Module and will understand the way in which we can refractor it.

Prepared Statement

The prepared statement are used in SQL to enhance to efficiency and to reduce the complexity. Through this approach we can use one SQL Statement many times with different parameter values and the advantage we will gain is that the query will be compiles by DBMS only once.

Code Fragment 1:String updateString = "UPDATE COFFEES SET SALES = 75 " +
"WHERE COF_NAME LIKE 'Colombian'";
stmt.executeUpdate(updateString);

Code Fragment 2:PreparedStatement updateSales = con.prepareStatement(
"UPDATE COFFEES SET SALES = ? WHERE COF_NAME LIKE ? ");
updateSales.setInt(1, 75);
updateSales.setString(2, "Colombian");

updateSales.executeUpdate():

 Reference to the above is: http://life.csu.edu.au/java-tut/jdbc/basics/prepared.html

Nice article about prepare statements and parameter binding http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db29.doc.odbc/db2z_bndparm.htm\\

JPA - Java Persistance API

"A major enhancement in EJB technology is the addition of the new Java Persistence API, which simplifies the entity persistence model and adds capabilities that were not in EJB 2.1 technology. The Java Persistence API deals with the way relational data is mapped to Java objects ("persistent entities"), the way that these objects are stored in a relational database so that they can be accessed at a later time, and the continued existence of an entity's state even after the application that uses it ends. In addition to simplifying the entity persistence model, the Java Persistence API standardizes object-relational mapping.

In short, EJB 3.0 is much easier to learn and use than was EJB 2.1, the technology's previous version, and should result in faster development of applications. With the inclusion of the Java Persistence API, EJB 3.0 technology also offers developers an entity programming model that is both easier to use and yet richer"

Reference  http://java.sun.com/developer/technicalArticles/J2EE/jpa/

JDO - Java Data Objects API

"The Java Data Objects (JDO) API is a standard interface-based Java model abstraction of persistence, developed under the auspices of the Java Community Process_. The original JDO 1.0 is Java Specification Request 12 (JSR 12), and the current JDO 2.0 is Java Specification Request 243 (JSR 243). Beginning with JDO 2.0, the development of the API and the Technology Compatibility Kit (TCK) takes place within the_ Apache JDO open-source project.

If you are an application programmer, you can use JDO technology to directly store your Java domain model instances into the persistent store (database). Alternatives to JDO include direct file I/O, serialization, JDBC_,_ Enterprise JavaBeans (EJB), Bean-Managed Persistence (BMP) or Container-Managed Persistence (CMP) entity beans, and the Java Persistence API_."_

 Reference: http://java.sun.com/jdo/index.jsp

 Named SQL Parameter Binding

I did some research in finding a way to introduce named parameter binding in castor. Currently the castor is using prepare statement and positional arguments to this prepared statement afterwards. After going  through various documentations and articles I found that named parameter support has not been supported for all the databases. All JDBC does not support this. JDBC 3.0 supports oracle named parameter binding and spring framework also supports this but MYSQL still supports positional parameter binding. Some useful reference links are:

http://www.javaworld.com/community/node/1181
http://www.oracle.com/technology/sample_code/tech/java/codesnippet/jdbc/namedparam/UsingOracleNamedParameterAPIs.html
http://static.springframework.org/spring/docs/2.5.x/reference/jdbc.html

Java Naming Convention

In my implementation I would be following standard java naming conventions described in the following webpage

http://java.sun.com/docs/codeconv/html/CodeConventions.doc8.html

Start of the program gift - Google Summer of Code 2009

I am extremely happy that as a GSoC student, Google has given me an Electronic Student Membership in the Association of Computing Machinery (http://www.acm.org/). The membership will be active June 1, 2009 - May 31, 2010.

The membership provides, such as Online Books and Courses, Professional Networking, Professional Development, e-Publications (your membership does not include print publications, due to shipping considerations), the ACM Digital Library & Guide to Computing Literature, and Electronic Services.

I am confident that it will be useful resource for me in Future. 

I just registered with *ACM, the world's largest educational and scientific computing society http://www.acm.org/*

I can contacted at:  ahmad.acm@acm.org

Create Patch in Eclipse

As a part of this project, I learn how to create patches in eclipse. These patches are necessary if you are working in a big project and in a team project. This is the way of submitting your work.

In the synchronise view,

Select the project(s) you are changing in the Eclipse Package Explorer view, right mouse, and

select Team > Create Patch... and follow the instructions.

ClassMolder Dependency Problem

I have just downloaded the updated ClassMolder class. There has been new dependency added in this class and because of that I am getting import error

import org.exolab.castor.mapping.xml.NamedNativeQuery

Solution  (from Ralf)

some classes of Castor are generated during build. If one of the source
files of generation (e.g. mapping.xsd) changes it may happen that new
files are generated and you get compile failures until you execute build
once again. org.exolab.castor.mapping.xml.NamedNativeQuery is one of those. After execution of 'mvn clean' and 'mvn install' on project root
followed by a refresh of your eclipse project this errors should disapear.

Solution suggested by Werner is also a good one:

usually a 'mvn clean compile' should do the trick. Here's why:

1) 'mvn clean' cleans all Maven artefacts from the target directory, incl. the cache files that the Maven plugin for the Castor XML code generator
maintains to minimize the amount of the code generation taking place during your builds. In other words, if your XML schemas have not changed, no generation will be performed.

2) 'mvn compile' will compile all available Java class files (taking from src/main/java) into target/classes. Before this build phase, Maven will
trigger code generation through the Maven plugin for the Castor XML code generator, as a result of which new Java sources are being produced (in
target/generated-sources/castor). As this directory is added to the source folders for compilation on the fly, those (generated) files will be compiled as well.

ThreadLocal PrepareStatements to be be used in SQLStatementRemove

A thread-local variable effectively provides a separate copy of its value for each thread that uses it. Each thread can see only the value associated with that thread, and is unaware that other threads may be using or modifying their own copies.public class ThreadLocal {
public Object get();
public void set(Object newValue);
public Object initialValue();
The get() accessor retrieves the current thread's value of the variable; the set() accessor modifies the current thread's value. The initialValue() method is an optional method that lets you set the initial value of the variable if it has not yet been used in this thread
}


Playing with Java final and static Keywords

"final": per-instance value, must be assigned once per instance.
"final" + "static": per-class value, must be assigned once per class.
"static": per-class value, can be assigned any number of times per class.
[neither modifier]: per-instance value, can be assigned any number of times per instance.

private final int j;
The value MUST be assigned ONCE, either here in the declaration or in each constructor. It is NOT automatically assigned a value of 0 (zero). Also, since it is not static, there is a distinct value for each instance object, so it makes sense that it must be assigned in the constructor.

private static int k;
The value may be assigned from anywhere in this class. Since it is not final, the value may be assigned any number of times, but since it is static, this value is stored at the class-level and universally available to all instances. If left unassigned, it is assumed to be 0 (zero).

private static final int i;
The value MUST be assigned ONCE, either here in the declaration or in a "static" class-level block. It is NOT automatically assigned a value of 0 (zero).

Sybase Database and compatiblity issues

From the last couple of days, I am working on finding the compatiblity issues between the traditional sybase database and other modern databases. Before discussing the compatibility issues, I want to share some introduction about sybase.

Sybase became the number two database system behind Oracle, after making a deal with Microsoft to share the source code for Microsoft to remarket on the OS/2 platform as "SQL Server". At the time, Sybase called the database server "Sybase SQL Server". Until version 4.9, Sybase and Microsoft SQL server were virtually identical. Due to disagreements between the two companies over revenue sharing (or lack thereof), Sybase and Microsoft decided to split the code-lines and went their own way, although the shared heritage is very evident in the Transact-SQL (T-SQL) procedural language as well as the basic process architecture. The big difference is that Sybase has a Unix heritage, while Microsoft was adapted and optimized only for the Microsoft Windows NT operating system. Sybase continues to offer versions for Windows, several varieties of Unix, and for Linux.

One of the compatiblity issue is the ANSI SQL Standard. By default sybase doesn't support ANSI SQL Standard and because of that it doesn't take 'is NULL' in the sql statement. It only takes '= null' as a parameter where null needs to be addressed. The JDBC type 4 driver for sybase addresses this issue by following approach

"When the Sybase driver establishes a connection, the driver sets the Sybase database option ansinull to on. Setting ansinull to on ensures that the driver is compliant with the ANSI SQL standard and is consistent with the behavior of other DataDirect Connect for JDBC drivers, which simplifies developing cross-database applications.

By default, Sybase does not evaluate NULL values in SQL equality (=) comparisons in an ANSI SQL-compliant manner. For example, the ANSI SQL specification defines that col1=null always evaluates to false. Using the default database setting (ansinull=off), if the value of col1 in the following statement is NULL, the comparison evaluates to true instead of false:

SELECT * FROM table WHERE col1 = NULL

Setting ansinull to on changes the default database behavior so that SQL statements must use IS NULL instead of =NULL. For example, using the Sybase driver, if the value of col1 in the following statement is NULL, the comparison evaluates to true:

SELECT * FROM table WHERE col1 IS NULL"

Refactoring of SQLStatementRemove

The defined tasks for the refactoring of SQLStatementRemove [http://jira.codehaus.org/browse/CASTOR-2702] were

  1. Shift identity check from SQLStatementRemove to SQLEngine. Follow up http://jira.codehaus.org/browse/CASTOR-2706
  2. Pass ID's array to constructor of SQLStatementRemove. Follow up http://jira.codehaus.org/browse/CASTOR-2704
  3. Shift deletion of records of extended objects from SQLStatementRemove to ClassMolder. Follow up http://jira.codehaus.org/browse/CASTOR-2707
  4. Prepare statement of SQLStatementRemove in a separate method. Follow up http://jira.codehaus.org/browse/CASTOR-2705
  5. Remove synchronization of executeStatement(). Follow up http://jira.codehaus.org/browse/CASTOR-2726 

Refactoring SQLStatementStore

The defined tasks for the refactoring of SQLStatementStore [http://jira.codehaus.org/browse/CASTOR-2713] were

  1. Shift the SQLFieldInfo array contents to constructor. Follow up http://jira.codehaus.org/browse/CASTOR-2714
  2. Shift the SQLColumnInfo array contents to constructor. http://jira.codehaus.org/browse/CASTOR-2715
  3. Separates the PrepareStatement functionality from executeStatement.http://jira.codehaus.org/browse/CASTOR-2716
  4. Shift Identity size check from SQLStatementStore to SQLEngine. http://jira.codehaus.org/browse/CASTOR-2717
  5. Refactor and shift the Storage of extended classes to the ClassMolder. http://jira.codehaus.org/browse/CASTOR-2718
  6. Remove synchronization of executeStatement(). http://jira.codehaus.org/browse/CASTOR-2727
  7. Separating failure check of of the update statement in to a separate new StoreStatementFailureCheck.http://jira.codehaus.org/browse/CASTOR-2733
  8. Separate SQLStatementStore into 2 implementations depending on binding of null values in conditions http://jira.codehaus.org/browse/CASTOR-2725

Literature Review related to JAVA PERSISTENCE API (JPA) and Enterprise Java Beans 3.0 (EJB 3.0)

Since last couple of weeks, I am studying and doing practical work in EJB's and Java persistence API's. I would like to share some useful knowledge I acquired.

JAVA PERSISTENCE API (JPA)

Main goal is

  1. No need to implement special interface or extend for classes
  2. Detaching entity objects from persistence storage
  3. ORM mapping
  4. Support annotations as an alternative to XML for description of metadata
  5. Ability to use entity object outside container

Data Access Object pattern - Object that encapsulates access to the data

Data Transfer Object - Object that encapsulates database data (response)

DAO Layer solution - Some of the DAO layer code is generated automatically using the metadata that developers provide in the descriptor file.

In simple DAO implementation strategy we implement every class explicitly but change of DBMS will require to change this whole code. The better solution is using DAO factories using abstract DAO factory. Each DBMS specific factory will implement the abstract dao interface. The 3rd strategy can be of using generic classes         that handle different DBMS of same type and the data source dependent information should be loaded from config file. Data Access Objects

JDBC does't support object oriented database. It supports RDBMS & SQL. In DAO we have object that encapsulates database access. If we use DAO patter then it return us Data Transfer Objects, DTO Pattern.  DAO Layer, includes the layer that contains objects that represents the database. Previously developer write DAO layer by themself but it was too complex to implement. Then DAO Framework arrises. Framework genrates 80% of the code and we only need to write small amount of code to complete it. If we change underlying RDBMS to Object oriented database then wee just need to modify DAO layer without changing the client code. For that we introduced DAO Factories. So for each DBMS we have separate DAO factory. In generic DAO classes, ALL DBMS dependent SQL should be put in to the configuration file.

Active record pattern - The pattern that combines DAO and DTO. An object that wraps row in database, encapsulates database access and adds domain logic on the data.

From Active record pattern to ORM - ORM is actually an active record pattern but maximum code generation support is added for transaction, caching, mapping(object associations or foreign key associations), mapping between object and table rows, implicit persistence, inheritance relations and lazy loading etc.

Active Record Object of Ruby on Rails- Inheritance and object association previously done through Single table inheritance pattern but later it was done through set of macros. It imposes naming conventions, COC and dry principles. Rubi On rails follow bottom-up approach.

Java Persistence Technologies

  1. JDBC by Microsoft
  2. Toplink Carleton Uni company and contained most of features of today's ORM
  3. EJB - Specification for enterprise components. Includes entity beans for persistence. Bean managed persistence or Container managed persistence.  The differences b/w EJB1 & EJB2 is the better support for object associations, advances in CMP and introduction of local interfaces that facilitates Session facade pattern for entity beans. In EJB2, Container managed Persistence doens't provide flexiblity to developer to choose the method of storing class hierarchy in database. CMP does it automatically and this is not good solution. In CMP, access code is generated automatically.

In Bean managerd Persistance we write manually the access code. EJB entity beans CMP uses ORM and does not based on POJO Model so these EJB must implement special interfaces and extend special classes. Moreover persistence object cannot execute outside the container which makes unit testing difficult. ORM is also not fully specified so implementor has to write code. The query language is EJBQL.

In CMP ejbLoad and ejbStore are empty but in BMP these method are implemented.

IN CMP developer cannot throw exceptions but in BMP developers are responsible for throwing exceptions.

In  CMP, bean interface classes will have abstract methods because they will be implemented by the container. It can happen that we do not have getters and setters for all the class fields because we don't want client to access them.

In interface classes there are CALLBACK/LIFECCYCLE methods that are empty body methods and in BMP we implement them separately..

In CMP it can happen that some methods doesn't appear on the interface but the container implements them additionally in the bean class. In deployment descriptor we sepecify the persistance type as 'Container' or 'Bean'. In CMP there are no attributes. Because classes are abstract. The actual implementation will be written by the container.

In CMP there will not be findXX methods as container implement them. Any method on TravelAgentEJB involves the transaction.

  1. Hibernate - Provides ORM and based on POJO Model. ORM is fully specified and persistence objects can be executed outside container. Object query language is HQL. Hibernate Annotations and Hibernate EntityManager provides JPA implementation over Hibernate core.
  2. iBatis - It is an open source product. Now known as Apache. It i s DAO layer which supports transaction and inheritance. All SQL is externalized to XML and is known as SQLMaps. POJO's can be mapped to input parameters of SQL or output result of query.
  3. Java Data Objects - It is specification for Persistance objects and is inspiration from OODBMS. It is for general Object data source mapping not just ORM. It uses POJO's and supports for transaction, inheritance, associations and caching is available. The query language is JDOQL. To explicitly make class persistence we invoke persistence manager.
  4. JPA - It is part of EJB3 specification but can be used independently. Inspiration in Toplink, hibernate, jdo. It provides ORM, uses POJO's. persistant object can be executed outside container. To explicitly make class persistence we invoke entity manager. The query language is JPQL AND the alternative to deployment descriptors is Java annotations.

OR impedance Mismatch (Map objects to tables)

a) Possible solutions are

  1. Use OODBMS
  2. MANUAL mapping through writing own SQL
  3. Use DAO layer
  4. Use ORM tool
  5. Mix and match - DAO Layer part of which uses ORM

b)Associations or Relationships

Associations can be

Object level 1-1, 1-N, N-1, N-M

Relational level 1-1, N-1 (Foreign key cannot point to collections)

Relational level cannot model unidirection 1-N associations and bidirectional N-1 associations.

Relational static model is incomplete. To generate domain model from relational model, information has to be added.

M-N assications can be handled by

One relataion for each class(R1, R2) plus one relation for join

One N-1 association from Join to R1

One M-1 association from Join to R2

c)Inheritance

  1. table per class - efficiency problem i.e querying one object will involve all the tables
  2. table per concrete (leaf) class - Integrity problems i.e. uniquness of id defined in abstract class is difficult to enforce in derived classes
  3. table per inheritance - Unlikely to normalized. Sparse tables (tables with null values)
  4. table per class Family - take subset of tree as a table.

d)Polymorphism

uses inheritance. It is mechanism that allows different types to expose different behaviour through same interface.

e) Identity

In relational model, two entities are identical if their contents are equal. In OO model, the entites are identitcal if object reference is identitcal. In relational model, tuples are indistinguishable so same entity quried twice will load data into different object instances. The solution is to import OO into relational model and use autogenerated key id's .

f)Composition

Composite delete means that all components should delete if you delete anything. Achieve by using database triggers or maintain coheracne at application level.

g)Containment

OO model has complex data structures like lists but relation world has only relations. So DBMS should model these by allowing theoretically dubious though not normalize in order to allow duplicate values in list.

h)Encapsulation

OO world has access modifiers and interfaces. Only relational equivalent is Views. The problems are

  1. Use of access modifiers     make it difficult to access values while persisting them
  2. Encapsulation is used to restrict access to data but database is shared among applications

i) Strategies for object navigation in relational world

  1. Lazy loading or on demand loading - load what is required
  2. Aggressive loading - load all at once

Approaches to ORM

  1. TOP-DOWN - Relational schema optimized for domain model. Involve the generation of database schema by ORM framework.
  2. BOTTOM-UP - Domain model optimized for relational schema. It involves the generation of application code for database schema. Domain model becomes very complex and relational schema is not easily evolved.
  3. Meet-in-the-Middle - Domain model is optimized for communication with non-developers but relational schema is optimized for reliability and efficiency. It is common approach because of long life of database schemas

Inversion of Control - situation where external system calls your code than your code calling the external system. It is known as Hollywood Principle. E.g. EJB callback methods.

Dependency Injection - To get a given service, components needs to know where the service is located and how to communicate with it. There are two approaches for this

Naive approach - Embed all the service intialization code in the client but the problem is that if service access implementaion changes then all code needed to change.

Dependency Injection Approach - Client declare dependency and then EJB container returns the service reference. Dependency injection is a particular form of Inversion of Control. It ensures that configuration of services is sperated from it's use. Dependencies configures externally in one place for the client. It is more reusable, more testable, and more readable.

Ways to do Dependency injection

  1. Constructor injection
  2. Setter injection
  3. Interface Injection

JPA entity - Grouping of state treated as a unit and can be persisted. The entities have associated metadata which enables persistence layer to manage entity. Annotated with @Entity.Attribute annotated with @Id  and a mutator method getXX annotated with @Id. If primary key is composite then @EmbeddedID or @IdClass instead of @Id. Elements can be overriden using element specification. Entities are managed by Entity manager. Each entity manager is associated with persistence context. Persistence context is a set of managed entity instances that exists in a particular data store. Entity manager can by container managed or application managed. Entity manager can create, remove entity instances, attach and detach entities and allow JPQL quries to run on entities.

In JPA, entity is different from bean. In entity we can add annotations @entity to say that this is entity and we can annotate attribute to say that it is primary key.

JPA Persistene Unit - A persistence unit represents the data contained in a single data store. It comprises the collection of entities that are managed together and corresponding configuration information i.e. data source and persistence provider. Persistence.xml contains the definition of one or more persistence units.

EJB 3.0 Specification

In EJB 3.0

  • No home interfaces
  • Business interface is POJI (Plain Old Java Interface) i.e doesnot extend    EjbObject
  • Bean class is POJO i.e. doesnot implement EnterpriseBean type
  • Simplified container callbacks - can be implemented either by callback method or callback listener classes. NO NEED TO IMPLEMENT EMPTY CALLBACKS.
  • Annotations & dependency injection
  • Interception (session & message-driven beans) - can implement call before and after bean methods. Can re-implement security and transaction classes to override container implementation.
  1. Disconnect objects. In ejb2 the java objects were connected to database. We couldn't cut it from the database and directly send it to the network. We had to convert it into some other object before sending. But JPA we can have disconnected objects which we can send across network or we can also connect it to database.

IN EJB3, @REMOTE before Hello Class declaration and implements it with Hello interface defines the relationship of bean with the associated interface. There are two ways. Either write @remote in the interface or write in the bean class. Benefit i s that we do not need to secify remote interface and we donot need home interface, deployment descriptor. @Remove annotation before checkout() means that if container calls remove method then it will first execute checkout method.

The home interfaces are not required in EJB3 because the beans are created as they look-up i.e getting reference through dependency injection where ever you needed. Bean class specifies dependencies not lookup and because of this it is possible to test EJB's outside the container.

Entties can be detached and send to remote server. Remote server can make changes to local copy and then sent back to orginal server where updated copy is merged then.

Some annotations

@Table(name = «mytable»)

@Id(generate = GeneratorType.AUTO)

@Column(name = ~first_column~)

Private void getID (long ID) {}

@Table (name= «a»)

@SecondaryTable (name = «b», pkJoinColumns =@PrimaryKeyJoinColumn(name=»c», referencedColumnName = «d»)

)

@column (name = «city», secondaryTable = «address»)

Some good Terminology

Lazy loading - University has many classrooms. If university information and classroom information is stored in the database. If we retrieve record for the university then whether we need to load the info for all classrooms at the same time. In aggresive loading, if network is far away and there are not many classroom then retrieive one by one (lazily) then it will slow down because every time they need separate connection. So we load them at once in aggressive loading.

CoC - Convention over configuration. We follow some conventions then we do not need configuration files/information. For example follwing the naming convention

DRY - Try not to say same thing twice. In EJB dry was there i.e create method in both interface and bean class.

Ruby-on-Rails - Good framwork for developing web applications. Rails means that we following conventions and because of this it has rapid developement. In EJB2, relationship between beans is specified in the descriptor files. In EJB3 the connecting and disconnecting objects is done through Persistence Manager.

JDO is more general then JPA, JDO supports both object oriented and relational databases. JPA is only for relational databases. JPA generates more code automatically because it accepts more metadata. But if we use JDO over relational database then we need to write more code by ourself.

Addressing M:N issue

In relational db's, e can only model N:1. Which means that one connection is mapped to many foreign keys. A foreign key points single key but not to collection. In order to address this issue, we introduce a new table which only defines the relation between two tables. This is a way to handle M:N relation. For the first table the relation with middle table is 1:N and for second table with middle table is 1:M.

In Table per class approach, if we need to change column in top hierarchy, then we have to make changes in all the tables under it.

Concrete classes means that the classes of which we make instances. i.e we have student class and subclass bsc and msc then bsc and msc are concrete classes and student is abstract class.

Table per concrete class problems: Lets say we have unique id defined abstract class. In that case it is difficult to force constraint in the concrete class. It is hard to make some thing unique in different tables. If we have one table then it would be simple. Sparse tables means that there are lot of field which would be null.

Table per family - compromise between table per class and table per hiearchy. It is kind of sub package.

Composition means that if we delete the university from the database then classroom should also be removed automatically. This is called composite relationship. Classrrom knows which which university are attached but university doens't know that which class room it has in the relational model. So it not easy to take care of composite relations.

Query by example - Different ways of querying to the database. If we want to search name Simon in db then we only do first name = search name and leave all other null. This is query by example.

Query by API - In that we have methods to find the specific record.

Some Concept specific to J22 EJB3.0 Application

War part is the web part. JSP stores in the webpages. Servlet stores in stored packages

Index.jsp will appear first because we specify in web.xml that welcome-file-list

@EJB dependency injection Say to servlet container that in order to achieve your goal we need an object of this type.

Only the methods specified in the interface will be accessed by the servlet.

Persistance unit gives real access to database. Persistance manager is a mean to access the real database. Through this, the system knows where to store this data. Entity manager is part of JPA. Entity manager encapsulates all the access to the database. EJB does it internally.

Persistance unit is just a configuration for the database.

Spring  framework- Sun developed JEE. In the first version it was hard to implement all this. So because of this, the guy expert on JEE thought that this is impossible in the start to implement all JEE features. He provides a kind of way to implement this kind of application without following complications of JEE. Which is known as Spring framework. It supports transaction, remote access. It provides same functionality of JEE but with much easier and less complex manner. Spring framework does all the things that are done by JEE but in a less complex way and is light weight. Spring framework doesn't use any concrete persistance engine like JPA

JNDI is the way of associating the name to an object. These names are stored in name server JNDI. JNDI is to look objects by name. It is problematic if in future we change the name then we need to change the whole code where ever it is used. But if we use dependency injection then we just declare the dependency. So is much better way for referencing objects.

The query language used in JPA is the JPQL. Java persistence query language.

We specify *.do extension in the url that we refer from jsp page. This is not a convention but it is the commonly used extension.

Controller knows how to invoke business logic and how to present the results through jsp page.

Control flow in EJB Application

JSP > SERVLET > SESSION BEAN (BUSINESS LOGIC) > SERVLET (RESULTS) > JSP (Presentation of results)

Hope that would be useful for the understanding of JPA and internal working for Castor.

CallableStatements for executing Stored Procedures using prepareCall

We only need prepareCall if the procedure has out parameters, since CallableStatement has methods to retrieve parameter values whereas PreparedStatement doesn't. If it doesn't have out parameters, you can use either of them.

Let's suppose we have following procedureCREATE PROCEDURE demoSp(IN inputParam VARCHAR(255), \
INOUT inOutParam INT)
BEGIN
DECLARE z INT;
SET z = inOutParam + 1;
SET inOutParam = z;

SELECT inputParam;

SELECT CONCAT('zyxw', inputParam);
END

====================================================

cStmt.setString(1, "abcdefg");
====================================================
Register output in the following two ways
cStmt.registerOutParameter(2, Types.INTEGER);

cStmt.registerOutParameter("inOutParam", Types.INTEGER);
 
=====================================================
Setting input parameters

cStmt.setString(1, "abcdefg");
 
Reference: http://dev.mysql.com/doc/refman/5.1/en/connector-j-usagenotes-basic.html

Run CASTOR test suit against HSQL Database

For running tests against HSQL, you need to go to 'RUN' menu and choose 'Run Configuration' option.

In run configuration change the 'VM Arguments' to -Ddatabase=hsql-mem

The test20 under cpactf test suit is written for testing against hsql. you have to explicitly run test20 by right click and run as JUNIT test.

Best Regards, Ahmad

Accomplished Tasks

The following main tasks have been accomplished in this project

Refacoring SQLStatementCreate

http://jira.codehaus.org/browse/CASTOR-2477

Refactoring SQLStatementStore

http://jira.codehaus.org/browse/CASTOR-2713

Refactoring SQLStatementRemove

http://jira.codehaus.org/browse/CASTOR-2702\\

There are plenty of sub task which have been addressed in the above tasks. You can see the details of each subtask through the task URL

Almost 47 subtasks had been addressed and 47 patches have been resulted to address the issues. 

Regards, Ahmad