Tuesday, December 27, 2011

JAXB, SAX, DOM Performance

This post investigates the performance of unmarshalling an XML document to Java objects using a number of different approaches. The XML document is very simple. It contains a collection of Person entities. 

There is a corresponding Person Java object for the Person entity in the XML
...

and a PersonList object to represent a collection of Persons. 

The approaches investigated were:
  • Various flavours of JAXB
  • SAX
  • DOM
In all cases, the objective was to get the entities in the XML document to the corresponding Java objects. The JAXB annotations on the Person and PersonList POJOS are used in the JAXB tests. The same classes can be used in SAX and DOM tests (the annotations will just be ignored). Initially the reference
implementations for JAXB, SAX and DOM were used. The Woodstox STAX parsing was then used. This would have been called in some of the JAXB unmarshalling tests.

The tests were carried out on my Dell Laptop, a Pentium Dual-Core CPU, 2.1 GHz running Windows 7.

Test 1 - Using JAXB to unmarshall a Java File.


Test 1 illustrates how simple the progamming model for JAXB is. It is very easy to go from an XML file to Java objects. There is no need to get involved with the nitty gritty details of marshalling and parsing.

Test 2 - Using JAXB to unmarshall a Streamsource

Test 2 is similar Test 1, except this time a Streamsource object wraps around a File object. The Streamsource object gives a hint to the JAXB implementation to stream the file.


Test 3 - Using JAXB to unmarshall a StAX XMLStreamReader

Again similar to Test 1, except this time an XMLStreamReader instance wraps a FileReader instance which is unmarshalled by JAXB.


Test 4 - Just use DOM
This test uses no JAXB and instead just uses the JAXP DOM approach. This means straight away more code is required than any JAXB approach.
Test 5 - Just use SAX Test 5 uses no JAXB and uses SAX to parse the XML document. The SAX approach involves more code and more complexity than any JAXB approach. The Developer has to get involved with the parsing of the document.

The tests were run 5 times for 3 files which contain a collection of Person entities. The first first file contained 100 Person entities and was 5K in size. The second contained 10,000 entities and was 500K in size and the third contained 250,000 Person entities and was 15 Meg in size. In no cases was any XSD used, or any validations performed. The results are given in result tables where the times for the different runs are comma separated.

TEST RESULTS
The tests were first run using JDK 1.6.26, 32 bit and the reference implementation for SAX, DOM and JAXB shipped with JDK was used.

Unmarshall Type 100 Persons time (ms) 10K Persons time (ms)  250K Persons time (ms)
JAXB (Default)  48,13, 5,4,4 78, 52, 47,50,50 1522, 1457, 1353, 1308,1317
JAXB(Streamsource) 11, 6, 3,3,2 44, 44, 48,45,43 1191, 1364, 1144, 1142, 1136
JAXB (StAX) 18, 2,1,1,1 111, 136, 89,91,92 2693, 3058, 2495, 2472, 2481
DOM 16, 2, 2,2,2 89,50, 55,53,50 1992, 2198, 1845, 1776, 1773
SAX 4, 2, 1,1,1 29, 34, 23,26,26 704, 669, 605, 589,591
JDK 1.6.26 Test comments

  1. The first time unmarshalling happens is usually the longest.
  2. The memory usage for the JAXB and SAX is similar. It is about 2 Meg for the file with 10,000 persons and 36 - 38 Meg file with 250,000.  DOM Memory usage is far higher.  For the 10,000 persons file it is 6 Meg, for the 250,000 person file it is greater than 130 Meg. 
  3. The performance times for pure SAX are better. Particularly, for very large files.
The exact same tests were run again, using the same JDK (1.6.26) but this time the Woodstox implementation of StAX parsing was used.

 
Unmarshall Type 100 Persons time (ms) 10K Persons time (ms)  250K Persons time (ms)
JAXB (Default)  168,3,5,8,3 294, 43, 46, 43, 42 2055, 1354, 1328, 1319, 1319
JAXB(Streamsource) 11, 3,3,3,4 43,42,47,44,42 1147, 1149, 1176, 1173, 1159
JAXB (StAX) 30,0,1,1,0 67,37,40,37,37 1301, 1236, 1223, 1336, 1297
DOM 103,1,1,1,2 136,52,49,49,50 1882, 1883, 1821, 1835, 1822
SAX 4, 2, 2,1,1 31,25,25,38,25 613, 609, 607, 595, 613
JDK 1.6.26 + Woodstox test comments

  1. Again, the first time unmarshalling happens is usually proportionally longer.
  2. Again, memory usage for SAX and JAXB is very similar. Both are far better
    than DOM.  The results are very similar to Test 1.
  3. The JAXB (StAX) approach time has improved considerably. This is due to the
    Woodstox implementation of StAX parsing being used.
  4. The performance times for pure SAX are still the best. Particularly
    for large files.
The the exact same tests were run again, but this time I used JDK 1.7.02 and the Woodstox implementation of StAX parsing.

Unmarshall Type 100 Persons time (ms) 10,000 Persons time (ms)  250,000 Persons time (ms)
JAXB (Default)  165,5, 3, 3,5 611,23, 24, 46, 28 578, 539, 511, 511, 519
JAXB(Streamsource) 13,4, 3, 4, 3 43,24, 21, 26, 22 678, 520, 509, 504, 627
JAXB (StAX) 21,1,0, 0, 0 300,69, 20, 16, 16 637, 487, 422, 435, 458
DOM 22,2,2,2,2 420,25, 24, 23, 24 1304, 807, 867, 747, 1189
SAX 7,2,2,1,1 169,15, 15, 19, 14 366, 364, 363, 360, 358
JDK 7 + Woodstox test comments:

  1.  The performance times for JDK 7 overall are much better.   There are some anomolies - the first time the  100 persons and the 10,000 person file is parsed.
  2. The memory usage is slightly higher.  For SAX and JAXB it is 2 - 4 Meg for the 10,000 persons file and 45 - 49 Meg for the 250,000 persons file.  For DOM it is higher again.  5 - 7.5 Meg for the 10,000 person file and 136 - 143 Meg for the 250,000 persons file.
Note: W.R.T. all tests
  1. No memory analysis was done for the 100 persons file. The memory usage was just too small and so it would have pointless information.
  2. The first time to initialise a JAXB context can take up to 0.5 seconds. This was not included in the test results as it only took this time the very first time. After that the JVM initialises context very quickly (consistly < 5ms). If you notice this behaviour with whatever JAXB implementation you are using, consider initialising at start up.
  3. These tests are a very simple XML file. In reality there would be more object types and more complex XML. However, these tests should still provide a guidance.
Conclusions:
  1. The peformance times for pure SAX are slightly better than JAXB but only for very large files. Unless you are using very large files the performance differences are not worth worrying about. The progamming model advantages of JAXB win out over the complexitiy of the SAX programming model.  Don't forget JAXB also provides random accses like DOM does. SAX does not provide this.
  2. Performance times look a lot better with Woodstox, if JAXB / StAX is being used.
  3. Performance times with 64 bit JDK 7 look a lot better. Memory usuage looks slightly higher.

Wednesday, December 7, 2011

Ant versus Maven

There are many ways to organise build systems for Java projects. The two most predominant are probably still Ant and Maven.  Debates between the two tend to go around in circles with the balance now swinging towards maven - since IDE support has got better (particularly Eclipse).  My own view is if you do not have a good architecture which is modular in nature and which separates concerns that should be separated you'll run into trouble no matter what you use.  The emphasis should always be on good architecture first and foremost.   

That said, I made this short video which illustrates some of the arguments you hear from Maven-ites and Ant-ists.  It is a debate between Maeve and Anthony.  Maeve is arguing for Maven; Anthony is arguing for Ant. Obviously, it's impossible to cover every single argument but the video includes some of the principle ones. Get some popcorn and enjoy.





Saturday, December 3, 2011

Musing on mis-usings: 'Powerful use, Damaging misuse'.

There's an old phrase attributed to the former British Prime Minister Benjamin Disraeli which states there are three types of lies: "lies, damn lies and statistics".  The insinuation here is that statistics are so easy to make up they are unreliable.  However, statistics are extensively used in empiracle science so surely they have some merit? In fact, they have a lot of merit.  But only when they are used corrrectly.  The problem is they are easy to misuse.  And when misused, misinformation happens which in turn does more more harm than good.

There are strong parallels to this narrative in the world of software engineering. Object orientated lanuages introduced the notion of inheritance, a clever idea to promote code reuse. However, inheritance - when misused - can easily lead to complex hierarchies and can make it difficult to change objects. The misuse of inheritance can reek havoc and since all it takes to use inheritance (in Java) is to be able to spell the word "extends", it's very easy to reek such havoc if you don't know what you are doing. A similar story can be told with polymorphism and with design patterns. We all know the case of someone hell bent on using a pattern and thinking more about the pattern than the problem they are trying to solve.  Even if they understand the difference between a Bridge and an Adapter it is still quite possible that some part of the architecture may be over engineered. Perhaps it's worth bearing in mind that every single one of the GOF design pattern is already in JDK, so if you really want it in your architecture you don't have to look very far - otherwise only use when it makes sense to use it.

This 'Powerful use, damaging misuse' anti-pattern is ubiquitous in Java systems.  Servlet Filters are a very handy feature for manipulating requests and reponses, but that's all they are meant to do.  There is nothing in the language to stop a developer treating the Filter as a classical object, adding public APIs and business logic to the Filter.  Of course the filter is never meant to be used this way and when they are trouble inevitably happens.  But the key point is that it's easy for a developer to take such a powerful feature, misuse it and damage architectures.  'Powerful use, damaging misuse' happens very easy with Aspects, even Exceptions (we have all seen cases where exceptions were thrown and it would have made more sense to just return a boolean) and with many other features. 

When it is so easy to make mistakes, inevitably they will happen.  The Java compiler isn't going to say - 'wait a sec do you really understand this concept?' and codestyle tools aren't sophisticated enough to spot misuse of advanced concepts. In addition, no company has the time to get the most senior person to review every line of code.  And even the most Senior Engineer will make mistakes. 

Now, much of what has been written here is obvious and has already been well documentated.   Powerful features generally have to be well understood to be properly used.  The question I think worth asking is if there is any powerful feature or engineering concept in a Java centric architecture which is not so easy to misuse? I suggest there is at least one, namely:  Encapsulation.  Firstly, let's consider if encapsulation didn't exist.  Everything would be public or global (as in Javascript).  As soon as access scope narrows, encapsulation is happening which is usually a good thing. Is it possible to make an architecture worse by encapsulating behaviour?  Well it's damn hard to think of a case where it could.  If you make a method private, it may be harder to unit test. But is it really?  It's always easy to unit test the method which calls it, which will be in the same class and logical unit.

There's a lesson to be learnt here. As soon as you design anything which something else uses, whether it be a core component in your architecture, a utility library class or a REST API you are going to tell the world about, ask youself:
  1. How easy is it for people to misuse this? Is it at the risky levels of inheritance or the safer levels of encapsulation?
  2. What are the consequences of misuse?
  3. And what can you do to minimise misuse and its consequences? 
Aim to increase 'powerful use' and minimise 'damaging misuse'!


Sunday, November 13, 2011

REST Spring 3.0 - Simple Introduction

This post is going to be a very simple introduction to the RestTemplate API introduced in Spring 3.0.
Dogs know how to REST!
The RestTemplate is similar to other Spring templates such as JmsTemplate and JdbcTemplate in that Spring eliminates a lot of boot strap code and thus makes your code much cleaner.  When applications use the RestTemplate they do not need to worry about HTTP connections, that is all encapsulated by the template. They also get a range of APIs from the RestTemplate which correspond to the well know HTTP methods (GET, PUT, POST, DELETE, HEAD, OPTIONS).  These APIs are overloaded to cater for things  like different ways of passing parameters to the actual REST API.  Ok, so let's look a very simple example.  Suppose we want to invoke a well know Twitter REST API to check the Twitter timeline for the user "dublintech".  This API would takes form: http://twitter.com/statuses/user_timeline.xml?id=dublintech

Fire it into your web browser and you'll see you get back the timeline for dublintech in XML.  But you don't want to fire it into your browser, you want to invoke the API in a Java program!  Let's do that.


That's it. Isn't that incredibly simple. No HTTP Connection handling to worry about. Import one Spring 3.0 class, use it and that's it!  Yes, this post was a very simple introduction. In the real world, if you were using this API you'd obviously be doing something a little more sophisticated for example using XPath to get specific information out of the XML returned. But we don't need to cover that.  The point of this post was really to point how easy it is to use some of the REST 3.0 Spring features. In future posts we'll be looking at more advanced features.

 For now - keep it simple!
References:
1. http://static.springsource.org/spring/docs/3.0.x/javadoc-api/
2. http://blog.springsource.org/2009/03/27/rest-in-spring-3-resttemplate/
3. https://dev.twitter.com/docs/api Twitter REST API
4. Roy Fielding's original REST Paper: http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm

Friday, October 14, 2011

Javascript Closure's and Currying!

Closures are a powerful and important feature of Javascript. Before trying to understand the syntax of how to create one it's important to understand just why they are an important feature of the language. Firstly closures are a way of creating objects. But, so what? Javascript already provides two ways to do this: Object literals and Constructor functions. So what's wrong with these mechanisms? Well they do not provide any sort of encapsulation. Get a handle to an object created via the literal object approach or the constructor function approach and you can pretty much can change anything in the object.

Watch this poor literal object's encapsulation dreams smash...
myObject = {
    myProperty:"I wish I was encapsulated"
}
console.log(myObject.myProperty);  //outputs I wish I was encapsulated
myObject.myProperty = "sorry mate you're note";
console.log(myObject.myProperty);  //outputs sorry mate you're note
As for the Constructor function...
var Person = function(name) {
    this.name = name;
    this.speak = function () {
        return "I am " + this.name;
    }
}

var tony = new Person("Tony");
console.log(tony.speak());  // outputs I am Tony

// Tony's encapsulation dreams are going to get smashed.
tony.name = "Fatso";
console.log(tony.speak());  // outputs I am fatso
This is awful. Put yourselves in Tony's shoes and think about how he feels. This lack of encapsulation causes problems. Some programmers used the convention of an underscore before a property (as in _name) to try to indicate they wished they could make a property private but they couldn't. The underscore was saying: "please, please don't touch me".  But who wants to engineer logical systems around emotional pleas?

Closures provide a mechanism for encapsulating objects. You want to encapsulate Tony's name (and every other person's name) do this:
var person = function(name) {
    console.log(">> setting name to " + this.name);
    return {
        getName: function(){
           return name;
        } 
    }
}
var tony = person("Tony");  // outputs setting name to Tony
console.log(tony.getName());  // outputs Tony
tony.name = "Fatso"   // it won't change the name in the closure
console.log(tony.getName()); // still outputs Tony
WOW! at last some encapsulation. Let me try to explain... The outer function returns an Object literal. This lives longer than the outer function. The outer function effectively ends as soon as person("tony") is finished. The object literal lives longer because it is returned. Properties and methods in the object literal can access variables in the outer function. Just like the way any inner function can access variables in the outer function. Now, when the object literal is returned it "closes" over the values of the variables in the outer function it can access. Effectively, getting keeping a copy of them. It would be the exact same for an inner function if it was returned. What is returned is called the closure. The outside world cannot directly access what the closure closes over. It can only access what the closure itself advertises to the outside world. In this case, that is just one function getName(). This all means we can make encapsulated person objects.

Currying
Currying is the process of reducing the number of arguments passed to a function by setting some argument(s) to predefined values.  Consider this function...

function outputNumbers(begin, end) {
    var i;
    for (i = begin; i <= end; i++) {
        print(i);
    }
}
outputNumbers(0, 5);  // outputs 0, 1, 2, 3, 4, 5
outputNumbers(1, 5);  // outputs 1, 2, 3, 4, 5
Now suppose we want a similar function with a fixed "begin" value.  Let's say the "begin" value was always 1.  We could do:
function outputNumbersFixedStart(start) {
    return function(end) {
        return outputNumbers(start, end);
    }
}

And then defined a variable to be this new function...
var outputFromOne = outputNumbersFixedStart(1);
outputFromOne(3);  1, 2, 3
outputFromOne(5);  1, 2, 3, 4, 5

As can be seen, the number of arguments has been reduced from 2 to 1. And this has been enabled because the argument that has been eliminated has been set to a fixed value, in this case its value is 1.

Currying works by creating a closure that holds the original function and the arguments to curry.  A generic solution to currying is to add a "curry" function to every function in our code by augmenting function's prototype to include a curry function.  
Function.prototype.curry = function() {
    if (arguments.length<1) {
        return this; //nothing to curry with - return function
    }
    var that = this;
    var slice = Array.prototype.slice;
    var args = slice.apply(arguments);
    return function() {
        var innerFunctionSlice = slice.apply(arguments);
        return that.apply(null, args.concat(slice.apply(arguments)));
    }
}
This curry function can be applied to any function.
var outputFromZeroCurried = outputNumbers.curry(0);  
var outputFromOneCurried = outputNumbers.curry(1);   

outputFromZeroCurried(3);  // outputs 0, 1, 2, 3
outputFromOneCurried(5);  // outputs 1, 2, 3, 4, 5
Note: One thing to bear in mind in the curry prototype function is that the value of arguments in the outer function is different to the value of arguments in inner function.  When outputNumbers.curry(0) is invoked, arguments in the outer function is {0}. When outputFromZeroCurried(3) is invoked, arguments for inner function is {3}, but arguments for the outer function is still {0}. Essentially, 'arguments' in the outer function refers to arguments passed to the outer function; 'arguments' for the inner function refers to arguments for the inner fuction.

This smells like a Curry!

Monday, October 10, 2011

Are you confused by "this" Javascript?

George Bush found many things confusing!
The keyword "this" is important in javascript but it is also confusing.  The reason is because it's not always easy to ascertain what it actually means. It has different values depending on where it is in a piece of code and more importantly what context the code is invoked in.

In 'Javascript: The Good Parts', Javascript Guru Doug Crockford describes four different invocation patterns, the "this" parameter is initialised differently in all four patterns: the method invocation pattern, the function invocation patter, the constructor invocation pattern and the apply invocation pattern

The Method Invocation Pattern
When a function is property of an object in javascript it is a method. When the method is invoked, the 'this' refers to that object.  For example, in the code below the property getName is an anonymous function.  If the anonymous function wishes to refer to the name variable it uses the 'this' notation.
var myObject = {
    name: "Staveley",
    getName: function () {
        return this.name;
    }
};

console.log(myObject.getName());// Staveley.

It is worth stressing the importance of 'this' in the example above.  If it is not used, the name will be the global one.
name="GlobalName";
var myObject = {
    name: "Alex",
    getName: function () {
        return name; // refers to name in global namespace.
    }
};
console.log(myObject.getName());// Outputs GlobalName not Alex.

The Function Invocation Pattern
When a function is defined not as a method but just as a function (i.e. there is no property defining it in an object),  the keyword 'this' refers to the the global object.
var getNameFunction = function() {
    name: "Alex";
    console.log(">>getNameFunction(),name=" + this.name);  
};
// Invoking getNameFuction() outputs: undefined.  
// Because, this is equal to the global object 
// which has no name property.
getNameFunction(); 

When another object has a property which calls the getNameFunction, the 'this' inside the
getNameFunction refers to that other object.
var obama = {
    name: "obama"
};

obama.getName = getNameFunction;
obama.getName();  // outputs obama!

But, an inner function does not share the method's access to the object as its 'this' is bound to the wrong value.
obama.getFullName = function() {
    var outputName = function() {
        console.log("barack " + this.name);
    };
    outputName();
} 
// Invoking getFullName() outputs barack undefined. Why? 
// Because the innerfunction's 'this' refers to the global variable. 
// The global variable has no name property.
obama.getFullName(); 

The solution is to assign another variable to the value this. By convention this variable takes the name 'that'.
obama.getFullName = function () {
    var that = this;
    var outputName = function() {
        console.log("barack " + that.name);
    };
    outputName();
};
obama.getFullName(); // outputs barak obama

Inner functions confused Obama!
The Constructor Invocation Pattern
Objects can be easily defined using object literal syntax. However, sometimes when many objects of the same type need to be defined there is a need for consistency. There is also a need to avoid repetition. Two good reasons to use constructor functions! 
var Car = function(colour, reg){
    this.colour= colour;  // this refers to the object being created.
    this.reg = reg;
};

var myRedCar = new Car("Red", "00D901");
var myGreenCar = new Car("Green", "00D902");
var myBlueCar = new Car("Blue", "00D903");  

// Now test those assignments worked. 
console.log(myRedCar.colour + ", " + myRedCar.reg);   // outputs Red, 00D901
console.log(myGreenCar.colour + ", " + myGreenCar.reg); // outputs Green, 00D902
console.log(myBlueCar.colour + ", " + myBlueCar.reg);  // outputs Blue, 00D903

Now suppose we want to add a method common to all 'Car' objects. We do:
Car.prototype.getColour = function() {
    console.log(">> getColour(), colour = " + this.colour);
};

In this case, the 'this' refers to the object created with the new prefix
myRedCar.getColour();  // outputs Red
myGreenCar.getColour(); // outputs Green
myBlueCar.getColour();  // outputs Blue

The Apply Invocation Pattern
When Apply (or Call) is used we are allowed to choose the value of 'this'.  To do this,
we simply specify it as an argument.
myRedCar.getColour.apply(myGreenCar);  // outputs green
myGreenCar.getColour.apply(myRedCar); // outputs red

References:
1. Brilliant tutorial on Javascript objects: http://helephant.com/2008/08/17/how-javascript-objects-work/
2. Douglas Crockford: http://javascript.crockford.com/javascript.html


Wednesday, October 5, 2011

Problems debugging Java after ANT compile

ANT    
Ok, this one is pretty easy but worth posting. I lost a few hours because of it and I don't want the same to happen to you! Suppose you want to be able to pass a switch into an ANT target which performs a javac to tell it to include debug information or not.

Your properties file will contain properties such as:
The ANT compile target then uses these properties:

However, the debug information is not there when you are debugging.  You run
ant -v compile to get more information. You see:

Now, as we can see the properties are echo'd as expected.  But, the "-g:none" indicates the compile won't include debug information.  Before you tear your hair out, relax! You have just made a very silly mistake we all make at sometime. When ANT reads property files it reads all property values literally. This means there is a difference between "true" and "true ", i.e. boolean properties should not have trailing spaces otherwise they will not be read as boolean properties. This is what happened here.  Ouch!
Ouch - trailing space!

As it states in http://ant.apache.org/manual/Tasks/property.html, regarding property files: "Trailing spaces are not stripped. It may have been what you wanted."

So rip that trailing space and re-run the ant compile target.

You should see:


which means the compiler is going to add lines, vars and source debug information.  Happy debugging!


References:
1. http://ant.apache.org/manual/Tasks/property.html









DB2 cheat sheet

Some useful DB2 commands


DB2 System Commands
  • DB2LEVEL -- checks version of DB2 installed.
  • DB2ILIST -- lists all instances installed
  • DB2CMD -- opens a command line processor
  • DB2CC -- opens db2 control center
  • DB2LICM -l -- gets db2 type. 
Command Line Processor Commands
  • DB2 LIST NODE DIRECTORY -- Lists all nodes
  • DB2 CATALOG TCPIP NODE DB2NODE REMOTE MACHINE215 SERVER 50000 -- catalogs node.  In this case, node is db2Node on the machine with name machine215. Port is 50000.
  • DB2 LIST DATABASE DIRECTORY -- list databases
  • DB2 GET DB CFG FOR SAMPLE -- get configuration info for the SAMPLE db.
  • DB2 CONNECT TO alexDB USER myuser USING mypass -- connect to db. In this case, database is alexdb, usern is myuser and password is mypass.
  • DB2 DISCONNECT alexdb  -- disconnects
  • DB2 LIST APPLICATIONS SHOW DETAIL -- shows all running db's
  • DB2 GET DBM CFG -- view authentication paramater (e.g. something like server_encrypt)
  • DB2 UPDATE DBM CFG USING AUTHENTICATION SERVER_ENCRYPT -- alter the authentication mechanism to server_encrypt 
  • DB2 GET AUTHORIZATIONS -- get authorisation level. 
Database commands via Command Line Processor (CLP)
  • DB2 GET DATABASE CONFIGURATION -- gets current database configuration
  • DB2 VALUES CURRENT USER - - gets the current user
  • DB2 VALUES CURRENT SCHEMA -- gets the current schema
  • DB2 VALUES CURRENT QUERY OPTIMIZATION -- get query optimization level.
Schemas
  • DB2 SELECT SCHEMANAME FROM SYSCAT.SCHEMATA -- list all schemas
  • DB2 VALUES CURRENT SCHEMA -- gets the current schema
  • DB2 SET SCHEMA ALEXSCHEMA -- set schema
Tables
  • DB2 LIST TABLES FOR schema_name -- list all tables for particular schema
  • DB2 LIST TABLES SHOW DETAIL; -- show detail about tables
  • DECLARE GLOBAL TEMPORARY TABLE -- declares a temporary table
  • CREATE TABLE MQT AS (SELECT c.cust_name, c.cust_id, a.balance FROM customer c, account a WHERE c._cust_name IN ('Alex') AND a.customer_id - c.cust_id) DATA INITIALLY DEFERRED REFRESH DEFERRED -- Creates a materialised query table. In this case the MQT is based on a join query from the customer and account table.
Tablespaces
  • DB2 LIST TABLESPACES SHOW DETAIL -- show detail about table spaces
  • SELECT * FROM SYSCAT.TABLESPACES;  -- show what syscat has about tablespaces
  • SELECT tbspace, bufferpoolid from syscat.tablespaces;  -- get tablespace and bufferpoolid
  • SELECT TABNAME FROM SYSCAT.TABLES WHERE TBSPACE=2; -- Check what TABLES are in tablespace where id = 2.

Constraints
  • SELECT * FROM SYSCAT.TABCONST;  -- Table constraints
  • SELECT * FROM SYSCAT.CHECKS;  -- Colum checks
  • SELECT * FROM SYSCAT.COLCHECKS; -- Column constraints 
  • SELECT * FROM SYSCAT.REFERENCES; --  Referential constraints
Sequences
  • CREATE SEQUENCE STESTRESULT AS INTEGER INCREMENT BY 1 START WITH 1 NO MINVALUE NO MAXVALUE NO CYCLE CACHE 10 ORDER;  -- Create Sequence starting with 1 which cache 10 values
  • SELECT * FROM SYSCAT.SEQUENCES; -- Gets systcat info on sequences
  • VALUES NEXT VALUE FOR MYSEQ; -- Gets next value from sequence myseq
  • ALTER SEQUENCE MYSEQ RESTART WITH 11 INCREMENT BY 1 MAXVALUE 10000 CYCLE CACHE 12 ORDER -- Changes MySeq sequence

Locksize
  • SELECT TABNAME, LOCKSIZE FROM SYSCAT.TABLES WHERE TABNAME = ' EMPLOYEES';  -- Check locksize which can be tablespace, table, partition, page, row - (usually row).

Bufferpools
  • SELECT bpname, npages, pagesize from syscat.bufferpools -- get useful buffer pool info.
  • SELECT buffer.bufferpoolid, buffer.bpname, buffer.npages, buffer.pagesize, tablespace.tbspace, tablespace.tbspaceid from syscat.bufferpools buffer, syscat.tablespaces tablespace where tablespace.bufferpoolid = buffer.bufferpoolid;  -- gets buffer pool and corresponding tablespace info.

Indexes
  • SELECT * FROM SYSCAT.INDEXES --  show all indexes
  • SELECT COLNAMES, TABNAME, INDEXTYPE, CLUSTERRATIO, CLUSTERFACTOR FROM SYSCAT.INDEXES WHERE TABNAME = 'TPERSON';  -- some useful columns

Functions
  • SELECT * FROM SYSCAT.FUNCTIONS;  -- check what functions DB has.

SYSDUMMY1 commands
  • SELECT CURRENT DATE FROM SYSIBM.SYSDUMMY1; -- gets current date.
  • SELECT HEX(36) FROM SYSIBM.SYSDUMMY1;  -- same as VALUES HEX(36)
  • SELECT XMLCOMMENT ('This is an XML comment') FROM SYSIBM.SYSDUMMY1;

Runstats
  • RUNSTATS ON TABLE TAUSER1.TOSUSER FOR INDEXES ALL;  -- runstats for all indexes
Checking the last time runstats was run...
  • SELECT CARD, STATS_TIME FROM SYSCAT.TABLES WHERE TABNAME = 'TOSUSER';
  • SELECT NLEAF, NLEVELS, FULLKEYCARD, STATS_TIME, TABNAME, INDNAME FROM SYSCAT.INDEXES WHERE TABNAME = 'TOSUSER';
The following catalog columns can be queried to determine if RUNSTATS has been performed on the tables and indexes:
  • If the CARD column of the SYSCAT.TABLES view displays a value of -1, or the STATS_TIME column displays a NULL value for a table, then the RUNSTATS utility has not run for that table.
  • If the NLEAF, NLEVELS and FULLKEYCARD columns of the SYSCAT.INDEXES view display a value of -1, or the STATS_TIME column displays a NULL value for an index, then the RUNSTATS utility has not run for that index.

Monday, October 3, 2011

The A to Z of DB2

In nearly every software architecture there is a relational database - somewhere.  And in nearly every relational database there is a range of concepts and buzzwords.  Some unique to a particular database vendor but many not.  In this post we run through some concepts / buzzwords for DB2 - alphabetically!
Authority Levels
A DB2 Authority Level is a security level representing a collection of privileges and higher-level database manager maintenance and utility operations.  SYSADM, SYSCTRL, SYSMAINT, SYSMON are instance level authorities and can only be assigned to a group. DBADM, SECADM and LOAD are database level authorities.
SYSADM is the only authority which can:
  •  update the DMB CFG file
  •  grant SYS* authorities to other groups. Grant DBADM authority to users / groups
  •  access data within any database
  •  do anything SYSCTRL can do
SYSCTRL:
  •  only access data in database if given privilege
  •  can create/drop databases, tablespaces
  •  do anything SYSMAINT can do
SYSMAINT can:
  •  db2start/ db2stop / backup / restore/ runstats
DBADM can:
  • do anything for a particular database
  • grant load authority to other users
Authentication Types
The following Authentication types are available:
SERVER: authentication takes place on the Server
SERVER_ENCRYPT: authentication takes place on the Server username / password is encrypted on the client before being sent.
CLIENT, KERBEROS, KRB_SERVER_ENCRYPT, DATA_ENCRYPT, DATA_ENCRYPT_CMP, GSSPLUGIN, GSS_SERVER_ENCRYPT.
If a Client uses the SERVER_ENCRYPT and the Server uses SERVER authentication type and error will occur when client tries to connect to the Server.

Common Table Expressions
Common Table Expressions exist only for the life of the SQL statement that created them. They are used for special cases e.g. recursion in a query.  
The syntax is: WITH [tablename] ...

Command Editor
Interactive GUI for SQL commands

Configuration Assistant
The Configuration Assistant enables users to configure clients so that they can access databases stored on remote DB2 servers. The configuration assistant allows users to catalog, uncatalog databases, bind applications, set DB2 registry parameters, changes passwords, test database connections.

Connect Enterprise Edition
Connect Enterprise Edition is an add on product that allows data to be moved between Linux, Unix, Windows, iSeries and zSeries based DB2 servers.

Constraints
Constraints are used to enforce business rules (e.g. attributes in a column cannot be null).  The SQL used to create constraints is stored in the System Catalog.

Control Center
Performs admin for instances / databases / bufferpools / tablespaces/ views / indexes... Catalog / Uncatalog databases. And all sorts of other DB tasks.

Cursor Operations
Update and delete operations can be performed using position operations or search operations.  In a position operation the cursor must be first created, opened and then positioned.  When a cursor is declared with 'WITH HOLD' option, it will remain open across transactions until it is explicitly closed.

Cursor Usage Steps
The steps to use a cursor are: DECLARE CURSOR, OPEN, FETCH, CLOSE.

Database Manager Configuration File
Stores the names of the groups which have been given instance level authorities (SYSADM, SYSCTRL, SYSMAINT, SYSMON)

Decimal
There four ways to define decimal types: DECIMAL(percision, scale), DEC(precision, scale), NUMERIC(precision, scale) and NUM(precision, scale)

DCS directory
The DCS directory stores database information for remote databases on z/OS iSeries.

Declared temporary tables
User defined tables to hold non persistent data.   They are created by the application and destroyed when the application terminates.

Design Advisor
The design advisor makes recommendations for new indexes, deletion of indexes, Materialized Query Tables, Multi Dimensional Clustering

Developer Workbench (Development Center in v8)
Used to create, edit, debug, deploy, test DB2 stored procedures and user defined functions. Also to develop and run XML queries.

Distinct Type
A distinct data type is a user-defined data type that is derived from one of the built in data types in DB2. Example of syntax creation: CREATE DISTINCT TYPE euro AS DECIMAL (9,3) WITH COMPARISON.  Disinct types are strongly typed; they cannot be used as an argument for a built-in data type in a built-in function, even if they are derived from them (and vice versa).  If the WITH COMPARISON syntax is used during creation, it means that comparison functions (<>, <, > , <=, >=, >) and ORDER BY, GROUP BY clauses can be used for the distinct type. Two casting functions are created anytime a distinct type is created.  This is used to convert to / from its base type and has the same name as the distinct type.

Extenders
Extenders are advanced data types that are not part of the built-in datatypes.  There are 6 types of extenders in DB2.
  • DB2 AVI extender
  • DB2 Text extender
  • DB2 Net search extender
  • DB2 XML extender
  • DB2 Spatial extender
  • DB2 Geodetic extender - can treat earth like globe rather than flat map.

Foreign Key Constraints for Delete:
ON DELETE CASCADE: When entity is deleted from parent table, any entity which has a FK to the parent entity will be deleted.
ON DELETE SET NULL: When entity is deleted from parent table, FK will be set to null
ON DELETE RESTRICT: When entity is deleted from parent table, FK values must point to same value
ON DELETE NO ACTION: When entity is deleted from parent table, FK values must point to something valid but can change

Foreign Key Constrains for Update:
ON UPDATE RESTRICT: When entity is updated in parent table, the FK values must have to have the same values
ON UPDATE NO ACTION: When the entity is updated in the parent table, the FKs values can change but must always point at something.

Grant All (table)
GRANT ALL privileges for table means: ALTER, SELECT, INSERT, UPDATE, DELETE, INDEX, REFERENCES privileges.  Note there is no 'ADD' privilege.

Graphic
Graphic is a fixed length double-byte character type.

Group By
Used to specify columns that are to be grouped together and to provide input into aggregate functions such as SUM() and AVG()

Group By Cube 
Used to group in multiple dimensions e.g. SELECT workdept, gender, AVG(salaray) AS avg_salary FROM employee GROUP BY CUBE (workdept, gender);

Having
The having clause is used to by further selection criteria to a GROUP BY clause. It refers to data that has already been grouped by a GROUP BY clause.  It uses same syntax as WHERE clause and can only be used in by the GROUP BY clause.

Health Center
The Health Center tool is used to select instance and database objects that you want to monitor

Identity Column
Identity columns must be a numeric data type with a scale of 0.

Indexes 
The creation of an index provides logical ordering of rows in a table in ascending order of the index.

Isolation levels
Repeatable read isolation level will lock rows scanned in a query.
Read stability isolation level will only lock the rows returned in the result data set.
Cursor stability isolation level will only lock the result set that the cursor is currently pointing to.
The uncommitted read isolation level will not lock any rows during normal read processing (unless another user tries to alter or drop the table being read).

Journal
The DB2 journal is an interactive GUI that tracks historical information about tasks, database actions and operations

License Center
the License Center Allows users to view information about licenses

Like (table creation)
CREATE TABLE ... LIKE ... - creates table which has exact same name, datatype and nullability characteristics.

Locks
Locks can only be applied to table spaces, tables and rows.

Lock conversion
The act of changing a lock to a more restrictive state.  In most cases, lock conversation happens for row level locks, e.g. if an Update(U) lock is held and an Exclusive(X) lock is needed, the update(U)lock will be converted to an Exclusive lock

Lock escalation
Lock escalation is when the size of a lock changes.  For example from Row to Table size. This is usually to free up some space in the Lock list.

Lock list
The specific amount of memory set aside to hold a structure that DB2 uses to manage locks.

Lock state (or mode)
DB2 locks can have various states: Intent None, Intent share, Next Key Share, Share, Intent Exclusive, Share with Intent Exclusive, Update, Next Key Exclusive, Exclusive, Weak Exclusive, Super Exclusive.  DB2 determines the lock state by looking at isolation level and the SQL being executed.

Materialised Query Tables:
User defined tables whose definition is based on the result of a query used for query optimization.

Null
For predicates use IS NULL. In result sets, - means null.  Unique indexes can one null value. Unique constraints can have never have a null value.  Nulls can't be in used with IN clauses.

Operating System Support

DB2 Type / OSLinuxWindowsSolarisMobile OSAIXHP-UXSystem i
DB2 EveryplaceNoNoNoYesNoNoNo
DB2 PEYesYesNoNoNoNoNo
DB2 Express CYesYesNoNoNoNoNo
DB2 Express YesYesYesNoNoNoNo
DB2 i5 / OsNoNoNoNoNoNoYes
DB2 WSEYesYesYesNoYesYesNo
DB2 ESEYesYesYesNoYesYesNo


Packages
A package is an object that contains the information needed to process SQL statements associated with a source code file of an application program.

Privilege: Alter (Table)
The alter table privilege allows user to add columns to a table, add / change table comments, create a table pk, unique/check constraint, triggers for table. 

Privilege: Control 
The control privilege that applies to Table, View, Nicknames, Packages and Indexes. It includes every privilege including the privilege to drop the object from the DB.  The owner of a table automatically receives control privilege (and all other privileges).  Only users with SYSADM or DBADM authority are allowed to explictily grant CONTROL privilege on any object.  A user with CONTROL privilege can grant any table privilege except CONTROL privilege.

Privilege: Connect 
(Database)
Connect Privilege applies to a database. It allows users to connect.

Privilege: References (Table)
Reference table privilege allows a user to create and drop foreign key constrains that reference a table in a parent relationship. This can be granted for entire table of just specific columns in 
the table.

Privilege: Usage (sequences)
Allows the PREVIOUS VALUE and NEXT VALUE expression associated with the sequence to be changed.

Privileges: Packages 
The BINDADD privilege at database allows a user to create packages in the database.

Privileges (Explicit / Implicity)
Explicit privileges have to be granted explicitly. Implictly privileges do not. For example, CONTROL is granted to anyone who creates a Table or View.  CREATEIN, ALTERIN, DROPIN is granted to anyone who creates a schema.

Privileges: View / Nickname
There is no Alter, Index or Reference privilege for View. Otherwise they have have the same privileges as Tables. Nickhaves have the same privileges possible as Tables.

Privileges:Index
There is only one Index privilege - it is Control.

Replication Center
Facilitates data replication between a DB2 database and any other relational database

Routine
A routine is a user defined function or stored procedure

Satellite Admin Center
Allows users set up and administer a group of DB2 Servers

Sequences
Sequences, identity columns and triggers can be used to generate values for a column. But, only sequences can be referenced in an INSERT statement.

SET operators
A set operator is is used to combined two or more queries into a single query. Examples: UNION, UNION ALL, INTERSECT, INTERSECT ALL, EXCEPT, EXCEPT ALL

Spatial Extenders
Spatial extender treats the world as flat map; the DB2 geodetic extender treats the world as a globe.

SQL Performance Monitor 
To analyse database operations performed againsts a DB2 for i5/OS database

Storage
Char = (number of characters * 1) bytes required; 1 and 254 characters
Varchar = (number of characters + 4) bytes requires; 32,672 characters
LONG VARCHAR = (number of characters + 24) bytes required; 32,672 characters (table space agnostic)
GRAPHIC = (number of characters * 2) bytes required; 127 characters
VARGRAPHIC = (number of characters * 2) + 4 bytes required; 16, 336
LONG VARGRAPHIC = (number of characters * 2) + 24 bytes required; 16,350 characters (table space agnostic)

Structured Data Type
A structured data type is a user-defined type that contains one or more attributes, each of which has a name and a data type of its own.  A SDT can also be created as a subtype of another structured type.   SDT are created by the CREATE TYPE sql statement.

Table Locks
Share mode - Other transactions are allowed read data but not change the data that is locked.
Exclusive mode - Other transactions can neither read nor modify the data that is locked.

Task Center
Allows users schedule, run tasks and send notifications about them

Time / Timestamp
Timestamp can store date time. Time can only store time.

Triggers
  • A trigger can be activated whenever an insert, update or delete operation is performed (not a select).
  • A trigger event can be activated before, after or instead of the trigger event
  • Trigger granularity: They can be activated for every row updated (FOR EACH ROW) or just for every statement (FOR EACH STATEMENT)
  • To stop trigger events setting off other triggers use the NO CASCADE option
  • A trigger event can reference old or new data using the 'REFERENCE OLD AS' or 'REFERENCE NEW AS' syntax
  • A trigger can send signals.This can be used to prevent actions, for example: SIGNAL SQLSTATE '75002' SET MESSAGE TEXT 'Deletes not allowed'.
  • The SQL used to create Triggers is stored in the system catalog.

Typed tables
User defined tables whose column definitions are based on the attributes of a user defined structured data type.

Universal Developer's Edition
Contains tools to build application supported Linux, UNIX, Windows and DRDA Application Requestor

View Tables
Do not contain real data but instead refer to data in real tables. Only the view definition itself is stored in the database.
Useful for controlling access to data.   The WITH LOCAL CHECK OPTION can be used to enforce data constraints for inserts, updates.
The WITH CASCADED CHECK OPTION can be used to cascade constraints to subsequent views.  Characteristics of View tables are stored in the system catalog not the SQL that created them.

Visual Explain
Gives visual representation of data access plan

XML Columns
XML columns are used to store documents as a hierarchial set of entities. The XML data type does not have a fixed length.

P.S.  Well done Seán O'Brien on another super performance! Well done Ireland beating Italy and making the World Cup Quarter finals!

Seán O'Brien