Saturday, September 12, 2015

Custom User types in GORM

Recently, I wanted to model a Merchant which like many things in a domain model had an Address. I thought it made sense that the Address was embedded inside the Merchant. Reasons:
  • It had no lifecycle outside the Merchant. Merchant dies so should the address.
  • It only ever belonged to one and only one Merchant
So pretty obvious this was a composition relationship.

Now, it is possible to model composition relationships in GORM. See here. However, this approach comes with the caveat that the Address must be a GORM object. I didn't want the Address being a GORM object because GORM objects are powerful in Grails. With all their dynamic finders and GORM APIs they are essentially like really a DAO on steroids. If a developer gets their hands on can do lots of things (not always good things). I didn't want or need any of this. In addition, a good architecture makes it difficult for developers to make mistakes when they are working under pressure at fast speeds. That means, when you are making design decisions you need to think about the power you need to give, should give and will give.

So with that in mind, I looked into wiring up a custom type for Address. This would just be a data structure that would model the address, could be reused outside the Merchant (thus promoting consistency and again thus promoting good design) and wouldn't come with the power of the GORM. There is some documentation in the GORM doc's for custom types but there isn't a full working example. I had a look at some Hibernate examples and then put managed to put this together and get working.

Here is my address object.

@Immutable
class Address {

    private final String city;
    private final String country;
    private final String state;
    private final String street1;
    private final String street2;
    private final String street3;
    private final String zip;

    public String getCity() {
        return city;
    }

    public String getCountry() {
        return country;
    }

    public String getZip() {
        return zip;
    }

    public String getState() {
        return state;
    }

    public String getStreet1() {
        return street1;
    }

    public String getStreet2() {
        return street2;
    }

    public String getStreet3() {
        return street3;
    }
}
Here is my AddressUserType object:
class AddressUserType implements UserType {

    public int[] sqlTypes() {
        return [
            StringType.INSTANCE.sqlType(),
            StringType.INSTANCE.sqlType(),
            StringType.INSTANCE.sqlType(),
            StringType.INSTANCE.sqlType(),
            StringType.INSTANCE.sqlType(),
            StringType.INSTANCE.sqlType(),
            StringType.INSTANCE.sqlType()
        ] as int[]
    }

    public Class getReturnedClass() {
        return Address.class;
    }

    public Object nullSafeGet(ResultSet rs, String[] names, SessionImplementor session, Object owner) throws SQLException {
        assert names.length == 7;
        log.debug(">>mullSafeGet(name=${names}")
        String city = StringType.INSTANCE.get(rs, names[0], session); // already handles null check
        String country = StringType.INSTANCE.get(rs, names[1], session ); // already handles null check
        String state = StringType.INSTANCE.get(rs, names[2], session ); // already handles null check
        String street1 = StringType.INSTANCE.get(rs, names[3], session ); // already handles null check
        String street2 = StringType.INSTANCE.get(rs, names[4], session ); // already handles null check
        String street3 = StringType.INSTANCE.get(rs, names[5], session ); // already handles null check
        String zip = StringType.INSTANCE.get(rs, names[6], session ); // already handles null check

        return city == null && v == null ? null : new GAddress(city: city, country: country, state: state, street1: street1, street2: street2,  street3: street3, zip: zip);
    }

    void nullSafeSet(java.sql.PreparedStatement st, java.lang.Object value, int index, org.hibernate.engine.spi.SessionImplementor session) throws org.hibernate.HibernateException, java.sql.SQLException {
        if ( value == null ) {
            StringType.INSTANCE.set( st, null, index );
            StringType.INSTANCE.set( st, null, index+1 );
            StringType.INSTANCE.set( st, null, index+2 );
            StringType.INSTANCE.set( st, null, index+3 );
            StringType.INSTANCE.set( st, null, index+4 );
            StringType.INSTANCE.set( st, null, index+5 );
            StringType.INSTANCE.set( st, null, index+6 );
        }
        else {
            final Address address = (Address) value;
            StringType.INSTANCE.set( st, address.getCity(), index,session );
            StringType.INSTANCE.set( st, address.getCountry(), index+1,session);
            StringType.INSTANCE.set( st, address.getState(), index+2,session);
            StringType.INSTANCE.set( st, address.getStreet1(), index+3,session);
            StringType.INSTANCE.set( st, address.getStreet2(), index+4,session);
            StringType.INSTANCE.set( st, address.getStreet3(), index+5,session);
            StringType.INSTANCE.set( st, address.getZip(), index+6,session);
        }
    }


    @Override
    public boolean isMutable() {
        return false;
    }

    @Override
    public boolean equals(Object x, Object y) throws HibernateException {
        // for now
        return x.equals(y);
    }

    @Override
    public int hashCode(Object x) throws HibernateException {
        assert (x != null);
        return x.hashCode();
    }

    @Override
    public Object deepCopy(Object value) throws HibernateException {
        return value;
    }

    @Override
    public Object replace(Object original, Object target, Object owner)
            throws HibernateException {
        return original;
    }

    @Override
    public Serializable disassemble(Object value) throws HibernateException {
        return (Serializable) value;
    }

    @Override
    public Object assemble(Serializable cached, Object owner)
            throws HibernateException {
        return cached;
    }

    public Class returnedClass() {
        return Address.class;
    }
}
And here is my Merchant which has an Address.
class Merchant {
    UUID id;

    String color;
    String displayName;
    //...
    //...

    Address address
    
    static mapping = {
        address type: AddressUserType, {
            column name: "city"
            column name: "country"
            column name: "zip"
            column name: "state"
            column name: "street1"
            column name: "street2"
            column name: "street3"
        }
    }

}
As stated, with this approach, the Address data structure could be used in other GORM objects. Until the next time take care of yourselves.

Monday, August 3, 2015

Book Review: Cloud Based Architectures

Migrating to Cloud-Native Application Architectures, Matt Stine.

In the last few years, it can be very easy to believe marketing departments simply aren’t paid unless they use the word "Cloud" as often as possible.  Its usage has become so ubiquitous that is has inevitably become ambiguous so one very astute thing author Matt Stine does is outline what he means by "Cloud" (in particular cloud architectures) in the opening pages of this book.  Essentially, cloud architectures can be summarised by the three S’s. 

  • Speed: ability to provision and release resources (computing, networking and storage) with ease thereby achieving faster project development time
  • Safety: making software architectures more resilient (fault tolerance, fault isolation etc) 
  • Scalability: One the many non-functional characteristics that has become even more important with proliferation of mobile computing.  More mobile devices mean more application usage mean more back ends APIs getting hit.  Architectures must be able to respond to this demand otherwise applications fail. 

Twelve Factor App

Throughout the book approaches and patterns for achieving cloud based architectures are introduced. Now, if you have worked on anything distributed which was well architected, you will definitely experience a little bit of deja vu.  For example, the Twelve Factor App is a collection of patterns developed by engineers at Heroku containing twelve (you guessed it) principles to guide a good architecture fit for cloud.  You are sure to have used codebase (make deployable units have their own code base), dependencies (use proper tooling for managing dependencies), config (externalise configuration), backing services (databases, caches etc should be consumed identically by all resources) before but patterns are supposed to be common solutions to common problems and good architecture is good architecture, so it is not unexpected some of the described approaches won’t always sound completely new. 

Microservices

How do Microservices help enable the 3 S’s?
  • Speed: Deployment is much simpler and faster with independent code chunks.
  • Security: Not too much here, but it is fairly intuitive that if you break something large into small chunks it is much easier to access control sensitive parts. 
  • Scale:  Much easier to only have to scale the parts of an architecture you need to scale as opposed to having to scale the entire architecture - which is what happens in a Monolithic application.

Cloud Migration

For organisations to move to Cloud based architectures, several changes are needed:
  • DevOps rather than Silos: this is to facilitate speed of delivery.
  • Continuous Delivery:  one way to be to sure the problems of long release cycles are guaranteed to be avoided 
  • Decentralise autonomy:  yes, it is a nice idea to give everyone more power and influence. It will certainly speed things up. But, in my own view some things (core APIs, major technical architectural decisions) should only be handled by people of a certain technical background who are prepared to be fully accountable for them.  Would you let every person working on the design of a bridge or in a heart transplant have the exact same say? 
  • Inverse conway manoeuvre: Software companies should do their architecture first and then re-align their organisations to fit that so that they avoid the anti-pattern of making architectures to resemble an existing organisational structure.   This sounds nice in theory but it may not be so easy to achieve (possibly easier to do in very small companies that never have a strict organisational structure anyway).

Break up the Monolith

So if you have bought into the cloud based approach and since it’s unlikely your project is green field, you need to figure out how to break up the Monolith.  Some ideas include:
  • Bounded Contexts: This allows inconsistent definitions of concepts as long as they are consistent inside a well defined context which makes it much easier to decompose data models. For example, in the Security context a User may always refer to something that can be authenticated and has particular roles but in a Management context it would be something that has an image, an address and some application capabilities (checking out a shopping cart).
  • Identifying bounded contexts in the monolith and making them micro services.
  • Containerisation: Using something like Docker can provide many advantages to using VM's
  • Writing new features as micro services
  • Making the Monolith look like micro services by using anti-corruption layers

Spring cloud and Netflix

By the sounds of it Spring cloud's and the Netflix OSS project sound like an absolute must to checkout. Amongst other things, they help:
  • Achieve dynamic configuration
  • Implement sophisticated service discovery
  • provide alternative to load balancing which involve client side more.  
In addition, Netflix's Hystrix library (which provides some very useful metrices) employs some clever fault tolerance patterns:
  • circuit breakers - if a system notices that another system is broken it stops calling it.
  • bulkheads -  failure is limited by strict partitioning

Any negatives?

So lots of positives about this book.  Plenty of well explained information in a short number of pages. That is a major achievement.   I have alluded to a few minor points above, but they are very minor.  It would be very difficult to criticise this book in any substantial manner especially when it is free.    

However, it is important for the reader to bear a few considerations.
  • Business realities: Approaches such as Continuous Delivery just may not fit your business plan.  Why would you want to provide a feature someone hasn’t paid for?  (In fairness, the author does make this very point, but it really is worth re-iterating)
  • Technical realities:  One problem I have with many software approaches is that they assume that everyone is at a given skill level. In an ideal world the only people who work in software engineering would be very good engineers but the reality is there is always a very wide range of skill levels on any project.  From super meticulous to complete spoofers.  This means that some things that can sound very elegant in theory but a lot more difficult to achieve in practise.  When considering moving to things like DevOps, there will definitely be benefits. But are organisations going to find it easier or harder to make engineers accountable? I would imagine there would be serious risks of it being latter if it is not properly thought through.
  • Some concepts e.g. Client Side load balancing sound very interesting. But, they are only introduced rather than critically analysed.  Much more research and thought would be required (for me anyway) before considering adopting.  


Sunday, July 5, 2015

Postgres indexes

Recently, I had a situation where I needed to think how I was using Postgres indexes. I had a simple Book table with the following schema...
>\d book

                  Table "shopping.book"
       Column        |          Type          | Modifiers 
---------------------+------------------------+-----------
 id                  | uuid                   | not null
 version             | bigint                 | not null
 amount_minor_units  | integer                | not null
 currency            | character varying(255) | not null
 author       | character varying(255) | not null
 publisher           | character varying(255) |    
The author and publisher columns were just String pointers to actual Author and Publisher references that were on another system meaning that classical foreign keys couldn't be used and that these dudes were just normal columns.

I needed to get an idea how the table would perform with lots of data, so first up some simple SQL to put in lots of test data:

 
> CREATE EXTENSION "uuid-ossp";
> insert into book (id, version, amount_minor_units, currency, author, publisher) 
select uuid_generate_v4(), 2, 22, 'USD', 'author' || x.id, 'publisher' || x.id from generate_series(1,1000000) AS x(id); 
This table was going to be hit lots of times with this simple query:
select * from book where author = 'Tony Biggins' and publisher='Books unlimited';
To get the explain plain, I did:
dublintech=> EXPLAIN (FORMAT JSON) select * from book where author = 'Tony Biggins' and publisher = 'Books unlimited';
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 [                                                                                                                                +
   {                                                                                                                              +
     "Plan": {                                                                                                                    +
       "Node Type": "Seq Scan",                                                                                                   +
       "Relation Name": "book",                                                                                                   +
       "Alias": "book",                                                                                                           +
       "Startup Cost": 0.00,                                                                                                      +
       "Total Cost": 123424.88,                                                                                                   +
       "Plan Rows": 1,                                                                                                            +
       "Plan Width": 127,                                                                                                         +
       "Filter": "(((author)::text = 'Tony Biggins'::text) AND ((publisher)::text = 'Books unlimited'::text))"+
     }                                                                                                                            +
   }                                                                                                                              +
 ]
(1 row)
As can be seen, Postgres is doing a Seq Scan, aka a table scan. I wanted to speed things up. There was only one index on the table which was for the id. This was just a conventional B-Tree index which would be useless in this query since it wasn't even in the where clause. Some of options I was thinking about:
  • Create an index on author or publisher
  • Create an index on author and create an index on publisher
  • Create a combination index on both index and publisher.
Hmmm... let the investigations begin. Start by indexing just author.
dublintech=> create index author_idx on book(author);
dublintech=> EXPLAIN (FORMAT JSON) select * from book where publisher = 'publisher3' and author='author3';
                                  QUERY PLAN                                   
-------------------------------------------------------------------------------
 [                                                                            +
   {                                                                          +
     "Plan": {                                                                +
       "Node Type": "Index Scan",                                             +
       "Scan Direction": "Forward",                                           +
       "Index Name": "author_idx",                                  +
       "Relation Name": "book",                                               +
       "Alias": "book",                                                       +
       "Startup Cost": 0.42,                                                  +
       "Total Cost": 8.45,                                                    +
       "Plan Rows": 1,                                                        +
       "Plan Width": 127,                                                     +
       "Index Cond": "((author)::text = 'author3'::text)",+
       "Filter": "((publisher)::text = 'publisher3'::text)"                     +
     }                                                                        +
   }                                                                          +
 ]
(1 row)
As can be seen Postgres performs an index scan and the total cost is much lower than the same query which uses a table scan. What about the multiple column index approach? Surely, since both are used in the query it should be faster again, right?
dublintech=> create index author_publisher_idx on book(author, publisher);
CREATE INDEX
dublintech=> EXPLAIN (FORMAT JSON) select * from book where publisher = 'publisher3' and author='author3';
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 [                                                                                                                        +
   {                                                                                                                      +
     "Plan": {                                                                                                            +
       "Node Type": "Index Scan",                                                                                         +
       "Scan Direction": "Forward",                                                                                       +
       "Index Name": "author_publisher_idx",                                                                     +
       "Relation Name": "book",                                                                                           +
       "Alias": "book",                                                                                                   +
       "Startup Cost": 0.42,                                                                                              +
       "Total Cost": 8.45,                                                                                                +
       "Plan Rows": 1,                                                                                                    +
       "Plan Width": 127,                                                                                                 +
       "Index Cond": "(((author)::text = 'author3'::text) AND ((publisher)::text = 'publisher3'::text))"+
     }                                                                                                                    +
   }                                                                                                                      +
 ]
(1 row)
This time Postgres, uses the multi-index, but the query doesn't go any faster. Mai, pourquoi? Recall, how we populated the table.
insert into book (id, version, amount_minor_units, currency, author, publisher) 
select uuid_generate_v4(), 2, 22, 'USD', 'author' || x.id, 'publisher' || x.id from generate_series(1,1000000) AS x(id); 
There are lots of rows, but every row has a unique author value and a unique publisher value. That would mean the author index for this query should perform just as well. An analogy would be, you go into a music shop looking for a new set of loudspeakers someone has told you to buy that have a particular cost and a particular power output (number of watts). When you enter the shop, you see the speakers are nicely ordered by cost and you know what? No two sets of loudspeakers have the same cost. Think about it. Are you going to find the speakers any faster if you use just use the cost or you use the cost and the loudspeaker?

Now, imagine the case if lots of the loudspeakers were the same cost. Then of course using both the cost and the power will be faster.

Now, let's take this point to the extremes in our test data. Suppose all the authors were the same. The author index becomes useless and if we don't have the author / publisher combination index we would go back to table scan.

// drop combination index and just leave author index on table 
dublintech=> drop index author_uesr_ref_idx;
DROP INDEX
dublintech=> update book set author='author3';
dublintech=> EXPLAIN (FORMAT JSON) select * from book where publisher = 'publisher3' and author='author3';
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 [                                                                                                                    +
   {                                                                                                                  +
     "Plan": {                                                                                                        +
       "Node Type": "Seq Scan",                                                                                       +
       "Relation Name": "book",                                                                                       +
       "Alias": "book",                                                                                               +
       "Startup Cost": 0.00,                                                                                          +
       "Total Cost": 153088.88,                                                                                       +
       "Plan Rows": 1,                                                                                                +
       "Plan Width": 123,                                                                                             +
       "Filter": "(((publisher)::text = 'publisher3'::text) AND ((author)::text = 'author3'::text))"+
     }                                                                                                                +
   }                                                                                                                  +
 ]
(1 row)
So we can conclude from this that single column indexes for combination searches can perform as well as combinational indexes when there is a huge degree of variance in the data of that single column. However, when there isn't, they won't perform as well and a combinational index should be used. Yes, I have tested by going to extremes but that is the best way to make principles clear.And please note: For the case when there is maximum variance in data, adding another index to the other column in the where clause, publisher made no difference. This is as expected.

Ok, let's stick with the case when there is massive variance in data values in the column. Consider the case of maximum variance and the query only ever involves exact matching. In this case, all authors values are guaranteed to be unique and you never have any interest in doing anything like less than or greater than. So why not use a hash index instead of a B-Tree index?

dublintech=> create index author_hash on book using hash (author);
dublintech=> EXPLAIN (FORMAT JSON) select * from book where publisher = 'publisher3' and author='author3';
                                  QUERY PLAN                                   
-------------------------------------------------------------------------------
 [                                                                            +
   {                                                                          +
     "Plan": {                                                                +
       "Node Type": "Index Scan",                                             +
       "Scan Direction": "NoMovement",                                        +
       "Index Name": "author_hash",                                 +
       "Relation Name": "book",                                               +
       "Alias": "book",                                                       +
       "Startup Cost": 0.00,                                                  +
       "Total Cost": 8.02,                                                    +
       "Plan Rows": 1,                                                        +
       "Plan Width": 127,                                                     +
       "Index Cond": "((author)::text = 'author3'::text)",+
       "Filter": "((publisher)::text = 'publisher3'::text)"                     +
     }                                                                        +
   }                                                                          +
 ]
(1 row)
Interesting, we have gone faster again. Not a massive difference this time around but an improvement nonetheless that could be more relevant with more data growth and / or when a more complex query with more computation is required. We can safely conclude from this part that yeah if you are only interested in exact matches then the hash index beats the b-tree index. Until the next time take care of yourselves. References:
  • http://www.postgresql.org/docs/9.1/static/using-explain.html
  • http://www.postgresql.org/docs/9.3/static/indexes-bitmap-scans.html

Tuesday, June 23, 2015

Problems with Cobertura and Sonar 5.1

Recently, I was having some bother trying to use Sonar 5.1 with my Grails 2.4.4 project. I was using the usual Groovy stuff: Gmetrics, Codenarc and Cobertura. For the Sonar database I was using Postgres 9.4.

The logfile for the Sonar runner just gave me this:

build 22-Jun-2015 07:44:30 INFO: ------------------------------------------------------------------------
build 22-Jun-2015 07:44:30 INFO: EXECUTION FAILURE
build 22-Jun-2015 07:44:30 INFO: ------------------------------------------------------------------------
build 22-Jun-2015 07:44:30 Total time: 9.153s
build 22-Jun-2015 07:44:30 Final Memory: 30M/1039M
build 22-Jun-2015 07:44:30 INFO: ------------------------------------------------------------------------
error 22-Jun-2015 07:44:30 ERROR: Error during Sonar runner execution
error 22-Jun-2015 07:44:30 ERROR: Unable to execute Sonar
error 22-Jun-2015 07:44:30 ERROR: Caused by: Unable to save file sources
error 22-Jun-2015 07:44:30 ERROR: Caused by: -1
Not much use! I thought there was some permission problem, since "Unable to save file sources" usually means that! But there were no permission issues. I then disabled the Cobertura part of the analysis and things were ok, so it was something wrong with the Cobertura part. I then:
  • enabled verbose logging -- sonar.verbose=true
  • enabled full stack trace logging -- using the -e switch
  • enabled full debug logging with the -- using the -X switch
this provided a few more clues.
error 22-Jun-2015 11:09:06 ERROR: Error during Sonar runner execution
build 22-Jun-2015 11:09:06 INFO: ------------------------------------------------------------------------
error 22-Jun-2015 11:09:06 org.sonar.runner.impl.RunnerException: Unable to execute Sonar
error 22-Jun-2015 11:09:06  at org.sonar.runner.impl.BatchLauncher$1.delegateExecution(BatchLauncher.java:91)
error 22-Jun-2015 11:09:06  at org.sonar.runner.impl.BatchLauncher$1.run(BatchLauncher.java:75)
error 22-Jun-2015 11:09:06  at java.security.AccessController.doPrivileged(Native Method)
error 22-Jun-2015 11:09:06  at org.sonar.runner.impl.BatchLauncher.doExecute(BatchLauncher.java:69)
error 22-Jun-2015 11:09:06  at org.sonar.runner.impl.BatchLauncher.execute(BatchLauncher.java:50)
error 22-Jun-2015 11:09:06  at org.sonar.runner.api.EmbeddedRunner.doExecute(EmbeddedRunner.java:102)
error 22-Jun-2015 11:09:06  at org.sonar.runner.api.Runner.execute(Runner.java:100)
error 22-Jun-2015 11:09:06  at org.sonar.runner.Main.executeTask(Main.java:70)
error 22-Jun-2015 11:09:06  at org.sonar.runner.Main.execute(Main.java:59)
error 22-Jun-2015 11:09:06  at org.sonar.runner.Main.main(Main.java:53)
error 22-Jun-2015 11:09:06 Caused by: java.lang.IllegalStateException: Unable to save file sources
error 22-Jun-2015 11:09:06  at org.sonar.batch.index.SourcePersister.persist(SourcePersister.java:84)
error 22-Jun-2015 11:09:06  at org.sonar.batch.phases.DatabaseModePhaseExecutor.executePersisters(DatabaseModePhaseExecutor.java:165)
error 22-Jun-2015 11:09:06  at org.sonar.batch.phases.DatabaseModePhaseExecutor.execute(DatabaseModePhaseExecutor.java:133)
error 22-Jun-2015 11:09:06  at org.sonar.batch.scan.ModuleScanContainer.doAfterStart(ModuleScanContainer.java:264)
error 22-Jun-2015 11:09:06  at org.sonar.api.platform.ComponentContainer.startComponents(ComponentContainer.java:92)
error 22-Jun-2015 11:09:06  at org.sonar.api.platform.ComponentContainer.execute(ComponentContainer.java:77)
error 22-Jun-2015 11:09:06  at org.sonar.batch.scan.ProjectScanContainer.scan(ProjectScanContainer.java:235)
error 22-Jun-2015 11:09:06  at org.sonar.batch.scan.ProjectScanContainer.scanRecursively(ProjectScanContainer.java:230)
error 22-Jun-2015 11:09:06  at org.sonar.batch.scan.ProjectScanContainer.doAfterStart(ProjectScanContainer.java:220)
error 22-Jun-2015 11:09:06  at org.sonar.api.platform.ComponentContainer.startComponents(ComponentContainer.java:92)
error 22-Jun-2015 11:09:06  at org.sonar.api.platform.ComponentContainer.execute(ComponentContainer.java:77)
error 22-Jun-2015 11:09:06  at org.sonar.batch.scan.ScanTask.scan(ScanTask.java:57)
error 22-Jun-2015 11:09:06  at org.sonar.batch.scan.ScanTask.execute(ScanTask.java:45)
error 22-Jun-2015 11:09:06  at org.sonar.batch.bootstrap.TaskContainer.doAfterStart(TaskContainer.java:135)
error 22-Jun-2015 11:09:06  at org.sonar.api.platform.ComponentContainer.startComponents(ComponentContainer.java:92)
error 22-Jun-2015 11:09:06  at org.sonar.api.platform.ComponentContainer.execute(ComponentContainer.java:77)
error 22-Jun-2015 11:09:06  at org.sonar.batch.bootstrap.GlobalContainer.executeTask(GlobalContainer.java:158)
error 22-Jun-2015 11:09:06  at org.sonar.batch.bootstrapper.Batch.executeTask(Batch.java:95)
error 22-Jun-2015 11:09:06  at org.sonar.batch.bootstrapper.Batch.execute(Batch.java:67)
error 22-Jun-2015 11:09:06  at org.sonar.runner.batch.IsolatedLauncher.execute(IsolatedLauncher.java:48)
error 22-Jun-2015 11:09:06  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
error 22-Jun-2015 11:09:06  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
error 22-Jun-2015 11:09:06  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
error 22-Jun-2015 11:09:06  at java.lang.reflect.Method.invoke(Method.java:606)
error 22-Jun-2015 11:09:06  at org.sonar.runner.impl.BatchLauncher$1.delegateExecution(BatchLauncher.java:87)
error 22-Jun-2015 11:09:06  ... 9 more
error 22-Jun-2015 11:09:06 Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
error 22-Jun-2015 11:09:06  at java.util.ArrayList.elementData(ArrayList.java:371)
error 22-Jun-2015 11:09:06  at java.util.ArrayList.get(ArrayList.java:384)
error 22-Jun-2015 11:09:06  at com.google.protobuf.RepeatedFieldBuilder.getBuilder(RepeatedFieldBuilder.java:245)
error 22-Jun-2015 11:09:06  at org.sonar.server.source.db.FileSourceDb$Data$Builder.getLinesBuilder(FileSourceDb.java:2911)
error 22-Jun-2015 11:09:06  at org.sonar.batch.index.SourceDataFactory.
Now, I could see earlier in the log, that the Cobertura analysis had finished. I could also see that the Cobertura coverage.xml generated ok (this is the file which collates the code coverage info). The next step after creating the coverage.xml file was for the sonar runner to parse it and send a request to Postgres, something had to going wrong at the parsing stage since connecting to Postgres was definitely not an issue (remember everything fine when Cobertura disabled). I knew there was no problem sending the request to Postgres, so thought there must be something odd in the coverage.xml file which meant Sonar runner failed to parse it. As stated, the coverage.xml file details what line number for each class has and hasn't been covered. Sample:

    
        
             
                 
             
       
       ...

...
So what kind of things could make the parsing barf? What about if there was some odd line number in the coverage.xml file? hmmm... To check this, I ran the following grep:
> grep "line number" coverage.xml
This gave too much. What about any negative line numbers?
>grep "line number=\"\-" coverage.xml
Nope, none. Ok go back to exception, look at this line:
java.lang.ArrayIndexOutOfBoundsException: -1
hmmm... If a line number was 0, I wonder could it make some array parsing in the sonar runner throw index out of bounds?
>grep "line number=\"0" coverage.xml
Hit! Time to grep lines before and after and get more info about this file.
>grep -C20 "line number=\"0" coverage.xml
This gave me the culprit. It made no sense to me why Cobertura was saying that linenumber 0 had 0 hits. It was still possible to open the Cobertura html report and view the analysis. Sonar was just barfing when it was parsing it. So I removed this file from Cobertura analysis by adding the following to my build config.
coverage {
    xml = true
    exclusions = [
        "**/com/dublintech/me/MyOddFile*"
    ]
}
I then re-ran and hey presto, everything working. The file wasn't in the coverage.xml file. This meant the Sonar runner could parse the file and everything was ok.

I like sonar, I like a stable build and I like rapid feedback so yeah I was a happy person when it was working again!

Saturday, May 30, 2015

Using separate Postgres Schemas for the same database in a Grails App

Recently, I wanted to use the same Postgres Database but split my persistence layer into separate components which used separate schemas. The motivation was to promote modular design, separate concerns and stop developers tripping up over each other. Vertical domain models can be difficult to achieve but not impossible.

In my shopping application, I had a user component, a shopping component and a product component. Now this is pretty easy if you are using separate databases, but sometimes it's nice to just get the separation of concerns using separate schemas in the same database, since using the same database can make things like DR, log shipping, replication etc easier.

While I could find doc for separate databases, I found it difficult to find Grails doc to advice on my specific problem - how to use separate schemas when using the same database when using Postgres. So here is how I ended up doing it.

Here is my datasource.groovy.
String db_url = "jdbc:postgresql://localhost:5432/testdb"
String usernameVar = "db_user"
String passwordVar = "db_secret"
String dbCreateVar = "update"
String dialect = "net.kaleidos.hibernate.PostgresqlExtensionsDialect"

dataSource_user {
    pooled = true
    jmxExport = true
    dialect = dialect
    driverClassName = "org.postgresql.Driver"
    username = usernameVar
    password = passwordVar
    url = platform_url
    dbCreate= "validate"
}

hibernate_user {
    cache.use_second_level_cache = false
    cache.use_query_cache = false
    cache.region.factory_class = 'net.sf.ehcache.hibernate.EhCacheRegionFactory' // Hibernate 3
    singleSession = true // configure OSIV singleSession mode
    default_schema = "user"
}

dataSource_shopping {
    pooled = true
    jmxExport = true
    dialect = dialect
    driverClassName = "org.postgresql.Driver"
    username = usernameVar
    password = passwordVar
    url = platform_url
    dbCreate = "validate"
}

hibernate_shopping {
    cache.use_second_level_cache = false
    cache.use_query_cache = false
    cache.region.factory_class = 'net.sf.ehcache.hibernate.EhCacheRegionFactory' // Hibernate 3
    singleSession = true // configure OSIV singleSession mode
    default_schema = "shopping"
}

dataSource_product {
    pooled = true
    jmxExport = true
    dialect = dialect
    driverClassName = "org.postgresql.Driver"
    username = usernameVar
    password = passwordVar
    url = platform_url
    dbCreate= "validate"
}

hibernate_product {
    cache.use_second_level_cache = false
    cache.use_query_cache = false
    cache.region.factory_class = 'net.sf.ehcache.hibernate.EhCacheRegionFactory' // Hibernate 3
    singleSession = true // configure OSIV singleSession mode
    default_schema = "product"
}
Note: there are some obvious optimisations in config above, but the above just makes explaining simple.

I then mapped each GORM object to the appropriate schema.

class Cart {
    // ...
    // ...
    static mapping = {
        datasource 'shopping'
        // ... 
    }
}

class Address {
    // ...
    // ...

    static mapping = {
        datasource 'user'
    }
}

class Stock {
    // ...
    // ...

    static mapping = {
        datasource 'product'
    }
}
I then started my app and said "yippe, this works" had a little tea break and moved onto the next problem. As can be seen the trick is to use a separate hibernate closure, specify the schema in there and name the closure using the same naming format for separate database, but make the database closures point to the same database.