Sunday, December 8, 2019

Arch Unit

If Software Architecture is done to a reasonable standard, we should expect to see:
  • Well designed patterns that can fulfill both functional requirements and non-functional requirements
  • No crazy crazy coupling, concerns are properly separated and everything is testable.
If we get that, we should have confidence that as the software evolves it is maintainable. So the tricky part is all too often Architectural rules start off great on a whiteboard (or a powerpoint slide) but just get lost in code, because they are too difficult to enforce.

Arch Unit is a super mechanism to impose Architectural rules and patterns on your code base.  It has been around a few years but something I only discovered this year.  I came across it when I was trying to think of ways to ensure the proverbial "utils" package did not turn into the proverbial "dumping ground". Ideally, we'd have no utils packages ever. But, in the real world they nearly always exist. The utils package shouldn't have many efferent dependencies.   So for example, suppose you have a package called shoppingcart.   Then you have need some sort of utility function to add the total of two carts, remove special offers, add loyalty discounts, blah blah blah.  The last thing you want to see is someone checking that into the utils package with dependencies towards the shoppingcart package.  Because, if it is so shoppingcart focused, it should really just be in the shoppingcart package. If this happens, very soon your utils package will have dependencies to everything and everything will have dependencies to it.  Disaster. What is the point in packages if anything can just depend on anything? They will cease to provide any name-spacing or encapsulation benefits.

So, how can Arch Unit help?  Well very simple you define Architectural rules like a JUnit test.  Wait a sec...  It is a JUnit test.    The efferent (outward) and afferent (inward) for you utils package are very simple expressed as:

@ArchTest
public static final ArchRule utilPackageDependenciesRules = classes().that().resideInAPackage("com.company.application.util")
           .should().onlyDependOnClassesThat().resideInAnyPackage(getAllowedDependencies("com.company.application.exception"))
           .andShould().onlyHaveDependentClassesThat().resideInAnyPackage("com.company.application.shoppingcart"
 "com.company.application.payment);
So that's it. Now repeat for every package and you now have code control that runs like any other JUnit test. So therefore it will run easily as part of your CI, CD etc. Now, if you have architected your packages well, you don't  have to bring up at code reviews. Instead the rules are part of your CI. As your software evolves and new packages come along and dependencies rules change, simply just change the rules that are expressed in nice fluent Java APIs. Someone new joins the teams and wants to get up to speed on the Architectural package rules? Simple, just direct them Architectural tests.

Not only does ArchUnit give you the ability to express package rules, you can also define your own rules aka conditions and then apply them to whatever code you want. For example, suppose you want a condition that an object is immutable. You naturally therefore want no setters. That could be expressed by this condition.
    static ArchCondition noPublicSettersCondition =
         new ArchCondition("class has no public setters") {
             @Override
             public void check(JavaClass item, ConditionEvents events) {
                 for (JavaMethod javaMethod: item.getMethods()) {
                     if (javaMethod.getName().startsWith("set") && 
                       javaMethod.getModifiers().contains(JavaModifier.PUBLIC)) {
                         String message = String.format(
                             "Public method %s is not allowed begin with setter", javaMethod.getName());
                         events.add(SimpleConditionEvent.violated(item, message));
                     }
                 }
             }
         };
You could then apply the noSetter condition to any custom Exception a developer may write. It wouldn't be good if an Exception had a setter would it?
    @ArchTest
    public static final ArchRule noExceptionsHaveSetters = classes().that()
      .areAssignableTo(RuntimeException.class).should(noSettersCondition);
    
Suppose you keep noticing that Loggers defined in classes either aren't private, aren't static or aren't final. Don't waste time talking about it code reviews. ArchUnit it!
    @ArchTest
    public final ArchRule loggers_should_be_private_static_final =
            fields().that().haveRawType(TaLogger.class)
                    .should().bePrivate()
                    .andShould().beStatic()
                    .andShould().beFinal()
                    .because("we agreed on this convention");
So the goal here is to conceptualise good rules that will help your to remain testable and maintainable  and then enforce them in a way that is easy to check and understand. ArchUnit really is a great library tool.

Sunday, June 9, 2019

Defining a Resource

Fielding's dissertation  describes a Resource as:

"Any information that can be named" ... "a document or image, a temporal service (e.g. “today’s weather in Los Angeles”), a collection of other resources, a non-virtual object (e.g. a person), and
so on. In other words, any concept that might be the target of an author’s hypertext
reference must fit within the definition of a resource. A resource is a conceptual mapping
to a set of entities, not the entity that corresponds to the mapping at any particular point in
time."

Defining a Resource is both a Science and an Art. It requires both Domain knowledge and API Architectural skills.   The following points detailed below serve as a checklist which may help you determine the shape of your Resource, what data it should contain and how it should be presented to consumers of your API.

The Resource must contain a Business Description

  • The business description should be 3 - 4 sentences in simple prose which explain what the Resource is. 
  • A developer with a moderate knowledge of  your system should be able to understand the description
  • Any caveats of the Resource should be made clear

The Resource should be useful on its own


This is similar to the maxim of defining the boundary of a micro-service, where a micro-service should be considered to be useful on its own.  Similarly, a Resource should be useful on its own.

For example, instead of:
/street-address/{id}

RESPONSE

{
    "street1": "String",
    "street2": "String"
}
and
/address-extra/{id}

RESPONSE 

{
    "city": "String",
    "country": "String"
}
It should be:
/address/{id}

RESPONSE

{
    "street1": "String",
    "street2": "String",
    "city": "String",
    "country": "String"
}
If a Resource on its own is not useful and always necessitates a subsequent request, it means code will inevitably become more complex as well as there being a performance impact incurred from the second request

Use an Appropriate Noun


Use of a simple noun over a compound noun is preferred.  For example, Address is better than AddressInfo or AddressDetail.  This is a general rule, there will always be exceptions.

If using multiple Resources to represent different views of the same data, for example: Address and AddressDetail, use the simple noun e.g Address first.  Then if the second representation is more detailed use ResourceNameDetail or if it is less detailed use ResourceNameSummary.  For example, suppose there is a requirement to introduce an Address type Resource:
  1. Address is introduced first
  2. If a subsequent view of Address is needed that is more detailed, the new Resource should be called AddressDetail
  3. If a subsequent view of Address is needed that is less detailed, the new Resource should be called AddressSummary

If it is only used in a READ does it need to be a Resource?


If a Resource is only ever used in a Read request and never a Write (Create, Partial Update, Full Update, Delete, ...) request it is questionable if it needs to be defined as a Resource with its own URI.  It could just be added to the parent payload and if there is a concern that payload then becomes too complex, the parent could just provide a sparse query - where the client can decide per API request what it wants returned.

Resources should conform to the uniform interface


The uniform interface is a very important part of good API design.  It is not just about using special verbs for different requests but also ensuring the data shape is consistent.

If creates, reads, updates, deletes etc are done in a consistent way, it means code is more consistent, reusable and more maintainable.

This means:
GET /addresses/{id}
and
GET /addresses
must return the same address data structure to represent an Address.
GET /addresses/{id}
RESPONSE
{
    "id":"546",
    "street1": "String",
    "street2": "String",
    "city": "String",
    "country": "String"
}
and
GET /addresses

RESPONSE
{
    "elements": [
         {
              "id":"546",
              "street1": "String",
              "street2": "String",
              "city": "String",
              "country": "String"
         },
         ...
     ]
}
Similarly, for write payloads, the Data Structure should be the same.  So, a partial update to change street1 would be:

PATCH /addresses/{id}
REQUEST

{
    "street1": "Walkview"
}

RESPONSE
{
    "id":"546",
    "street1": "Walkview",
    "street2": "Meadowbrook",
    "city": "Dublin",
    "country": "Ireland"
}
and not something like
PATCH /addresses/{id}
REQUEST

{
    "newStreet1Value": "Walkview"
}

From a Resource perspective, the data structure must be consistent. A different data structure means a different Resource, which should be named differently and have its own path.

Don't expose everything


If your DB model is quite sophisticated, you can be sure not all attributes need to be exposed at an API level. Some fields may only be getting persisted for back office processing and should never presented make it to any UI.

When adding an attribute to a Resource, consider:

  • to only include fields that you are sure the client is interested in 
  • if you are not sure, leave the attribute out. It is much smaller problem to add an attribute later on, then to remove an attribute that has already been exposed.

API Models shouldn't blindly mirror DB Relational model or OO Models


In database modelling approaches such as normalizing data or collapsing inheritance hierarchies are used.  In Object Orientated design, techniques such as polymorphism, inheritance hierarchies etc are used to promote things like code reuse and to reduce coupling.

Resource modelling does not have to follow theses techniques. The consumer of an API doesn't care if the data is all in one table, or normalized over multiple tables.  In general, the API returns data in a format that is easy to use and does not require much additional mapping by the client before it can become useful.

Use Hierarchical data to Avoid repetition


One of the advantages of hierarchical data over flat formats such as CSV is that it provides a mechanism to avoid repetition.  For example, consider a flat data structure which contains a list of persons and what team they are in.  In CSV this is:

team, firstname, lastname
Liverpool, Mo, Salah
Liverpool, Andy, Roberston

In JSON this could be:
{
    "team": "Liverpool",
    "players": [
        {
            "firstName":"Mo",
            "lastName":"Salah"
        },
        {
            "firstName":"Andy",
            "lastName":"Roberston"
        },
         ...
     ]
}

Use Hierarchical Data to Make context clear


Another advantage of hierarchical data is that it helps provide context. To understand a flat data structure you need to know what the query was that generated the data to understand the meaning of it.  For example, consider a bunch of rows that contain a date range.

name, fromDate, toDate, holidays
Tony, 2018-01-01, 2018-02-02, true
Tony, 2018-02-03, 2018-03-01, false

You could make assumptions that there is a new row when there is a change in Tony being on holidays.  But, what if there is another column?

name, fromDate, toDate, holidays, sick
Tony, 2018-01-01, 2018-02-02, true, false
Tony, 2018-02-03, 2018-03-01, false, true

Are the date ranges corresponding to holidays, sickness or both?

If we get more data back maybe it might be clearer...
name, fromDate, toDate, holidays, sick,
Tony, 2018-01-01, 2018-02-02, true, false
Tony, 2018-02-03, 2018-03-01, false, true
Tony, 2018-03-02, 2018-04-01, false, false
Now it looks like it's sickness that the date range corresponds to and its only a coincidence it lines up with a holiday period. However, when we get more data back this theory also fails.
name, fromDate, toDate, holidays, sick,
Tony, 2018-01-01, 2018-02-02, true, false
Tony, 2018-02-03, 2018-03-01, false, true
Tony, 2018-03-02, 2018-04-01, false, false
Tony, 2018-04-02, 2018-05-01, true, false

It gets even more complicated when just don't have some information.  For example:
name, fromDate, toDate, holidays, sick,
Tony, 2018-01-01, 2018-02-02, true, false
Tony, 2018-02-03, 2018-03-01, false, true
Tony, 2018-03-02, 2018-04-01, false, false
Tony, 2018-04-02, 2018-05-01, true, false
Tony, 2018-05-02, 2018-06-01, null, false
Tony, 2018-06-02, 2018-07-01, null, false
Tony, 2018-07-02, 2018-07-08, true, false
Tony, 2018-07-08, 2018-07-09, true, null

The limitation with flat data structures is not only lack of normalisation but that they can only go so far in making the data self-describing.

When it isn't clear what data means, it is inevitable processing the data will be buggy.

We could represent the same person data in hierarchical format as:
{
    "name":"tony",
    "holidays": [
         {
            "fromDate":"2018-01-01",
            "toDate":"2018-02-02"
         },
         {
             "fromDate":"2018-04-02",
             "toDate":"2018-05-01"
         }, 
         {
             "fromDate":"2018-07-02",
             "toDate":"2018-07-09"
         }
     ],
     "sick": [ 
         {
             "fromDate":"2018-02-03",
             "toDate":"2018-03-01"
         }
     ]
}
Now, the data is much more self describing and it is clear when a date range is for a holiday and when it is for a sick period.

Resource Relationships

Resources on their own only describe themselves. A Resource model describes relationships between Resources.  This will give an indication of:
  • dependencies between Resources. What Resources are needed for a particular Resource to exist or what is impacted when a particular Resource changes: updated or deleted.
  • Data navigation - in a large domain model, it is much easier to understand and follow if navigational and directional sense is provided to consumers of the model.  Especially, when to navigation across (Resources loosely connected) can be be differentiated from navigation down (Resources strongly connected)
Hypermedia links aren't only used to achieve HATEOAS.  Resources that describe what they are linked to using hypermedia links demonstrate a very powerful mechanism to express the Resource model. Advantages include:
  • A large domain model is split into more manageable pieces.  Typically users are only interested in a particular part of the model.  When Resources self describe their own relationships, it means a large complex model is split up into more digestible chunks and users get the information they need quicker. 
  • The Resource model is self-describing and kept in sync with code. Everything is co-located.
Make clear Parent - Child relationships
A Child Resource describes its Parent URL hierarchical name spacing. A Parent Resource has children of one or many types should make this clear by providing links to the children.  For example, if a Team Resource has Players child Resources.  The Team payload should make this clear.

REQUEST
https://api.server.com/teams/4676
RESPONSE

{
    "id":"34533",
    ...,
    "_links": {
          "self":"https://api.server.com/teams/4676",
          "players":"https://api.server.com/teams/4676/players"
    }
}

Make clear Peer relationships

This is similar to above except it is for Resources that exist in a different hierarchical name space. So for example, suppose the team is in division 1.  A link should be included in the team's division attribute.
REQUEST
https://api.server.com/teams/4676

RESPONSE
{
    "id":"34533",
    "division": {
        "name":"Division 1",
        "_links": {
              "self":"https://api.server.com/divisions/1"
        }
     },
     ..., 
    "_links": {
        "self":"https://api.server.com/teams/4676",
        "players":"https://api.server.com/teams/4676/players"
    }
}

Make clear Links to Other Representations

If data is modeled to have multiple Resources which represent different representations of the data, the Resources should also include links to each other.
REQUEST
https://api.server.com/teams/4676

RESPONSE
{
    "id":"34533",
    "division": {
        "name":"Division 1",
        "_links": {
              "self":"https://api.server.com/divisions/1"
        }
     },
     ..., 
    "_links": {
        "self":"https://api.server.com/teams/4676",
        "players":"https://api.server.com/teams/4676/players",
        "teamDetails":"https://api.server.com/teamDetails/4676"
    }
}