Saturday, March 28, 2020

Clean Unit Testing

It's easy to write "unit test" tests that use JUnit and some mocking library. They may produce code coverage, that keep some stakeholders happy, even though the tests aren't even unit tests and provide questionable value. It can also be very easy to write unit tests that are — in theory — units test but are more complex than the underlying code and hence just add to the total software entropy.

This particular type of software entropy has the unpleasant characteristic of making it even harder for the underlying software to be restructured or to surface new requirements. It is like the test has a negative value.

Doing unit testing properly is a lot harder than people think. In this article, I outline several tips that aim to improve the readability, maintainability and the quality of your unit tests.

Note: for the code snippets, Spock is used. For those who don't know Spock, consider it a very powerful DSL around JUnit which adds some nice features and cuts down on verbosity.

Reason for Failure 


The Unit Test should only fail if there is a problem with the code under test. A unit test for the class DBService should only fails if there is a bug with DBService not if there is a bug with any other class it depends on. Thus, in the unit test for DBService, the only instantiated object should be DBService. Every other object that DBService depends on should be stubbed or mocked.
Otherwise, you are testing code beyond DBService. While you might incorrectly think this is more bang for the buck, it means locating the root cause of problems will take longer. If the test fails, it could be because there is a problem with multiple classes but you don't know which one. Whereas, if it can only fail because the code under test is wrong, then you know exactly where the problem is.
Furthermore, thinking this way will improve the Object Orientated nature of your code. The tests will only test the responsibilities of the Class. If it's responsibilities aren't clear, or it can't do anything without another class, or the class is so trivial the test is pointless, it prompts the question that there something wrong with the class in terms of its responsibilities.
The only exception to not mocking or stubbing a dependent class is if you are using a well-known class from the Java library e.g. String. There is not much point stubbing or mocking that. Or, the dependent class is just a simple immutable POJO where there is not much value to stubbing or mocking it.

Stubbing and Mocking 


The terms mocking and stubbing can often be used interchangeably as if there were the same thing. They are not the same thing. In summary, if your code under test has a dependency on an object for which it never invokes a method on that object which has side effects, that object should be stubbed.
Whereas, if it has a dependency on an object for which it does invoke methods that have side effects then that should be mocked. Why is this important? Because your test should be checking for different things depending on the types of relationships it has with its dependencies.
Let's say your object under test is BusinessDelegate. BusinessDelegate receives requests to edit BusinessEntities. It performs some simple business logic and then invokes methods on a DBFacade (a facade class in front of a Database). So, the code under test looks like this:
public class BusinessDelegate {
     private DBFacade dbFacade;
     // ...

     public void edit(BusinessEntity businessEntity) {
         // Read some attributes on the business entity
         String newValue = businessEntity.getValue();
      
         // Some Business Logic, Data Mapping, and / or Validation

         //...  

         dbFacade.update(index, data)
    }
}
Regarding the BusinessDelegate class, we can see two relationships.


  1. A read-only relationship with BusinessEntity. The BusinessDelegate calls a few getters() on it and never changes its state or never invokes any methods that have side effects. 
  2. A relationship with DBFacade where it asks DBFacade to do something that we assume will have side effects. It is not the responsibility of BusinessDelegate to ensure the update happens, that is DBFacade's job. The responsibility of BusinessDelegate is to ensure the update method is invoked with the correct parameters — only.
So clearly, regarding the unit test for BusinessDelegate, BusinessEntity should be stubbed and DbFacade should be mocked. If we were using the Spock testing framework we could see this very clearly
class BusinessDelegateSpec {
    @Subject
    BusinessDelegate businessDelegate
    def dbFacade

    def setup() {
        dbFacade = Mock(DbFacade)
        businessDelegate =  new BusinessDelegate(dbFacade);
    }

    def "edit(BusinessEntity businessEntity)"() {
        given:
           def businessEntity = Stub(BusinessEntity)
           // ...
        when:
            businessDelegate.edit(businessEntity)
        then:
            1 * dbFacade.update(data)
    }
}
Having a good understanding of stub mock differentiation improves OO quality dramatically. Instead of just thinking about what the object does, the relationships and dependencies between them get much more focus. It is now possible for unit tests to help enforce design principles that would otherwise just get lost.

Stub and Mock in the Right Place 


The curious among you, might be wondering why in the above code sample, dbFacade declared at class level, while businessEntity was declared at method level? Well, the answer is, unit test code is much more readable the more it can mirror the code under test. In the actual BusinessDelegate class, the dependency on dbFacade is at the class level and the dependency on BusinessEntity at the method level.
In the real world when a BusinessDelegate is instantiated a DbFacade dependency will exist, anytime BusinessDelegate is instantiated for a unit test it is ok to have the DbFacade dependency also existing.
Sound reasonable? Hope so. There are two further advantages of doing this:
  • A reduction in code verbosity. Even using Spock, unit tests can become verbose. If you move class-level dependencies out of the unit test, you will reduce test code verbosity. If your class has a dependency on four other classes at the class level that minimum four lines of code out of each test.
  • Consistency. Developers tend to write unit tests their way. Fine if they are the only people reading their code; but this is rarely the case. Therefore, the more consistency we have across the tests the easier they are to maintain. So if you read a test you have never read before and at least see variables being stubbed and mocked in specific places for specific reasons, you will find the unit test code easier to read.

Variable Declaration Order 


This is a follow on from the last point. Declaring the variables in the right place is a great start, the next thing is to do in the same order they appear in the code. So, if we have something like below.
public class BusinessDelegate {

    private BusinessEntityValidator businessEntityValidator;
    private DbFacade dbFacade;
    private ExcepctionHandler exceptionHandler;

    @Inject
    BusinessDelegate(BusinessEntityValidator businessEntityValidator, DbFacade dbFacade, ExcepctionHandler exceptionHandler) {
        // ...
        // ...
    }
    public BusinessEntity read(Request request, Key key) {
         // ... 

    }
    
}
It is much easier to read the test code if they stubs and mocks are defined in the same order as the way the class declares them.
class BusinessDelegateSpec {
    @Subject BusinessDelegate businessDelegate
    //  class level dependencies in the same order
    def businessEntityValidator
    def dbFacade
    def exceptionHandler

    def setup() {
        businessEntityValidator = Stub(BusinessEntityValidator)
        dbFacade = Mock(DbFacade)
        exceptionHandler =  Mock(ExceptionHandler)
        businessDelegate = new BusinessDelegate(businessEntityValidator, dbFacade, exceptionHandler)
    }

    def "read(Request request, Key key)"() {
        given:
            def request = Stub(Request)
            def key = Stub(key)
        when:
            businessDelegate.read(request, key)
        then:
            // ...
    }
}

Variable Naming 


And if you thought the last point was pedantic, you'll be glad to know this one also is. The variable names used to represent the stubs and mocks should be the same names that are used in the actual code. Even better, if you can name the variable the same as the type in the code under test and not lose any business meaning then do that. In the last code sample, the parameter variables are named requestInfo and key and they corresponding stubs have the same names. This is much easier to follow than doing something like this:
//.. 
public void read(Request info, Key someKey) {
  // ...
}
// corresponding test code
def "read(Request request, Key key)"() {
    given:
        def aRequest = Stub(Request)
        def myKey = Stub(key)  // you ill get dizzy soon!
        // ... 

Avoid Over Stubbing 


Too much stubbing (or mocking) usually means something has gone wrong. Let's consider the Law of Demeter. Imagine some telescopic method call...
List queryBusinessEntities(Request request, Params params) {
    // check params are allowed
    Params paramsToUpdate =        queryService.getParamResolver().getParamMapper().getParamComparator().compareParams(params)
    // ...
    // ...
}
It is not enough to stub queryService. Now whatever is returned by resolveAllowableParams() has to be stubbed and that stub has to have mapToBusinessParamsstubbed() which then has to have mapToComparableParams() stubbed. Even with a nice framework like Spock which minimizes verbosity, you will have to four lines of stubbing for what is one line of Java code.
def "queryBusinessEntities()"() {
   given: 
      def params = Stub(Params)
      def paramResolver = Stub(ParamResolver)
      queryService.getParamResolver() = paramResolver
      def paramMapper = Stub(ParamMapper)
      paramResolver.getParamMapper() >> paramMapper
      def paramComparator = Stub (ParamComparator)
      paramMapper.getParamComparator() >> paramComparator
      Params paramsToUpdate = Stub(Params)
      paramComparator.comparaParams(params) >> paramsToUpdate
   when:
       // ...
   then: 
        // ...
}
Yuck! Look at how that one line of Java does to our unit test. It gets even worse if you are not using something like Spock. The solution is to avoid telescopic method calling and try to just use direct dependencies. In this case, just inject theParamComparator directly into our class. Then the code becomes...
List queryBusinessEntities(Request request, Params params) {
    // check params are allowed
    Params paramsToUpdate = paramComparator.compareParams(params)
    // ...
    // ...
}
and the test code becomes
setup() {
    // ...
    // ...
    paramComparator = Stub (ParamComparator)
    businessEntityDelegate = BusinessEntityDelegate(paramComparator) 
}

def "queryBusinessEntities()"() {
   given: 
      def params = Stub(Params)
      Params paramsToUpdate = Stub(Params)
      paramComparator.comparaParams(params) >> paramsToUpdate
   when:
       // ..
   then: 
        // ...
}
All of the sudden people should be thanking you for feeling less dizzy. 

Gherkin Syntax 


Bad unit tests have horrible things like asserts all over the place The top the middle and the bottom. It can very quickly get nauseating trying to figure out which ones are important, which ones are redundant and which ones require which bit of set up etc etc. Schematic things are easier to follow. That is the real advantage of the Gherkin syntax. The scenario is set up in the given: always, the when: is the test scenario and then: is what we expect. Even better using, something like Spock means you have a nice, neat DSL so that the given:, when:, and then: can all be co-located in the one test method.

Narrow When Wide Then 

If a unit test is testing four methods, is it a unit test? Consider the below test:
def "test several methods" {
    given: 
        // ...
    when:
        def name = personService.getname();
        def dateOfBirth = personService.getDateOfBirth();
        def country = personService.getCountry();
    then:
        name == "tony"
        dateOfBirth == "1970-04-04"
        country == "Ireland"
}
First up, if Jenkins tells you this failed, you are going to have to root around and figure out what part of the class is wrong. Because the test doesn't focus on a specific method you don't know immediately which method is failing. Second up, say if it is getName() that is failing, how do you know getDateOfBirth() and getCountry() are working? The test stops on the first failure. So when the test fails, you don't even know if you have one method not working or three methods not working. You can go around telling everyone you have 99% code coverage and one test failing. But — how much was that one test really doing?
Furthermore, what's easier to fix? A small test or a long test? Ideally, a test should check a single interaction with the thing you are testing. Now, this doesn't mean you can only have one asset, but you should have a narrow when and a wide then.
So let's take the narrow when first. Ideally, one line of code only. The one line of code matches the method you are unit testing.
def "getName()" {
    given: 
        // ...
    when:
        def name = personService.getname();
    then:
        name == "tony"
}

def "getDateOfBirth()" {
    given: 
        // ...
    when:
        def dateOfBirth = personService.getDateOfBirth();
    then:
        dateOfBirth == "1970-04-04"
}

def "getCountry()" {
    given: 
        // ...
    when:
        def country = personService.getCountry();
    then:
        country == "Ireland"
}
Now we could have the exact same code coverage, if getName() fails but getCountry() and getDateOfBirth() pass, but there is a problem with getName() and not getCountry() and getDateOfBirth(). Getting the granularity of a test is an entirely different stat to code coverage. It should be ideally one unit test minimum for every non-private method. It is more when you factor in negative tests etc. It is perfectly fine to have multiple asserts in a unit test. For example, suppose we had a method that delegated onto other classes.
Consider a method resyncCache() which in its implementation calls two other methods on a cacheService object, clear() and reload().
def "resyncCache()" {
    given: 
        // ...
    when:
        personService.resyncCache();
    then:
        1 * cacheService.clear()
        1 * cacheService.reload()
}
In this scenario, it would not make sense to have two separate tests. The "when" is the same and if either fails, you know immediately which method you have to look at. Having two separate tests just means twice the effort with little benefit. The subtle thing to get right here is to ensure your assets are in the right order. They should be in the same order as code execution. So, clear() is invoked before reload(). If the test fails at clear(), there is not much point going on to check to reload() anyway as the method is broken. If you don't follow the assertion order tip, and assert on reload() first and that is reported as failing, you won't know if clear() invocation which is supposed to happen first even happened. Thinking this way will make help you become a Test Ninja!
The ordering tip for mocking and stubbing, the same applies to assert. Assert in chronological order. It's pedantic but it will make test code much more maintainable.

Parameterization 

The parameterization is a very powerful capability that can greatly reduce test code verbosity and rapidly increase branch coverage in code paths. The Unit Test Ninja should be always able to spot when to use it!
An obvious indication that a number of tests could be grouped into one test and parameterized is that they have the same when blocks, except for different input parameters.
For example, consider the below.
def "addNumbers(), even numbers"() {
    given:
      // ...
    when:
      def answer = mathService.addNumbers(4, 4);
    then:
      // ...
}

def "addNumbers(), odd numbers"() {
    given:
      // ...
    when:
      def answer = mathService.addNumbers(5, 5);
    then:
      // ...
}
As we can see here the when is the same except the input parameters. This is a no-brainer for parameterization.
@Unroll("number1=#number1, number2=#number2")  // unroll will provide the exact values in test report
def "addNumbers()"(int number1, int number2) {
    given:
      // ...
    when:
      def answer = mathService.addNumbers(number1, number2);
    then:
      // ...
    where:
      number1   | number2   || answer
      4         | 4         || 8
      5         | 5         || 10
}
Immediately we get a 50% reduction in code. We have also made it much easier to add further permutations by just adding another row to the where table. So, while it may seem very obvious that these two tests should have been the one parameterized test, it is only obvious if the maxim of having a narrow when is adhered to. The narrow "when" coding style makes the exact scenario being tested much easier to see. If a wide when is used with lots of things happening it is not and therefore spotting tests to parameterize is harder.
Usually, the only time to not parameterize a test that has the same syntactic where: code block is when the expectations are a completely different structure. Expecting an int is the same structure, expecting an exception in one scenario and an int in another means you have two different structures. In such scenarios, it is better not to parameterize. A proverbial (and infamous) example of this is mixing a positive and negative test.
Suppose our addNumbers() method will throw an exception if it receives a float, that's a negative test and should be kept separate. A then: block should never contain an if statement. It is a sign a test is becoming too flexible and a separate test with no if statements would make more sense.

Summary 

Clean unit testing is very important if you want to have maintainable code, release regularly and enjoy your Software Engineering more.



























Sunday, December 8, 2019

Arch Unit

If Software Architecture is done to a reasonable standard, we should expect to see:
  • Well designed patterns that can fulfill both functional requirements and non-functional requirements
  • No crazy crazy coupling, concerns are properly separated and everything is testable.
If we get that, we should have confidence that as the software evolves it is maintainable. So the tricky part is all too often Architectural rules start off great on a whiteboard (or a powerpoint slide) but just get lost in code, because they are too difficult to enforce.

Arch Unit is a super mechanism to impose Architectural rules and patterns on your code base.  It has been around a few years but something I only discovered this year.  I came across it when I was trying to think of ways to ensure the proverbial "utils" package did not turn into the proverbial "dumping ground". Ideally, we'd have no utils packages ever. But, in the real world they nearly always exist. The utils package shouldn't have many efferent dependencies.   So for example, suppose you have a package called shoppingcart.   Then you have need some sort of utility function to add the total of two carts, remove special offers, add loyalty discounts, blah blah blah.  The last thing you want to see is someone checking that into the utils package with dependencies towards the shoppingcart package.  Because, if it is so shoppingcart focused, it should really just be in the shoppingcart package. If this happens, very soon your utils package will have dependencies to everything and everything will have dependencies to it.  Disaster. What is the point in packages if anything can just depend on anything? They will cease to provide any name-spacing or encapsulation benefits.

So, how can Arch Unit help?  Well very simple you define Architectural rules like a JUnit test.  Wait a sec...  It is a JUnit test.    The efferent (outward) and afferent (inward) for you utils package are very simple expressed as:

@ArchTest
public static final ArchRule utilPackageDependenciesRules = classes().that().resideInAPackage("com.company.application.util")
           .should().onlyDependOnClassesThat().resideInAnyPackage(getAllowedDependencies("com.company.application.exception"))
           .andShould().onlyHaveDependentClassesThat().resideInAnyPackage("com.company.application.shoppingcart"
 "com.company.application.payment);
So that's it. Now repeat for every package and you now have code control that runs like any other JUnit test. So therefore it will run easily as part of your CI, CD etc. Now, if you have architected your packages well, you don't  have to bring up at code reviews. Instead the rules are part of your CI. As your software evolves and new packages come along and dependencies rules change, simply just change the rules that are expressed in nice fluent Java APIs. Someone new joins the teams and wants to get up to speed on the Architectural package rules? Simple, just direct them Architectural tests.

Not only does ArchUnit give you the ability to express package rules, you can also define your own rules aka conditions and then apply them to whatever code you want. For example, suppose you want a condition that an object is immutable. You naturally therefore want no setters. That could be expressed by this condition.
    static ArchCondition noPublicSettersCondition =
         new ArchCondition("class has no public setters") {
             @Override
             public void check(JavaClass item, ConditionEvents events) {
                 for (JavaMethod javaMethod: item.getMethods()) {
                     if (javaMethod.getName().startsWith("set") && 
                       javaMethod.getModifiers().contains(JavaModifier.PUBLIC)) {
                         String message = String.format(
                             "Public method %s is not allowed begin with setter", javaMethod.getName());
                         events.add(SimpleConditionEvent.violated(item, message));
                     }
                 }
             }
         };
You could then apply the noSetter condition to any custom Exception a developer may write. It wouldn't be good if an Exception had a setter would it?
    @ArchTest
    public static final ArchRule noExceptionsHaveSetters = classes().that()
      .areAssignableTo(RuntimeException.class).should(noSettersCondition);
    
Suppose you keep noticing that Loggers defined in classes either aren't private, aren't static or aren't final. Don't waste time talking about it code reviews. ArchUnit it!
    @ArchTest
    public final ArchRule loggers_should_be_private_static_final =
            fields().that().haveRawType(TaLogger.class)
                    .should().bePrivate()
                    .andShould().beStatic()
                    .andShould().beFinal()
                    .because("we agreed on this convention");
So the goal here is to conceptualise good rules that will help your to remain testable and maintainable  and then enforce them in a way that is easy to check and understand. ArchUnit really is a great library tool.

Sunday, June 9, 2019

Defining a Resource

Fielding's dissertation  describes a Resource as:

"Any information that can be named" ... "a document or image, a temporal service (e.g. “today’s weather in Los Angeles”), a collection of other resources, a non-virtual object (e.g. a person), and
so on. In other words, any concept that might be the target of an author’s hypertext
reference must fit within the definition of a resource. A resource is a conceptual mapping
to a set of entities, not the entity that corresponds to the mapping at any particular point in
time."

Defining a Resource is both a Science and an Art. It requires both Domain knowledge and API Architectural skills.   The following points detailed below serve as a checklist which may help you determine the shape of your Resource, what data it should contain and how it should be presented to consumers of your API.

The Resource must contain a Business Description

  • The business description should be 3 - 4 sentences in simple prose which explain what the Resource is. 
  • A developer with a moderate knowledge of  your system should be able to understand the description
  • Any caveats of the Resource should be made clear

The Resource should be useful on its own


This is similar to the maxim of defining the boundary of a micro-service, where a micro-service should be considered to be useful on its own.  Similarly, a Resource should be useful on its own.

For example, instead of:
/street-address/{id}

RESPONSE

{
    "street1": "String",
    "street2": "String"
}
and
/address-extra/{id}

RESPONSE 

{
    "city": "String",
    "country": "String"
}
It should be:
/address/{id}

RESPONSE

{
    "street1": "String",
    "street2": "String",
    "city": "String",
    "country": "String"
}
If a Resource on its own is not useful and always necessitates a subsequent request, it means code will inevitably become more complex as well as there being a performance impact incurred from the second request

Use an Appropriate Noun


Use of a simple noun over a compound noun is preferred.  For example, Address is better than AddressInfo or AddressDetail.  This is a general rule, there will always be exceptions.

If using multiple Resources to represent different views of the same data, for example: Address and AddressDetail, use the simple noun e.g Address first.  Then if the second representation is more detailed use ResourceNameDetail or if it is less detailed use ResourceNameSummary.  For example, suppose there is a requirement to introduce an Address type Resource:
  1. Address is introduced first
  2. If a subsequent view of Address is needed that is more detailed, the new Resource should be called AddressDetail
  3. If a subsequent view of Address is needed that is less detailed, the new Resource should be called AddressSummary

If it is only used in a READ does it need to be a Resource?


If a Resource is only ever used in a Read request and never a Write (Create, Partial Update, Full Update, Delete, ...) request it is questionable if it needs to be defined as a Resource with its own URI.  It could just be added to the parent payload and if there is a concern that payload then becomes too complex, the parent could just provide a sparse query - where the client can decide per API request what it wants returned.

Resources should conform to the uniform interface


The uniform interface is a very important part of good API design.  It is not just about using special verbs for different requests but also ensuring the data shape is consistent.

If creates, reads, updates, deletes etc are done in a consistent way, it means code is more consistent, reusable and more maintainable.

This means:
GET /addresses/{id}
and
GET /addresses
must return the same address data structure to represent an Address.
GET /addresses/{id}
RESPONSE
{
    "id":"546",
    "street1": "String",
    "street2": "String",
    "city": "String",
    "country": "String"
}
and
GET /addresses

RESPONSE
{
    "elements": [
         {
              "id":"546",
              "street1": "String",
              "street2": "String",
              "city": "String",
              "country": "String"
         },
         ...
     ]
}
Similarly, for write payloads, the Data Structure should be the same.  So, a partial update to change street1 would be:

PATCH /addresses/{id}
REQUEST

{
    "street1": "Walkview"
}

RESPONSE
{
    "id":"546",
    "street1": "Walkview",
    "street2": "Meadowbrook",
    "city": "Dublin",
    "country": "Ireland"
}
and not something like
PATCH /addresses/{id}
REQUEST

{
    "newStreet1Value": "Walkview"
}

From a Resource perspective, the data structure must be consistent. A different data structure means a different Resource, which should be named differently and have its own path.

Don't expose everything


If your DB model is quite sophisticated, you can be sure not all attributes need to be exposed at an API level. Some fields may only be getting persisted for back office processing and should never presented make it to any UI.

When adding an attribute to a Resource, consider:

  • to only include fields that you are sure the client is interested in 
  • if you are not sure, leave the attribute out. It is much smaller problem to add an attribute later on, then to remove an attribute that has already been exposed.

API Models shouldn't blindly mirror DB Relational model or OO Models


In database modelling approaches such as normalizing data or collapsing inheritance hierarchies are used.  In Object Orientated design, techniques such as polymorphism, inheritance hierarchies etc are used to promote things like code reuse and to reduce coupling.

Resource modelling does not have to follow theses techniques. The consumer of an API doesn't care if the data is all in one table, or normalized over multiple tables.  In general, the API returns data in a format that is easy to use and does not require much additional mapping by the client before it can become useful.

Use Hierarchical data to Avoid repetition


One of the advantages of hierarchical data over flat formats such as CSV is that it provides a mechanism to avoid repetition.  For example, consider a flat data structure which contains a list of persons and what team they are in.  In CSV this is:

team, firstname, lastname
Liverpool, Mo, Salah
Liverpool, Andy, Roberston

In JSON this could be:
{
    "team": "Liverpool",
    "players": [
        {
            "firstName":"Mo",
            "lastName":"Salah"
        },
        {
            "firstName":"Andy",
            "lastName":"Roberston"
        },
         ...
     ]
}

Use Hierarchical Data to Make context clear


Another advantage of hierarchical data is that it helps provide context. To understand a flat data structure you need to know what the query was that generated the data to understand the meaning of it.  For example, consider a bunch of rows that contain a date range.

name, fromDate, toDate, holidays
Tony, 2018-01-01, 2018-02-02, true
Tony, 2018-02-03, 2018-03-01, false

You could make assumptions that there is a new row when there is a change in Tony being on holidays.  But, what if there is another column?

name, fromDate, toDate, holidays, sick
Tony, 2018-01-01, 2018-02-02, true, false
Tony, 2018-02-03, 2018-03-01, false, true

Are the date ranges corresponding to holidays, sickness or both?

If we get more data back maybe it might be clearer...
name, fromDate, toDate, holidays, sick,
Tony, 2018-01-01, 2018-02-02, true, false
Tony, 2018-02-03, 2018-03-01, false, true
Tony, 2018-03-02, 2018-04-01, false, false
Now it looks like it's sickness that the date range corresponds to and its only a coincidence it lines up with a holiday period. However, when we get more data back this theory also fails.
name, fromDate, toDate, holidays, sick,
Tony, 2018-01-01, 2018-02-02, true, false
Tony, 2018-02-03, 2018-03-01, false, true
Tony, 2018-03-02, 2018-04-01, false, false
Tony, 2018-04-02, 2018-05-01, true, false

It gets even more complicated when just don't have some information.  For example:
name, fromDate, toDate, holidays, sick,
Tony, 2018-01-01, 2018-02-02, true, false
Tony, 2018-02-03, 2018-03-01, false, true
Tony, 2018-03-02, 2018-04-01, false, false
Tony, 2018-04-02, 2018-05-01, true, false
Tony, 2018-05-02, 2018-06-01, null, false
Tony, 2018-06-02, 2018-07-01, null, false
Tony, 2018-07-02, 2018-07-08, true, false
Tony, 2018-07-08, 2018-07-09, true, null

The limitation with flat data structures is not only lack of normalisation but that they can only go so far in making the data self-describing.

When it isn't clear what data means, it is inevitable processing the data will be buggy.

We could represent the same person data in hierarchical format as:
{
    "name":"tony",
    "holidays": [
         {
            "fromDate":"2018-01-01",
            "toDate":"2018-02-02"
         },
         {
             "fromDate":"2018-04-02",
             "toDate":"2018-05-01"
         }, 
         {
             "fromDate":"2018-07-02",
             "toDate":"2018-07-09"
         }
     ],
     "sick": [ 
         {
             "fromDate":"2018-02-03",
             "toDate":"2018-03-01"
         }
     ]
}
Now, the data is much more self describing and it is clear when a date range is for a holiday and when it is for a sick period.

Resource Relationships

Resources on their own only describe themselves. A Resource model describes relationships between Resources.  This will give an indication of:
  • dependencies between Resources. What Resources are needed for a particular Resource to exist or what is impacted when a particular Resource changes: updated or deleted.
  • Data navigation - in a large domain model, it is much easier to understand and follow if navigational and directional sense is provided to consumers of the model.  Especially, when to navigation across (Resources loosely connected) can be be differentiated from navigation down (Resources strongly connected)
Hypermedia links aren't only used to achieve HATEOAS.  Resources that describe what they are linked to using hypermedia links demonstrate a very powerful mechanism to express the Resource model. Advantages include:
  • A large domain model is split into more manageable pieces.  Typically users are only interested in a particular part of the model.  When Resources self describe their own relationships, it means a large complex model is split up into more digestible chunks and users get the information they need quicker. 
  • The Resource model is self-describing and kept in sync with code. Everything is co-located.
Make clear Parent - Child relationships
A Child Resource describes its Parent URL hierarchical name spacing. A Parent Resource has children of one or many types should make this clear by providing links to the children.  For example, if a Team Resource has Players child Resources.  The Team payload should make this clear.

REQUEST
https://api.server.com/teams/4676
RESPONSE

{
    "id":"34533",
    ...,
    "_links": {
          "self":"https://api.server.com/teams/4676",
          "players":"https://api.server.com/teams/4676/players"
    }
}

Make clear Peer relationships

This is similar to above except it is for Resources that exist in a different hierarchical name space. So for example, suppose the team is in division 1.  A link should be included in the team's division attribute.
REQUEST
https://api.server.com/teams/4676

RESPONSE
{
    "id":"34533",
    "division": {
        "name":"Division 1",
        "_links": {
              "self":"https://api.server.com/divisions/1"
        }
     },
     ..., 
    "_links": {
        "self":"https://api.server.com/teams/4676",
        "players":"https://api.server.com/teams/4676/players"
    }
}

Make clear Links to Other Representations

If data is modeled to have multiple Resources which represent different representations of the data, the Resources should also include links to each other.
REQUEST
https://api.server.com/teams/4676

RESPONSE
{
    "id":"34533",
    "division": {
        "name":"Division 1",
        "_links": {
              "self":"https://api.server.com/divisions/1"
        }
     },
     ..., 
    "_links": {
        "self":"https://api.server.com/teams/4676",
        "players":"https://api.server.com/teams/4676/players",
        "teamDetails":"https://api.server.com/teamDetails/4676"
    }
}

Wednesday, December 19, 2018

What's the case for your API?

Disclaimer: In pure REST, API's are opaque and the URL should be whatever what was sent as a link in the response to a previous request. But, I'm not talking pure REST, I'm talking more pragmatic APIs which involve some concepts from REST and as well as general API best practices. 

When writing an API, it starts simple. You identify the obvious resources and end up with endpoints such as:
/api.mycompany.com/tweet

Eventually, your API will have to capture more sophisticated concepts and model more complex resources that cannot be expressed in short single nouns.  Some real world examples include:
  • Enabling request validation via a Request Validator resource (AWS API Gateway API)
  • Performing a customer search via a Customer Search resource (Google Customer Search API)
  • Running powerful checks against code via a Check Runs resource (Github API)
In English grammar, nouns that are really two nouns joined in some way are called compound nouns and in English grammar, compound nouns follow one of three patterns:
  1. All the one word: haircut, toothpaste
  2. Two words: rain forest, ice cream
  3. Hyphenated: self-esteem, brother-in-law
In the API world there are different options to choose from but it is better for consistency that your APIs just pick one approach and stick to that it. So firstly, what are the options for compound nouns from an API perspective?

Camel Case


Camel case is the practise of writing each word in the phrase with a capital letter.  There are two variations:
  1. Initial upper case (also know as Pascal's case) is where the first letter is also a capital, for example: IceCream.  Pascal's case is popular in programming languages for naming classes e.g. Java. 
  2. Initial lower case is where the initial letter is always lower case, for example: iceCream.  This approach is popular in programming languages (again Java is a good example)  for naming variables.  When people say camel case, they are usually referring to the initial lower case format.

Kebab Case

In Kebab Case, the individual words are separated by hyphens. Ice cream is expressed as ice-cream.  This approach is used in the Lisp programming language, in lots of URLs (for example, every blog post in www.blogger.com e.g. http://dublintech.blogspot.com/2018/08/oauth-20-authorisation-code-grant.html).  The observant amongst you will note sometimes the word "dash" is sometimes used in technical references instead of "hyphen".  So, what's the difference?  In English grammar, hyphen is the thing  used to join two words to make one whereas the dash is the thing used to usually add some sort stylistic emphasis to the end of a sentence such as: "I might have an interesting point here - you never know".

In programming we don't care whether the term is hyphen and dash. They are used interchangeably and mean the same thing.

The kebab case approach became popular in Web URIs because search engines knew that the hyphen meant separate words and could index the URI properly.  This convention used by search engines meant hyphens became a de facto standard for URIs.

Snake Case

In this approach, an underscore is used to separate words.  Ice cream becomes ice_cream. This approach is used in Python and Ruby for anything other than a class name or static constant.

Join words

In this approach the words are just joined. There is no -, no _ and no capitalisation of anything. This is not a popular approach with developers because it's difficult to read.

APIs

Should we use camelCase, kebab-case or snake_case in an API?  Well unfortunately, Mr. Fielding's dissertation did not go into such detail.  So what are people actually doing?  And is the approach used consistent across the API's URL and the JSON Body.  Let's take a look.

AWS

AWS have different API styles for different services.  The API Gateway REST API reference shows that JSON payload uses camel case but the URL uses nothing, it's just:
/restapis/{id}/requestvalidators/{requestvalidatorId}

Google

Surprise, surprise Google also have lots of APIs. The Google
Custom Search API is similar to the AWS API Gateway API.  The compound noun in the URL is just the one word and the JSON body is camel case.  

The Google Gmail API has camel case in request body and in some URLs, for example the forwarding addresses API.  

The Google youtube API sometimes will use kebab case in the URL e.g. yt-analytics but in other cases will use single word e.g. youtubepartner.   But the JSON payload is camel case.

Github

The Github API is a good example where we get a reminder that if possible, you should try to avoid this issue by trying to avoid compound nouns as it avoids them by using some creative name spacing.

However, some more rooting around and you'll find a compound noun such as check run expressed using kebab case in the URL and he JSON body using snake case. 

Stripe

Stripe use snake case in the URL and in the JSON body.  For example the PaymentsIntents API

 https://api.stripe.com/v1/payment_intents 

and JSON body...
{
  "id": "pi_Aabcxyz01aDfoo",
  "object": "payment_intent",
  "allowed_source_types": [
    "card"
  ],
  "amount": 1099,
  "amount_capturable": 1000,

Paypal

Paypal have more compound nouns than the other APIs checked. APIs for resources such as billing agreement  the API will use kebab case in the URL but then use snake case in the JSON payloads.

Twitter

Twitter use snake case in the URL e.g. /saved_searches/ and snake case in the JSON payloads.

Facebook

Facebook's Graph API tends to avoid resource naming in URLs and in JSON bodies it is snake case.

By this stage, you should be getting a little but confused. So let's recap via the table below.

APIURLJSON Body
AWS API GatewayNo separatorcamelCase
Facebook Graph API N/Asnake_case
Github Snake and Kebabsnake_case
Google custom search No separatorcamelCase
Google Gmail camelCasecamelCase
LinkedIn camelCasecamelCase
Pay pal kebab-casesnake_case
Stripe snake_casesnake_case
Twitter snake_casesnake_case


Everyone is different, what should I do?

So there is a lack of consistency across the industry.  However there are point worth making:
  1. In general compound nouns are best avoided.  In all the APIs checked (except PayPal), they appear in under 5% of the APIs.  This means developers don't get upset when their favourite approach is not used.
  2. The only Web API in the selection above that had more than 5% of its APIs using compound nouns was PayPal and they went for kebab-case in URIs.
  3. kebab-case is never used in any JSON body.  The syntax is allowed.  So what drives this trend? It's more than likely because JavaScript Web UIs are possibly the mos popular client invoking API and the similarly the most popular back end language serving the API is Java and both of those dudes don't allow the - in any of their declarations.

Is there anyone else saying anything?


In the excellent REST API Design Cookbook, industry expert Mark Masse suggests:
  1. Make your APIs lower case, which rules out camel case
  2. Use kebab case when expressing compound terms in your URIs
  3. Avoid using underscores as they may not display well since some browser render hyperlinks with an underline added

Make a decision

  1. Avoid compound nouns if you can.  This isn't always possible. Sticking to a ubiquitous language is important and helpful.  If you have a complex business application you will have lots of compound nouns. 
  2. If you can't avoid compound nounds and more than 5% of the APIs are going to involve compound nouns use kebab case for your URIs. Why?  Because if you have a complex business domain it's not only developers you need to think about.  Lots of BA's, Product Architects, curious Managers will also be looking at your APIs. Kebab-case is the easiest to read for everyone.
  3. For JSON body, I think it is okay to use camelCase because this is the easiest to map back to JavaScript and Java code.  It is also a recommendation from Google to use camelCase in JSON.
  4. If you have to use camelCase in your URI's, consider using the first letter capital approach for the URIs as the URIs are supposed to marking resources not attributes. Resources are more analogous to Java Classes which also use initial letter capital format; whereas the JSON payload attributes are analogous to Java attributes which use initial lower case.
Until the next time, take care of yourselves.