Sunday, June 9, 2019

Defining a Resource

Fielding's dissertation  describes a Resource as:

"Any information that can be named" ... "a document or image, a temporal service (e.g. “today’s weather in Los Angeles”), a collection of other resources, a non-virtual object (e.g. a person), and
so on. In other words, any concept that might be the target of an author’s hypertext
reference must fit within the definition of a resource. A resource is a conceptual mapping
to a set of entities, not the entity that corresponds to the mapping at any particular point in
time."

Defining a Resource is both a Science and an Art. It requires both Domain knowledge and API Architectural skills.   The following points detailed below serve as a checklist which may help you determine the shape of your Resource, what data it should contain and how it should be presented to consumers of your API.

The Resource must contain a Business Description

  • The business description should be 3 - 4 sentences in simple prose which explain what the Resource is. 
  • A developer with a moderate knowledge of  your system should be able to understand the description
  • Any caveats of the Resource should be made clear

The Resource should be useful on its own


This is similar to the maxim of defining the boundary of a micro-service, where a micro-service should be considered to be useful on its own.  Similarly, a Resource should be useful on its own.

For example, instead of:
/street-address/{id}

RESPONSE

{
    "street1": "String",
    "street2": "String"
}
and
/address-extra/{id}

RESPONSE 

{
    "city": "String",
    "country": "String"
}
It should be:
/address/{id}

RESPONSE

{
    "street1": "String",
    "street2": "String",
    "city": "String",
    "country": "String"
}
If a Resource on its own is not useful and always necessitates a subsequent request, it means code will inevitably become more complex as well as there being a performance impact incurred from the second request

Use an Appropriate Noun


Use of a simple noun over a compound noun is preferred.  For example, Address is better than AddressInfo or AddressDetail.  This is a general rule, there will always be exceptions.

If using multiple Resources to represent different views of the same data, for example: Address and AddressDetail, use the simple noun e.g Address first.  Then if the second representation is more detailed use ResourceNameDetail or if it is less detailed use ResourceNameSummary.  For example, suppose there is a requirement to introduce an Address type Resource:
  1. Address is introduced first
  2. If a subsequent view of Address is needed that is more detailed, the new Resource should be called AddressDetail
  3. If a subsequent view of Address is needed that is less detailed, the new Resource should be called AddressSummary

If it is only used in a READ does it need to be a Resource?


If a Resource is only ever used in a Read request and never a Write (Create, Partial Update, Full Update, Delete, ...) request it is questionable if it needs to be defined as a Resource with its own URI.  It could just be added to the parent payload and if there is a concern that payload then becomes too complex, the parent could just provide a sparse query - where the client can decide per API request what it wants returned.

Resources should conform to the uniform interface


The uniform interface is a very important part of good API design.  It is not just about using special verbs for different requests but also ensuring the data shape is consistent.

If creates, reads, updates, deletes etc are done in a consistent way, it means code is more consistent, reusable and more maintainable.

This means:
GET /addresses/{id}
and
GET /addresses
must return the same address data structure to represent an Address.
GET /addresses/{id}
RESPONSE
{
    "id":"546",
    "street1": "String",
    "street2": "String",
    "city": "String",
    "country": "String"
}
and
GET /addresses

RESPONSE
{
    "elements": [
         {
              "id":"546",
              "street1": "String",
              "street2": "String",
              "city": "String",
              "country": "String"
         },
         ...
     ]
}
Similarly, for write payloads, the Data Structure should be the same.  So, a partial update to change street1 would be:

PATCH /addresses/{id}
REQUEST

{
    "street1": "Walkview"
}

RESPONSE
{
    "id":"546",
    "street1": "Walkview",
    "street2": "Meadowbrook",
    "city": "Dublin",
    "country": "Ireland"
}
and not something like
PATCH /addresses/{id}
REQUEST

{
    "newStreet1Value": "Walkview"
}

From a Resource perspective, the data structure must be consistent. A different data structure means a different Resource, which should be named differently and have its own path.

Don't expose everything


If your DB model is quite sophisticated, you can be sure not all attributes need to be exposed at an API level. Some fields may only be getting persisted for back office processing and should never presented make it to any UI.

When adding an attribute to a Resource, consider:

  • to only include fields that you are sure the client is interested in 
  • if you are not sure, leave the attribute out. It is much smaller problem to add an attribute later on, then to remove an attribute that has already been exposed.

API Models shouldn't blindly mirror DB Relational model or OO Models


In database modelling approaches such as normalizing data or collapsing inheritance hierarchies are used.  In Object Orientated design, techniques such as polymorphism, inheritance hierarchies etc are used to promote things like code reuse and to reduce coupling.

Resource modelling does not have to follow theses techniques. The consumer of an API doesn't care if the data is all in one table, or normalized over multiple tables.  In general, the API returns data in a format that is easy to use and does not require much additional mapping by the client before it can become useful.

Use Hierarchical data to Avoid repetition


One of the advantages of hierarchical data over flat formats such as CSV is that it provides a mechanism to avoid repetition.  For example, consider a flat data structure which contains a list of persons and what team they are in.  In CSV this is:

team, firstname, lastname
Liverpool, Mo, Salah
Liverpool, Andy, Roberston

In JSON this could be:
{
    "team": "Liverpool",
    "players": [
        {
            "firstName":"Mo",
            "lastName":"Salah"
        },
        {
            "firstName":"Andy",
            "lastName":"Roberston"
        },
         ...
     ]
}

Use Hierarchical Data to Make context clear


Another advantage of hierarchical data is that it helps provide context. To understand a flat data structure you need to know what the query was that generated the data to understand the meaning of it.  For example, consider a bunch of rows that contain a date range.

name, fromDate, toDate, holidays
Tony, 2018-01-01, 2018-02-02, true
Tony, 2018-02-03, 2018-03-01, false

You could make assumptions that there is a new row when there is a change in Tony being on holidays.  But, what if there is another column?

name, fromDate, toDate, holidays, sick
Tony, 2018-01-01, 2018-02-02, true, false
Tony, 2018-02-03, 2018-03-01, false, true

Are the date ranges corresponding to holidays, sickness or both?

If we get more data back maybe it might be clearer...
name, fromDate, toDate, holidays, sick,
Tony, 2018-01-01, 2018-02-02, true, false
Tony, 2018-02-03, 2018-03-01, false, true
Tony, 2018-03-02, 2018-04-01, false, false
Now it looks like it's sickness that the date range corresponds to and its only a coincidence it lines up with a holiday period. However, when we get more data back this theory also fails.
name, fromDate, toDate, holidays, sick,
Tony, 2018-01-01, 2018-02-02, true, false
Tony, 2018-02-03, 2018-03-01, false, true
Tony, 2018-03-02, 2018-04-01, false, false
Tony, 2018-04-02, 2018-05-01, true, false

It gets even more complicated when just don't have some information.  For example:
name, fromDate, toDate, holidays, sick,
Tony, 2018-01-01, 2018-02-02, true, false
Tony, 2018-02-03, 2018-03-01, false, true
Tony, 2018-03-02, 2018-04-01, false, false
Tony, 2018-04-02, 2018-05-01, true, false
Tony, 2018-05-02, 2018-06-01, null, false
Tony, 2018-06-02, 2018-07-01, null, false
Tony, 2018-07-02, 2018-07-08, true, false
Tony, 2018-07-08, 2018-07-09, true, null

The limitation with flat data structures is not only lack of normalisation but that they can only go so far in making the data self-describing.

When it isn't clear what data means, it is inevitable processing the data will be buggy.

We could represent the same person data in hierarchical format as:
{
    "name":"tony",
    "holidays": [
         {
            "fromDate":"2018-01-01",
            "toDate":"2018-02-02"
         },
         {
             "fromDate":"2018-04-02",
             "toDate":"2018-05-01"
         }, 
         {
             "fromDate":"2018-07-02",
             "toDate":"2018-07-09"
         }
     ],
     "sick": [ 
         {
             "fromDate":"2018-02-03",
             "toDate":"2018-03-01"
         }
     ]
}
Now, the data is much more self describing and it is clear when a date range is for a holiday and when it is for a sick period.

Resource Relationships

Resources on their own only describe themselves. A Resource model describes relationships between Resources.  This will give an indication of:
  • dependencies between Resources. What Resources are needed for a particular Resource to exist or what is impacted when a particular Resource changes: updated or deleted.
  • Data navigation - in a large domain model, it is much easier to understand and follow if navigational and directional sense is provided to consumers of the model.  Especially, when to navigation across (Resources loosely connected) can be be differentiated from navigation down (Resources strongly connected)
Hypermedia links aren't only used to achieve HATEOAS.  Resources that describe what they are linked to using hypermedia links demonstrate a very powerful mechanism to express the Resource model. Advantages include:
  • A large domain model is split into more manageable pieces.  Typically users are only interested in a particular part of the model.  When Resources self describe their own relationships, it means a large complex model is split up into more digestible chunks and users get the information they need quicker. 
  • The Resource model is self-describing and kept in sync with code. Everything is co-located.
Make clear Parent - Child relationships
A Child Resource describes its Parent URL hierarchical name spacing. A Parent Resource has children of one or many types should make this clear by providing links to the children.  For example, if a Team Resource has Players child Resources.  The Team payload should make this clear.

REQUEST
https://api.server.com/teams/4676
RESPONSE

{
    "id":"34533",
    ...,
    "_links": {
          "self":"https://api.server.com/teams/4676",
          "players":"https://api.server.com/teams/4676/players"
    }
}

Make clear Peer relationships

This is similar to above except it is for Resources that exist in a different hierarchical name space. So for example, suppose the team is in division 1.  A link should be included in the team's division attribute.
REQUEST
https://api.server.com/teams/4676

RESPONSE
{
    "id":"34533",
    "division": {
        "name":"Division 1",
        "_links": {
              "self":"https://api.server.com/divisions/1"
        }
     },
     ..., 
    "_links": {
        "self":"https://api.server.com/teams/4676",
        "players":"https://api.server.com/teams/4676/players"
    }
}

Make clear Links to Other Representations

If data is modeled to have multiple Resources which represent different representations of the data, the Resources should also include links to each other.
REQUEST
https://api.server.com/teams/4676

RESPONSE
{
    "id":"34533",
    "division": {
        "name":"Division 1",
        "_links": {
              "self":"https://api.server.com/divisions/1"
        }
     },
     ..., 
    "_links": {
        "self":"https://api.server.com/teams/4676",
        "players":"https://api.server.com/teams/4676/players",
        "teamDetails":"https://api.server.com/teamDetails/4676"
    }
}