Tuesday, 16 April 2013

Hibernate exception timings vs SELECT COUNT(*) to check entity existance

This is something that I've been curious about for some time.  How much faster really is it to throw an exception rather than doing a SELECT COUNT(*) to check if an entity exists?

My initial suspicions are that the SELECT COUNT(*) would in fact be slower due to the database transactions.  However, the time to build up the exception stack for the NoEntityFound is no small fate either.  It's also worth considering if the numbers fluctuate based on the amount of data within the table at that point?

Obviously, the analysis could get much more complicated and it's not my intention to go into the realms of indexes / partitioning / whether the entity is presenting a view of two tables / hibernate caching.  It's more a general curiosity.

So lets begin, here's my entity..

Now I'm going to use an extremely simple HibernateUtil class to get me the session, shown below.

To fire the exception, I'm going to use the .load() method from org.hibernate.Session. I'll also run the following code to see how checking the SELECT COUNT(*) would take...

Here are the results! Using the load method, with the given entity above this takes 1328 milliseconds. SELECT COUNT(*) came in last with 1454 milliseconds, all hail load!
Now something that does seem important is as both methods were pretty dam close, which one is more readable? To be honest, the load does just seem to make sense and catch the exception but you could easily argue exceptions shouldn't control the flow of your application if you're checking existence of an object. Currently, I have 608 entries in the table that the Player entity is hooked up to. Below are the results for the different times.
.load()SELECT COUNT(*)number of values in table
What we can see from the results above, interestingly is the amount of data in the table, has pretty much no resemblance to the time either query takes to run.

Sunday, 7 April 2013

Parsing ESPN API using Java and Google GSON

For my first post, I'll explain how to parse the ESPN API.  The API documentation can be found at

Firstly, you'll need to request an API key, then you can start querying the REST API to retrieve the JSON response.  In the following example, I'm going to query simply for all the players in the sport of "Soccer" (dam that Americanism )  that play in the Premier League in England.

From reading the documentation, this is the URL that I need to send requests to (with [apiKey] replaced with the correct value).[apiKey]

Something to bear in mind first of all, there's an offset value that forces you to make multiple queries if you want more than just the 50 results, this is set with a parameter known as offset.  So for instance, to get the results from 51-101 the following query would pull these back.  We'll come back to this later as this can cause some slight issues.[apiKey]&offset=51

Now I've got the description out the way, I'll start the code, it should be noted I'm using GSON to parse the JSON so you'll need to add the following Maven dependency.

Once this has been done and you've ran the maven:install to get the jars downloaded, you can start querying. The code below is simply required to download the JSON from the ESPN API

Now we can start parsing this JSON, because the JSON produced has a lot of redundant data, I decided against parsing it into objects and just queried it raw. This will give us a JsonArray of Athletes and the associated data that can be pulled about them. It should be noted as this point, the response here varies based on your API allowance (free, partner, paid etc). I'll leave the relevant ESPN API page here
Now, as we've got the relevant JsonArray we want, we can just loop through it to return the data about each player. It should be noted a couple of things that may look odd about this code. Firstly, the reason for the sleep is because the API has a limit of how many times a second you can call it, currently as I write this for free you're limited to once every 3 seconds. Secondly, the reason it's in a loop of 650 is to do with the offset I referenced earlier in this post. This means you need to query each 50 players, this seems a little computationally expensive as I'd have thought it'd be easier just to return the 602 players rather than having to do the heavy loading of receiving 12 RESTful calls.
Then, when we put this all together you get the class below, this will loop through every player giving you their first name.
You can download the full project on github