Sunday, September 26, 2010

Do it on the device, or do it on the server?

This weekend, I thought I'd extend my little Android usage tracking application to work on more ISPs than the one (Internode) that it already does. As my phone is (sadly) on Optus, I thought I'd write one for that.

Internode was easy to add, as they have a documented API for accessing usage counters which are ideal for computer consumption. Optus on the other hand only provide a web application interface to check usage, necessitating the use of a web scraper. A web scraper is an application that pretends to be a user on a web page, makes all the appropriate calls (and fudges any javascript calls that are necessary) to get the results it needs. It then parses the (often non-compliant) HTML results that come back to get data. I have no problem doing this, and have done so on several occasions before, but it is not easy work and can be quite fiddly.  Parsing the HTML is often the most difficult part, as it is usually not well formed XML so you can't just use DOM to parse it.

In short order, I had a working prototype that used JTidy to clean up the HTML into something that I could use properly and then XPath to extract the elements of the document I needed.  It works great, except that the document clean up and parsing into DOM takes a really long time on a resource constrained device such as a phone.  It takes about 20 seconds to clean up and parse the document on my development emulator, which is too slow to produce a good mobile experience, especially if you have to parse multiple documents as I do.

So now I'm faced with a choice.  I could write a man in the middle service that the phone sends the user's login details to which then performs the parsing on behalf of the user and sends the results on to the phone, but there are a number of drawbacks to this:
  1. This means that the user is sending his login details to a 3rd party, which is a security no no.
  2. It introduces a single point of failure into the equation.  If my app gets popular the middle man service could get slammed.  If Optus decides that they don't like what I'm doing, they could easily block it.
  3. It means I need to host a service, which means additional expense.
I don't want to do this, so what I'm left with is more hacky solutions, using regular expressions to find what I want in the HTML documents retrieved from the provider.  This will take me longer to code, will be more prone to failure, and is just generally nasty.  I'm not happy.  Devices these days are very powerful, and there should not be the need for intermediary servers to help with processing.

Of course this would all be much easier if the providers published web services interfaces to their data, rather than just web applications.  This has been the mantra of SOA and internet connected businesses since the terms were coined.  It doesn't even cost them that much more to do it, and would lead to better designed web applications, but thats a subject for another rant.  Optus doesn't do this because there's no economic incentive for them to do so.  They gain nothing directly from publishing a usable web interface, so they can't be bothered... bah!

To be fair to Optus, they aren't the only ones that don't get it.  No ISPs and telcos provide any decent interfaces, other than Internode.
blog comments powered by Disqus