Monday, January 10, 2011

Our experience with App Engine

We have been developing on App Engine for almost 2 years now, so we feel like it is time to give some feedback about our experience.

When we started weespr.com, we wanted to have a platform which would eventually scale given the nature of our business.

To give some context, our application automatically scans people's online activity and create magazine digests on a regular basis with content from various feeds. It is data intensive, relying heavily on processing of photos and texts.

We had previous - good - experiences with Google technologies like GWT and we wanted to stick to Java.
So it was a natural candidate to try App Engine. To be honest, we did not do a lot of due diligence on the platform limitations, and we even took a big risk at that time since the Java support was just released when we started developing. But the fact that we wanted to build using GWT and needed a good Java platform with scalability without much administration tipped the balance in favor of GAE. And cloud is the future, Google was behind it. That was enough for us to make the move.

Love at first sight

We liked it. Very much. The ease of use, the wonderful eclipse plugin (GWT + GAE), the deployment capabilities etc... In no time we had a complex skeleton of the website up and running and were cranking up code, focusing on the business logic. No license to buy, no server to deploy, no software to install.

The morning after...

Then we started to face the limitations of the platform, and had to spend some cycles working around it.
These are no surprise to people familiar with the platform. To name a few which impacted us:
  • JDO support
  • 1 MB entity size limit
  • 30 seconds request limitations
  • 10 MB response limit
  • limited Images API
JDO support

JDO support on App Engine is not very robust - to say the least. If you do basic stuff, it works fine. If you start doing some advanced JDO operations, it either does not support them - or worse - is buggy. For example avoid indexed lists like plague. We had to spend lots of time to hunt down bugs. At the end of the day, we sticked with JDO but avoided fancy high level JDO features as much as possible.


1 MB entity size
The 1 MB entity size was a big issue since we had to render composites of people's photos. The resolution of our image renders is ~ 2600 * 1600. Depending on the contents, it can in rare cases go over 1 MB. Up until recently,  our app could not render such pages.
The new images API now allows custom JPEG compression as well as 32MB API calls. So we can know compress the images as needed until we can fit 1 MB - which gives us enough quality for our needs.
Also we are limited to storing our image renders in JPEG instead of PNG because of size, which prevents us from doing some fancy transparency.
Definitely having a datastore with support of 1+MB entities would be great.

30 seconds requests
The 30 seconds limitation was a big limiting factor on background requests. We have a number of cron jobs which need to scan people's activity on a regular basis, contacting 3rd party websites like facebook or picasa. Sometimes, it can take 30+ seconds. We had to break down our work in smaller chunks with a map reduce approach.
Again this is another limitation which was recently lifted (10 minutes background jobs with version 1.4.0).

However there is still a limit for user requests. This still impacts us when users upload photos, they have a maximum of 30 seconds after which it times out. For most people who have fast connections, this is good enough. But for people uploading very large photos, that can be an issue. We could use blobstore instead, but we need control on the request handler to be able to give user progress/feedback on the download - so blobstore is not an option.

What we would need here is the ability to make long user requests. Having dedicated instances (like EC2) would not be the best solution as it would not scale automatically. What we need is a different pool of instances which handle long user requests, dedicated to our application obviously - but which would scale automatically like the rest of app engine.

10 MB response limit
The 10 MB response limit is still an issue for us. It impacts the downloading of the magazines in PDF format. We currently cap the number of pages to 28 because of that. If we had a way to write programatically to blobstore, this would not be an issue any more (App Engine engineers, if you read this blog...). We would still have to deal with the 30 sec limitation.

Images API
That is probably the one which impacted us the most in terms of development cycles. The pages we put together in the magazine are composite renders of photos and texts. The images API handles well the composite renders of photos, but there is no support whatsoever for text rendering.
So we had to develop our own classes from scratch to do text rendering. We load font information (TTF) from compressed jar files (to overcome 3000 files limit on application files) and do pixel rendering in memory in bitmap format. The rendered bitmaps are then handled by the images API. We had to develop algorithms to deal with text alignment, word wrapping etc. At the end of the day, it works well.
This is really low level stuff which is not the level of business we like to focus on. If App Engine supported swing or awt libraries we could have avoided that work.
On top of the lackluster of advanced image processing, there is a bunch of annoying bugs in the development server which make our life more difficult than it should be.

Now as far as photo processing goes, there is not much offered on the platform. Things like free rotation of pictures? Nope. Advanced effects? Don't even think about it. At the end of the day we decided to leverage Picnik for the individual photo processing. The integration was a breeze, and we now have best of breed photo processing. Photo processing is not our core business, so we live happily with that.

But development goes on

And it goes on quickly. Some of the goodies of the platform are totally awesome. Take the email support. With a few lines of code, every weespr user has a custom @weespr.com address they can use to send content directly into their magazines. And they can send pics all day long like from their mobile phones, it scales. No maintenance. No email server configuration.

Task queues ? It's like having trusted workers who will get the job done for you, no strings attached. App Engine without task queues is like bread without butter.

We could go on an on about the pleasure of  developing on the platform. Once you get familiar with it, there is no going back to a non-services platform.

On a day to day basis

Then comes day to day maintenance of the app.

Versions
App versionning  is great. We have been releasing new versions of apps once every 2-3 days on average. That's a lot of releases. When things went south, we just had to revert back to another release. One simple click. You can't beat that.

Logging
Logging support sucks. Big time. There is an automatic rollover of logs based on severity, which means that if your app logs like crazy, chances are that you may only have couple hours worth of logs - at best. So when an error happens, there is usually no way to analyze logs. We ended up sending emails to developers when exceptions happen, with detailed info in the email. Then we go and try to reproduce the exception.
For the same reasons, log analysis is not really possible. We work around it by deferring analysis using tasks.



Billing
That part is great. Just because it costs hardly anything to run a website on App Engine. Our only cost so far is storage (as renderings of pages in the magazines quickly add up). Paying on App Engine means you have usage. We pay, that's good news:)

App maintenance

Downtime on App Engine  ? Maintenance ?  Yes there is. But we don't care too much. We have nice handlers which will automatically put the website in maintenance mode when App Engine is not available. App Engine has been having some reliability issues during the last few months with unplanned outages. Our apps behaved nicely, showing user friendly maintenance messages without us having to do anything. We don't have any server admin - try to beat that MySQL / Apache / Load balancers .... Hard drives failing ? Not even sure how / where our data is stored. It would be in the trunk of a googler's car with WIFI we would not care less.
We've previously run startups with regular hosting providers, and the overall level of reliability we get with App Engine is an order of magnitude above (no dealing with corrupted filesystem/database or  rogue processes take 100% CPU, sharing bandwidth of email servers etc).

Data maintenance

Booh... The datastore viewer is bad. And slow. And bad. And slooooooooow.... Don't expect to do anything fancy besides looking up individual entries. Making modifications of data in place works so - so. Sometimes it does, sometimes it fails silently.
When we need to run some reports on the data however, there is a handy library, mapreduce. It can only run reports which are hard coded in your application. If you need to run new reports on the fly, you will need to write your additional layer on top of it or just redeploy apps with the new reporting logic.

Java vs python

Java is a bit like foster child on App Engine compared to python. Every new feature comes to python before Java, and some features never get Java support. For example if you want to clean up indexes you can only do so with the python SDK. This is quite surprising that Google engineers are not putting Java first. For serious applications Java is a much better choice than Python. Especially when talking App Engine for business. However it still feels like Java is playing catch up with Python in terms of tools (the APIs are pretty much the same).

JRE limitations

That is probably the biggest remaining problem today on App Engine. Lots of standard JRE classes are not available which prevents lots of useful libraries from being used. Especially native processing around image manipulation. We sincerely wish App Engine engineers would focus more resources on this.

Is App Engine for everyone ?

Probably not. First it requires a different way of thinking applications. Think workflows, think denormalization, think idempotence. Your average PHP coder will not become a cloud warrior overnight.
If you run critical apps or handle critical data, it certainly is not the right platform either.

But for the 90% of other businesses out there, or the vast majority of start ups... You really need to think twice before going to another platform. Sorry sysadmins out there, but the future does not look good for your profession.


Conclusion

20 months ago we took a leap of faith adopting App Engine. After lots of development, working around limitations of the platform and sleepless nights debugging JDO libraries... our verdict is simple.

We thank the cloud and App Engine. Big time. We have been able to focus on business logic of our app instead of plumbing, and even when taking into account the limitations we faced, we still would not go back to another platform. Why ? Because all the limitations are being knocked down one by one by the platform. And the platform just gets better and better. Latest release 1.4.0 was a major accomplishment.

Did we need the scalability offered by App Engine so far ? Not yet. Free maintenance and  reliability  of the platform have been more valuable to us so far.
But knowing that we scale makes it so easy to move forward in terms of business... that it's still a big + on our list.

We look forward to the development of the platform (hint for google engineers: read our woes above;)

Thanks App Engine.

17 comments:

  1. Best. Appengine. Post. Ever!

    I've been using GAE/J from the last one year. I'm in love with the platform like you guys and yes, the limitations can get on your nerves at times but the advantages just make me drool at GAE.

    You have put forward all the points so clearly that I would recommend anyone planning to start working on GAE to first read this!

    ReplyDelete
  2. i'm using windows azure,sql azure for the past 2 years, i dont think gae is better than it.

    ReplyDelete
  3. @pradeep no ones claiming anything like that in this post!

    ReplyDelete
  4. @pradeep I know nothing about azure. Tell us why it's better.

    ReplyDelete
  5. I'm a .Netter so I would make no value statements about running Java on Azure, but Microsoft are making a big effort. Here's the SDK http://www.windowsazure4j.org/

    and an example:
    http://blog.smarx.com/posts/programming-language-interoperability-in-windows-azure

    Reasons why you might consider it:
    1. SQL Azure - a proper HA triple replicated SQL database, other than that Table, blob and Queue store are similar to what you find elsewhere
    2. AppFabric - there's some neato tools hiding in there like federated identity, on premises projections, caching services etc
    3. The model bridges between the "bare metal VM" model of EC2 and Rackspace, and the "I don't actually know what it is" of GAE and Heroku.

    As for price - if you join the Bizspark program you get enough resource to run a moderate app for free (and a ton of MS software you're probably not interested in, like Windows and SQL licenses)

    ReplyDelete
  6. Thanks for the detailed feedback, this is a great endorsement of the App Engine Platform, including it's limitations and suggestions about the features you would need.

    Please ping me by email, you may be interested in some features we have in trusted testers right now.

    ---
    Patrick Chanezon- Google Developer Relations Manager - Cloud & Tools

    ReplyDelete
  7. How were you able to make custom email addresses that receive email on @weespr.com addresses? The app engine documentation says that it only supports incoming mail on string@appid.appspotmail.com.

    I would love to know. Great write up.

    ReplyDelete
  8. We forward all incoming mail from our weespr.com domain to appspotmail.com

    ReplyDelete
  9. What did you use for the custom user accounts? (meaning, you didn't use google ID's for login - the 'out of the box' solution)

    ReplyDelete
  10. For accounts, we use 3 different types:
    1 - facebook connect
    2 - openid / google accounts (through google apps integration http://www.google.com/enterprise/marketplace/viewListing?productListingId=5459+16262700581173018933)
    3 - custom weespr accounts

    ReplyDelete
  11. Thanks for sharing. I'd love to hear more details about, "We have nice handlers which will automatically put the website in maintenance mode when App Engine is not available."

    Are you detected when GAE goes down and managing redirects to a static page from all your handlers? I'd love to hear more specifics on this.

    ReplyDelete
  12. Yes. This little piece of code runs before anything else:
    CapabilitiesService capabilitiesService = CapabilitiesServiceFactory.getCapabilitiesService();
    CapabilityStatus dsStatus = capabilitiesService.getStatus(Capability.DATASTORE_WRITE).getStatus();
    if (dsStatus == CapabilityStatus.DISABLED) {
    response.sendRedirect("/maintenance.html");
    return;
    }

    ReplyDelete
  13. Great article and I appreciate your conclusion at the end.
    But I miss some statements about the performance of weespr. Did you run load tests, especially regarding the data storage? There a lot of blog articles out there saying that the performance is slow. However, those articles are older than a year. That is why I'm very interested in your opinion.

    Many thanks
    Marcel

    ReplyDelete
  14. Load tests are not relevant in the App Engine, since it scales as long as you design your database model correctly. For websites like weespr it is usually easy as user's data can be silo-ed to the user.
    For latency, the best way to measure real time is probably to look at the dashboard status page:
    http://code.google.com/status/appengine

    ReplyDelete
  15. Thank you for the article, especially on the limitations.

    When you mentioned there are bugs around the JDO support, what is the work around? Is there a set of lower level API that you use as a work around? Like falling back from Hibernate HQL down to direct SQL?

    Thanks,
    Joseph

    ReplyDelete
  16. Yes, you can use the low level API:
    http://code.google.com/appengine/docs/java/datastore/queries.html

    ReplyDelete
  17. as of 1.4.2 Java SDK can now clear out indexes

    "You can now vacuum datastore indexes with the Java SDK."
    http://code.google.com/p/googleappengine/wiki/SdkForJavaReleaseNotes

    ReplyDelete