Design Essay #1 – Disconnected Data-Driven Apps

Note to regular readers: This is a software development-related post. Occasionally I do these to help process and clarify my thoughts on a project. If software design isn’t your cup of tea (or coffee), feel free to skip today’s post – I won’t be offended. Actually, I won’t even know!

I’m presently mulling over the design of a new app that will feature both an iOS app and a browser-based UI sharing a common web server backend. The browser UI/UX (User Interface/User Experience) is easy to think through, but the iOS app, not so much. It’s not a matter of how the controls on the interface will be laid out, or what the screen-to-screen workflows need to look like. It’s more a matter of the user experience when the iDevice is disconnected from the network.

I want the application to feel very responsive whether connected on WiFi, 3/4G, or completely disconnected. This implies a local copy of the data on the device, and that implies that data needs to be kept in sync with the server backend. This all leads very quickly to every programmer’s least favorite topic – data synchronization. It’s a simple concept to talk about, but it’s never easy to actually implement. It’s a nontrivial process that can be extremely complicated depending on the requirements of the UX.

Let’s start with a populated database. Our browser-based user gets a standard CRUD (Create/Read/Update/Delete) interface to interact with the data. Other than worrying about record-locking so that multiple browser-based users don’t run each other over trying to simultaneously update the same record, we’re pretty much good to go. Handling record locks is a well-known pattern with a straightforward implementation.

Our iOS user, however, will need a local copy of the database. Getting the first copy downloaded is easy. We ask the server to dump the data in JSON or XML format and then have our app throw it into a local Core Data database. If this was a view-only app, we’d be finished, other than to provide a mechanism to regularly refresh the data, such as at application startup, return from background, or when the user hits a refresh button. We would certainly implement a way to receive only updated records in the future, and that could be handled by adding a “last_updated” field to the table schemas, and asking only for those records that were updated since the last time our local database was refreshed.

At some point though, the iOS users will demand equality with their browser-based brethren. Rather than risk bringing an Occupy Wall Street-style event to our development lab, we are compelled to provide CRUD functionality on iOS. This means data is going to be changing in multiple places, and we’ll have to do careful, regular housekeeping to keep everything in sync.

The basic blocking and tackling here will involve sending CRUD transactions to the web server for processing and acknowledgement. If we send a new or updated record to the server, we’ll want a response that tells us how things went back at the server. When things don’t go well, we have to decide what to do. If the failure was caused by a connectivity problem, we need to mark the update as not sent and keep it in queue of unprocessed requests to be retried at a later time. If the failure was some sort of system or database problem back at the server, the response should help our iOS app make an informed decision on what to do.  And what sorts of problems might we face?  Database down or bad credentials?  Server out of disk space? Duplicate record errors? Non-null field violations?  I’m not sure I can guess them all, but I do know we need to have a default action to handle  unexpected edge cases.  And what do we tell our iOS user when things turn evil?  Hmm….the Apple Way would be to just hide it all from the user and try to figure out a way to handle it silently. Definitely non-trivial and perhaps not worth the development effort. I lean toward being straight up with the user and telling them that the operation failed and they will need to try it again later – in a very nice way, of course.

Let’s look at another situation: User A is using a browser. User B is using an iPad and has just refreshed his local copy of the data. Both decide to update the same customer record. What are the possibilities?

  1. They change different fields (e.g. user A changes “Street Address” and user B changes “Credit Card Expiration Date”)
  2. They change the same field (e.g. both A and B change “Last Name”)

In case #1, we probably want to preserve both changes, though we really can’t be 100% certain. In case #2, both A & B may have both changed the last name (i.e. both changed “Smith” to “Jones”), but what if they changed it to two different last names? Which change should we keep?  Should we say we’ll keep the latest change? We can, but that doesn’t guarantee we’re keeping the right information. We remember that we can’t lock the record as we did when our two browser users we’re vying of the same record because our iOS user could be using the app while disconnected. In fact, he can’t even check for record locks. And even if he were connected, the network delays of lock processing would probably result in a very poor user experience. So…what to do?

The robust solution would be to build a mechanism that allows the users to see where multiple updates on the same record have resulted in a conflict, and then allow the user to pick which one will be kept. That takes the difficult (impossible?) decision out of our code and puts it in the user’s lap. This could be implemented by adding two fields to the table schemas. A boolean “conflicted” field could be used to indicate that a record encountered a conflict condition when an attempt was made to process it into the database. An complementary “conflict_id” field could be used to point to the other record that is party to the conflict. When a conflict is detected, the record in question would be added to the table with these two fields set appropriately, and our CRUD interface logic would have to pick these out, highlight them for the user, and provide some actions to let the user clean things up.

In the spirit of getting something running quickly, my druthers would be to use a phased approach to get to The Promised Land above.

Phase I: Implement full CRUD capability for the browser UI, and view-only on iOS. The database back-end is a well-known pattern that can be laid down quickly and get the browser users online. Doing view-only on iOS first lets us work through the minutiae of the Core Data implementation without the complexities of dealing with disconnected CRUD activities and conflict resolution.

Phase II: Implement full CRUD on iOS for just one of the tables and develop the backend API needed to asynchronously process updates from the app.

Phase III: Build the conflict resolution mechanism for the same table.

Phase IV: Implement full CRUD for the rest of the tables by simply extended the work done in II and III. At this point most of the guess work should be over, and things should proceed quickly.

I can’t imagine having any readers that made it this far, but if you did, and you’re a coder, AND you’ve been down this path before, I’d love to hear from you.  How have, or how would you tackle this conundrum?