Why incremental ONIX changes are best, and hard

When first discussing an ONIX feed with a potential recipient, amid all of the minutiae of their preferences is one key question: Incremental?

Even in an email there’s a certain tone about it – the tone of an organisation that thought it would be fine to get complete catalogue feeds from small publishers ten years ago, and is now up to their midriffs in files and data, and please can you let us know only when something has actually changed? Thanks!

The Problem

It’s a pretty serious situation to be in, when you are receiving 8,000 product records from a trading partner. ONIX is an XML standard, and XML processing is not entirely low cost. With modern processor power and efficient software libraries you wouldn’t really notice the processing of a single ONIX record, but combine the reading of the data from hundreds of files with a comparison against the data you have stored for those products to see if they need updating and you can be into some serious processing time. And the thing about time is that there is only so much of it in a day. Having once worked for a company whose main product took around twenty-four hours to process a daily file, I can tell you that that situation is absolutely paralysing.

Fun fact: this company had no ability to restore their database from backups, because switching it to a mode that would allow consistent backups to be taken potentially threatened the processing window. I say potentially because they had no way of testing whether it would or wouldn’t without trying it, and they couldn’t afford the time to do so. They still actually took backups, even though they could never be used, presumably because cognitive dissonance wouldn’t let them stop, even though the act of taking a backup that could never be used added time on to their daily processing. It’s OK – I’m happier now.

But change detection is hard. Consonance is written in Ruby On Rails, and therefore by default we have create and update timestamps on every product record, every price record, every work, marketing text, cover image, imprint, contributor, sales agent, supplier, etc., along with audit trails that tell us exactly what was changed and when by whom. In some parallel universe there definitely exists a civilisation that can use that data to tell you whether an ONIX record has changed since a particular historical point in time, possibly using seven-dimensional quantum blockchains.

But this is what many people’s first plan is when approaching such a problem. Got a timestamp? is the give-away question, but let’s consider why that approach fails for any but the most trivial situation.

What was the change?

Firstly, it is not enough to know that a row of data was last changed two days ago, in order to know whether the data has changed since you last sent it a week ago, because this clearly doesn’t tell you whether it also changed fifteen seconds prior to that last change, and therefore whether the latest change was to correct a mistake previously made. A price changed from $10.00 to $101.00 and then back to $10.00 is no change at all, but your timestamps don’t know.

That in itself means that you can’t use timestamps alone, but if you have access to data change logs then maybe you can detect whether the data actually changed. We use an open source Ruby library that allows us to say what was the data like for this record as-of one week ago?. Not only can we tell if there were any actual data changes, we can tell exactly which items of data changed.

That still doesn’t quite get us to a solution, for two further reasons.

Do we care about that change?

As I wrote in a previous article, every ONIX recipient has an à la carte requirement, and a change in the main Thema code for a work may not affect the ONIX to be sent to a particular recipient. While it’s arguable that ignoring those some of these differences doesn’t do much harm to recipients in general – they’re only receiving a bit of extra data, after all – my feeling is that it would tend to do harm to particular recipients disproportionately. And those would be the recipients who should be the least trouble, who don’t actually need much detailed metadata (rights management agencies, for example).

So as a point of principle, I feel that we should care about these changes.

Are your current systems sabotaging your growth ambitions? Are you hungry to implement new business models, but concerned you lack the strong administrative foundations needed for innovation?

We're always amazed at how resigned publishers have had to become to the low bar in publishing management systems. Demand more.