Harnessing self-organisation in your database.

Gavin Ruddy
gav@pontneo.com
12 January 2015

From a user's perspective, getting good results from a database is like any research. There's finding the particular thing you're looking for because the question is already clear to you. And then there's discovering something that helps you ask better questions. Efficient progress depends on getting the balance between the two right.

Searching for things by text matching or using the taxonomies set up for you is of course amazingly powerful, but generally these top-down techniques are skewed towards finding answers. Sometimes that's all you need, but sometimes it can be annoyingly blunt and good progress requires lots of extra effort and skill. As people, we're capable of all sorts of logical leaps that these techniques often don't manage very well. So where's the room for improvement? I think that, like in our own brains, allowing the data to self-organise more is the way forward. That might sound a bit woolly, but in fact it's not: it's really about optimising some of the elements of search and navigation that usually change slowly or are just left to chance.

So what does self-organisation actually mean? Self-organisation is organisation that emerges bottom-up from a system's own internal dynamics, rather than organisation coming from an external source that the system itself doesn't effect. In the case of a pile of articles for example, the text and taxonomies are fixed things imposed from somewhere else, there's no direct feedback between the use and organisation of that data and so that organisation doesn't respond or evolve without intervention. Whatever structures might appear in the way the articles relate to each other (even in click data, e.g. Fig 1) ultimately reflects the ways the users are given access to the data. On the other hand, things like the social web sections of lots of websites, the 'people who viewed this also viewed these' or the 'most read content' sections etc., are automatically responding and evolving as people click through them, effecting in turn the way the users use the data, leading to more evolution, round and round in a loop. Even though these are often still fairly rudimentary, it is this sort of direct, unconstrained feedback that enables data to self-organise.

Fig 1: Click data - Science journals mapped by web clicks between them (2009)

The very fact that you can see subtle structure developing in click data (e.g. Fig 1) implies that the self-organisation of data could be taken a lot further. For example, click data might show articles in a database clustering, reflecting some non-random behaviour of the users. Within this clustering, each article develops a certain non-random proximity to other articles, and this can be used to

return articles in search results even where text matching fails
better estimate the relevance (and hence the order) of articles in search results
send articles automatically to places they're most likely to be interesting

Each cluster of articles is a dynamic category with an identity that emerges from its content, so it has

a dynamic meaning
a certain level of attention
things like freshness etc.

and this can provide useful context for each of the articles it contains. Each cluster also relates to other clusters, so you can generate navigation that integrates the way the data is used like

dynamic taxonomy/sematics
dynamic maps of the data

and all of these mechanisms allow the user to get a better sense of where they are, what's going on around them and even how much influence they're having. The more these mechanisms are used to help guide traffic through the data, the stronger the feedback becomes and the more efficiently the data can self-organise.

Natural systems are almost always what's called 'critically self-organised', where the organisation emerging from within the system is intense enough to be dominant, and the resulting dynamic equilibrium is well-known for generating consistent, meaningful and complex behaviour (including even things like novelty). These natural systems are like Ninjas at finding and maintaining efficiency, so I think an excellent model for how to optimally organise things.

You can see these sort of dynamic approaches appearing more and more around the web (they're a great way to get things like ads to the right place at the right time). The key things are to identify and enable appropriate feedbacks and to allow these feedbacks to maximise their influence. It doesn't necessarily need to involve new user interactions (it could be funnelled through search results for example - e.g. see the click-through filter referred to below) or need complicated things to implement like cluster maps (which, by frightening users off, can make it harder for the data to self-organise anyway). Essentially the organisation just needs to revolve around a process by which it continually converges on a more efficient state.

More about this

Infographic: 5 useful things to do with click data.
Better Search: Design for a click-through filter that dynamically reorganises search results.
Working demo click-through filter based on medical data.
Better Search: Prototype click-through filter results analysis.
Solr Lucene Click-Through Filter Plugin: details & query structure.

Loading comments...