1. A Data Model for Social Networks

    Until today, the history of data models in social network programming is determined by the paradigm of relational databases. This is both for historical and economical reasons. In the times when such communities like Facebook or MySpace started, relational databases were the dominating systems in the industry. Little was known about alternative approaches. Economically, the time-to-market pressure forced developers to use the most common, LAMP-stack like software. There was neither enough time nor money to risk development of new tools or even to look around for approaches fitting better into the domain.

    Later, the pressure of scalability forced the companies more and more to leave the canonical way of data modeling. The holy grail of normalization was failing. Partitioning lead to denormalization and denormalization lead to the challenging of relational data at all. This was at the beginning of the NoSQL movement.

    Say, we could start from scratch. What would be the right data model for a Social Network? Since a network is all about relationships of entities, I suppose, it should be based on the graph model. The main entity is a person, this will probably be modeled as a node. Different users may have different relationships with each other, so, if we model relationships with edges, we end up with a multi-relational graph. Note that even groups - such as discussion groups - are relationships between persons. Such relations may be strong or weak, they have rates or weights and similar properties.  Further, persons own data like location, date of birth, blog posts or images etc. These can be modeled more or less as properties. Finally, we end up with a weighted, multi-relational property graph. And indeed, this is the data model of modern graph databases such as Neo4j, Sones, InfiniteGraph and others.

    But as far as I can see, all those systems simply go not far enough. In context of scalability, what if a node could be more than a mere abstraction in a still homogenous storage? What, if a node could even process it’s own data and relationships? What, if a node together with its data - properties and relations - could be bound to a CPU, for a given time frame at least? It is not much known about Google’s Pregel system, but it seems to be build in this direction.

    What I have in mind is a graph system where a node has all it’s own data, indexes and storage and can be scheduled to do some calculations with these data and propagate results to its neighbors over weighted channels. With such a system it should be easy to build a Social Network of any scale, do some searches in this network an even retrieve structural information about communities or the like.

    Architectural, such a system could be build multi-layered, with a distributed file system or data base, capable to store a huge number of relatively small data sets all locally indexed, resulting in small indexes and fast local lookups. On top of this we could have a network of processing nodes interrelated by message passing channels. Each such node may be scheduled to allocate some computing resource for a small amount of time to fulfill a local request. Searches may be propagated over neighborhoods retrieving either data of interest for a given node or uncover structural properties of the network.

    After all, the right data and computing model for a social network seems to be an elastic particle cloud with only local data and interconnections, where messages are propagated over neighborhoods like in a network of neurons. Neat. And what, if that cloud is build upon the greatest message passing system known to exist on our planet - the Internet? Isn’t the real Social Network a peer-to-peer connection of all those little processing units out there, from the workstation down to the smartphone? What, if the Internet itself is supposed to be the one and only Social Network abstraction?

Powered by Tumblr; designed by Adam Lloyd and Ingo Schramm.