1. Analysis of Facebook Graph

    Very interesting paper: The Anatomy of the Facebook Social Graph

    Reading this very interesting analysis of the structure of the Facebook graph I come to the conclusion that the main features qualitatively may be common to all graphs of social networks. For example, I’ve seen another social graph one order of magnitude smaller having very similar features. It would be really interesting to compare such graphs to other graphs, especially directed social graphs like Twitter or Google+ in terms of different measures. It would also be interesting to compare these graphs to offline social networks. Another interesting question could be to analyze the evolution of such graphs over time, for example using methods of the theory of random graphs.

  2. A Data Model for Social Networks

    Until today, the history of data models in social network programming is determined by the paradigm of relational databases. This is both for historical and economical reasons. In the times when such communities like Facebook or MySpace started, relational databases were the dominating systems in the industry. Little was known about alternative approaches. Economically, the time-to-market pressure forced developers to use the most common, LAMP-stack like software. There was neither enough time nor money to risk development of new tools or even to look around for approaches fitting better into the domain.

    Later, the pressure of scalability forced the companies more and more to leave the canonical way of data modeling. The holy grail of normalization was failing. Partitioning lead to denormalization and denormalization lead to the challenging of relational data at all. This was at the beginning of the NoSQL movement.

    Say, we could start from scratch. What would be the right data model for a Social Network? Since a network is all about relationships of entities, I suppose, it should be based on the graph model. The main entity is a person, this will probably be modeled as a node. Different users may have different relationships with each other, so, if we model relationships with edges, we end up with a multi-relational graph. Note that even groups - such as discussion groups - are relationships between persons. Such relations may be strong or weak, they have rates or weights and similar properties.  Further, persons own data like location, date of birth, blog posts or images etc. These can be modeled more or less as properties. Finally, we end up with a weighted, multi-relational property graph. And indeed, this is the data model of modern graph databases such as Neo4j, Sones, InfiniteGraph and others.

    But as far as I can see, all those systems simply go not far enough. In context of scalability, what if a node could be more than a mere abstraction in a still homogenous storage? What, if a node could even process it’s own data and relationships? What, if a node together with its data - properties and relations - could be bound to a CPU, for a given time frame at least? It is not much known about Google’s Pregel system, but it seems to be build in this direction.

    What I have in mind is a graph system where a node has all it’s own data, indexes and storage and can be scheduled to do some calculations with these data and propagate results to its neighbors over weighted channels. With such a system it should be easy to build a Social Network of any scale, do some searches in this network an even retrieve structural information about communities or the like.

    Architectural, such a system could be build multi-layered, with a distributed file system or data base, capable to store a huge number of relatively small data sets all locally indexed, resulting in small indexes and fast local lookups. On top of this we could have a network of processing nodes interrelated by message passing channels. Each such node may be scheduled to allocate some computing resource for a small amount of time to fulfill a local request. Searches may be propagated over neighborhoods retrieving either data of interest for a given node or uncover structural properties of the network.

    After all, the right data and computing model for a social network seems to be an elastic particle cloud with only local data and interconnections, where messages are propagated over neighborhoods like in a network of neurons. Neat. And what, if that cloud is build upon the greatest message passing system known to exist on our planet - the Internet? Isn’t the real Social Network a peer-to-peer connection of all those little processing units out there, from the workstation down to the smartphone? What, if the Internet itself is supposed to be the one and only Social Network abstraction?

  3. Overlapping Graph Partitioning

    Graph partitioning is known to be a hard problem and in algorithm theory many partitioning problems are NP-complete. With the raise of non-relational databases in general and graph databases in particular the partitioning problem nevertheless turns more and more into a practical challenge.

    A search on a graph database is in essence the exploitation of a graph theoretical notion of structure or measure of a graph. One may think of two types of structure, global and local. A global property for example is the chromatic number of a graph. On the opposite, a given vertex of a graph may be a member of a certain cluster or community of that graph. This is a local property since even the definition of “community” is local - dependent of a notion of density making sense only in a local context.

    In practical applications almost always only local searches based on traversals starting from a given node are useful. Determining global properties just takes too much time, at least for huge graphs such as social networks. Keeping that in mind, why not just partitioning a graph into local neighbourhoods big enough to solve a certain problem?

    Such a partitioning scheme could not consist of disjoint parts but will probably be overlapping. And also, such a partitioning would be dependent on the particular problem. But this is a general feature of a graph database - one have to model the data into a hand crafted graph to solve a certain class of problems and the same model may not be well suited for another class. 

    Overlapping local partitionings or mappings are well known in mathematics such as the theory of manifolds where one have charts and atlases. The local structure is conserved in a chart but the global totality may only be visible over the whole atlas.

    Most practical problems with graph partitionings occur when a traversal reaches a frontier or when one have to decide for an insert into which partition to place the new item. If overlappings are allowed these problems may be less important.

    Overlapping partitionings might be a good strategy to organize data used to solve local problems. How to build up such partitionings and how to prevent them to grow to the size of the whole graph may be again dependent on the particular problem.

  4. Graph Minors

    Recently I learned a very beautiful result in graph theory, the graph minor theorem by Robertson & Seymour presented and proved in a series of 20 papers between 1983 and 1988. This theory is very rich and it has an immediate algorithmic consequence since there always exists a polynomial time algorithm to test a graph for a given minor.

    One may define a class of graph properties called “hereditary” being closed under taking minors. Such a property can be expressed by a number of forbidden minors of a graph, minors it must not have. Surprisingly, the number of such minors is always finite.

    Now, if you want to show your graph having some property you can do it like this:

    1. Show the property to be hereditary.

    2. Determine the forbidden minors (the harder part).

    3. Show your graph not having these minors (algorithm always exists and is polynomial).

    A simple example is a planarity test, proved by Wagner already in 1937. “A graph is planar iff it contains neither K5 nor K3,3 as a minor.” This is especially useful to show a graph not to be planar. All you have to do is to show it contains one of K5 or K3,3 as a minor! The theory of Robertson & Seymour is an extension of Wagner’s results but much deeper and with much more influence.

Powered by Tumblr; designed by Adam Lloyd and Ingo Schramm.