The dynamics and social organization of innovation in the field of oncology/Stochastic block models

Summary edit

Block models are generative models that create networks from groups (blocks) of nodes by prescribing specific patterns of connection between each of the groups. It can, for example, describe assortative patterns - things that go together, usually called "communities" - but is not limited to them, enabling descriptions that are richer than mere proximity.

Block models can be formulated as stochastic models, prescribing probabilities for nodes in each group to connect to nodes in other groups. From these probabilities one can sample entire networks. Any network, therefore, will have some probability of being sampled from that model. Some networks will be more likely to be sampled - those whose connectivity patterns align with the model's - other will be less likely to be sampled from that specific model.

Thus, given a network, one can inverse the above procedure and, instead of generating a network from a model, search within a family of models or a model's parameters - the groups it proposes - which model or set of parameters is more likely to have given birth to the network at hand. By the very nature of block models, once this is done, it provides a classification of the network's nodes in groups according to their connectivity pattern.

This inversion from "model to network" towards "network to model" proceeds by searching for models that minimize the amount of information necessary to describe the entire network given the model, plus the model itself. That is, the information about the parameters of the model, plus the information about the details of the network that were not taken into account by the model with those parameters. This total information is called the description length of a model for some network. Two qualities of this procedure are that it is non-parametric, in the sense that parameters are inferred from data, and that it distinguishes noise from signal, thus avoiding over-fitting the model. It reflects a goal of seeking explanations to phenomena that balance the fit on the data with the complexity of the explanation, in a quantitatively precise way.

It should be noted that earlier block models were quite restrictive, having groups of fixed degree and a predetermined number of groups. Today there are extensions that allow for the degree sequence in each group to be taken into account, for the existence of overlapping groups, as well as to derive the number of groups from the data, or provide a nested (hierarchical) structure of blocks. This means you not only learn the groups, but you learn how the groups are grouped themselves in a meta-network, and so on (meta-meta-network!), until you end up with a single group at the root.