Teaser #2: Given Names and the Co-Authorship Relation

Science is universal, independent of cultural prejudices and political boundaries – and of course, independent of an author’s name. Or not? Try calculating the average similarity of your name with all of your co-authors’ names, and the average name similarity with your co-authors’ co-authors, and so on.

We did these calculations for the Paul Erdős’ collaboration network. Paul Erdős is known for having published papers with more collaborators than any other mathematician (the considered collaboration network counts 572 direct collaborators and 6383 at distance two). For reference, we additionally calculated the average name similarity for 1000 randomly relabled collaboration networks (keeping Paul Erdős’ node fixed), as depicted in grey on the plot below. The given error ranges correspond to the 95% confidence interval.

First of all, we note only a small difference in magnitude for the average name similarity at distance 1 and distance 2. Nevertheless, considering the 95% confidence interval, even for Paul Erdős, the tendency of co-authors having more similar names, can not be neglected. For co-authors at distance 2, author names even exhibit the very slight tendency of being less similar than according randomly chosen co-authors.

Of course, we must take care to avoid the confusion of correlation and causality. Giving your child the name “Paul” won’t increase the probability of collaboration with Paul Erdős (especially as he unfortunately already died in 1996). Nevertheless, considering your own collaboration network, more astonishing results may be observed…

P.S.: The name “Paul” by itself is special, as it is one of the most popular names in Wikipedia and accordingly, more related to other names as the average name is. The impact of the source name’s distributional properties can be ignored, by calculating the pairwise average name similarity at distance k separately. In case for Paul Erdős, the direct co-authors’ names have an average pairwise similarity score of 0.64 in contrast to 0.51 at distance two (estimated, as the calculation hasn’t finished yet).

15th Discovery Challenge

organized in conjunction with ECML PKDD 2013

Teaser #2: Given Names and the Co-Authorship Relation