Teaser #1: How similar are your friends’ names?

Try calculating the overall average similarity of names (e.g. based on co-occurrences in Wikipedia) and the average similarity of you and all your friends’ names. Do these average similarity scores differ significantly?

Here are the results for the 20DC13 Team

Firstly, our team members’ first names are Stephan, Andreas, Robert, Folke and Jürgen (ordered alphabetically by the last name).

We constructed the name co-occurrence graph based on sentences within the English Wikipedia, as described in our papers. Each name can then be represented by its “context” vector, i.e., the corresponding row within the co-occurrence graph’s adjacency matrix. We then calculated the similarity between two names as the cosine similarity between the corresponding context vectors. (These is by the way the similarity which is implemented in nameling and is for the respective top 100 similar pairs of name available for download).

Well, here are the pair-wise similarity scores for the 20DC13 team:

Name1 Name2 Similarity
Stephan Andreas 0.901121
Stephan Robert 0.789887
Stephan Folke 0.549801
Stephan Jürgen 0.806095
Andreas Robert 0.688174
Andreas Folke 0.558555
Andreas Jürgen 0.849864
Robert Folke 0.465395
Robert Jürgen 0.569373
Folke Jürgen 0.474674

In average, our team member similarity score is accordingly 0.665294. The total average pair-wise similarity is 0.02914, so our team’s similarity score is more than 22 times above average. Additionally, we repeatedly selected random groups of names of the same size as our team (100,000 repetitions) and calculated the respective average group similarity, resulting in the following histogram:

Histogram of Average Random Team Similarity

So yes, our team’s average name similarity is significantly larger than expected by chance!

Happy number crunching!
.folke

One comment

  1. Hi!

    That’s pretty interesting and shows that similarity of people from the same country, same generation and education level is much higher than a random one. I ask myself what about other teams who organized these challenge before? What is similarity of their names? For instance, a french-greek team organized challenge 2012. I’m pretty sure, the similarities of “greek” part and “french” part will be bigger than the total common one. What do you think, huh?