Nothing changed this week since there where no new result submissions. We are waiting for your recommender result submission for the next leaderboard which will be online on April 12th.
| Pos | Diff | Team Name | Score |
|---|---|---|---|
| 1 | TeamUFCG | 0,0262 | |
| 2 | TomFu | 0,0158 |
Nothing changed this week since there where no new result submissions. We are waiting for your recommender result submission for the next leaderboard which will be online on April 12th.
| Pos | Diff | Team Name | Score |
|---|---|---|---|
| 1 | TeamUFCG | 0,0262 | |
| 2 | TomFu | 0,0158 |
Our first baby name heroes are team TeamUFCG and TomFu. We are waiting for your recommender result submission for the next leaderboard which will be online on April 5th.
| Pos | Diff | Team Name | Score |
|---|---|---|---|
| 1 | TeamUFCG | 0,0262 | |
| 2 | TomFu | 0,0158 |
Once again, it’s time for some number crunching fun… Today, I looked at the interrelationship of first names within Twitter’s Follower graph and got some beautiful results.
For the analysis, I used an excerpt of the Follower graph, consisting of 1,486,403 users and 72,590,619 links (as described here), as well as the name co-occurrence graph based on the English Wikipedia corpus which is used for calculating name similarities in Nameling (as described in the Nameling papers). The fist names of Twitter users were extracted from the users’ profile data, where a user may provide her or his full name. Of course, many users just entered some fantasy name. Accordingly, the first token of the provided name string which matched against our list of known names was chosen as the user’s first name. This process induces some noise into the data, but due to the vast number of considered pairs of users, this effect should be neglectable.
Now, relative to 3,078 randomly chosen users (our Linux cluster is still crunching on more), I calculated the average name similarity of direct neighbours in the Follower graph, the average name similarity between pairs of users at a (shortest path) distance of two, …of three, and so on. For reference, I also added the total average name similarity for all considered pairs of users, as depicted by the grey dashed line. Finally, the error bars correspond to the 95% confidence interval.

As we can see, users which are located more closely within the follower graph tend to have more similar names than distant users. Additionally, a monotonically decreasing dependency between the average name similarity and the shortest path distance in the follower graph can be observed. Moreover, users at a distance up to three tend to have more similar names than in average, whereas users with shortest path distances above three tend to have less similar names than in average.
Stay tuned for more results (eg. considering the ReTweet graph and ReTweet frequency) and happy number crunching!
Aloha! Привет! Hola! Shalom! السلام علیکم Olá! Salut! Hallo! Cześć! Hi! 你好 Pozdravljeni! …
Participants from 16 countries already registered to the challenge. Don’t miss the chance and join! First results will be published on April, 1st. We are currently preparing the leader board, which will then be updated every Friday.
As deviating naming habits emerge from different cultural contexts, we are also looking forward to inspiring conversations at the workshop!
Today we were made aware of some inconveniences in the Perl script for splitting your training and test set from the public challenge data:
The script is still not user friendly, but at least it prints out some messages now. We added a FAQ entry which exemplifies the process of splitting the public data.
Science is universal, independent of cultural prejudices and political boundaries – and of course, independent of an author’s name. Or not? Try calculating the average similarity of your name with all of your co-authors’ names, and the average name similarity with your co-authors’ co-authors, and so on.
Read more on our analysis of Paul Erdős’ Collaboration network.
Try calculating the overall average similarity of names (e.g. based on co-occurrences in Wikipedia) and the average similarity of you and all your friends’ names. Do these average similarity scores differ significantly?
You find challenge-related publications on our new literature page. Contribution is open via BibSonomy. You will have to join the 20DC13 group.
The challenge is open for participation now!
We are pleased to organize this year’s ECML PKDD Discovery Challenge, tackling the task of recommending given names. The challenge comprises two phases:
Of course there will be prizes that we will announce later!