News | 15th Discovery Challenge

April 5, 2013 by Juergen

Pos	Diff	Team Name	Score
1		TeamUFCG	0,0262
2		TomFu	0,0158

Pos	Diff	Team Name	Score
1		TeamUFCG	0,0262
2		TomFu	0,0158

Teaser #3: I follow whom my name is alike

Once again, it’s time for some number crunching fun… Today, I looked at the interrelationship of first names within Twitter’s Follower graph and got some beautiful results.

For the analysis, I used an excerpt of the Follower graph, consisting of 1,486,403 users and 72,590,619 links (as described here), as well as the name co-occurrence graph based on the English Wikipedia corpus which is used for calculating name similarities in Nameling (as described in the Nameling papers). The fist names of Twitter users were extracted from the users’ profile data, where a user may provide her or his full name. Of course, many users just entered some fantasy name. Accordingly, the first token of the provided name string which matched against our list of known names was chosen as the user’s first name. This process induces some noise into the data, but due to the vast number of considered pairs of users, this effect should be neglectable.

Now, relative to 3,078 randomly chosen users (our Linux cluster is still crunching on more), I calculated the average name similarity of direct neighbours in the Follower graph, the average name similarity between pairs of users at a (shortest path) distance of two, …of three, and so on. For reference, I also added the total average name similarity for all considered pairs of users, as depicted by the grey dashed line. Finally, the error bars correspond to the 95% confidence interval.

As we can see, users which are located more closely within the follower graph tend to have more similar names than distant users. Additionally, a monotonically decreasing dependency between the average name similarity and the shortest path distance in the follower graph can be observed. Moreover, users at a distance up to three tend to have more similar names than in average, whereas users with shortest path distances above three tend to have less similar names than in average.

Stay tuned for more results (eg. considering the ReTweet graph and ReTweet frequency) and happy number crunching!

March 21, 2013 by .folke

Participants from 16 countries!

Aloha! Привет! Hola! Shalom! السلام علیکم Olá! Salut! Hallo! Cześć! Hi! 你好 Pozdravljeni! …

Participants from 16 countries already registered to the challenge. Don’t miss the chance and join! First results will be published on April, 1st. We are currently preparing the leader board, which will then be updated every Friday.

As deviating naming habits emerge from different cultural contexts, we are also looking forward to inspiring conversations at the workshop!

March 21, 2013 by .folke

Script for splitting training and test data updated

Today we were made aware of some inconveniences in the Perl script for splitting your training and test set from the public challenge data:

the output file names were not consistent with the description (fixed)
the anonymous user ids were anonymized another time (fixed)
different date representation was expected (fixed)

The script is still not user friendly, but at least it prints out some messages now. We added a FAQ entry which exemplifies the process of splitting the public data.

March 18, 2013 by .folke

Teaser #2: Given Names and the Co-Authorship Relation

Science is universal, independent of cultural prejudices and political boundaries – and of course, independent of an author’s name. Or not? Try calculating the average similarity of your name with all of your co-authors’ names, and the average name similarity with your co-authors’ co-authors, and so on.

Read more on our analysis of Paul Erdős’ Collaboration network.

March 8, 2013 by .folke

Teaser #1: How similar are your friends’ names?

Try calculating the overall average similarity of names (e.g. based on co-occurrences in Wikipedia) and the average similarity of you and all your friends’ names. Do these average similarity scores differ significantly?

March 6, 2013 by .folke

First, there will be an offline competition, where participants predict future search activities based on a training data set which is derived from the name search website nameling.
Then, there will also be an online competition, where participants integrate their recommender systems into the nameling website.

Of course there will be prizes that we will announce later!

15th Discovery Challenge

organized in conjunction with ECML PKDD 2013

Category Archives: News

2nd Leaderboard

Leaderboard

Teaser #3: I follow whom my name is alike

Participants from 16 countries!

Script for splitting training and test data updated

Teaser #2: Given Names and the Co-Authorship Relation

Teaser #1: How similar are your friends’ names?

Collaborative Bibliography

Challenge opened

Recommending Given Names