...research!

Beside aiding future parents in finding and choosing a suitable given name, the Nameling also serves as an academic testbed for new approaches to given name recommendation techniques.

We encourage the community to participate and support other researchers by giving access to an anonymized dataset of usage patterns which accrue in the running system.

The published usage data consists of activities, such as

User ID Activity Type Name Timestamp
1 ENTER_SEARCH abdul 1339779605
2 ENTER_SEARCH Nik 1339779618
2 ENTER_SEARCH Moritz 1339779629
2 LINK_CATEGORY_SEARCH Developmental genes and proteins 1339779647
3 ENTER_SEARCH henriette 1339779674
3 LINK_SEARCH Reinhard 1339789647
3 ADD_FAVORITE Maite 1339789901

In this case, user 1 entered the name "abdul" in Nameling's search form on Fri, 15 Jun 2012 17:00:05 GMT (1339779605, Unix time), user 2 entered the names "Nik" and "Moritz" and followed the link "Developmental genes and proteins". Finally, user 3 entered the name "henriette", followed the link to the name "Reinhard" and added the name "Maite" to the list of favorite names.

Please note that the order of activities within the activity log files are not necessarily chronologically ordered and that no data cleansing took place. All data is UTF-8 encoded and published as a tabulator separated plain text file.

Dumps for Research Purposes

Please understand that data privacy is an important issue especially in the context of searching for a baby name and that we want to prevent commercial use of Nameling's search profiles. Before you get access to the dataset, you therefore have to sign up our license agreement and send a digitalized copy per mail to nameling at cs.uni-kassel.de . Upon receipt of your signed license agreement, we will send you instructions on how to access the dataset.

We are quite interested in results you got with the help of this dataset. Therefore, please inform us about your publications. Concerning citing this data in publications, please cite the following article:

Folke Mitzlaff, Stephan Doerfel, Andreas Hotho, Robert Jäschke, and Juergen Mueller: Summary of the 15th discovery challenge: Recommending given names. 15th Discovery Challenge of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, volume 1120, pages 7–24, Aachen, Germany, January 2014. CEUR-WS. URL ceur-ws.org/Vol-1120. Download: PDF.

If you want to refer to the system, please use the following publication:

Folke Mitzlaff and Gerd Stumme: Namelings - Discover Given Name Relatedness Based on Data from the Social Web. 4th International Conference on Social Informatics, SocInfo 2012, volume 7710 of Lecture Notes in Computer Science, pages 531–534, Berlin / Heidelberg, Germany, 2012. Springer. DOI: 10.1007/978-3-642-35386-4_39. Download: PDF.

Downloads

File Size Description
20DC13_Offline_Challenge.tar.bz2 3.55 MB 20DC13 - The train and test data from the offline challenge.
20DC13_Online_Challenge.tar.bz2 8.79 MB 20DC13 - The usage and approximated location data from the online challenge as well as the mapping between the user IDs from offline to online challenge.
20DC13_Scripts.tar.bz2 3.47 KB 20DC13 - All scripts which are needed to reproduce the applied data preprocessing and evaluation.
20DC13_Supplements.tar.bz2 55.1 MB 20DC13 - Supplementary dataset containing the name list, top 100 similar names, and approximate geo locations.
20DC13_Top6_Submissions.tar.bz2 31.7 MB 20DC13 - Submitted recommendations of the top 6 participants (and who submitted a paper to the workshop).
activitylog_20120812.tar.bz2 2.43 MB Nameling research dump from August 2012-08-12 containing 38,723 users with 361,751 activities.

Related Links