ECML PKDD Discovery Challenge 2009

Welcome to this year's ECML PKDD Discovery Challenge. At last year's challenge for the first time a dataset from the Web 2.0 was used. Researchers from all over the world tried to solve the two tasks spam detection and tag recommendation. As lot of researchers missed the first challenge but continuously work on similar tasks, we were asked for a second round of the challenge to give them a chance to apply their latest research results this year. As last year's spam detection task was very well addressed, we decided to focus on a setup with three different tag recommendation tasks.

Latest News

Back from ECML PKDD

After a long journey on Friday we are back from Bled where we enjoyed a great week. We conclude this year's challenge by publishing even more results and files and announcing the winners.

The Winners

We are proud to present the winners of this year's ECML PKDD Discovery Challenge. To see the second and third of each task, please have a look at the results page for Tasks 1 and 2 and Task 3.

Task 1 Content-Based Tag Recommendations

Tag Sources for Recommendation in Collaborative Tagging Systems Marek Lipczak, Yeming Hu, Yael Kollet, and Evangelos Milios PDF

Task 2 Graph-Based Recommendations

Factor Models for Tag Recommendation in BibSonomy Steffen Rendle and Lars Schmidt-Thieme PDF

Task 3 Online Tag Recommendations

Tag Sources for Recommendation in Collaborative Tagging Systems Marek Lipczak, Yeming Hu, Yael Kollet, and Evangelos Milios PDF


The slides of our opening remarks and closing remarks are now online. They include an announcement of the winners and some detailed evaluation.

Final Results

The final results of the challenge are available online. Since some participants did not attend the workshop, the list now includes 16 submissions for Task 1 and 13 submissions for Task 2. As discussed on the workshop, we also put your submitted result files online. They are linked to your submission id on the results page.

The results for Task 3 (online tag recommendations) are also online. The page contains a more detailed evaluation of timeouts, precision/recall, click rates, etc. Most of the diagrams are described in our paper Testing and Evaluating Tag Recommenders in a Live System.

Thank You

Last but not least we would like to thank you all for participating in this year's challenge and creating such an interesting workshop. We hope you enjoyed the challenge and went a step forward in your research.

Proceedings published at

The proceedings are now published at as Volume 497. The complete PDF is also available here. You can cite a paper of the proceedings using this post as template.

Latest details on paper and poster presentation

It's time for some information regarding the poster presentation:

  • The maximum size of the poster should be 0,9m x 1,9m.
  • You can prepare ONE slide for the minute madness! Please give it to us at latest during the first coffee break.
We'll keep you updated about further things you should know!

Latest details on paper presentation in Bled

We plan to hold the workshop in Bled on Monday, September 7th 2009. This gives you probably a broader audience than on Friday and it would also fit nicely with the official announcement of the challenge winners Monday evening at the conference opening ceremony - which nicely completes the workshop.

Paper presentation

Until we know the final number of participants, we can't promise that everybody get's time for a talk. However, for the first three participants we promise that they'll get a talk (we plan with 20 minutes plus 10 minutes discussion). Depending on the total number of participants, the remaining papers get a talk or a poster presentation.

Since there are some indications that some of the better placed participants won't attend the conference (and thus will be discarded on the final result list) we encourage everybody to submit a paper and register for the conference! Chances are good that you'll get a talk and even if not - we are glad to see you all at a possible poster presentation where we have a lot of time to discuss your work!

The reason why we are so strict with the need to attend the workshop is that we think the idea of a workshop is to meet each other and exchange ideas. And this is hopefully also the reason why you participated in the challenge and submitted a paper - because you're interested in meeting people which work on similar topics like you and discuss problems and ideas and maybe even settle future cooperation.

Workshop proceedings

All accepted papers will be published in the workshop proceedings which will be available online and in printed form (to be confirmed). There is no distinction between papers which have been presented as poster or as talk. Only papers from registered participants will be published in the proceedings.

Announcement of results and accepted papers

After some busy days we can finally present the results of this year's Discovery Challenge. As mentioned on that page, the ranking is subject to change, since participation and presentation of a paper/poster at the conference are a condition to be on the final list. Then, also the names of the first teams will be announced.

Furthermore, we have sent out the notifications for your papers. Please read the brief reviews carefully and follow the suggestions of the reviewer - in particular, if you got a "conditional accept".

The schedule is now as follows:

Accepted Papers

Please register until July 20th for the conference with the regular fee. (The reduced fee is no longer available on the registration page) The registration agency got the list of authors of accepted papers from us and will ensure you get the reduced fee. If you choose to transfer the amount yourself to the agency, transfer only the reduced fee!

Conditionally Accepted Papers

Until July 20th, 9 AM CEST, 2009 we need an updated version of your paper. Based on this version we will decide until 12 AM CEST if your paper finally gets accepted. You then have time to register until the end of that day to get the early registration fee (see procedure above for accepted papers on further details). Of course, you can register earlier - with the drawback that if your paper get's not accepted you're nevertheless registered (and invited to visit the workshop, of course!).

Final Version for All Accepted Papers

Work on your paper to improve it and in particular follow the suggestions of the brief reviews until August 7th (there's a weekend following that day ... so again, in principle you have time until Monday 10th of August 9 AM CEST).

Tomorrow: announcement of results and accepted papers

The amount of paper submissions - 27 - overwhelmed us, such that we have to postpone the announcement of the results and the accepted papers to tomorrow. Thanks for your participation and the work you put into writing the papers! Overall, there were more than 20 result submissions for each task, with the best f1m of the second task being almost twice as high as the best f1m of the first task.

Just a quick note regarding the intended paper acceptance procedure tomorrow: some papers will get a conditional accept only. This means, the reviews will contain brief information about necessary improvements the authors have to make to get their papers accepted. The earlier the authors make these improvements, the earlier they know for sure, if their paper gets accepted.
Additionally, we will have to know as soon as possible, who of you will attend the conference, such that we can ensure the early registration fee. Thus, please discuss in your group, who will travel to Bled in September and don't hesitate to already register!

Original tas files of test data released

After a busy night and more than fourty submissions, the original tas files for the test data of task 1 and task 2 is now available online.

You can use it to calculate your performance using the evaluation procedure.

We'll keep you updated about the results and the performance of the submissions!

Test data released

The test datasets for the challenge are now available online. You can upload your results using our submission form.

This is the dataset for task 1: 2009-07-01_task1.tgz.

This is the dataset for task 2: 2009-07-01_task2.tgz.

Good luck!

Awards for the Winners

We proudly announce that we now have awards for the winners of each of the three tasks!

The winning teams of task 1 and 2 each will get one mobile phone (E71) sponsored by NOKIA. A nice thing about that model is that it works reliably all over the world. Happy calling!

The winning team of the online challenge (task 3) will get a chance to hover an aircraft through large, not so windy rooms: a remote-controlled helicopter is their prize for having the best f1 measure. This prize is sponsored by the Tagora project.

Many thanks to the sponsors for providing these prizes!

Please note: as a requirement to get one of the awards, you need to a) have the best f1 measure (for the first five tags) in the corresponding task, b) submit a sound paper explaining your method (only tasks 1 & 2), and c) appear at ECML PKDD 2009 in September in Bled to give a talk and receive the prize.

Finally, the result submission form for tasks 1 and 2 is now online.
Please make yourself familiar with it, as you need to use it to submit your result files. You can already upload files for testing purposes - we will use only the latest file for evaluation.

Description for Online Challenge available

The description on how you can participate in the third task - the online challenge - is now online. We carefully checked that the description is complete and correct, however, in such a complex task mistakes easily happen. Thus, please send us an email if you have questions, find errors or unclear descriptions! In particular, we encourage interested participants to comment on the task and the description - the setting as such is pretty new to us and we are looking forward into an interesting competition!

Interested parties should contact us during this month (June, not later than June 30th) such that we can estimate the expected amount of participants and can discuss where to run the algorithms (on our machines our remotely on yours). We have also set up a test platform where you now can test your setting.

Finally, we have detailed the evaluation procedure: We will compute precision and recall for each post in the test data set (regarding the first five recommended tags only!) and then average both over all posts. The final F1-Measure will be computed using the averaged recall and precision as f1m = (2 * precision * recall) / (precision + recall).

Key dates fixed

We have now fixed the dates for paper submission (July 10th) and notification of winners (July 14th). Please note that you have to submit a paper describing your approach two days after submitting the results. The submission of a paper is a prerequisite to get publicly announced as winner of the challenge and to present the results at the ECML PKDD workshop.

Although the early registration deadline of ECML PKDD will be July 1st, we ensure that participants whose papers got accepted can also register with the reduced fee.

Evaluation details

Today we released details on the evaluation for the offline tasks. We describe the test data file formats, the result data file formats, the evaluation measures and provide sample data and the evaluation program.

If you have any questions regarding the evaluation, don't hesitate to contact us, or discuss it on the mailing list (closed).

Updated datasets

We have found an error in our cleansing procedure to generate the dumps for the challenge. As described here, we first clean the tags and then remove empty tags or tags which match one of the tags imported, public, systemimported, nn, systemunfiled. Unfortunately, we accidentally checked against the tags imported, public, system:imported, nn, system:unfiled (note the colon in the system tags). We have now fixed the datasets and linked them on the dataset page. The old dumps are still available as 2009-01-01_cleaned_2009-03-18.tgz and 2009-01-01_cleaned_post-core-2_2009-03-18.tgz.

We're sorry for the caused trouble!

Challenge opened

The challenge is open for participation now!


This year's discovery challenge consists of three tasks in the area of social bookmarking. All tasks target the support of the user during the tagging process by recommending tags. As we are hosting the social bookmark and publication sharing system BibSonomy, we are able to provide a complete dataset of BibSonomy for the challenge. A training dataset for all tasks is provided at the beginning of the competition. The test dataset will be released 48 hours before the final deadline, except for the online challenge. The presentation of the results will take place at the ECML PKDD workshop where the top teams are invited to present their approaches and results. The winners of each task will be awarded a prize!

To get started with the tasks we suggest that you make yourself familiar with BibSonomy. A more formal description of the underlying structure which is called folksonomy is given in this paper (pdf here) where also a description of the BibSonomy components are given.
The next step is to subscribe to the mailing list dc09 (closed). We will use the list to distribute news about the challenge or other important information. Furthermore, the list can be used to clarify questions about the dataset and the different tasks. As the welcome message on the list contains information about how to access the dataset, subscribing to this list is essential to participate in the challenge. Update: Since the mailing list is closed now, follow the instructions on how to acquire a BibSonomy dump. You can participate either at one of the challenges, at two of them or at all.


There are three tasks in the area of tag recommendation, each of them focuses on a certain aspect of this problem. All three tasks get the same dataset for training. It is a snapshot of BibSonomy until December 31st 2008. The dataset is cleaned and consists of two parts, the core part and the complete snapshot. Both datasets are described in detail on the dataset page.

The test dataset for each task will be different for each task.

Task 1: Content-Based Tag Recommendations

The test data for this task contains posts, whose user, resource or tags are not contained in the post-core at level 2 of the training data. Thus, methods which can't produce tag recommendations for new resources or are unable to suggest new tags very probably won't produce good results here.

Task 2: Graph-Based Recommendations

This task is especially intended for methods relying on the graph structure of the training data only. The user, resource, and tags of each post in the test data are all contained in the training data's post-core at level 2.

Task 3: Online Tag Recommendations

This is a bonus task which will take place after Tasks 1 and 2. The participants shall implement a recommendation service which can be called via HTTP by BibSonomy's recommender when a user posts a bookmark or publication. All participating recommenders are called on each posting process, one of them is choosen to actually deliver the results to the user. We can then measure the performance of the recommenders in an online setting, where timeouts are important and where we can measure which tags the user clicked.

You will have to implement a REST-based HTTP service which uses parts of BibSonomy's API XML schema (in particular the TagsType and PostsType). You can then run the service yourself remotely or on one of our servers.

Details about this task can be found here.

Key Dates

  • Task description and datasets available online: March 25th
  • Test dataset will be released (by midnight CEST): July 6th
  • Result submission deadline (by midnight CEST): July 8th
  • Workshop paper submission deadline: July 10th
  • Notification of winners, publication of results on webpage, notification of acceptance: July 14th
  • Workshop proceedings (camera-ready) deadline: August 7th
  • ECML PKDD Discovery Challenge Workshop: September 7th


To contact us please send a mail to

The Discovery Challenge is supported by the European Project Tagora - Semiotic Dynamics in Online Social Communities.


To submit your result files, use our submission form.

Your paper must be submitted to the EasyChair submission system in PDF format. Although not required for the initial submission, we recommend to follow the format guidelines of ECML PKDD (Springer LNCS -- LaTeX Style File), as this will be the required format for accepted papers.

The workshop proceedings will be distributed during the workshop. We plan to issue a post workshop publication of selected papers by Springer Lecture Notes.


Until the final results are available, have a look here.