Welcome to this year's ECML PKDD Discovery Challenge. At last year's challenge for the first time a dataset from the Web 2.0 was used. Researchers from all over the world tried to solve the two tasks spam detection and tag recommendation. As lot of researchers missed the first challenge but continuously work on similar tasks, we were asked for a second round of the challenge to give them a chance to apply their latest research results this year. As last year's spam detection task was very well addressed, we decided to focus on a setup with three different tag recommendation tasks.
ECML PKDD Discovery Challenge 2009
Back from ECML PKDD
September 14th 2009
After a long journey on Friday we are back from Bled where we enjoyed a great week. We conclude this year's challenge by publishing even more results and files and announcing the winners.
Task 1 Content-Based Tag Recommendations
Tag Sources for Recommendation in Collaborative Tagging Systems PDF
Task 2 Graph-Based Recommendations
Factor Models for Tag Recommendation in BibSonomy PDF
Task 3 Online Tag Recommendations
Tag Sources for Recommendation in Collaborative Tagging Systems PDF
The final results of the challenge are available online. Since some participants did not attend the workshop, the list now includes 16 submissions for Task 1 and 13 submissions for Task 2. As discussed on the workshop, we also put your submitted result files online. They are linked to your submission id on the results page.
The results for Task 3 (online tag recommendations) are also online. The page contains a more detailed evaluation of timeouts, precision/recall, click rates, etc. Most of the diagrams are described in our paper Testing and Evaluating Tag Recommenders in a Live System.
Last but not least we would like to thank you all for participating in this year's challenge and creating such an interesting workshop. We hope you enjoyed the challenge and went a step forward in your research.
Proceedings published at CEUR-WS.org
September 3rd 2009
Latest details on paper and poster presentation
September 2nd 2009
It's time for some information regarding the poster presentation:
- The maximum size of the poster should be 0,9m x 1,9m.
- You can prepare ONE slide for the minute madness! Please give it to us at latest during the first coffee break.
Latest details on paper presentation in Bled
July 17th 2009
We plan to hold the workshop in Bled on Monday, September 7th 2009. This gives you probably a broader audience than on Friday and it would also fit nicely with the official announcement of the challenge winners Monday evening at the conference opening ceremony - which nicely completes the workshop.
Until we know the final number of participants, we can't promise that everybody get's time for a talk. However, for the first three participants we promise that they'll get a talk (we plan with 20 minutes plus 10 minutes discussion). Depending on the total number of participants, the remaining papers get a talk or a poster presentation.
Since there are some indications that some of the better placed participants won't attend the conference (and thus will be discarded on the final result list) we encourage everybody to submit a paper and register for the conference! Chances are good that you'll get a talk and even if not - we are glad to see you all at a possible poster presentation where we have a lot of time to discuss your work!
The reason why we are so strict with the need to attend the workshop is that we think the idea of a workshop is to meet each other and exchange ideas. And this is hopefully also the reason why you participated in the challenge and submitted a paper - because you're interested in meeting people which work on similar topics like you and discuss problems and ideas and maybe even settle future cooperation.
All accepted papers will be published in the workshop proceedings which will be available online and in printed form (to be confirmed). There is no distinction between papers which have been presented as poster or as talk. Only papers from registered participants will be published in the proceedings.
Announcement of results and accepted papers
July 16th 2009
After some busy days we can finally present the results of this year's Discovery Challenge. As mentioned on that page, the ranking is subject to change, since participation and presentation of a paper/poster at the conference are a condition to be on the final list. Then, also the names of the first teams will be announced.
Furthermore, we have sent out the notifications for your papers. Please read the brief reviews carefully and follow the suggestions of the reviewer - in particular, if you got a "conditional accept".
The schedule is now as follows:
Please register until July 20th for the conference with the regular fee. (The reduced fee is no longer available on the registration page) The registration agency got the list of authors of accepted papers from us and will ensure you get the reduced fee. If you choose to transfer the amount yourself to the agency, transfer only the reduced fee!
Conditionally Accepted Papers
Until July 20th, 9 AM CEST, 2009 we need an updated version of your paper. Based on this version we will decide until 12 AM CEST if your paper finally gets accepted. You then have time to register until the end of that day to get the early registration fee (see procedure above for accepted papers on further details). Of course, you can register earlier - with the drawback that if your paper get's not accepted you're nevertheless registered (and invited to visit the workshop, of course!).
Final Version for All Accepted Papers
Work on your paper to improve it and in particular follow the suggestions of the brief reviews until August 7th (there's a weekend following that day ... so again, in principle you have time until Monday 10th of August 9 AM CEST).
Tomorrow: announcement of results and accepted papers
July 14th 2009
The amount of paper submissions - 27 - overwhelmed us, such that we have to postpone the announcement of the results and the accepted papers to tomorrow. Thanks for your participation and the work you put into writing the papers! Overall, there were more than 20 result submissions for each task, with the best f1m of the second task being almost twice as high as the best f1m of the first task.
Just a quick note
regarding the intended paper acceptance procedure tomorrow: some papers
will get a conditional accept only. This means, the reviews will contain brief information
about necessary improvements
the authors have to make to get their papers accepted. The earlier the authors make these improvements,
the earlier they know for sure, if their paper gets accepted.
Additionally, we will have to know as soon as possible, who of you will attend the conference, such that we can ensure the early registration fee. Thus, please discuss in your group, who will travel to Bled in September and don't hesitate to already register!
Original tas files of test data released
July 09th 2009
You can use it to calculate your performance using the evaluation procedure.
We'll keep you updated about the results and the performance of the submissions!
Test data released
July 06th 2009
The test datasets for the challenge are now available online. You can upload your results using our submission form.
This is the dataset for task 1: 2009-07-01_task1.tgz.
This is the dataset for task 2: 2009-07-01_task2.tgz.Good luck!
Awards for the Winners
June 24th 2009
We proudly announce that we now have awards for the winners of each of the three tasks!
The winning team of the online challenge (task 3) will get a chance to hover an aircraft through large, not so windy rooms: a remote-controlled helicopter is their prize for having the best f1 measure. This prize is sponsored by the Tagora project.
Many thanks to the sponsors for providing these prizes!
Please note: as a requirement to get one of the awards, you need to a) have the best f1 measure (for the first five tags) in the corresponding task, b) submit a sound paper explaining your method (only tasks 1 & 2), and c) appear at ECML PKDD 2009 in September in Bled to give a talk and receive the prize.
Finally, the result submission form for tasks 1 and 2 is now online.
Please make yourself familiar with it, as you need to use it to submit your result files. You can already upload files for testing purposes - we will use only the latest file for evaluation.
Description for Online Challenge available
June 8th 2009
The description on how you can participate in the third task - the online challenge - is now online. We carefully checked that the description is complete and correct, however, in such a complex task mistakes easily happen. Thus, please send us an email if you have questions, find errors or unclear descriptions! In particular, we encourage interested participants to comment on the task and the description - the setting as such is pretty new to us and we are looking forward into an interesting competition!
Interested parties should contact us during this month (June, not later than June 30th) such that we can estimate the expected amount of participants and can discuss where to run the algorithms (on our machines our remotely on yours). We have also set up a test platform where you now can test your setting.
Finally, we have detailed the evaluation procedure: We will compute precision and recall for each post in the test data set (regarding the first five recommended tags only!) and then average both over all posts. The final F1-Measure will be computed using the averaged recall and precision as f1m = (2 * precision * recall) / (precision + recall).
Key dates fixed
May 29th 2009
We have now fixed the dates for paper submission (July 10th) and notification of winners (July 14th). Please note that you have to submit a paper describing your approach two days after submitting the results. The submission of a paper is a prerequisite to get publicly announced as winner of the challenge and to present the results at the ECML PKDD workshop.
Although the early registration deadline of ECML PKDD will be July 1st, we ensure that participants whose papers got accepted can also register with the reduced fee.
Apr 22th 2009
Today we released details on the evaluation for the offline tasks. We describe the test data file formats, the result data file formats, the evaluation measures and provide sample data and the evaluation program.
Apr 8th 2009
We have found an error in our cleansing procedure to generate the dumps for the challenge. As described here, we first clean the tags and then remove empty tags or tags which match one of the tags imported, public, systemimported, nn, systemunfiled. Unfortunately, we accidentally checked against the tags imported, public, system:imported, nn, system:unfiled (note the colon in the system tags). We have now fixed the datasets and linked them on the dataset page. The old dumps are still available as 2009-01-01_cleaned_2009-03-18.tgz and 2009-01-01_cleaned_post-core-2_2009-03-18.tgz.
We're sorry for the caused trouble!
Mar 25th 2009
The challenge is open for participation now!
This year's discovery challenge consists of three tasks in the area of social bookmarking. All tasks target the support of the user during the tagging process by recommending tags. As we are hosting the social bookmark and publication sharing system BibSonomy, we are able to provide a complete dataset of BibSonomy for the challenge. A training dataset for all tasks is provided at the beginning of the competition. The test dataset will be released 48 hours before the final deadline, except for the online challenge. The presentation of the results will take place at the ECML PKDD workshop where the top teams are invited to present their approaches and results. The winners of each task will be awarded a prize!
To get started with the tasks we suggest that you make yourself familiar with
A more formal description of the underlying structure which is called folksonomy is given in
where also a description of the BibSonomy components are given.
The next step is to subscribe to the mailing list dc09 (closed). We will use the list to distribute news about the challenge or other important information. Furthermore, the list can be used to clarify questions about the dataset and the different tasks. As the welcome message on the list contains information about how to access the dataset, subscribing to this list is essential to participate in the challenge. Update: Since the mailing list is closed now, follow the instructions on how to acquire a BibSonomy dump. You can participate either at one of the challenges, at two of them or at all.
There are three tasks in the area of tag recommendation, each of them focuses on a certain aspect of this problem. All three tasks get the same dataset for training. It is a snapshot of BibSonomy until December 31st 2008. The dataset is cleaned and consists of two parts, the core part and the complete snapshot. Both datasets are described in detail on the dataset page.
The test dataset for each task will be different for each task.
Task 1: Content-Based Tag Recommendations
The test data for this task contains posts, whose user, resource or tags are not contained in the post-core at level 2 of the training data. Thus, methods which can't produce tag recommendations for new resources or are unable to suggest new tags very probably won't produce good results here.
Task 2: Graph-Based Recommendations
This task is especially intended for methods relying on the graph structure of the training data only. The user, resource, and tags of each post in the test data are all contained in the training data's post-core at level 2.
Task 3: Online Tag Recommendations
This is a bonus task which will take place after Tasks 1 and 2. The participants shall implement a recommendation service which can be called via HTTP by BibSonomy's recommender when a user posts a bookmark or publication. All participating recommenders are called on each posting process, one of them is choosen to actually deliver the results to the user. We can then measure the performance of the recommenders in an online setting, where timeouts are important and where we can measure which tags the user clicked.
You will have to implement a REST-based HTTP service which uses parts of BibSonomy's API XML schema (in particular the TagsType and PostsType). You can then run the service yourself remotely or on one of our servers.
Details about this task can be found here.
- Task description and datasets available online: March 25th
- Test dataset will be released (by midnight CEST): July 6th
- Result submission deadline (by midnight CEST): July 8th
- Workshop paper submission deadline: July 10th
- Notification of winners, publication of results on webpage, notification of acceptance: July 14th
- Workshop proceedings (camera-ready) deadline: August 7th
- ECML PKDD Discovery Challenge Workshop: September 7th
To contact us please send a mail to email@example.com.
- Folke Eisterlehner, University of Kassel
- Andreas Hotho, University of Würzburg
- Robert Jäschke, University of Kassel
The Discovery Challenge is supported by the European Project Tagora - Semiotic Dynamics in Online Social Communities.
To submit your result files, use our submission form.
Your paper must be submitted to the EasyChair submission system in PDF format. Although not required for the initial submission, we recommend to follow the format guidelines of ECML PKDD (Springer LNCS -- LaTeX Style File), as this will be the required format for accepted papers.
The workshop proceedings will be distributed during the workshop. We plan to issue a post workshop publication of selected papers by Springer Lecture Notes.
Until the final results are available, have a look here.