- Informatik in Kassel
- FG Wissensverarbeitung
- Technikgestaltung @ ITeG
- Hochschulforschung @ INCHER
- Web Science @ L3S
- Source Code
- Data Sets
Dr. Stephan Doerfel
- Universität Kassel, Fachgebiet Wissensverarbeitung
- Wilhelmshöher Allee 73
- 34121 Kassel ( Germany)
- Raum 0445F
- Telefon: +49 (0)561 804-6252
- Fax: +49 (0)561 804-6259
I earned my phd at the University of Kassel in 2017 for my thesis "Supporting Researchers: Analyzing the Scholarly Publication Life Cycle and Social Bookmarking Systems".
My research there includes the investigation of
scientometrics on the web (altmetrics), recommender systems (in general and specifically for scholarly literature) and their possible applications.
Further interests include formal concept analysis, social network analysis, and machine learning in general.
I was also a senior developer of the blue social bookmark and publication sharing system BibSonomy. In 2016, I joined Micromata GmbH as data scientist and big data engineer.
Full list of my publications and posters.
- Zoller, D., Doerfel, S., Jäschke, R., Stumme, G., Hotho, A.: Posted, visited, exported: Altmetrics in the social tagging system BibSonomy. Journal of Informetrics . 10, 732 - 749 (2016).Abstract In social tagging systems, like Mendeley, CiteULike, and BibSonomy, users can post, tag, visit, or export scholarly publications. In this paper, we compare citations with metrics derived from users’ activities (altmetrics) in the popular social bookmarking system BibSonomy. Our analysis, using a corpus of more than 250,000 publications published before 2010, reveals that overall, citations and altmetrics in BibSonomy are mildly correlated. Furthermore, grouping publications by user-generated tags results in topic-homogeneous subsets that exhibit higher correlations with citations than the full corpus. We find that posts, exports, and visits of publications are correlated with citations and even bear predictive power over future impact. Machine learning classifiers predict whether the number of citations that a publication receives in a year exceeds the median number of citations in that year, based on the usage counts of the preceding year. In that setup, a Random Forest predictor outperforms the baseline on average by seven percentage points.Doerfel, S., Jäschke, R., Stumme, G.: The Role of Cores in Recommender Benchmarking for Social Bookmarking Systems. ACM Transactions on Intelligent Systems and Technology. 7, 40:1-40:33 (2016).Social bookmarking systems have established themselves as an important part in today’s web. In such systems, tag recommender systems support users during the posting of a resource by suggesting suitable tags. Tag recommender algorithms have often been evaluated in offline benchmarking experiments. Yet, the particular setup of such experiments has rarely been analyzed. In particular, since the recommendation quality usually suffers from difficulties like the sparsity of the data or the cold start problem for new resources or users, datasets have often been pruned to so-called cores (specific subsets of the original datasets) – however without much consideration of the implications on the benchmarking results. In this paper, we generalize the notion of a core by introducing the new notion of a set-core – which is independent of any graph structure – to overcome a structural drawback in the previous constructions of cores on tagging data. We show that problems caused by some types of cores can be eliminated using setcores. Further, we present a thorough analysis of tag recommender benchmarking setups using cores. To that end, we conduct a large-scale experiment on four real-world datasets in which we analyze the influence of different cores on the evaluation of recommendation algorithms. We can show that the results of the comparison of different recommendation approaches depends on the selection of core type and level. For the benchmarking of tag recommender algorithms, our results suggest that the evaluation must be set up more carefully and should not be based on one arbitrarily chosen core type and level.Doerfel, S., Zoller, D., Singer, P., Niebler, T., Hotho, A., Strohmaier, M.: What Users Actually do in a Social Tagging System: A Study of User Behavior in BibSonomy. ACM Transactions on the Web. 10, 14:1--14:32 (2016).Social tagging systems have established themselves as an important part in today’s web and have attracted the interest of our research community in a variety of investigations. Henceforth, several aspects of social tagging systems have been discussed and assumptions have emerged on which our community builds their work. Yet, testing such assumptions has been difficult due to the absence of suitable usage data in the past. In this work, we thoroughly investigate and evaluate four aspects about tagging systems, covering social interaction, retrieval of posted resources, the importance of the three different types of entities, users, resources, and tags, as well as connections between these entities’ popularity in posted and in requested content. For that purpose, we examine live server log data gathered from the real-world, public social tagging system BibSonomy. Our empirical results paint a mixed picture about the four aspects. While for some, typical assumptions hold to a certain extent, other aspects need to be reflected in a very critical light. Our observations have implications for the understanding of social tagging systems, and the way they are used on the web. We make the dataset used in this work available to other researchers.Doerfel, S., Zoller, D., Singer, P., Niebler, T., Hotho, A., Strohmaier, M.: How Social is Social Tagging? Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion. pp. 251-252. International World Wide Web Conferences Steering Committee, Seoul, Korea (2014).Social tagging systems have established themselves as an important part in today's web and have attracted the interest of our research community in a variety of investigations. This has led to several assumptions about tagging, such as that tagging systems exhibit a social component. In this work we overcome the previous absence of data for testing such an assumption. We thoroughly study social interaction, leveraging for the first time live log data gathered from the real-world public social tagging system \bibs. Our results indicate that sharing of resources constitutes an important and indeed social aspect of tagging.Doerfel, S., Hotho, A., Kartal-Aydemir, A., Roßnagel, A., Stumme, G.: Informationelle Selbstbestimmung Im Web 2.0 - Chancen Und Risiken Sozialer Verschlagwortungssysteme. Vieweg + Teubner Verlag (2013).Die neue Generation des Internets („Web 2.0“ oder „Social Web“) zeichnet sich durch eine sehr freizügige Informationsbereitstellung durch seine Nutzer aus. Vor diesem Hintergrund haben Informatiker und Juristen in enger Interaktion die Chancen und Risiken der neuen Web 2.0-Technologien erkundet und gestaltet. Nach Bestandsaufnahme werden die technischen und rechtlichen Chancen und Risiken bezogen auf typisierte Aufgaben analysiert. Generische Konzepte für die datenschutzgerechte Gestaltung einer Anwendung wie Identitätsmanagement, Vermeidung von Personenbezug, Profilbildung und Verantwortlichkeiten werden erarbeitet. Parallel dazu werden Algorithmen und Verfahren für diese Konzepte vorgestellt: Recommender-Systeme für kooperative Verschlagwortungssysteme sowie Spam-Entdeckungsverfahren für solche Systeme. Sie werden anhand realer Daten evaluiert. Alle Ergebnisse werden anhand des Social Bookmarking-Systems BibSonomy erläutert. Schließlich wird diskutiert, inwieweit Dogmatik und Auslegung des Datenschutzrechts wegen der neuen Problemlagen des Web 2.0 verändert werden müssen und eventuell gesetzgeberische Aktivitäten erforderlich oder ratsam sind.
projectsThese are the projects I have mainly worked on. Through my research, I have further been involved with the projects Venus, EveryAware, and PoSTs .
PUMAIn the DFG funded project "Akademisches Publikationsmanagement" (PUMA), we used and extended the BibSonomy software to create a new web portal that is run locally at an institution (library, university, etc.) and that integrates there with the existing eco system by connecting to the local open access repository, the publication discovery service or the eLearning platform.
Info 2.0In the DFG funded Project "Informationelle Selbstbestimmung im Web 2.0" (Info 2.0), we analyzed opportunities and risks in Web 2.0 systems, regarding issues of privacy. A second aspect that was investigated were consequences of user-generated ratings and reviews of products and particularly of scholarly publications. The project's results have been published as a book.
BibSonomyBibSonomy is a scholarly social bookmarking system where researchers manage their collections of publications and web pages. BibSonomy is an open source project, continously developed by researchers in Kassel, Würzburg, and Hanover. Functioning as a test bed for recommendation and ranking algorithms, as well as through the publicly available datasets, containing traces of user behavior on the Web, BibSonomy has been the subject of various scientific studies.
Full list of my reviewing and teaching activities.
- PC Member: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2017) , September 18 - 22, 2017, Skopje, Macedonia.
- PC Member: 3rd International Workshop on Machine learning, Optimization and Big Data (MOD 2017) , September 14 - 17, 2017, Volterra, Italy.
- PC Member: 23rd International Symposium on Methodologies for Intelligent Systems (ISMIS 2017) , June 26 - 28, 2017, Warsaw, Poland.
- Journal Reviewer: Information Processing & Management (Elsevier) (2017)
- PC Member: 14th International Conference on Formal Concept Analysis (ICFCA 2017), June 12 - 16, 2017, Rennes, France.
- Journal Reviewer: Knowledge-Based Systems (Elsevier) (2017)
- PC Member: Workshop on Knowledge Discovery, Data Mining and Machine Learning (KDML 2016) , September 12 - 14, 2016, Potsdam, Germany.
- PC Member: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2016) , September 19 - 23, 2016, Riva del Garda, Italy.
- Subreviewer: 25th International Joint Conference on Artificial Intelligence (IJCAI 2016) , July 9 - 15, 2016, New York City, USA.
- PC Member: 2nd International Workshop on Machine learning, Optimization and Big Data (MOD 2016) , August 26 - 29, 2016, Volterra, Italy.
- Sommersemester 2015