Dimension Curse Detector. Offenlegung und Bewertung hochdimensionaler Konzentrationsphänomene im maschinellen Lernen
January 2022 – December 2023
LOEWE-Projekt in der Förderlinie Exploration
Projektleiter: Dr. Tom Hanika
Das Projekt quantifiziert und evaluiert hochdimensionale Konzentrationsphänomene in Daten, welche oftmals assoziiert werden mit dem Begriff „Dimension Curse“ oder auch „Fluch der Dimension“. So heißt das Zusammenspiel einer Vielzahl von Effekten, die auftreten, wenn maschinelle Lernverfahren auf hochdimensionale Daten angewendet werden, etwa bei Tumoren in der Medizin. Bisher kann dieses Phänomen noch nicht mit Algorithmen berechnet werden. Es ist daher offen, inwieweit es Ergebnisse wissenschaftlicher Anwendungen entscheidend beeinflusst hat.
Ziel des Projektes „Dimension Curse Detector“ ist es, eine berechenbare Annäherung des Konzentrationsphänomens zu entwickeln und als Prototyp anzuwenden. Der Dimensionsfluch ist in Data Science und Machine Learning ein Sammelbegriff für eine Vielzahl von Phänomenen, die von hochdimensionalen Daten ausgeprägt werden, insbesondere die Konzentration von Bewertungs- und Distanzfunktionen. Dieser Aspekt wird oft nur anekdotisch gefasst, was zu einer Vielzahl von empirisch abgeleiteten Empfehlungen führte, die jedoch mathematisch unfundiert als auch widersprüchlich sind. Um den wissenschaftlichen sowie ökonomischen Einsatz von Methoden der Künstlichen Intelligenz auch für zukünftige, erwartbar hochdimensionale, Daten sicherzustellen, ist das Erkennen sowie die Quantifizierung des Dimensionsfluchs notwendig. Nur so ist es möglich, im Sinne von Explainable AI die auf Basis dieser Daten getroffenen Entscheidungen transparent und damit einer bewussten Reflexion und diskursiven Argumentation zugänglich zu machen. Der Dimension Curse Detector soll in diesem Sinn ein Werkzeug für die Gestaltung gesellschaftlich wünschenswerter IT-Anwendungen werden.
FAIRDIENSTE – Fair Digital Services – Mapping of Controversies
February 2021 – January 2024
Companies that market digital products or services often face the dilemma that their interest in customer data conflicts with customers’ desire for privacy. More and more consumers react sensitively when they are asked to disclose too much data. In business model development, new approaches are needed that take this into account and offer a fair compromise for both sides. The project pursues an interdisciplinary approach that includes both sociological and (business) informatics aspects. Fair business models are investigated that aim at cooperation and value mediation.
For this purpose, it is necessary to examine, among other things, the extent to which a culture of fairness can be economically promoted by means of social media or platform architectures through the outsourcing of value conflicts and associated valuation issues. In support of this approach, the Knowledge & Data Engineering Group will develop a paradigm for the visualization of value conflicts based on mathematical order theory.
REGIO – A Mapping of the Origin and Success of Cooperative Relationships in Regional Research Networks and Innovation Clusters
July 2018 – June 2021
Prolongation: July 2021 – December 2021
Science is affected by local interaction: Participants tend to cooperate with their neighborhood, where neighborhood can refer both to regional nearness and proximity in other dimensions. The goal of the BMBF research project REGIO is to accomplish a better understanding of the impact of geographical and thematic proximity on the creation and success of interaction in science and R&D. The leading hypothesis is that the occurrence of research associations and successful regional innovation clusters implies the cumulation of scientific expertise in a geographical region. Which attributes and constellations of protagonists contribute to the occurrence and success of cooperations?
TOPIKOS – Collaborative Low-Effort Topological and Topological-Social Indoor-Mapping
November 2016 – November 2019
Prolongation: November 2019 – Dezember 2020
We investigate the emergence of topological, topical and social maps of short-lived indoor events such as conferences and fairs without effort for participants and without any need for support by the organizers. By considering scenarios in which no explicit localization infrastructure is given, no mapping has been done before the event and users do not need to deviate from their usual behavior, we follow the idea of “common sense geography” as postulated by the DFG-Excellence-Cluster 264 “Topoi”. We focus on topological maps, which represent (indoor) places, i.e., locations, where people come together, and “paths” between them. We extract topics via text mining from documents of participants (e.g., homepages and tweets). Topical maps extend topological maps through semantic information about the topics participants “bring” to certain places. Social maps additionally provide information about who knows each other and who talked to each other at which place.
CIDA – Computational Intelligence & Data Analytics
November 2017 – April 2020
Computational Intelligence & Data Analytics is an area where knowledge is gained from large amounts of data. The methods originate from very different areas of machine learning (ML) and data analysis, such as statistical learning theory, artificial intelligence, soft computing and others. ML allows for a data-driven approach to develop systems that increasingly supplements or partially replaces a conventional model-driven approach. This means that data is analyzed, models are parameterized with data, and new types of applications are developed. The very different application domains are, for instance, energy systems, automobiles, industrial automation, Internet of Things, marketing, quality control, or process control. A successful application of ML methods requires on the one hand the careful and systematic handling of these methods, and on the other hand a kind of professional “creativity”, i.e. the ability to generate innovation.
Aim of this project was to provide systematic training facilities to students in this field. In particular, the project established labs and computer-based exercises for a series of classes, together with an introductory course for machine learning and data analytics.
WISKIDZ – Changes in Academic Career Dynamics in Germany
October 2013 – September 2016
Prolongation: September 2016 – August 2019
The WISKIDZ project contributes to a better understanding of long-term developments in the recruiting behavior in public research and individual career paths after obtaining a doctoral degree. The analyses are based on dissertation data, which are supplemented by information on publications, patents and macro-economic data among others. The project essentially analysed two key issues. First, we aimed to understand changes in recruiting behavior over time with a special focus put on disciplinary idiosyncrasies. We created and analyzed genealogies of doctoral students and their advisors in selected fields (physics, electronics, management and medieval history) from 1945 to the present. Second, the project probed into the interdependencies of academic and non-academic employment opportunities of young researchers. The key points of interest are direct and indirect effects of exogenous changes in the labor market.
We have established the BibSonomy Genealogy for creating a PhD advisor family tree of German researchers. It is based on dissertation meta data of the German National Library and allows users to edit their advisor relationships.
Funded by: Federal Ministry of Education and Research (Bundesministerium für Bildung und Forschung) within the funding line “FoWiN” [16FWN001]: Forschung zum Wissenschaftlichen Nachwuchs.
CIL – Collaborative Interactive Learning
February 2017 – June 2019
Technical systems are solving increasingly complex tasks with the help of computers. Originally, these systems had been drawn up for particular tasks and operating conditions and were limited to those during runtime. Nowadays, they are able to adapt to new situations, learn from observations and optimize themselves. For that reason, they are often called smart or intelligent. In the future, there will be more and more applications where not all of the data necessary for learning can be provided – even not for self-learning systems at time of design. A simple adaptation (e.g. of parameters) during runtime fails to be sufficient, as well. Reasons are, for instance, the required amount of data, the time needed for acquisition or financial costs and, in particular, the fact that while duration these systems are being confronted with situations not known at the time of development (situations not able to be known, inherently). What is required, hence, is a completely new kind of smart systems with a lifelong ability to learn (corresponding to the aggregate service life of the system) in uncertain and temporal variable environments. These systems need to operate intensively autonomic, by evaluating their own knowledge, independently procuring resources (humans, other systems, internet etc.) or connecting with them, rating information of others (e.g. with respect to currentness) and thereby using different machine learning methods (e.g. Collaborative Learning or Active Learning).
The aim of this project was the investigation of a class of entirely new technologies for the development of systems outlined above and which we identify as Collaborative Interactive Learning (CIL). These machine learning methods are ‘collaborative’ in the sense that several systems cooperate among themselves and with humans, in order to mutually solve problems (including those not capable of being solved on their own). Also, they are ‘interactive’ as there will be an actively animated and regular flow of knowledge and information – not only between these technical systems but also between systems and humans in various ways. Further, we differentiate between a dedicated (D-CIL) and an opportunistic version (O-CIL) of CIL. Concerning D-CIL, processes of learning as well as tasks and groups of humans and systems involved are well-defined. Concerning O-CIL, on the other hand, tasks are variable and groups are open for diverse participants. In O-CIL, systems use all sources of information even if they are quite uncertain and possibly only sporadically available. The scientific leading question of the project is thereby defined by its necessity to develop and research entirely new concepts, technologies and mechanisms for CIL (or D-CIL and O-CIL) in several scientific disciplines. Potential applications of D-CIL or O-CIL have been identified in many areas: cyber-physical systems that are learning from each other, teams of autonomic robots, cooperating autonomic vehicles, distributed systems for intrusion detection in computer networks, design of cooperation mechanisms for the solution of tasks employing processes of Collaboration Engineering, Crowdsourcing in order to use an expertise of an indefinite mass of people etc.
The project “Fundamental Collaborative Interactive Learning” was funded by the University of Kassel (funding program for further profiling of the University of Kassel from 2017 to 2022 – Line “Zukunft”).
FEE – Early Detection and Decision Support for Critical Situations in Production Environments: Development of Assistance Systems to Support Plant Operators in Critical Situations
September 2014 – August 2017
A high automation degree of processing plants allows economical operations even in countries with high labor costs such as Germany. However, it reduces the experience of the operators regarding the process dynamics and can lead to information overload in critical situations (due to “alarm flood”). When control is lost human lifes and environment are endangered. This can cause serious damage to assets and costly production downtime.
The goal of the BMBF research project FEE was therefore to detect critical situations in the plant at an early stage, and to develop assistance functions that support plant operators in decision making during critical situations.
For this purpose appropriate real-time big data methods were developed that utilize the available heterogenous mass data from the plants. Early warnings will be provided to the operator in order to enable proactive instead of reactive actions. Furthermore, assistance functions were developed that support the operators in deciding on their intervention strategy.
PUMA – Academic Publication Management
August 2009 – July 2011
March 2013 – February 2015
Even though many researchers consider the open access idea important, the concrete positioning of content in institutional repositories (IR) often fails due to the fact that – from the authors’ perspective – the effort of data entry is not accompanied by direct benefits. In this DFG project the IR-input was therefore integrated into the work processes of the scientist, who can – at the same time – position the publication he has created in the university research report, update the list of publications on his website and transfer the entry to a cooperative publication management system.
The input is also supported by automatically gathering and offering metadata from different data sources (Sherpa Romeo list, OPAC, Library networks, cooperative publication management systems) at the time of entering the data. For this integration a unique digital author identification (DAI) was introduced in the project. The PUMA platform was developed as a showcase of the open access repository platform DSpace and was connected to the library system PICA and the cooperative publication management system BibSonomy. The system is open for adjustment “out of the box” to other popular IR software, university research reports and university bibliographies. The results were made available to other libraries as open source software.
EveryAware – Enhance Environmental Awareness through Social Information Technologies
March 2011 – February 2014
There is now overwhelming evidence that the current organization of our economies and societies is seriously damaging biological ecosystems and human living conditions in the very short term, with potentially catastrophic effects in the long term. A grassroots approach can help enact novel policies, with a key contribution from information and communication technologies. Nowadays, low-cost sensing technologies allow citizens to directly assess the state of the environment, and social networking tools allow effective data and opinion collection and real-time information dissemination processes.
The project developed a unified framework by creating a new technological platform that combines sensing technologies, networking applications and data processing tools; the Internet and existing mobile communication networks provided the infrastructure. Case studies involving different numbers of participants tested the scalability of the platform, aiming to involve as many citizens as possible while leveraging the low cost and high usability of the sensing devices. The integration of participatory sensing with monitoring of subjective opinions is novel and crucial, as it exposes the mechanisms by which the local perception of an environmental issue, corroborated by quantitative data, evolves into socially shared opinions, eventually driving behavioral changes. Critically, enabling this level of transparency allows for effective communication of desirable environmental strategies to the general public and to institutional agencies.
VENUS – Design of Socio-Technical Networking Applications in Situative Ubiquitous Computing Systems
January 2010 – December 2013
VENUS is a research cluster at the interdisciplinary Research Center for Information System Design (ITeG) at Kassel University, funded by the State of Hesse as part of the program for excellence in research and development (LOEWE).
Many areas of private and personal life are already pervaded by IT applications. The Internet has become a part of everyday life for many people. More and more mobile phones allow high-speed Internet access. Social networks have influenced the nature of connections between people and will continue to enrich our lives with new forms of communication, coordination and interaction. The computerization and networking of everyday life is progressing continuously and rapidly.
The visionary Mark Weiser wrote: Ubiquitous computing technologies “weave themselves into the fabric of everyday life until they are indistinguishable from it”. Thus, the provision and processing of information will be part of the surrounding infrastructure. Information and services will be ubiquitously available. The technology moves into the background and offers customized services adapted to the needs of the user.
From a technical perspective, ubiquitous computing (UC) leads to context-aware applications that adapt dynamically to their runtime environment in order to provide the user with services that are tailored to the particular situation. Hence, ubiquitous computing and self-adaptivity go hand in hand. This implies a variety of technical and non-technical consequences. The ubiquitous availability of services and the associated self-adaptation of applications create new challenges that clearly are not only technical in nature.
The goal of VENUS was to explore the design process of future networked, ubiquitous systems, which are characterized by situation awareness and self-adaptive behavior. The project explored and extended the foundations of such systems and developed in particular a design methodology that supports the development of socially acceptable ubiquitous computing applications, i.e. applications that not only satisfy the functional requirements but also comply with the given user requirements in terms of usability, trust, legal regulations and so on. Thus, VENUS focused on the interactions between the new technology, the individual user and the society. The long-term goal of VENUS was the creation of a comprehensive interdisciplinary development methodology for the design of ubiquitous computing systems.
Commune – Detection of Groups of Interest in Collaborative Tagging Systems
January 2010 – June 2011
July 2011 – February 2013
The amount of available information exponentially increases with the change towards the information society. Formerly aspired archetypes of omniscient knowledge holders are superseded by the ability to quickly find and access world wide instantly available information. Scholars at schools and universities experience that it is not necessary to know everything, but to know where to find the information needed, which manifests in the colloquial phrase of “googling” for information.
This leads to the central problem that no single person is able to survey today’s oversupply of information. Therefore, information is preselected and presorted prior access and perception. This gives rise to the question, who pre-processes information. Central and intransparent information preselection forms the worst case from a democratic point of view, as it gives place to politically motivated censorship (e.g., Google-Censorship in china).
Opposed to central information provision, in the “Web 2.0” information are provided and retrieved according to democratic principles. Each individual may provide, review and tag knowledge. But this gives raise to many new problems. A sociologist, e.g., searches with other expectations for the tag “migration” than a computer scientist, and reviews of the one may be irrelevant or even misleading for the other. Such a distinction of individuals not only depends on profession. As manifold a society is, as manifold are the groups of interests it comprises. Based on generated, reviewed and accessed information, technical methods for automatically detecting groups of interests can be engineered. Such methods may then be used for providing interest weighted views on the stored and managed knowledge base. Even preferences and interests of fringe groups which otherwise would be marginalized out as statistical outliers may be accounted.
The Hertie Chair for Knowledge & Data Engineering runs the collaborative publication and bookmark sharing and tagging system BibSonomy. Researchers and students use the system for providing and accessing information but it also serves as a testbed for new methods of presenting and pre-processing information. Next, new methods for automatically detecting groups of interests will be developed and tested. This especially gives rise to new views on the literature which is managed by BibSonomy, but will be generally applicable to other systems. This supports research and teaching, as bibliographies related to a specific field which are commonly provided by single lecturers are superseded by a selection rating of literature based on a corresponding group of interest. A student new to a certain field of research may therefor search well-directed for appropriate literature.
In the context of this research project, new algorithms for detecting groups of interests in collaborative tagging systems were developed and evaluated. Existing methods had to be adopted to new data structures and new methods developed accordingly. Different methods had to be evaluated objectively. As there are no gold-standards for groups of interests in collaborative tagging systems, new measures for assessing the quality of a given partition in groups of interests had to be developed. The best methods for detecting groups of interests were implemented in BibSonomy and evaluated in a live setting.
Informational Self-Determination in the Web 2.0
April 2009 – September 2010
October 2010 – September 2012
The new generation of the internet ( “Web 2.0” or “social internet”) is characterized by a very liberal provision of information through the users. Against this background, this DFG project’s goal is to explore and to shape the opportunities and risks of the new Web2.0 technologies in a selected scenario and in close interaction between scientists and lawyers.
After a review of the situation and subsequently the creation of medium-term scenarios, the project analyzed the technical and legal opportunities and risks related to typed roles. Generic concepts were developed for the design of applications complying with data protection law (identity management, avoidance of personal reference and educational profile, responsibilities). Honouring these concepts, algorithms and procedures for two specific tasks were developed: Recommender systems for cooperative tagging systems and collaborative spam detection methods for such systems. They were evaluated using real data. The most successful approaches were implemented in the collaborative publication management system BibSonomy and were be evaluated in the current operation. Finally, it was analyzed to which extent, on account of the new complex of problems of Web 2.0, dogmatics and interpretation of data protection law have to be modified, and if possibly legislative activities are necessary or advisable.
Webzubi – A Web 2.0 Platform for the Creation of Innovative Job Training for Industrial/Technical Trainees
April 2009 – March 2012
Development of ontology learning methods and evaluation of the Webzubi platform
The Web 2.0 provides excellent prospects for an improvement of apprenticeship by using interactive communication and learning platforms. Up to now the new elements of the Web 2.0 are not used in the project partners’ training management. To increase motivation and thereby the quality of education of commercial-technical apprentices, a new Web 2.0 platform was implemented. Target audience are commercial-technical apprentices from DB Mobility Logistics AG and associated partners. This pilot project reaches more than 3,000 commercial-technical apprentices altogether. With the help of Web 2.0 technologies, the apprentices shall be prepared for the increasing interoperation in professional life. The challenge for the University of Kassel in this BMBF financed project was the development of semantics-based navigation and recommendation components.
Industry Project mit K+S IT-Services GmbH
March 2005 – December 2012
Investigation and implementation of search engines with K+S IT-Services GmbH
TAGora – Semiotic Dynamics in Online Social Communities
Juni 2006 bis August 2009
Our research group was a member of the EU project TAGora . The focus of the project was the investigation of WEB2.0 applications, which enable users to create user specific contents themselves.
Novel user structures emerge through the mapping of social structures in the internet, differing from data models wich have been investigated so far. Aspects are the appearance of semiotic relations and their development over time. In order to investigate and develop approaches and solutions for models and analysis methods, collaboration was carried out in the interdisciplinary context.
Research partners were: University of Roma (La Sapienza), Sony CSL, University of Koblenz-Landau and University of Southampton.
NEPOMUK – Networked Environment for Personalized, Ontology-based Management of Unified Knowledge
January 2006 -s December 2008
As a member of the research centre L3S Hanover we took part in the EU project NEPOMUK. The aim of this project was to extend the computer desktops by semantic abilities in order to improve the collaboration and exchange of information within and between the working groups. Vision was the “Social Semantic Desktop” which unites the abilities of the semantic web with those of social network analysis (SNA).
Within a consortium of researchers, industry and a growing community the research area knowledge engineering dealt in particular with the discovery and structurization of communities and in this connection examined methods from the field of SNA. Thus, relations between users, resources and meta data were used to recognize users with similar interests. This improved the exchange of knowledge between users unknown to each other. The implementation developed during the project is disposable and obtained high visibility, i.a. at heise online and Technology Review.
June 2006 – May 2007
Through the assignment of tags to relevant resources (e.g. URLs), folksonomies allow an individual categorization of one’s own knowledge, and at the same time collective use of the gathered data. Despite the tags’ function of giving a rough preview, which may be used for navigation and orientation within the huge amount of data, a systematic search as known from conventional search engines for the World Wide Web, offers a reasonable completion/extension for information retrieval. For this reason the project, which was supported by a Microsoft award, investigated the implementation and further development of methods based on link popularity as regards the tripartite structure of the folksonomy. In addition to facilitating a “ranked” search for information, a comparison of the amounts of information as well as the users’ behaviour e.g. regarding trend discovery can be accomplished.
KDubiq – Knowledge Discovery in Ubiquitous Environments
December 2005 – May 2008
KDubiq was the first Coordination Action (CA) for Ubiquitous Knowledge Discovery, 100% funded by the European Union under IST (Information Society Technology), FET Open (Future and Emerging Technologies) in the 6th Framework Programme under the number IST-6FP-021321. We were actively involved as a chair of the working group 4 (WG4).
The purpose of this working group was to investigate possible data types for ubiquitous KDD systems and elaborate an overview on the current state of the art in representing and processing data in ubiquitous knowledge discovery applications with a particular focus on Web 2.0 mining and sensor networks.
PROLEARN – Technology Enhanced Professional Learning
April 2004 – December 2007
Our research group was a member of the PROLEARN Network of Excellence financed by the IST (Information Society Technology) programme of the European commission dealing with technology enhanced professional learning. Our mission was to bring together the most important research groups in the area of professional learning and training, as well as other key organisations and industrial partners, thus bridging the currently existing gap between research and practical experience in this area.
PROLEARN’s goal was to achieve a greater focus on questions of European importance and a better integration of research efforts. Therefore PROLEARN has initiated and improved cooperations between various actors of academia and industry in the area of technology enhanced learning.
COMO – COncepts and MOdels
December 2005 – November 2007
COMO project was a German-Russian cooperation for the investigation of conceptual structures and structures of abstract models within knowledge engineering. Proposals for a conceptual reorganization of the logic as regards practical utilization were to be linked with efforts as for abstract models, in order to being able to solve open questions regarding the pragmatical application of current algebraic-logic theories.
On the one hand the practical problems of communication associated with the use of different semantics (ontologies) are of particular importance. On the other hand, problems of granularity at system descriptions in connection with modal and temporal logic are investigated with general conceptions developed in the conceptual system theory.
PADLR – Personalized Access to Distributed Learning Resources
April 2001 – February 2005
Aided by BMBF and MWK Niedersachsen
The research unit Knowledge and Data Engineering of the University of Kassel is partner in the Research Center L3S and worked there on a module of the project Personalized Access to Distributed Learning Resources (PADLR), where it developed a so-called Courseware Watchdog. This serves to find teaching materials in the WWW or P2P-network Edutella and to present it to the user.
Learning material can be collected with the help of an ontology-based focused Webcrawler and by connecting to Edutella. Subjective Clustering extends well-known algorithms with ontology based background knowledge and permits thus the description of preferences and production of subjective views. A visualization based on Formal Concept Analysis offers intelligent browsing capabilities. Strategies for ontology evolution make it possible to reflect modifications within the ontology which are in the interest of the learner.
February 2002 – December 2004
KDNet was an open network of participants from science, industry and administration. The principal purpose of this international project was the integration of problems from the business everyday life in research discussions and co-operation regarding the future of Knowledge Discovery and Data Mining. The project was promoted by the European Commission as Framework of Excellence in the 5th Master program.