## Fachgebiet Wissensverarbeitung (KDE), EECS, Universität Kassel

Das Fachgebiet Wissensverarbeitung des Fachbereichs Elektrotechnik/Informatik forscht an der Entwicklung von Methoden zur Wissensentdeckung und Wissensrepräsentation (Approximation und Exploration von Wissen, Ordnungsstrukturen in Wissen, Ontologieentwicklung) in Daten als auch in der Analyse von (sozialen) Netzwerkdaten und damit verbundenen Wissensprozessen (Metriken in Netzwerken, Anomalieerkennung, Charakterisierung von sozialen Netzwerken). Dabei liegt ein Schwerpunkt auf der exakten algebraischen Modellierung der verwendeten Strukturen und auf der Evaluierung und Neuentwicklung von Netzwerkmaßen. Neben der Erforschung von Grundlagen in den Gebieten Ordnungs- und Verbandstheorie, Beschreibungslogiken, Graphentheorie und Ontologie werden auch Anwendungen – bspw. in sozialen Medien sowie in der Szientometrie – erforscht.

Das Fachgebiet Wissensverarbeitung ist Mitglied im Wissenschaftlichen Zentrum für Informationstechnik-Gestaltung (ITeG) der Universität Kassel, im Wissenschaftlichen Zentrum INCHER der Universität Kassel, im Forschungszentrum L3S und im Hessischen KI-Zentrum (hessian.AI).

Testen Sie unser Social-Bookmark-System BibSonomy sowie unsere Namens-Suchmaschine Nameling!### Promotionsstelle – Bewerbungsfrist 22.03.2023

Zum nächstmöglichen Zeitpunkt suchen wir im Rahmen des Forschungsprojekts “Towards Ordinal Data Science” eine*n wissenschaftliche*n Mitarbeiter*in.

### Unsere neusten Publikationen

- Ganter, B., Hanika, T., Hirth, J.: Scaling Dimension, https://arxiv.org/abs/2302.09101, (2023).
@misc{https://doi.org/10.48550/arxiv.2302.09101,

author = {Ganter, Bernhard and Hanika, Tom and Hirth, Johannes},

keywords = {sai},

publisher = {arXiv},

title = {Scaling Dimension},

year = 2023

}%0 Generic

%1 https://doi.org/10.48550/arxiv.2302.09101

%A Ganter, Bernhard

%A Hanika, Tom

%A Hirth, Johannes

%D 2023

%I arXiv

%R 10.48550/ARXIV.2302.09101

%T Scaling Dimension

%U https://arxiv.org/abs/2302.09101 - Hanika, T., Hirth, J.: Conceptual Views on Tree Ensemble Classifiers CoRR. abs/2302.05270, (2023).
@article{DBLP:journals/corr/abs-2302-05270,

author = {Hanika, Tom and Hirth, Johannes},

journal = {CoRR},

keywords = {xai},

title = {Conceptual Views on Tree Ensemble Classifiers},

volume = {abs/2302.05270},

year = 2023

}%0 Journal Article

%1 DBLP:journals/corr/abs-2302-05270

%A Hanika, Tom

%A Hirth, Johannes

%D 2023

%J CoRR

%R 10.48550/arXiv.2302.05270

%T Conceptual Views on Tree Ensemble Classifiers

%U https://doi.org/10.48550/arXiv.2302.05270

%V abs/2302.05270 - Felde, M., Stumme, G.: Interactive collaborative exploration using incomplete contexts Data & Knowledge Engineering. 143, 102104 (2023).
@article{Felde_2023,

author = {Felde, Maximilian and Stumme, Gerd},

journal = {Data & Knowledge Engineering},

keywords = {myown},

month = {jan},

pages = 102104,

publisher = {Elsevier {BV}},

title = {Interactive collaborative exploration using incomplete contexts},

volume = 143,

year = 2023

}%0 Journal Article

%1 Felde_2023

%A Felde, Maximilian

%A Stumme, Gerd

%D 2023

%I Elsevier {BV}

%J Data & Knowledge Engineering

%P 102104

%R 10.1016/j.datak.2022.102104

%T Interactive collaborative exploration using incomplete contexts

%U https://doi.org/10.1016%2Fj.datak.2022.102104

%V 143 - Stubbemann, M., Hanika, T., Schneider, F.M.: Intrinsic Dimension for Large-Scale Geometric Learning Transactions on Machine Learning Research. (2023).
@article{stubbemann2022intrinsic,

author = {Stubbemann, Maximilian and Hanika, Tom and Schneider, Friedrich Martin},

journal = {Transactions on Machine Learning Research},

keywords = {myown},

title = {Intrinsic Dimension for Large-Scale Geometric Learning},

year = 2023

}%0 Journal Article

%1 stubbemann2022intrinsic

%A Stubbemann, Maximilian

%A Hanika, Tom

%A Schneider, Friedrich Martin

%D 2023

%J Transactions on Machine Learning Research

%T Intrinsic Dimension for Large-Scale Geometric Learning

%U https://openreview.net/forum?id=85BfDdYMBY - Hanika, T., Hirth, J.: On the lattice of conceptual measurements Information Sciences. 613, 453–468 (2022).We present a novel approach for data set scaling based on scale-measures from formal concept analysis, i.e., continuous maps between closure systems, for which we derive a canonical representation. Moreover, we prove that scale-measures can be lattice ordered using the canonical representation. This enables exploring the set of scale-measures by the use of meet and join operations. Furthermore we show that the lattice of scale-measures is isomorphic to the lattice of sub-closure systems that arises from the original data. Finally, we provide another representation of scale-measures using propositional logic in terms of data set features. Our theoretical findings are discussed by means of examples.
@article{HANIKA2022453,

abstract = {We present a novel approach for data set scaling based on scale-measures from formal concept analysis, i.e., continuous maps between closure systems, for which we derive a canonical representation. Moreover, we prove that scale-measures can be lattice ordered using the canonical representation. This enables exploring the set of scale-measures by the use of meet and join operations. Furthermore we show that the lattice of scale-measures is isomorphic to the lattice of sub-closure systems that arises from the original data. Finally, we provide another representation of scale-measures using propositional logic in terms of data set features. Our theoretical findings are discussed by means of examples.},

author = {Hanika, Tom and Hirth, Johannes},

journal = {Information Sciences},

keywords = {scaling},

pages = {453-468},

title = {On the lattice of conceptual measurements},

volume = 613,

year = 2022

}%0 Journal Article

%1 HANIKA2022453

%A Hanika, Tom

%A Hirth, Johannes

%D 2022

%J Information Sciences

%P 453-468

%R https://doi.org/10.1016/j.ins.2022.09.005

%T On the lattice of conceptual measurements

%U https://www.sciencedirect.com/science/article/pii/S0020025522010489

%V 613

%X We present a novel approach for data set scaling based on scale-measures from formal concept analysis, i.e., continuous maps between closure systems, for which we derive a canonical representation. Moreover, we prove that scale-measures can be lattice ordered using the canonical representation. This enables exploring the set of scale-measures by the use of meet and join operations. Furthermore we show that the lattice of scale-measures is isomorphic to the lattice of sub-closure systems that arises from the original data. Finally, we provide another representation of scale-measures using propositional logic in terms of data set features. Our theoretical findings are discussed by means of examples. - Hirth, J., Hanika, T.: Formal Conceptual Views in Neural Networks, http://arxiv.org/abs/2209.13517, (2022).Explaining neural network models is a challenging task that remains unsolved in its entirety to this day. This is especially true for high dimensional and complex data. With the present work, we introduce two notions for conceptual views of a neural network, specifically a many-valued and a symbolic view. Both provide novel analysis methods to enable a human AI analyst to grasp deeper insights into the knowledge that is captured by the neurons of a network. We test the conceptual expressivity of our novel views through different experiments on the ImageNet and Fruit-360 data sets. Furthermore, we show to which extent the views allow to quantify the conceptual similarity of different learning architectures. Finally, we demonstrate how conceptual views can be applied for abductive learning of human comprehensible rules from neurons. In summary, with our work, we contribute to the most relevant task of globally explaining neural networks models.
@misc{hirth2022formal,

abstract = {Explaining neural network models is a challenging task that remains unsolved in its entirety to this day. This is especially true for high dimensional and complex data. With the present work, we introduce two notions for conceptual views of a neural network, specifically a many-valued and a symbolic view. Both provide novel analysis methods to enable a human AI analyst to grasp deeper insights into the knowledge that is captured by the neurons of a network. We test the conceptual expressivity of our novel views through different experiments on the ImageNet and Fruit-360 data sets. Furthermore, we show to which extent the views allow to quantify the conceptual similarity of different learning architectures. Finally, we demonstrate how conceptual views can be applied for abductive learning of human comprehensible rules from neurons. In summary, with our work, we contribute to the most relevant task of globally explaining neural networks models.},

author = {Hirth, Johannes and Hanika, Tom},

keywords = {views},

note = {cite arxiv:2209.13517Comment: 17 pages, 8 figures, 9 tables},

title = {Formal Conceptual Views in Neural Networks},

year = 2022

}%0 Generic

%1 hirth2022formal

%A Hirth, Johannes

%A Hanika, Tom

%D 2022

%T Formal Conceptual Views in Neural Networks

%U http://arxiv.org/abs/2209.13517

%X Explaining neural network models is a challenging task that remains unsolved in its entirety to this day. This is especially true for high dimensional and complex data. With the present work, we introduce two notions for conceptual views of a neural network, specifically a many-valued and a symbolic view. Both provide novel analysis methods to enable a human AI analyst to grasp deeper insights into the knowledge that is captured by the neurons of a network. We test the conceptual expressivity of our novel views through different experiments on the ImageNet and Fruit-360 data sets. Furthermore, we show to which extent the views allow to quantify the conceptual similarity of different learning architectures. Finally, we demonstrate how conceptual views can be applied for abductive learning of human comprehensible rules from neurons. In summary, with our work, we contribute to the most relevant task of globally explaining neural networks models. - Stubbemann, M., Hanika, T., Schneider, F.M.: Intrinsic Dimension for Large-Scale Geometric Learning, https://arxiv.org/abs/2210.05301, (2022).
@misc{stubbemann2022intrinsic,

author = {Stubbemann, Maximilian and Hanika, Tom and Schneider, Friedrich Martin},

keywords = {outdated},

title = {Intrinsic Dimension for Large-Scale Geometric Learning},

year = 2022

}%0 Generic

%1 stubbemann2022intrinsic

%A Stubbemann, Maximilian

%A Hanika, Tom

%A Schneider, Friedrich Martin

%D 2022

%T Intrinsic Dimension for Large-Scale Geometric Learning

%U https://arxiv.org/abs/2210.05301 - Felde, M., Koyda, M.: Interval-Dismantling for Lattices, https://arxiv.org/abs/2208.01479, (2022).Dismantling allows for the removal of elements of a set, or in our case lattice, without disturbing the remaining structure. In this paper we have extended the notion of dismantling by single elements to the dismantling by intervals in a lattice. We utilize theory from Formal Concept Analysis (FCA) to show that lattices dismantled by intervals correspond to closed subrelations in the respective formal context, and that there exists a unique kernel with respect to dismantling by intervals. Furthermore, we show that dismantling intervals can be identified directly in the formal context utilizing a characterization via arrow relations and provide an algorithm to compute all dismantling intervals.
@preprint{felde2022intervaldismantling,

abstract = {Dismantling allows for the removal of elements of a set, or in our case lattice, without disturbing the remaining structure. In this paper we have extended the notion of dismantling by single elements to the dismantling by intervals in a lattice. We utilize theory from Formal Concept Analysis (FCA) to show that lattices dismantled by intervals correspond to closed subrelations in the respective formal context, and that there exists a unique kernel with respect to dismantling by intervals. Furthermore, we show that dismantling intervals can be identified directly in the formal context utilizing a characterization via arrow relations and provide an algorithm to compute all dismantling intervals.},

author = {Felde, Maximilian and Koyda, Maren},

keywords = {myown},

note = {cite arxiv:2208.01479Comment: 12 pages, 5 figures, 1 algorithm},

title = {Interval-Dismantling for Lattices},

year = 2022

}%0 Generic

%1 felde2022intervaldismantling

%A Felde, Maximilian

%A Koyda, Maren

%D 2022

%R 10.48550/arXiv.2208.01479

%T Interval-Dismantling for Lattices

%U https://arxiv.org/abs/2208.01479

%X Dismantling allows for the removal of elements of a set, or in our case lattice, without disturbing the remaining structure. In this paper we have extended the notion of dismantling by single elements to the dismantling by intervals in a lattice. We utilize theory from Formal Concept Analysis (FCA) to show that lattices dismantled by intervals correspond to closed subrelations in the respective formal context, and that there exists a unique kernel with respect to dismantling by intervals. Furthermore, we show that dismantling intervals can be identified directly in the formal context utilizing a characterization via arrow relations and provide an algorithm to compute all dismantling intervals. - Schäfermeier, B., Hirth, J., Hanika, T.: Research Topic Flows in Co-Authorship Networks, https://doi.org/10.1007/s11192-022-04529-w, (2022).In scientometrics, scientific collaboration is often analyzed by means of co-authorships. An aspect which is often overlooked and more difficult to quantify is the flow of expertise between authors from different research topics, which is an important part of scientific progress. With the Topic Flow Network (TFN) we propose a graph structure for the analysis of research topic flows between scientific authors and their respective research fields. Based on a multi-graph and a topic model, our proposed network structure accounts for intratopic as well as intertopic flows. Our method requires for the construction of a TFN solely a corpus of publications (i.e., author and abstract information). From this, research topics are discovered automatically through non-negative matrix factorization. The thereof derived TFN allows for the application of social network analysis techniques, such as common metrics and community detection. Most importantly, it allows for the analysis of intertopic flows on a large, macroscopic scale, i.e., between research topic, as well as on a microscopic scale, i.e., between certain sets of authors. We demonstrate the utility of TFNs by applying our method to two comprehensive corpora of altogether 20 Mio. publications spanning more than 60 years of research in the fields computer science and mathematics. Our results give evidence that TFNs are suitable, e.g., for the analysis of topical communities, the discovery of important authors in different fields, and, most notably, the analysis of intertopic flows, i.e., the transfer of topical expertise. Besides that, our method opens new directions for future research, such as the investigation of influence relationships between research fields.
@misc{schafermeier2022research,

abstract = {In scientometrics, scientific collaboration is often analyzed by means of co-authorships. An aspect which is often overlooked and more difficult to quantify is the flow of expertise between authors from different research topics, which is an important part of scientific progress. With the Topic Flow Network (TFN) we propose a graph structure for the analysis of research topic flows between scientific authors and their respective research fields. Based on a multi-graph and a topic model, our proposed network structure accounts for intratopic as well as intertopic flows. Our method requires for the construction of a TFN solely a corpus of publications (i.e., author and abstract information). From this, research topics are discovered automatically through non-negative matrix factorization. The thereof derived TFN allows for the application of social network analysis techniques, such as common metrics and community detection. Most importantly, it allows for the analysis of intertopic flows on a large, macroscopic scale, i.e., between research topic, as well as on a microscopic scale, i.e., between certain sets of authors. We demonstrate the utility of TFNs by applying our method to two comprehensive corpora of altogether 20 Mio. publications spanning more than 60 years of research in the fields computer science and mathematics. Our results give evidence that TFNs are suitable, e.g., for the analysis of topical communities, the discovery of important authors in different fields, and, most notably, the analysis of intertopic flows, i.e., the transfer of topical expertise. Besides that, our method opens new directions for future research, such as the investigation of influence relationships between research fields.},

author = {Schäfermeier, Bastian and Hirth, Johannes and Hanika, Tom},

journal = {Scientometrics},

keywords = {topic-models},

month = {October},

title = {Research Topic Flows in Co-Authorship Networks},

year = 2022

}%0 Generic

%1 schafermeier2022research

%A Schäfermeier, Bastian

%A Hirth, Johannes

%A Hanika, Tom

%D 2022

%J Scientometrics

%R 10.1007/s11192-022-04529-w

%T Research Topic Flows in Co-Authorship Networks

%U https://doi.org/10.1007/s11192-022-04529-w

%X In scientometrics, scientific collaboration is often analyzed by means of co-authorships. An aspect which is often overlooked and more difficult to quantify is the flow of expertise between authors from different research topics, which is an important part of scientific progress. With the Topic Flow Network (TFN) we propose a graph structure for the analysis of research topic flows between scientific authors and their respective research fields. Based on a multi-graph and a topic model, our proposed network structure accounts for intratopic as well as intertopic flows. Our method requires for the construction of a TFN solely a corpus of publications (i.e., author and abstract information). From this, research topics are discovered automatically through non-negative matrix factorization. The thereof derived TFN allows for the application of social network analysis techniques, such as common metrics and community detection. Most importantly, it allows for the analysis of intertopic flows on a large, macroscopic scale, i.e., between research topic, as well as on a microscopic scale, i.e., between certain sets of authors. We demonstrate the utility of TFNs by applying our method to two comprehensive corpora of altogether 20 Mio. publications spanning more than 60 years of research in the fields computer science and mathematics. Our results give evidence that TFNs are suitable, e.g., for the analysis of topical communities, the discovery of important authors in different fields, and, most notably, the analysis of intertopic flows, i.e., the transfer of topical expertise. Besides that, our method opens new directions for future research, such as the investigation of influence relationships between research fields. - D{{ü}}rrschnabel, D., Hanika, T., Stubbemann, M.: {FCA2VEC:} Embedding Techniques for Formal Concept Analysis In: Missaoui, R., Kwuida, L., and Abdessalem, T. (eds.) Complex Data Analytics with Formal Concept Analysis. pp. 47–74. Springer International Publishing (2022).
@incollection{DBLP:books/sp/missaoui2022/DurrschnabelHS22,

author = {D{{ü}}rrschnabel, Dominik and Hanika, Tom and Stubbemann, Maximilian},

booktitle = {Complex Data Analytics with Formal Concept Analysis},

editor = {Missaoui, Rokia and Kwuida, L{{é}}onard and Abdessalem, Talel},

keywords = {publist},

pages = {47--74},

publisher = {Springer International Publishing},

title = {{FCA2VEC:} Embedding Techniques for Formal Concept Analysis},

year = 2022

}%0 Book Section

%1 DBLP:books/sp/missaoui2022/DurrschnabelHS22

%A D{{ü}}rrschnabel, Dominik

%A Hanika, Tom

%A Stubbemann, Maximilian

%B Complex Data Analytics with Formal Concept Analysis

%D 2022

%E Missaoui, Rokia

%E Kwuida, L{{é}}onard

%E Abdessalem, Talel

%I Springer International Publishing

%P 47--74

%R 10.1007/978-3-030-93278-7_3

%T {FCA2VEC:} Embedding Techniques for Formal Concept Analysis

%U https://doi.org/10.1007/978-3-030-93278-7_3 - Felde, M., Stumme, G.: Attribute Exploration with Multiple Contradicting Partial Experts In: Braun, T., Cristea, D., and J{ä}schke, R. (eds.) Graph-Based Representation and Reasoning. pp. 51–65. Springer International Publishing, Cham (2022).Attribute exploration is a method from Formal Concept Analysis (FCA) that helps a domain expert discover structural dependencies in knowledge domains which can be represented as formal contexts (cross tables of objects and attributes). In this paper we present an extension of attribute exploration that allows for a group of domain experts and explores their shared views. Each expert has their own view of the domain and the views of multiple experts may contain contradicting information.
@inproceedings{10.1007/978-3-031-16663-1_5,

abstract = {Attribute exploration is a method from Formal Concept Analysis (FCA) that helps a domain expert discover structural dependencies in knowledge domains which can be represented as formal contexts (cross tables of objects and attributes). In this paper we present an extension of attribute exploration that allows for a group of domain experts and explores their shared views. Each expert has their own view of the domain and the views of multiple experts may contain contradicting information.},

address = {Cham},

author = {Felde, Maximilian and Stumme, Gerd},

booktitle = {Graph-Based Representation and Reasoning},

editor = {Braun, Tanya and Cristea, Diana and J{ä}schke, Robert},

keywords = {myown},

pages = {51--65},

publisher = {Springer International Publishing},

title = {Attribute Exploration with Multiple Contradicting Partial Experts},

year = 2022

}%0 Conference Paper

%1 10.1007/978-3-031-16663-1_5

%A Felde, Maximilian

%A Stumme, Gerd

%B Graph-Based Representation and Reasoning

%C Cham

%D 2022

%E Braun, Tanya

%E Cristea, Diana

%E J{ä}schke, Robert

%I Springer International Publishing

%P 51--65

%R 10.1007/978-3-031-16663-1_5

%T Attribute Exploration with Multiple Contradicting Partial Experts

%X Attribute exploration is a method from Formal Concept Analysis (FCA) that helps a domain expert discover structural dependencies in knowledge domains which can be represented as formal contexts (cross tables of objects and attributes). In this paper we present an extension of attribute exploration that allows for a group of domain experts and explores their shared views. Each expert has their own view of the domain and the views of multiple experts may contain contradicting information.

%@ 978-3-031-16663-1 - Dürrschnabel, D., Hanika, T., Stumme, G.: Discovering Locally Maximal Bipartite Subgraphs, http://arxiv.org/abs/2211.10446, (2022).Induced bipartite subgraphs of maximal vertex cardinality are an essential concept for the analysis of graphs. Yet, discovering them in large graphs is known to be computationally hard. Therefore, we consider in this work a weaker notion of this problem, where we discard the maximality constraint in favor of inclusion maximality. Thus, we aim to discover locally maximal bipartite subgraphs. For this, we present three heuristic approaches to extract such subgraphs and compare their results to the solutions of the global problem. For the latter, we employ the algorithmic strength of fast SAT-solvers. Our three proposed heuristics are based on a greedy strategy, a simulated annealing approach, and a genetic algorithm, respectively. We evaluate all four algorithms with respect to their time requirement and the vertex cardinality of the discovered bipartite subgraphs on several benchmark datasets
@misc{durrschnabel2022discovering,

abstract = {Induced bipartite subgraphs of maximal vertex cardinality are an essential concept for the analysis of graphs. Yet, discovering them in large graphs is known to be computationally hard. Therefore, we consider in this work a weaker notion of this problem, where we discard the maximality constraint in favor of inclusion maximality. Thus, we aim to discover locally maximal bipartite subgraphs. For this, we present three heuristic approaches to extract such subgraphs and compare their results to the solutions of the global problem. For the latter, we employ the algorithmic strength of fast SAT-solvers. Our three proposed heuristics are based on a greedy strategy, a simulated annealing approach, and a genetic algorithm, respectively. We evaluate all four algorithms with respect to their time requirement and the vertex cardinality of the discovered bipartite subgraphs on several benchmark datasets},

author = {Dürrschnabel, Dominik and Hanika, Tom and Stumme, Gerd},

keywords = {myown},

note = {cite arxiv:2211.10446Comment: 12 pages, 3 figures, 3 tables},

title = {Discovering Locally Maximal Bipartite Subgraphs},

year = 2022

}%0 Generic

%1 durrschnabel2022discovering

%A Dürrschnabel, Dominik

%A Hanika, Tom

%A Stumme, Gerd

%D 2022

%T Discovering Locally Maximal Bipartite Subgraphs

%U http://arxiv.org/abs/2211.10446

%X Induced bipartite subgraphs of maximal vertex cardinality are an essential concept for the analysis of graphs. Yet, discovering them in large graphs is known to be computationally hard. Therefore, we consider in this work a weaker notion of this problem, where we discard the maximality constraint in favor of inclusion maximality. Thus, we aim to discover locally maximal bipartite subgraphs. For this, we present three heuristic approaches to extract such subgraphs and compare their results to the solutions of the global problem. For the latter, we employ the algorithmic strength of fast SAT-solvers. Our three proposed heuristics are based on a greedy strategy, a simulated annealing approach, and a genetic algorithm, respectively. We evaluate all four algorithms with respect to their time requirement and the vertex cardinality of the discovered bipartite subgraphs on several benchmark datasets - Hanika, T., Schneider, F.M., Stumme, G.: {Intrinsic dimension of geometric data sets} Tohoku Mathematical Journal. 74, 23–52 (2022).The curse of dimensionality is a phenomenon frequently observed in machine learning (ML) and knowledge discovery (KD). There is a large body of literature investigating its origin and impact, using methods from mathematics as well as from computer science. Among the mathematical insights into data dimensionality, there is an intimate link between the dimension curse and the phenomenon of measure concentration, which makes the former accessible to methods of geometric analysis. The present work provides a comprehensive study of the intrinsic geometry of a data set, based on Gromov's metric measure geometry and Pestov's axiomatic approach to intrinsic dimension. In detail, we define a concept of geometric data set and introduce a metric as well as a partial order on the set of isomorphism classes of such data sets. Based on these objects, we propose and investigate an axiomatic approach to the intrinsic dimension of geometric data sets and establish a concrete dimension function with the desired properties. Our model for data sets and their intrinsic dimension is computationally feasible and, moreover, adaptable to specific ML/KD-algorithms, as illustrated by various experiments.
@article{10.2748/tmj.20201015a,

abstract = {The curse of dimensionality is a phenomenon frequently observed in machine learning (ML) and knowledge discovery (KD). There is a large body of literature investigating its origin and impact, using methods from mathematics as well as from computer science. Among the mathematical insights into data dimensionality, there is an intimate link between the dimension curse and the phenomenon of measure concentration, which makes the former accessible to methods of geometric analysis. The present work provides a comprehensive study of the intrinsic geometry of a data set, based on Gromov's metric measure geometry and Pestov's axiomatic approach to intrinsic dimension. In detail, we define a concept of geometric data set and introduce a metric as well as a partial order on the set of isomorphism classes of such data sets. Based on these objects, we propose and investigate an axiomatic approach to the intrinsic dimension of geometric data sets and establish a concrete dimension function with the desired properties. Our model for data sets and their intrinsic dimension is computationally feasible and, moreover, adaptable to specific ML/KD-algorithms, as illustrated by various experiments.},

author = {Hanika, Tom and Schneider, Friedrich Martin and Stumme, Gerd},

journal = {Tohoku Mathematical Journal},

keywords = {publist},

number = 1,

pages = {23 -- 52},

publisher = {Tohoku University, Mathematical Institute},

title = {{Intrinsic dimension of geometric data sets}},

volume = 74,

year = 2022

}%0 Journal Article

%1 10.2748/tmj.20201015a

%A Hanika, Tom

%A Schneider, Friedrich Martin

%A Stumme, Gerd

%D 2022

%I Tohoku University, Mathematical Institute

%J Tohoku Mathematical Journal

%N 1

%P 23 -- 52

%R 10.2748/tmj.20201015a

%T {Intrinsic dimension of geometric data sets}

%U https://doi.org/10.2748/tmj.20201015a

%V 74

%X The curse of dimensionality is a phenomenon frequently observed in machine learning (ML) and knowledge discovery (KD). There is a large body of literature investigating its origin and impact, using methods from mathematics as well as from computer science. Among the mathematical insights into data dimensionality, there is an intimate link between the dimension curse and the phenomenon of measure concentration, which makes the former accessible to methods of geometric analysis. The present work provides a comprehensive study of the intrinsic geometry of a data set, based on Gromov's metric measure geometry and Pestov's axiomatic approach to intrinsic dimension. In detail, we define a concept of geometric data set and introduce a metric as well as a partial order on the set of isomorphism classes of such data sets. Based on these objects, we propose and investigate an axiomatic approach to the intrinsic dimension of geometric data sets and establish a concrete dimension function with the desired properties. Our model for data sets and their intrinsic dimension is computationally feasible and, moreover, adaptable to specific ML/KD-algorithms, as illustrated by various experiments. - Stubbemann, M., Stumme, G.: LG4AV: Combining Language Models and Graph Neural Networks for Author Verification In: Bouadi, T., Fromont, E., and H{ü}llermeier, E. (eds.) Advances in Intelligent Data Analysis XX. pp. 315–326. Springer International Publishing, Cham (2022).The verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media. Therefore, it is important that authorship information in frequently used data sets is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches impractical for bibliometric data. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process.
@inproceedings{10.1007/978-3-031-01333-1_25,

abstract = {The verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media. Therefore, it is important that authorship information in frequently used data sets is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches impractical for bibliometric data. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process.},

address = {Cham},

author = {Stubbemann, Maximilian and Stumme, Gerd},

booktitle = {Advances in Intelligent Data Analysis XX},

editor = {Bouadi, Tassadit and Fromont, Elisa and H{ü}llermeier, Eyke},

keywords = {regio},

pages = {315--326},

publisher = {Springer International Publishing},

title = {LG4AV: Combining Language Models and Graph Neural Networks for Author Verification},

year = 2022

}%0 Conference Paper

%1 10.1007/978-3-031-01333-1_25

%A Stubbemann, Maximilian

%A Stumme, Gerd

%B Advances in Intelligent Data Analysis XX

%C Cham

%D 2022

%E Bouadi, Tassadit

%E Fromont, Elisa

%E H{ü}llermeier, Eyke

%I Springer International Publishing

%P 315--326

%T LG4AV: Combining Language Models and Graph Neural Networks for Author Verification

%U https://link.springer.com/chapter/10.1007/978-3-031-01333-1_25

%X The verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media. Therefore, it is important that authorship information in frequently used data sets is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches impractical for bibliometric data. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process.

%@ 978-3-031-01333-1 - Schäfermeier, B., Stumme, G., Hanika, T.: Mapping Research Trajectories, https://arxiv.org/abs/2204.11859, (2022).
@misc{https://doi.org/10.48550/arxiv.2204.11859,

author = {Schäfermeier, Bastian and Stumme, Gerd and Hanika, Tom},

keywords = {trajectory_mapping},

publisher = {arXiv},

title = {Mapping Research Trajectories},

year = 2022

}%0 Generic

%1 https://doi.org/10.48550/arxiv.2204.11859

%A Schäfermeier, Bastian

%A Stumme, Gerd

%A Hanika, Tom

%D 2022

%I arXiv

%R 10.48550/ARXIV.2204.11859

%T Mapping Research Trajectories

%U https://arxiv.org/abs/2204.11859 - Schäfermeier, B., Stumme, G., Hanika, T.: Towards Explainable Scientific Venue Recommendations, http://arxiv.org/abs/2109.11343, (2021).Selecting the best scientific venue (i.e., conference/journal) for the submission of a research article constitutes a multifaceted challenge. Important aspects to consider are the suitability of research topics, a venue's prestige, and the probability of acceptance. The selection problem is exacerbated through the continuous emergence of additional venues. Previously proposed approaches for supporting authors in this process rely on complex recommender systems, e.g., based on Word2Vec or TextCNN. These, however, often elude an explanation for their recommendations. In this work, we propose an unsophisticated method that advances the state-of-the-art in two aspects: First, we enhance the interpretability of recommendations through non-negative matrix factorization based topic models; Second, we surprisingly can obtain competitive recommendation performance while using simpler learning methods.
@misc{schafermeier2021towards,

abstract = {Selecting the best scientific venue (i.e., conference/journal) for the submission of a research article constitutes a multifaceted challenge. Important aspects to consider are the suitability of research topics, a venue's prestige, and the probability of acceptance. The selection problem is exacerbated through the continuous emergence of additional venues. Previously proposed approaches for supporting authors in this process rely on complex recommender systems, e.g., based on Word2Vec or TextCNN. These, however, often elude an explanation for their recommendations. In this work, we propose an unsophisticated method that advances the state-of-the-art in two aspects: First, we enhance the interpretability of recommendations through non-negative matrix factorization based topic models; Second, we surprisingly can obtain competitive recommendation performance while using simpler learning methods.},

author = {Schäfermeier, Bastian and Stumme, Gerd and Hanika, Tom},

keywords = {venue_recommendations},

note = {cite arxiv:2109.11343},

title = {Towards Explainable Scientific Venue Recommendations},

year = 2021

}%0 Generic

%1 schafermeier2021towards

%A Schäfermeier, Bastian

%A Stumme, Gerd

%A Hanika, Tom

%D 2021

%T Towards Explainable Scientific Venue Recommendations

%U http://arxiv.org/abs/2109.11343

%X Selecting the best scientific venue (i.e., conference/journal) for the submission of a research article constitutes a multifaceted challenge. Important aspects to consider are the suitability of research topics, a venue's prestige, and the probability of acceptance. The selection problem is exacerbated through the continuous emergence of additional venues. Previously proposed approaches for supporting authors in this process rely on complex recommender systems, e.g., based on Word2Vec or TextCNN. These, however, often elude an explanation for their recommendations. In this work, we propose an unsophisticated method that advances the state-of-the-art in two aspects: First, we enhance the interpretability of recommendations through non-negative matrix factorization based topic models; Second, we surprisingly can obtain competitive recommendation performance while using simpler learning methods. - Schaefermeier, B., Stumme, G., Hanika, T.: Topic space trajectories Scientometrics. 126, 5759–5795 (2021).The annual number of publications at scientific venues, for example, conferences and journals, is growing quickly. Hence, even for researchers it becomes harder and harder to keep track of research topics and their progress. In this task, researchers can be supported by automated publication analysis. Yet, many such methods result in uninterpretable, purely numerical representations. As an attempt to support human analysts, we present topic space trajectories, a structure that allows for the comprehensible tracking of research topics. We demonstrate how these trajectories can be interpreted based on eight different analysis approaches. To obtain comprehensible results, we employ non-negative matrix factorization as well as suitable visualization techniques. We show the applicability of our approach on a publication corpus spanning 50 years of machine learning research from 32 publication venues. In addition to a thorough introduction of our method, our focus is on an extensive analysis of the results we achieved. Our novel analysis method may be employed for paper classification, for the prediction of future research topics, and for the recommendation of fitting conferences and journals for submitting unpublished work. An advantage in these applications over previous methods lies in the good interpretability of the results obtained through our methods.
@article{schafermeier2020topic,

abstract = {The annual number of publications at scientific venues, for example, conferences and journals, is growing quickly. Hence, even for researchers it becomes harder and harder to keep track of research topics and their progress. In this task, researchers can be supported by automated publication analysis. Yet, many such methods result in uninterpretable, purely numerical representations. As an attempt to support human analysts, we present topic space trajectories, a structure that allows for the comprehensible tracking of research topics. We demonstrate how these trajectories can be interpreted based on eight different analysis approaches. To obtain comprehensible results, we employ non-negative matrix factorization as well as suitable visualization techniques. We show the applicability of our approach on a publication corpus spanning 50 years of machine learning research from 32 publication venues. In addition to a thorough introduction of our method, our focus is on an extensive analysis of the results we achieved. Our novel analysis method may be employed for paper classification, for the prediction of future research topics, and for the recommendation of fitting conferences and journals for submitting unpublished work. An advantage in these applications over previous methods lies in the good interpretability of the results obtained through our methods.},

author = {Schaefermeier, Bastian and Stumme, Gerd and Hanika, Tom},

journal = {Scientometrics},

keywords = {myown},

month = {jul},

number = 7,

pages = {5759-5795},

publisher = {Springer},

title = {Topic space trajectories},

volume = 126,

year = 2021

}%0 Journal Article

%1 schafermeier2020topic

%A Schaefermeier, Bastian

%A Stumme, Gerd

%A Hanika, Tom

%D 2021

%I Springer

%J Scientometrics

%N 7

%P 5759-5795

%R 10.1007/s11192-021-03931-0

%T Topic space trajectories

%U https://doi.org/10.1007/s11192-021-03931-0

%V 126

%X The annual number of publications at scientific venues, for example, conferences and journals, is growing quickly. Hence, even for researchers it becomes harder and harder to keep track of research topics and their progress. In this task, researchers can be supported by automated publication analysis. Yet, many such methods result in uninterpretable, purely numerical representations. As an attempt to support human analysts, we present topic space trajectories, a structure that allows for the comprehensible tracking of research topics. We demonstrate how these trajectories can be interpreted based on eight different analysis approaches. To obtain comprehensible results, we employ non-negative matrix factorization as well as suitable visualization techniques. We show the applicability of our approach on a publication corpus spanning 50 years of machine learning research from 32 publication venues. In addition to a thorough introduction of our method, our focus is on an extensive analysis of the results we achieved. Our novel analysis method may be employed for paper classification, for the prediction of future research topics, and for the recommendation of fitting conferences and journals for submitting unpublished work. An advantage in these applications over previous methods lies in the good interpretability of the results obtained through our methods. - Schaefermeier, B., Stumme, G., Hanika, T.: Topological Indoor Mapping through WiFi Signals (2021).The ubiquitous presence of WiFi access points and mobile devices capable of measuring WiFi signal strengths allow for real-world applications in indoor localization and mapping. In particular, no additional infrastructure is required. Previous approaches in this field were, however, often hindered by problems such as effortful map-building processes, changing environments and hardware differences. We tackle these problems focussing on topological maps. These represent discrete locations, such as rooms, and their relations, e.g., distances and transition frequencies. In our unsupervised method, we employ WiFi signal strength distributions, dimension reduction and clustering. It can be used in settings where users carry mobile devices and follow their normal routine. We aim for applications in short-lived indoor events such as conferences.
@article{schaefermeier2021topological,

abstract = {The ubiquitous presence of WiFi access points and mobile devices capable of measuring WiFi signal strengths allow for real-world applications in indoor localization and mapping. In particular, no additional infrastructure is required. Previous approaches in this field were, however, often hindered by problems such as effortful map-building processes, changing environments and hardware differences. We tackle these problems focussing on topological maps. These represent discrete locations, such as rooms, and their relations, e.g., distances and transition frequencies. In our unsupervised method, we employ WiFi signal strength distributions, dimension reduction and clustering. It can be used in settings where users carry mobile devices and follow their normal routine. We aim for applications in short-lived indoor events such as conferences.},

author = {Schaefermeier, Bastian and Stumme, Gerd and Hanika, Tom},

keywords = {wifi},

note = {cite arxiv:2106.09789Comment: 18 pages},

title = {Topological Indoor Mapping through WiFi Signals},

year = 2021

}%0 Journal Article

%1 schaefermeier2021topological

%A Schaefermeier, Bastian

%A Stumme, Gerd

%A Hanika, Tom

%D 2021

%T Topological Indoor Mapping through WiFi Signals

%U http://arxiv.org/abs/2106.09789

%X The ubiquitous presence of WiFi access points and mobile devices capable of measuring WiFi signal strengths allow for real-world applications in indoor localization and mapping. In particular, no additional infrastructure is required. Previous approaches in this field were, however, often hindered by problems such as effortful map-building processes, changing environments and hardware differences. We tackle these problems focussing on topological maps. These represent discrete locations, such as rooms, and their relations, e.g., distances and transition frequencies. In our unsupervised method, we employ WiFi signal strength distributions, dimension reduction and clustering. It can be used in settings where users carry mobile devices and follow their normal routine. We aim for applications in short-lived indoor events such as conferences. - Schäfermeier, B., Hanika, T., Stumme, G.: Distances for WiFi Based Topological Indoor Mapping 16th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (MobiQuitous), November 12--14, 2019, Houston, TX, USA (2019).For localization and mapping of indoor environments through WiFi signals, locations are often represented as likelihoods of the received signal strength indicator. In this work we compare various measures of distance between such likelihoods in combination with different methods for estimation and representation. In particular, we show that among the considered distance measures the Earth Mover's Distance seems the most beneficial for the localization task. Combined with kernel density estimation we were able to retain the topological structure of rooms in a real-world office scenario.
@inproceedings{schafermeier2019distances,

abstract = {For localization and mapping of indoor environments through WiFi signals, locations are often represented as likelihoods of the received signal strength indicator. In this work we compare various measures of distance between such likelihoods in combination with different methods for estimation and representation. In particular, we show that among the considered distance measures the Earth Mover's Distance seems the most beneficial for the localization task. Combined with kernel density estimation we were able to retain the topological structure of rooms in a real-world office scenario.},

author = {Schäfermeier, Bastian and Hanika, Tom and Stumme, Gerd},

booktitle = {16th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (MobiQuitous), November 12--14, 2019, Houston, TX, USA},

keywords = {wifi},

title = {Distances for WiFi Based Topological Indoor Mapping},

year = 2019

}%0 Conference Paper

%1 schafermeier2019distances

%A Schäfermeier, Bastian

%A Hanika, Tom

%A Stumme, Gerd

%B 16th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (MobiQuitous), November 12--14, 2019, Houston, TX, USA

%D 2019

%R 10.1145/3360774.3360780

%T Distances for WiFi Based Topological Indoor Mapping

%X For localization and mapping of indoor environments through WiFi signals, locations are often represented as likelihoods of the received signal strength indicator. In this work we compare various measures of distance between such likelihoods in combination with different methods for estimation and representation. In particular, we show that among the considered distance measures the Earth Mover's Distance seems the most beneficial for the localization task. Combined with kernel density estimation we were able to retain the topological structure of rooms in a real-world office scenario.

%@ 978-1-4503-7283-1/19/11