## Fachgebiet Wissensverarbeitung (KDE), EECS, Universität Kassel

Das Fachgebiet Wissensverarbeitung des Fachbereichs Elektrotechnik/Informatik forscht an der Entwicklung von Methoden zur Wissensentdeckung und Wissensrepräsentation (Approximation und Exploration von Wissen, Ordnungsstrukturen in Wissen, Ontologieentwicklung) in Daten als auch in der Analyse von (sozialen) Netzwerkdaten und damit verbundenen Wissensprozessen (Metriken in Netzwerken, Anomalieerkennung, Charakterisierung von sozialen Netzwerken). Dabei liegt ein Schwerpunkt auf der exakten algebraischen Modellierung der verwendeten Strukturen und auf der Evaluierung und Neuentwicklung von Netzwerkmaßen. Neben der Erforschung von Grundlagen in den Gebieten Ordnungs- und Verbandstheorie, Beschreibungslogiken, Graphentheorie und Ontologie werden auch Anwendungen – bspw. in sozialen Medien sowie in der Szientometrie – erforscht.

Das Fachgebiet Wissensverarbeitung ist Mitglied im Wissenschaftlichen Zentrum für Informationstechnik-Gestaltung (ITeG) der Universität Kassel, im Wissenschaftlichen Zentrum INCHER der Universität Kassel, im Forschungszentrum L3S und im Hessischen KI-Zentrum (hessian.AI).

Testen Sie unser Social-Bookmark-System BibSonomy sowie unsere Namens-Suchmaschine Nameling!### Promotionsstelle – Bewerbungsfrist 30.06.2023

Zum nächstmöglichen Zeitpunkt suchen wir im Rahmen des Forschungsprojekts “Towards Ordinal Data Science” eine*n wissenschaftliche*n Mitarbeiter*in.

### Unsere neusten Publikationen

- Hirth, J., Horn, V., Stumme, G., Hanika, T.: Automatic Textual Explanations of Concept Lattices In: Ojeda-Aciego, M., Sauerwald, K., and Jäschke, R. (eds.) Graph-Based Representation and Reasoning. pp. 138–152. Springer Nature Switzerland, Cham (2023).Lattices and their order diagrams are an essential tool for communicating knowledge and insights about data. This is in particular true when applying Formal Concept Analysis. Such representations, however, are difficult to comprehend by untrained users and in general in cases where lattices are large. We tackle this problem by automatically generating textual explanations for lattices using standard scales. Our method is based on the general notion of ordinal motifs in lattices for the special case of standard scales. We show the computational complexity of identifying a small number of standard scales that cover most of the lattice structure. For these, we provide textual explanation templates, which can be applied to any occurrence of a scale in any data domain. These templates are derived using principles from human-computer interaction and allow for a comprehensive textual explanation of lattices. We demonstrate our approach on the spices planner data set, which is a medium sized formal context comprised of fifty-six meals (objects) and thirty-seven spices (attributes). The resulting 531 formal concepts can be covered by means of about 100 standard scales.
@inproceedings{10.1007/978-3-031-40960-8_12,

abstract = {Lattices and their order diagrams are an essential tool for communicating knowledge and insights about data. This is in particular true when applying Formal Concept Analysis. Such representations, however, are difficult to comprehend by untrained users and in general in cases where lattices are large. We tackle this problem by automatically generating textual explanations for lattices using standard scales. Our method is based on the general notion of ordinal motifs in lattices for the special case of standard scales. We show the computational complexity of identifying a small number of standard scales that cover most of the lattice structure. For these, we provide textual explanation templates, which can be applied to any occurrence of a scale in any data domain. These templates are derived using principles from human-computer interaction and allow for a comprehensive textual explanation of lattices. We demonstrate our approach on the spices planner data set, which is a medium sized formal context comprised of fifty-six meals (objects) and thirty-seven spices (attributes). The resulting 531 formal concepts can be covered by means of about 100 standard scales.},

address = {Cham},

author = {Hirth, Johannes and Horn, Viktoria and Stumme, Gerd and Hanika, Tom},

booktitle = {Graph-Based Representation and Reasoning},

editor = {Ojeda-Aciego, Manuel and Sauerwald, Kai and Jäschke, Robert},

keywords = {sai},

pages = {138--152},

publisher = {Springer Nature Switzerland},

title = {Automatic Textual Explanations of Concept Lattices},

year = 2023

}%0 Conference Paper

%1 10.1007/978-3-031-40960-8_12

%A Hirth, Johannes

%A Horn, Viktoria

%A Stumme, Gerd

%A Hanika, Tom

%B Graph-Based Representation and Reasoning

%C Cham

%D 2023

%E Ojeda-Aciego, Manuel

%E Sauerwald, Kai

%E Jäschke, Robert

%I Springer Nature Switzerland

%P 138--152

%T Automatic Textual Explanations of Concept Lattices

%X Lattices and their order diagrams are an essential tool for communicating knowledge and insights about data. This is in particular true when applying Formal Concept Analysis. Such representations, however, are difficult to comprehend by untrained users and in general in cases where lattices are large. We tackle this problem by automatically generating textual explanations for lattices using standard scales. Our method is based on the general notion of ordinal motifs in lattices for the special case of standard scales. We show the computational complexity of identifying a small number of standard scales that cover most of the lattice structure. For these, we provide textual explanation templates, which can be applied to any occurrence of a scale in any data domain. These templates are derived using principles from human-computer interaction and allow for a comprehensive textual explanation of lattices. We demonstrate our approach on the spices planner data set, which is a medium sized formal context comprised of fifty-six meals (objects) and thirty-seven spices (attributes). The resulting 531 formal concepts can be covered by means of about 100 standard scales.

%@ 978-3-031-40960-8 - Stumme, G., D{{ü}}rrschnabel, D., Hanika, T.: Towards Ordinal Data Science, https://doi.org/10.48550/arXiv.2307.09477, (2023).
@misc{DBLP:journals/corr/abs-2307-09477,

author = {Stumme, Gerd and D{{ü}}rrschnabel, Dominik and Hanika, Tom},

journal = {CoRR},

keywords = {publist},

title = {Towards Ordinal Data Science},

volume = {abs/2307.09477},

year = 2023

}%0 Generic

%1 DBLP:journals/corr/abs-2307-09477

%A Stumme, Gerd

%A D{{ü}}rrschnabel, Dominik

%A Hanika, Tom

%D 2023

%J CoRR

%R 10.48550/arXiv.2307.09477

%T Towards Ordinal Data Science

%U https://doi.org/10.48550/arXiv.2307.09477

%V abs/2307.09477 - Felde, M., Koyda, M.: Interval-dismantling for lattices International Journal of Approximate Reasoning. 159, 108931 (2023).Dismantling allows for the removal of elements from a poset, or in our case lattice, without disturbing the remaining structure. In this paper we have extended the notion of dismantling by single elements to the dismantling by intervals in a lattice. We utilize theory from Formal Concept Analysis (FCA) to show that lattices dismantled by intervals correspond to closed subrelations in the respective formal context, and that there exists a unique core with respect to dismantling by intervals. Furthermore, we show that dismantling intervals can be identified directly in the formal context utilizing a characterization via arrow relations and provide an algorithm to compute all dismantling intervals.
@article{FELDE2023108931,

abstract = {Dismantling allows for the removal of elements from a poset, or in our case lattice, without disturbing the remaining structure. In this paper we have extended the notion of dismantling by single elements to the dismantling by intervals in a lattice. We utilize theory from Formal Concept Analysis (FCA) to show that lattices dismantled by intervals correspond to closed subrelations in the respective formal context, and that there exists a unique core with respect to dismantling by intervals. Furthermore, we show that dismantling intervals can be identified directly in the formal context utilizing a characterization via arrow relations and provide an algorithm to compute all dismantling intervals.},

author = {Felde, Maximilian and Koyda, Maren},

journal = {International Journal of Approximate Reasoning},

keywords = {myown},

pages = 108931,

title = {Interval-dismantling for lattices},

volume = 159,

year = 2023

}%0 Journal Article

%1 FELDE2023108931

%A Felde, Maximilian

%A Koyda, Maren

%D 2023

%J International Journal of Approximate Reasoning

%P 108931

%R 10.1016/j.ijar.2023.108931

%T Interval-dismantling for lattices

%U https://www.sciencedirect.com/science/article/pii/S0888613X23000622

%V 159

%X Dismantling allows for the removal of elements from a poset, or in our case lattice, without disturbing the remaining structure. In this paper we have extended the notion of dismantling by single elements to the dismantling by intervals in a lattice. We utilize theory from Formal Concept Analysis (FCA) to show that lattices dismantled by intervals correspond to closed subrelations in the respective formal context, and that there exists a unique core with respect to dismantling by intervals. Furthermore, we show that dismantling intervals can be identified directly in the formal context utilizing a characterization via arrow relations and provide an algorithm to compute all dismantling intervals. - Hanika, T., Hirth, J.: Conceptual views on tree ensemble classifiers International Journal of Approximate Reasoning. 159, 108930 (2023).Random Forests and related tree-based methods are popular for supervised learning from table based data. Apart from their ease of parallelization, their classification performance is also superior. However, this performance, especially parallelizability, is offset by the loss of explainability. Statistical methods are often used to compensate for this disadvantage. Yet, their ability for local explanations, and in particular for global explanations, is limited. In the present work we propose an algebraic method, rooted in lattice theory, for the (global) explanation of tree ensembles. In detail, we introduce two novel conceptual views on tree ensemble classifiers and demonstrate their explanatory capabilities on Random Forests that were trained with standard parameters.
@article{HANIKA2023108930,

abstract = {Random Forests and related tree-based methods are popular for supervised learning from table based data. Apart from their ease of parallelization, their classification performance is also superior. However, this performance, especially parallelizability, is offset by the loss of explainability. Statistical methods are often used to compensate for this disadvantage. Yet, their ability for local explanations, and in particular for global explanations, is limited. In the present work we propose an algebraic method, rooted in lattice theory, for the (global) explanation of tree ensembles. In detail, we introduce two novel conceptual views on tree ensemble classifiers and demonstrate their explanatory capabilities on Random Forests that were trained with standard parameters.},

author = {Hanika, Tom and Hirth, Johannes},

journal = {International Journal of Approximate Reasoning},

keywords = {xai},

pages = 108930,

title = {Conceptual views on tree ensemble classifiers},

volume = 159,

year = 2023

}%0 Journal Article

%1 HANIKA2023108930

%A Hanika, Tom

%A Hirth, Johannes

%D 2023

%J International Journal of Approximate Reasoning

%P 108930

%R https://doi.org/10.1016/j.ijar.2023.108930

%T Conceptual views on tree ensemble classifiers

%U https://www.sciencedirect.com/science/article/pii/S0888613X23000610

%V 159

%X Random Forests and related tree-based methods are popular for supervised learning from table based data. Apart from their ease of parallelization, their classification performance is also superior. However, this performance, especially parallelizability, is offset by the loss of explainability. Statistical methods are often used to compensate for this disadvantage. Yet, their ability for local explanations, and in particular for global explanations, is limited. In the present work we propose an algebraic method, rooted in lattice theory, for the (global) explanation of tree ensembles. In detail, we introduce two novel conceptual views on tree ensemble classifiers and demonstrate their explanatory capabilities on Random Forests that were trained with standard parameters. - Stubbemann, M., Hanika, T., Schneider, F.M.: Intrinsic Dimension for Large-Scale Geometric Learning Transactions on Machine Learning Research. (2023).
@article{stubbemann2022intrinsic,

author = {Stubbemann, Maximilian and Hanika, Tom and Schneider, Friedrich Martin},

journal = {Transactions on Machine Learning Research},

keywords = {myown},

title = {Intrinsic Dimension for Large-Scale Geometric Learning},

year = 2023

}%0 Journal Article

%1 stubbemann2022intrinsic

%A Stubbemann, Maximilian

%A Hanika, Tom

%A Schneider, Friedrich Martin

%D 2023

%J Transactions on Machine Learning Research

%T Intrinsic Dimension for Large-Scale Geometric Learning

%U https://openreview.net/forum?id=85BfDdYMBY - Dürrschnabel, D., Stumme, G.: Maximal Ordinal Two-Factorizations, http://arxiv.org/abs/2304.03338, (2023).Given a formal context, an ordinal factor is a subset of its incidence relation that forms a chain in the concept lattice, i.e., a part of the dataset that corresponds to a linear order. To visualize the data in a formal context, Ganter and Glodeanu proposed a biplot based on two ordinal factors. For the biplot to be useful, it is important that these factors comprise as much data points as possible, i.e., that they cover a large part of the incidence relation. In this work, we investigate such ordinal two-factorizations. First, we investigate for formal contexts that omit ordinal two-factorizations the disjointness of the two factors. Then, we show that deciding on the existence of two-factorizations of a given size is an NP-complete problem which makes computing maximal factorizations computationally expensive. Finally, we provide the algorithm Ord2Factor that allows us to compute large ordinal two-factorizations.
@misc{durrschnabel2023maximal,

abstract = {Given a formal context, an ordinal factor is a subset of its incidence relation that forms a chain in the concept lattice, i.e., a part of the dataset that corresponds to a linear order. To visualize the data in a formal context, Ganter and Glodeanu proposed a biplot based on two ordinal factors. For the biplot to be useful, it is important that these factors comprise as much data points as possible, i.e., that they cover a large part of the incidence relation. In this work, we investigate such ordinal two-factorizations. First, we investigate for formal contexts that omit ordinal two-factorizations the disjointness of the two factors. Then, we show that deciding on the existence of two-factorizations of a given size is an NP-complete problem which makes computing maximal factorizations computationally expensive. Finally, we provide the algorithm Ord2Factor that allows us to compute large ordinal two-factorizations.},

author = {Dürrschnabel, Dominik and Stumme, Gerd},

keywords = {two_dimension_extension},

note = {cite arxiv:2304.03338Comment: 14 pages, 6 figures, 2 algorithms},

title = {Maximal Ordinal Two-Factorizations},

year = 2023

}%0 Generic

%1 durrschnabel2023maximal

%A Dürrschnabel, Dominik

%A Stumme, Gerd

%D 2023

%T Maximal Ordinal Two-Factorizations

%U http://arxiv.org/abs/2304.03338

%X Given a formal context, an ordinal factor is a subset of its incidence relation that forms a chain in the concept lattice, i.e., a part of the dataset that corresponds to a linear order. To visualize the data in a formal context, Ganter and Glodeanu proposed a biplot based on two ordinal factors. For the biplot to be useful, it is important that these factors comprise as much data points as possible, i.e., that they cover a large part of the incidence relation. In this work, we investigate such ordinal two-factorizations. First, we investigate for formal contexts that omit ordinal two-factorizations the disjointness of the two factors. Then, we show that deciding on the existence of two-factorizations of a given size is an NP-complete problem which makes computing maximal factorizations computationally expensive. Finally, we provide the algorithm Ord2Factor that allows us to compute large ordinal two-factorizations. - Dürrschnabel, D., Stumme, G.: Greedy Discovery of Ordinal Factors, http://arxiv.org/abs/2302.11554, (2023).In large datasets, it is hard to discover and analyze structure. It is thus common to introduce tags or keywords for the items. In applications, such datasets are then filtered based on these tags. Still, even medium-sized datasets with a few tags result in complex and for humans hard-to-navigate systems. In this work, we adopt the method of ordinal factor analysis to address this problem. An ordinal factor arranges a subset of the tags in a linear order based on their underlying structure. A complete ordinal factorization, which consists of such ordinal factors, precisely represents the original dataset. Based on such an ordinal factorization, we provide a way to discover and explain relationships between different items and attributes in the dataset. However, computing even just one ordinal factor of high cardinality is computationally complex. We thus propose the greedy algorithm in this work. This algorithm extracts ordinal factors using already existing fast algorithms developed in formal concept analysis. Then, we leverage to propose a comprehensive way to discover relationships in the dataset. We furthermore introduce a distance measure based on the representation emerging from the ordinal factorization to discover similar items. To evaluate the method, we conduct a case study on different datasets.
@misc{durrschnabel2023greedy,

abstract = {In large datasets, it is hard to discover and analyze structure. It is thus common to introduce tags or keywords for the items. In applications, such datasets are then filtered based on these tags. Still, even medium-sized datasets with a few tags result in complex and for humans hard-to-navigate systems. In this work, we adopt the method of ordinal factor analysis to address this problem. An ordinal factor arranges a subset of the tags in a linear order based on their underlying structure. A complete ordinal factorization, which consists of such ordinal factors, precisely represents the original dataset. Based on such an ordinal factorization, we provide a way to discover and explain relationships between different items and attributes in the dataset. However, computing even just one ordinal factor of high cardinality is computationally complex. We thus propose the greedy algorithm in this work. This algorithm extracts ordinal factors using already existing fast algorithms developed in formal concept analysis. Then, we leverage to propose a comprehensive way to discover relationships in the dataset. We furthermore introduce a distance measure based on the representation emerging from the ordinal factorization to discover similar items. To evaluate the method, we conduct a case study on different datasets.},

author = {Dürrschnabel, Dominik and Stumme, Gerd},

keywords = {ordinal_factor_analysis},

note = {cite arxiv:2302.11554Comment: 11 pages, 6 figures, 2 tables, 3 algorithms},

title = {Greedy Discovery of Ordinal Factors},

year = 2023

}%0 Generic

%1 durrschnabel2023greedy

%A Dürrschnabel, Dominik

%A Stumme, Gerd

%D 2023

%T Greedy Discovery of Ordinal Factors

%U http://arxiv.org/abs/2302.11554

%X In large datasets, it is hard to discover and analyze structure. It is thus common to introduce tags or keywords for the items. In applications, such datasets are then filtered based on these tags. Still, even medium-sized datasets with a few tags result in complex and for humans hard-to-navigate systems. In this work, we adopt the method of ordinal factor analysis to address this problem. An ordinal factor arranges a subset of the tags in a linear order based on their underlying structure. A complete ordinal factorization, which consists of such ordinal factors, precisely represents the original dataset. Based on such an ordinal factorization, we provide a way to discover and explain relationships between different items and attributes in the dataset. However, computing even just one ordinal factor of high cardinality is computationally complex. We thus propose the greedy algorithm in this work. This algorithm extracts ordinal factors using already existing fast algorithms developed in formal concept analysis. Then, we leverage to propose a comprehensive way to discover relationships in the dataset. We furthermore introduce a distance measure based on the representation emerging from the ordinal factorization to discover similar items. To evaluate the method, we conduct a case study on different datasets. - Stubbemann, M., Hille, T., Hanika, T.: Selecting Features by their Resilience to the Curse of Dimensionality (2023).
@article{stubbemann2023selecting,

author = {Stubbemann, Maximilian and Hille, Tobias and Hanika, Tom},

keywords = {selecting},

title = {Selecting Features by their Resilience to the Curse of Dimensionality},

year = 2023

}%0 Journal Article

%1 stubbemann2023selecting

%A Stubbemann, Maximilian

%A Hille, Tobias

%A Hanika, Tom

%D 2023

%T Selecting Features by their Resilience to the Curse of Dimensionality - Hirth, J., Horn, V., Stumme, G., Hanika, T.: Ordinal Motifs in Lattices, http://arxiv.org/abs/2304.04827, (2023).Lattices are a commonly used structure for the representation and analysis of relational and ontological knowledge. In particular, the analysis of these requires a decomposition of a large and high-dimensional lattice into a set of understandably large parts. With the present work we propose /ordinal motifs/ as analytical units of meaning. We study these ordinal substructures (or standard scales) through (full) scale-measures of formal contexts from the field of formal concept analysis. We show that the underlying decision problems are NP-complete and provide results on how one can incrementally identify ordinal motifs to save computational effort. Accompanying our theoretical results, we demonstrate how ordinal motifs can be leveraged to retrieve basic meaning from a medium sized ordinal data set.
@misc{hirth2023ordinal,

abstract = {Lattices are a commonly used structure for the representation and analysis of relational and ontological knowledge. In particular, the analysis of these requires a decomposition of a large and high-dimensional lattice into a set of understandably large parts. With the present work we propose /ordinal motifs/ as analytical units of meaning. We study these ordinal substructures (or standard scales) through (full) scale-measures of formal contexts from the field of formal concept analysis. We show that the underlying decision problems are NP-complete and provide results on how one can incrementally identify ordinal motifs to save computational effort. Accompanying our theoretical results, we demonstrate how ordinal motifs can be leveraged to retrieve basic meaning from a medium sized ordinal data set.},

author = {Hirth, Johannes and Horn, Viktoria and Stumme, Gerd and Hanika, Tom},

keywords = {publist},

title = {Ordinal Motifs in Lattices},

year = 2023

}%0 Generic

%1 hirth2023ordinal

%A Hirth, Johannes

%A Horn, Viktoria

%A Stumme, Gerd

%A Hanika, Tom

%D 2023

%R 10.48550/arXiv.2304.04827

%T Ordinal Motifs in Lattices

%U http://arxiv.org/abs/2304.04827

%X Lattices are a commonly used structure for the representation and analysis of relational and ontological knowledge. In particular, the analysis of these requires a decomposition of a large and high-dimensional lattice into a set of understandably large parts. With the present work we propose /ordinal motifs/ as analytical units of meaning. We study these ordinal substructures (or standard scales) through (full) scale-measures of formal contexts from the field of formal concept analysis. We show that the underlying decision problems are NP-complete and provide results on how one can incrementally identify ordinal motifs to save computational effort. Accompanying our theoretical results, we demonstrate how ordinal motifs can be leveraged to retrieve basic meaning from a medium sized ordinal data set. - Ganter, B., Hanika, T., Hirth, J.: Scaling Dimension In: Dürrschnabel, D. and López-Rodríguez, D. (eds.) Formal Concept Analysis - 17th International Conference, {ICFCA} 2023, Kassel, Germany, July 17-21, 2023, Proceedings. pp. 64–77. Springer (2023).
@inproceedings{DBLP:conf/icfca/GanterHH23,

author = {Ganter, Bernhard and Hanika, Tom and Hirth, Johannes},

booktitle = {Formal Concept Analysis - 17th International Conference, {ICFCA} 2023, Kassel, Germany, July 17-21, 2023, Proceedings},

editor = {Dürrschnabel, Dominik and López-Rodríguez, Domingo},

keywords = {sai},

pages = {64--77},

publisher = {Springer},

series = {Lecture Notes in Computer Science},

title = {Scaling Dimension},

volume = 13934,

year = 2023

}%0 Conference Paper

%1 DBLP:conf/icfca/GanterHH23

%A Ganter, Bernhard

%A Hanika, Tom

%A Hirth, Johannes

%B Formal Concept Analysis - 17th International Conference, {ICFCA} 2023, Kassel, Germany, July 17-21, 2023, Proceedings

%D 2023

%E Dürrschnabel, Dominik

%E López-Rodríguez, Domingo

%I Springer

%P 64--77

%R 10.1007/978-3-031-35949-1\_5

%T Scaling Dimension

%U https://doi.org/10.1007/978-3-031-35949-1\_5

%V 13934 - Felde, M., Stumme, G.: Attribute Exploration with Multiple Contradicting Partial Experts In: Braun, T., Cristea, D., and J{ä}schke, R. (eds.) Graph-Based Representation and Reasoning. pp. 51–65. Springer International Publishing, Cham (2022).Attribute exploration is a method from Formal Concept Analysis (FCA) that helps a domain expert discover structural dependencies in knowledge domains which can be represented as formal contexts (cross tables of objects and attributes). In this paper we present an extension of attribute exploration that allows for a group of domain experts and explores their shared views. Each expert has their own view of the domain and the views of multiple experts may contain contradicting information.
@inproceedings{10.1007/978-3-031-16663-1_5,

abstract = {Attribute exploration is a method from Formal Concept Analysis (FCA) that helps a domain expert discover structural dependencies in knowledge domains which can be represented as formal contexts (cross tables of objects and attributes). In this paper we present an extension of attribute exploration that allows for a group of domain experts and explores their shared views. Each expert has their own view of the domain and the views of multiple experts may contain contradicting information.},

address = {Cham},

author = {Felde, Maximilian and Stumme, Gerd},

booktitle = {Graph-Based Representation and Reasoning},

editor = {Braun, Tanya and Cristea, Diana and J{ä}schke, Robert},

keywords = {myown},

pages = {51--65},

publisher = {Springer International Publishing},

title = {Attribute Exploration with Multiple Contradicting Partial Experts},

year = 2022

}%0 Conference Paper

%1 10.1007/978-3-031-16663-1_5

%A Felde, Maximilian

%A Stumme, Gerd

%B Graph-Based Representation and Reasoning

%C Cham

%D 2022

%E Braun, Tanya

%E Cristea, Diana

%E J{ä}schke, Robert

%I Springer International Publishing

%P 51--65

%R 10.1007/978-3-031-16663-1_5

%T Attribute Exploration with Multiple Contradicting Partial Experts

%X Attribute exploration is a method from Formal Concept Analysis (FCA) that helps a domain expert discover structural dependencies in knowledge domains which can be represented as formal contexts (cross tables of objects and attributes). In this paper we present an extension of attribute exploration that allows for a group of domain experts and explores their shared views. Each expert has their own view of the domain and the views of multiple experts may contain contradicting information.

%@ 978-3-031-16663-1 - Schäfermeier, B., Hirth, J., Hanika, T.: Research Topic Flows in Co-Authorship Networks, https://doi.org/10.1007/s11192-022-04529-w, (2022).In scientometrics, scientific collaboration is often analyzed by means of co-authorships. An aspect which is often overlooked and more difficult to quantify is the flow of expertise between authors from different research topics, which is an important part of scientific progress. With the Topic Flow Network (TFN) we propose a graph structure for the analysis of research topic flows between scientific authors and their respective research fields. Based on a multi-graph and a topic model, our proposed network structure accounts for intratopic as well as intertopic flows. Our method requires for the construction of a TFN solely a corpus of publications (i.e., author and abstract information). From this, research topics are discovered automatically through non-negative matrix factorization. The thereof derived TFN allows for the application of social network analysis techniques, such as common metrics and community detection. Most importantly, it allows for the analysis of intertopic flows on a large, macroscopic scale, i.e., between research topic, as well as on a microscopic scale, i.e., between certain sets of authors. We demonstrate the utility of TFNs by applying our method to two comprehensive corpora of altogether 20 Mio. publications spanning more than 60 years of research in the fields computer science and mathematics. Our results give evidence that TFNs are suitable, e.g., for the analysis of topical communities, the discovery of important authors in different fields, and, most notably, the analysis of intertopic flows, i.e., the transfer of topical expertise. Besides that, our method opens new directions for future research, such as the investigation of influence relationships between research fields.
@misc{schafermeier2022research,

abstract = {In scientometrics, scientific collaboration is often analyzed by means of co-authorships. An aspect which is often overlooked and more difficult to quantify is the flow of expertise between authors from different research topics, which is an important part of scientific progress. With the Topic Flow Network (TFN) we propose a graph structure for the analysis of research topic flows between scientific authors and their respective research fields. Based on a multi-graph and a topic model, our proposed network structure accounts for intratopic as well as intertopic flows. Our method requires for the construction of a TFN solely a corpus of publications (i.e., author and abstract information). From this, research topics are discovered automatically through non-negative matrix factorization. The thereof derived TFN allows for the application of social network analysis techniques, such as common metrics and community detection. Most importantly, it allows for the analysis of intertopic flows on a large, macroscopic scale, i.e., between research topic, as well as on a microscopic scale, i.e., between certain sets of authors. We demonstrate the utility of TFNs by applying our method to two comprehensive corpora of altogether 20 Mio. publications spanning more than 60 years of research in the fields computer science and mathematics. Our results give evidence that TFNs are suitable, e.g., for the analysis of topical communities, the discovery of important authors in different fields, and, most notably, the analysis of intertopic flows, i.e., the transfer of topical expertise. Besides that, our method opens new directions for future research, such as the investigation of influence relationships between research fields.},

author = {Schäfermeier, Bastian and Hirth, Johannes and Hanika, Tom},

journal = {Scientometrics},

keywords = {topic-models},

month = {October},

title = {Research Topic Flows in Co-Authorship Networks},

year = 2022

}%0 Generic

%1 schafermeier2022research

%A Schäfermeier, Bastian

%A Hirth, Johannes

%A Hanika, Tom

%D 2022

%J Scientometrics

%R 10.1007/s11192-022-04529-w

%T Research Topic Flows in Co-Authorship Networks

%U https://doi.org/10.1007/s11192-022-04529-w

%X In scientometrics, scientific collaboration is often analyzed by means of co-authorships. An aspect which is often overlooked and more difficult to quantify is the flow of expertise between authors from different research topics, which is an important part of scientific progress. With the Topic Flow Network (TFN) we propose a graph structure for the analysis of research topic flows between scientific authors and their respective research fields. Based on a multi-graph and a topic model, our proposed network structure accounts for intratopic as well as intertopic flows. Our method requires for the construction of a TFN solely a corpus of publications (i.e., author and abstract information). From this, research topics are discovered automatically through non-negative matrix factorization. The thereof derived TFN allows for the application of social network analysis techniques, such as common metrics and community detection. Most importantly, it allows for the analysis of intertopic flows on a large, macroscopic scale, i.e., between research topic, as well as on a microscopic scale, i.e., between certain sets of authors. We demonstrate the utility of TFNs by applying our method to two comprehensive corpora of altogether 20 Mio. publications spanning more than 60 years of research in the fields computer science and mathematics. Our results give evidence that TFNs are suitable, e.g., for the analysis of topical communities, the discovery of important authors in different fields, and, most notably, the analysis of intertopic flows, i.e., the transfer of topical expertise. Besides that, our method opens new directions for future research, such as the investigation of influence relationships between research fields. - Stubbemann, M., Stumme, G.: LG4AV: Combining Language Models and Graph Neural Networks for Author Verification In: Bouadi, T., Fromont, E., and H{ü}llermeier, E. (eds.) Advances in Intelligent Data Analysis XX. pp. 315–326. Springer International Publishing, Cham (2022).The verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media. Therefore, it is important that authorship information in frequently used data sets is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches impractical for bibliometric data. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process.
@inproceedings{10.1007/978-3-031-01333-1_25,

abstract = {The verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media. Therefore, it is important that authorship information in frequently used data sets is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches impractical for bibliometric data. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process.},

address = {Cham},

author = {Stubbemann, Maximilian and Stumme, Gerd},

booktitle = {Advances in Intelligent Data Analysis XX},

editor = {Bouadi, Tassadit and Fromont, Elisa and H{ü}llermeier, Eyke},

keywords = {regio},

pages = {315--326},

publisher = {Springer International Publishing},

title = {LG4AV: Combining Language Models and Graph Neural Networks for Author Verification},

year = 2022

}%0 Conference Paper

%1 10.1007/978-3-031-01333-1_25

%A Stubbemann, Maximilian

%A Stumme, Gerd

%B Advances in Intelligent Data Analysis XX

%C Cham

%D 2022

%E Bouadi, Tassadit

%E Fromont, Elisa

%E H{ü}llermeier, Eyke

%I Springer International Publishing

%P 315--326

%T LG4AV: Combining Language Models and Graph Neural Networks for Author Verification

%U https://link.springer.com/chapter/10.1007/978-3-031-01333-1_25

%X The verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media. Therefore, it is important that authorship information in frequently used data sets is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches impractical for bibliometric data. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process.

%@ 978-3-031-01333-1 - Dürrschnabel, D., Hanika, T., Stumme, G.: Discovering Locally Maximal Bipartite Subgraphs, http://arxiv.org/abs/2211.10446, (2022).Induced bipartite subgraphs of maximal vertex cardinality are an essential concept for the analysis of graphs. Yet, discovering them in large graphs is known to be computationally hard. Therefore, we consider in this work a weaker notion of this problem, where we discard the maximality constraint in favor of inclusion maximality. Thus, we aim to discover locally maximal bipartite subgraphs. For this, we present three heuristic approaches to extract such subgraphs and compare their results to the solutions of the global problem. For the latter, we employ the algorithmic strength of fast SAT-solvers. Our three proposed heuristics are based on a greedy strategy, a simulated annealing approach, and a genetic algorithm, respectively. We evaluate all four algorithms with respect to their time requirement and the vertex cardinality of the discovered bipartite subgraphs on several benchmark datasets
@misc{durrschnabel2022discovering,

abstract = {Induced bipartite subgraphs of maximal vertex cardinality are an essential concept for the analysis of graphs. Yet, discovering them in large graphs is known to be computationally hard. Therefore, we consider in this work a weaker notion of this problem, where we discard the maximality constraint in favor of inclusion maximality. Thus, we aim to discover locally maximal bipartite subgraphs. For this, we present three heuristic approaches to extract such subgraphs and compare their results to the solutions of the global problem. For the latter, we employ the algorithmic strength of fast SAT-solvers. Our three proposed heuristics are based on a greedy strategy, a simulated annealing approach, and a genetic algorithm, respectively. We evaluate all four algorithms with respect to their time requirement and the vertex cardinality of the discovered bipartite subgraphs on several benchmark datasets},

author = {Dürrschnabel, Dominik and Hanika, Tom and Stumme, Gerd},

keywords = {myown},

note = {cite arxiv:2211.10446Comment: 12 pages, 3 figures, 3 tables},

title = {Discovering Locally Maximal Bipartite Subgraphs},

year = 2022

}%0 Generic

%1 durrschnabel2022discovering

%A Dürrschnabel, Dominik

%A Hanika, Tom

%A Stumme, Gerd

%D 2022

%T Discovering Locally Maximal Bipartite Subgraphs

%U http://arxiv.org/abs/2211.10446

%X Induced bipartite subgraphs of maximal vertex cardinality are an essential concept for the analysis of graphs. Yet, discovering them in large graphs is known to be computationally hard. Therefore, we consider in this work a weaker notion of this problem, where we discard the maximality constraint in favor of inclusion maximality. Thus, we aim to discover locally maximal bipartite subgraphs. For this, we present three heuristic approaches to extract such subgraphs and compare their results to the solutions of the global problem. For the latter, we employ the algorithmic strength of fast SAT-solvers. Our three proposed heuristics are based on a greedy strategy, a simulated annealing approach, and a genetic algorithm, respectively. We evaluate all four algorithms with respect to their time requirement and the vertex cardinality of the discovered bipartite subgraphs on several benchmark datasets - Hanika, T., Schneider, F.M., Stumme, G.: {Intrinsic dimension of geometric data sets} Tohoku Mathematical Journal. 74, 23–52 (2022).The curse of dimensionality is a phenomenon frequently observed in machine learning (ML) and knowledge discovery (KD). There is a large body of literature investigating its origin and impact, using methods from mathematics as well as from computer science. Among the mathematical insights into data dimensionality, there is an intimate link between the dimension curse and the phenomenon of measure concentration, which makes the former accessible to methods of geometric analysis. The present work provides a comprehensive study of the intrinsic geometry of a data set, based on Gromov's metric measure geometry and Pestov's axiomatic approach to intrinsic dimension. In detail, we define a concept of geometric data set and introduce a metric as well as a partial order on the set of isomorphism classes of such data sets. Based on these objects, we propose and investigate an axiomatic approach to the intrinsic dimension of geometric data sets and establish a concrete dimension function with the desired properties. Our model for data sets and their intrinsic dimension is computationally feasible and, moreover, adaptable to specific ML/KD-algorithms, as illustrated by various experiments.
@article{10.2748/tmj.20201015a,

abstract = {The curse of dimensionality is a phenomenon frequently observed in machine learning (ML) and knowledge discovery (KD). There is a large body of literature investigating its origin and impact, using methods from mathematics as well as from computer science. Among the mathematical insights into data dimensionality, there is an intimate link between the dimension curse and the phenomenon of measure concentration, which makes the former accessible to methods of geometric analysis. The present work provides a comprehensive study of the intrinsic geometry of a data set, based on Gromov's metric measure geometry and Pestov's axiomatic approach to intrinsic dimension. In detail, we define a concept of geometric data set and introduce a metric as well as a partial order on the set of isomorphism classes of such data sets. Based on these objects, we propose and investigate an axiomatic approach to the intrinsic dimension of geometric data sets and establish a concrete dimension function with the desired properties. Our model for data sets and their intrinsic dimension is computationally feasible and, moreover, adaptable to specific ML/KD-algorithms, as illustrated by various experiments.},

author = {Hanika, Tom and Schneider, Friedrich Martin and Stumme, Gerd},

journal = {Tohoku Mathematical Journal},

keywords = {publist},

number = 1,

pages = {23 -- 52},

publisher = {Tohoku University, Mathematical Institute},

title = {{Intrinsic dimension of geometric data sets}},

volume = 74,

year = 2022

}%0 Journal Article

%1 10.2748/tmj.20201015a

%A Hanika, Tom

%A Schneider, Friedrich Martin

%A Stumme, Gerd

%D 2022

%I Tohoku University, Mathematical Institute

%J Tohoku Mathematical Journal

%N 1

%P 23 -- 52

%R 10.2748/tmj.20201015a

%T {Intrinsic dimension of geometric data sets}

%U https://doi.org/10.2748/tmj.20201015a

%V 74

%X The curse of dimensionality is a phenomenon frequently observed in machine learning (ML) and knowledge discovery (KD). There is a large body of literature investigating its origin and impact, using methods from mathematics as well as from computer science. Among the mathematical insights into data dimensionality, there is an intimate link between the dimension curse and the phenomenon of measure concentration, which makes the former accessible to methods of geometric analysis. The present work provides a comprehensive study of the intrinsic geometry of a data set, based on Gromov's metric measure geometry and Pestov's axiomatic approach to intrinsic dimension. In detail, we define a concept of geometric data set and introduce a metric as well as a partial order on the set of isomorphism classes of such data sets. Based on these objects, we propose and investigate an axiomatic approach to the intrinsic dimension of geometric data sets and establish a concrete dimension function with the desired properties. Our model for data sets and their intrinsic dimension is computationally feasible and, moreover, adaptable to specific ML/KD-algorithms, as illustrated by various experiments. - Schäfermeier, B., Stumme, G., Hanika, T.: Mapping Research Trajectories, https://arxiv.org/abs/2204.11859, (2022).
@misc{https://doi.org/10.48550/arxiv.2204.11859,

author = {Schäfermeier, Bastian and Stumme, Gerd and Hanika, Tom},

keywords = {trajectory_mapping},

publisher = {arXiv},

title = {Mapping Research Trajectories},

year = 2022

}%0 Generic

%1 https://doi.org/10.48550/arxiv.2204.11859

%A Schäfermeier, Bastian

%A Stumme, Gerd

%A Hanika, Tom

%D 2022

%I arXiv

%R 10.48550/ARXIV.2204.11859

%T Mapping Research Trajectories

%U https://arxiv.org/abs/2204.11859 - D{ü}rrschnabel, D., Hanika, T., Stubbemann, M.: FCA2VEC: Embedding Techniques for Formal Concept Analysis In: Missaoui, R., Kwuida, L., and Abdessalem, T. (eds.) Complex Data Analytics with Formal Concept Analysis. pp. 47–74. Springer International Publishing, Cham (2022).Embedding large and high dimensional data into low dimensional vector spaces is a necessary task to computationally cope with contemporary data sets. Superseding `latent semantic analysis' recent approaches like `word2vec' or `node2vec' are well established tools in this realm. In the present paper we add to this line of research by introducing `fca2vec', a family of embedding techniques for formal concept analysis (FCA). Our investigation contributes to two distinct lines of research. First, we enable the application of FCA notions to large data sets. In particular, we demonstrate how the cover relation of a concept lattice can be retrieved from a computationally feasible embedding. Secondly, we show an enhancement for the classical node2vec approach in low dimension. For both directions the overall constraint of FCA of explainable results is preserved. We evaluate our novel procedures by computing fca2vec on different data sets like, wiki44 (a dense part of the Wikidata knowledge graph), the Mushroom data set and a publication network derived from the FCA community.
@inbook{Dürrschnabel2022,

abstract = {Embedding large and high dimensional data into low dimensional vector spaces is a necessary task to computationally cope with contemporary data sets. Superseding `latent semantic analysis' recent approaches like `word2vec' or `node2vec' are well established tools in this realm. In the present paper we add to this line of research by introducing `fca2vec', a family of embedding techniques for formal concept analysis (FCA). Our investigation contributes to two distinct lines of research. First, we enable the application of FCA notions to large data sets. In particular, we demonstrate how the cover relation of a concept lattice can be retrieved from a computationally feasible embedding. Secondly, we show an enhancement for the classical node2vec approach in low dimension. For both directions the overall constraint of FCA of explainable results is preserved. We evaluate our novel procedures by computing fca2vec on different data sets like, wiki44 (a dense part of the Wikidata knowledge graph), the Mushroom data set and a publication network derived from the FCA community.},

address = {Cham},

author = {D{ü}rrschnabel, Dominik and Hanika, Tom and Stubbemann, Maximilian},

booktitle = {Complex Data Analytics with Formal Concept Analysis},

editor = {Missaoui, Rokia and Kwuida, L{é}onard and Abdessalem, Talel},

keywords = {vector_space_embeddings},

pages = {47--74},

publisher = {Springer International Publishing},

title = {FCA2VEC: Embedding Techniques for Formal Concept Analysis},

year = 2022

}%0 Book Section

%1 Dürrschnabel2022

%A D{ü}rrschnabel, Dominik

%A Hanika, Tom

%A Stubbemann, Maximilian

%B Complex Data Analytics with Formal Concept Analysis

%C Cham

%D 2022

%E Missaoui, Rokia

%E Kwuida, L{é}onard

%E Abdessalem, Talel

%I Springer International Publishing

%P 47--74

%R 10.1007/978-3-030-93278-7_3

%T FCA2VEC: Embedding Techniques for Formal Concept Analysis

%U https://doi.org/10.1007/978-3-030-93278-7_3

%X Embedding large and high dimensional data into low dimensional vector spaces is a necessary task to computationally cope with contemporary data sets. Superseding `latent semantic analysis' recent approaches like `word2vec' or `node2vec' are well established tools in this realm. In the present paper we add to this line of research by introducing `fca2vec', a family of embedding techniques for formal concept analysis (FCA). Our investigation contributes to two distinct lines of research. First, we enable the application of FCA notions to large data sets. In particular, we demonstrate how the cover relation of a concept lattice can be retrieved from a computationally feasible embedding. Secondly, we show an enhancement for the classical node2vec approach in low dimension. For both directions the overall constraint of FCA of explainable results is preserved. We evaluate our novel procedures by computing fca2vec on different data sets like, wiki44 (a dense part of the Wikidata knowledge graph), the Mushroom data set and a publication network derived from the FCA community.

%@ 978-3-030-93278-7 - Felde, M., Koyda, M.: Interval-Dismantling for Lattices, https://arxiv.org/abs/2208.01479, (2022).Dismantling allows for the removal of elements of a set, or in our case lattice, without disturbing the remaining structure. In this paper we have extended the notion of dismantling by single elements to the dismantling by intervals in a lattice. We utilize theory from Formal Concept Analysis (FCA) to show that lattices dismantled by intervals correspond to closed subrelations in the respective formal context, and that there exists a unique kernel with respect to dismantling by intervals. Furthermore, we show that dismantling intervals can be identified directly in the formal context utilizing a characterization via arrow relations and provide an algorithm to compute all dismantling intervals.
@preprint{felde2022intervaldismantling,

abstract = {Dismantling allows for the removal of elements of a set, or in our case lattice, without disturbing the remaining structure. In this paper we have extended the notion of dismantling by single elements to the dismantling by intervals in a lattice. We utilize theory from Formal Concept Analysis (FCA) to show that lattices dismantled by intervals correspond to closed subrelations in the respective formal context, and that there exists a unique kernel with respect to dismantling by intervals. Furthermore, we show that dismantling intervals can be identified directly in the formal context utilizing a characterization via arrow relations and provide an algorithm to compute all dismantling intervals.},

author = {Felde, Maximilian and Koyda, Maren},

keywords = {myown},

note = {cite arxiv:2208.01479Comment: 12 pages, 5 figures, 1 algorithm},

title = {Interval-Dismantling for Lattices},

year = 2022

}%0 Generic

%1 felde2022intervaldismantling

%A Felde, Maximilian

%A Koyda, Maren

%D 2022

%R 10.48550/arXiv.2208.01479

%T Interval-Dismantling for Lattices

%U https://arxiv.org/abs/2208.01479

%X Dismantling allows for the removal of elements of a set, or in our case lattice, without disturbing the remaining structure. In this paper we have extended the notion of dismantling by single elements to the dismantling by intervals in a lattice. We utilize theory from Formal Concept Analysis (FCA) to show that lattices dismantled by intervals correspond to closed subrelations in the respective formal context, and that there exists a unique kernel with respect to dismantling by intervals. Furthermore, we show that dismantling intervals can be identified directly in the formal context utilizing a characterization via arrow relations and provide an algorithm to compute all dismantling intervals. - Schaefermeier, B., Stumme, G., Hanika, T.: Topic space trajectories Scientometrics. 126, 5759–5795 (2021).The annual number of publications at scientific venues, for example, conferences and journals, is growing quickly. Hence, even for researchers it becomes harder and harder to keep track of research topics and their progress. In this task, researchers can be supported by automated publication analysis. Yet, many such methods result in uninterpretable, purely numerical representations. As an attempt to support human analysts, we present topic space trajectories, a structure that allows for the comprehensible tracking of research topics. We demonstrate how these trajectories can be interpreted based on eight different analysis approaches. To obtain comprehensible results, we employ non-negative matrix factorization as well as suitable visualization techniques. We show the applicability of our approach on a publication corpus spanning 50 years of machine learning research from 32 publication venues. In addition to a thorough introduction of our method, our focus is on an extensive analysis of the results we achieved. Our novel analysis method may be employed for paper classification, for the prediction of future research topics, and for the recommendation of fitting conferences and journals for submitting unpublished work. An advantage in these applications over previous methods lies in the good interpretability of the results obtained through our methods.
@article{schafermeier2020topic,

abstract = {The annual number of publications at scientific venues, for example, conferences and journals, is growing quickly. Hence, even for researchers it becomes harder and harder to keep track of research topics and their progress. In this task, researchers can be supported by automated publication analysis. Yet, many such methods result in uninterpretable, purely numerical representations. As an attempt to support human analysts, we present topic space trajectories, a structure that allows for the comprehensible tracking of research topics. We demonstrate how these trajectories can be interpreted based on eight different analysis approaches. To obtain comprehensible results, we employ non-negative matrix factorization as well as suitable visualization techniques. We show the applicability of our approach on a publication corpus spanning 50 years of machine learning research from 32 publication venues. In addition to a thorough introduction of our method, our focus is on an extensive analysis of the results we achieved. Our novel analysis method may be employed for paper classification, for the prediction of future research topics, and for the recommendation of fitting conferences and journals for submitting unpublished work. An advantage in these applications over previous methods lies in the good interpretability of the results obtained through our methods.},

author = {Schaefermeier, Bastian and Stumme, Gerd and Hanika, Tom},

journal = {Scientometrics},

keywords = {myown},

month = {jul},

number = 7,

pages = {5759-5795},

publisher = {Springer},

title = {Topic space trajectories},

volume = 126,

year = 2021

}%0 Journal Article

%1 schafermeier2020topic

%A Schaefermeier, Bastian

%A Stumme, Gerd

%A Hanika, Tom

%D 2021

%I Springer

%J Scientometrics

%N 7

%P 5759-5795

%R 10.1007/s11192-021-03931-0

%T Topic space trajectories

%U https://doi.org/10.1007/s11192-021-03931-0

%V 126

%X The annual number of publications at scientific venues, for example, conferences and journals, is growing quickly. Hence, even for researchers it becomes harder and harder to keep track of research topics and their progress. In this task, researchers can be supported by automated publication analysis. Yet, many such methods result in uninterpretable, purely numerical representations. As an attempt to support human analysts, we present topic space trajectories, a structure that allows for the comprehensible tracking of research topics. We demonstrate how these trajectories can be interpreted based on eight different analysis approaches. To obtain comprehensible results, we employ non-negative matrix factorization as well as suitable visualization techniques. We show the applicability of our approach on a publication corpus spanning 50 years of machine learning research from 32 publication venues. In addition to a thorough introduction of our method, our focus is on an extensive analysis of the results we achieved. Our novel analysis method may be employed for paper classification, for the prediction of future research topics, and for the recommendation of fitting conferences and journals for submitting unpublished work. An advantage in these applications over previous methods lies in the good interpretability of the results obtained through our methods. - Schaefermeier, B., Stumme, G., Hanika, T.: Topological Indoor Mapping through WiFi Signals (2021).The ubiquitous presence of WiFi access points and mobile devices capable of measuring WiFi signal strengths allow for real-world applications in indoor localization and mapping. In particular, no additional infrastructure is required. Previous approaches in this field were, however, often hindered by problems such as effortful map-building processes, changing environments and hardware differences. We tackle these problems focussing on topological maps. These represent discrete locations, such as rooms, and their relations, e.g., distances and transition frequencies. In our unsupervised method, we employ WiFi signal strength distributions, dimension reduction and clustering. It can be used in settings where users carry mobile devices and follow their normal routine. We aim for applications in short-lived indoor events such as conferences.
@article{schaefermeier2021topological,

abstract = {The ubiquitous presence of WiFi access points and mobile devices capable of measuring WiFi signal strengths allow for real-world applications in indoor localization and mapping. In particular, no additional infrastructure is required. Previous approaches in this field were, however, often hindered by problems such as effortful map-building processes, changing environments and hardware differences. We tackle these problems focussing on topological maps. These represent discrete locations, such as rooms, and their relations, e.g., distances and transition frequencies. In our unsupervised method, we employ WiFi signal strength distributions, dimension reduction and clustering. It can be used in settings where users carry mobile devices and follow their normal routine. We aim for applications in short-lived indoor events such as conferences.},

author = {Schaefermeier, Bastian and Stumme, Gerd and Hanika, Tom},

keywords = {wifi},

note = {cite arxiv:2106.09789Comment: 18 pages},

title = {Topological Indoor Mapping through WiFi Signals},

year = 2021

}%0 Journal Article

%1 schaefermeier2021topological

%A Schaefermeier, Bastian

%A Stumme, Gerd

%A Hanika, Tom

%D 2021

%T Topological Indoor Mapping through WiFi Signals

%U http://arxiv.org/abs/2106.09789

%X The ubiquitous presence of WiFi access points and mobile devices capable of measuring WiFi signal strengths allow for real-world applications in indoor localization and mapping. In particular, no additional infrastructure is required. Previous approaches in this field were, however, often hindered by problems such as effortful map-building processes, changing environments and hardware differences. We tackle these problems focussing on topological maps. These represent discrete locations, such as rooms, and their relations, e.g., distances and transition frequencies. In our unsupervised method, we employ WiFi signal strength distributions, dimension reduction and clustering. It can be used in settings where users carry mobile devices and follow their normal routine. We aim for applications in short-lived indoor events such as conferences. - Schäfermeier, B., Stumme, G., Hanika, T.: Towards Explainable Scientific Venue Recommendations, http://arxiv.org/abs/2109.11343, (2021).Selecting the best scientific venue (i.e., conference/journal) for the submission of a research article constitutes a multifaceted challenge. Important aspects to consider are the suitability of research topics, a venue's prestige, and the probability of acceptance. The selection problem is exacerbated through the continuous emergence of additional venues. Previously proposed approaches for supporting authors in this process rely on complex recommender systems, e.g., based on Word2Vec or TextCNN. These, however, often elude an explanation for their recommendations. In this work, we propose an unsophisticated method that advances the state-of-the-art in two aspects: First, we enhance the interpretability of recommendations through non-negative matrix factorization based topic models; Second, we surprisingly can obtain competitive recommendation performance while using simpler learning methods.
@misc{schafermeier2021towards,

abstract = {Selecting the best scientific venue (i.e., conference/journal) for the submission of a research article constitutes a multifaceted challenge. Important aspects to consider are the suitability of research topics, a venue's prestige, and the probability of acceptance. The selection problem is exacerbated through the continuous emergence of additional venues. Previously proposed approaches for supporting authors in this process rely on complex recommender systems, e.g., based on Word2Vec or TextCNN. These, however, often elude an explanation for their recommendations. In this work, we propose an unsophisticated method that advances the state-of-the-art in two aspects: First, we enhance the interpretability of recommendations through non-negative matrix factorization based topic models; Second, we surprisingly can obtain competitive recommendation performance while using simpler learning methods.},

author = {Schäfermeier, Bastian and Stumme, Gerd and Hanika, Tom},

keywords = {venue_recommendations},

note = {cite arxiv:2109.11343},

title = {Towards Explainable Scientific Venue Recommendations},

year = 2021

}%0 Generic

%1 schafermeier2021towards

%A Schäfermeier, Bastian

%A Stumme, Gerd

%A Hanika, Tom

%D 2021

%T Towards Explainable Scientific Venue Recommendations

%U http://arxiv.org/abs/2109.11343

%X Selecting the best scientific venue (i.e., conference/journal) for the submission of a research article constitutes a multifaceted challenge. Important aspects to consider are the suitability of research topics, a venue's prestige, and the probability of acceptance. The selection problem is exacerbated through the continuous emergence of additional venues. Previously proposed approaches for supporting authors in this process rely on complex recommender systems, e.g., based on Word2Vec or TextCNN. These, however, often elude an explanation for their recommendations. In this work, we propose an unsophisticated method that advances the state-of-the-art in two aspects: First, we enhance the interpretability of recommendations through non-negative matrix factorization based topic models; Second, we surprisingly can obtain competitive recommendation performance while using simpler learning methods. - Schäfermeier, B., Hanika, T., Stumme, G.: Distances for WiFi Based Topological Indoor Mapping 16th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (MobiQuitous), November 12--14, 2019, Houston, TX, USA (2019).For localization and mapping of indoor environments through WiFi signals, locations are often represented as likelihoods of the received signal strength indicator. In this work we compare various measures of distance between such likelihoods in combination with different methods for estimation and representation. In particular, we show that among the considered distance measures the Earth Mover's Distance seems the most beneficial for the localization task. Combined with kernel density estimation we were able to retain the topological structure of rooms in a real-world office scenario.
@inproceedings{schafermeier2019distances,

abstract = {For localization and mapping of indoor environments through WiFi signals, locations are often represented as likelihoods of the received signal strength indicator. In this work we compare various measures of distance between such likelihoods in combination with different methods for estimation and representation. In particular, we show that among the considered distance measures the Earth Mover's Distance seems the most beneficial for the localization task. Combined with kernel density estimation we were able to retain the topological structure of rooms in a real-world office scenario.},

author = {Schäfermeier, Bastian and Hanika, Tom and Stumme, Gerd},

booktitle = {16th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (MobiQuitous), November 12--14, 2019, Houston, TX, USA},

keywords = {wifi},

title = {Distances for WiFi Based Topological Indoor Mapping},

year = 2019

}%0 Conference Paper

%1 schafermeier2019distances

%A Schäfermeier, Bastian

%A Hanika, Tom

%A Stumme, Gerd

%B 16th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (MobiQuitous), November 12--14, 2019, Houston, TX, USA

%D 2019

%R 10.1145/3360774.3360780

%T Distances for WiFi Based Topological Indoor Mapping

%X For localization and mapping of indoor environments through WiFi signals, locations are often represented as likelihoods of the received signal strength indicator. In this work we compare various measures of distance between such likelihoods in combination with different methods for estimation and representation. In particular, we show that among the considered distance measures the Earth Mover's Distance seems the most beneficial for the localization task. Combined with kernel density estimation we were able to retain the topological structure of rooms in a real-world office scenario.

%@ 978-1-4503-7283-1/19/11