Viszards Session at Sunbelt 2009
This year's Viszards Sesson takes place in the new area of social bookmarking, and is about visualising the content of the publication sharing system BibSonomy, which is hosted by the Knwoledge and Data Engineering Group of the University of Kassel.
To get started with the tasks we suggest that you make yourself familiar with BibSonomy. A more formal description of the underlying structure -- called folksonomy -- is given in this paper (pdf here) where also a description of the BibSonomy components is provided. Your next step is to subscribe to the mailing list viszards09. We will use the list to distribute news about the data and other relevant information. Furthermore, the list can be used to clarify questions about the dataset and the different tasks. As the welcome message on the list contains information about how to access the dataset, subscribing to this list is essential to participate in the viszards session.
To access the dataset please subscribe to the viszards09 mailing list. The welcome message will contain all information to access the dataset.
The dataset has been created using the mysqldump command of a MySQL database. The CREATE statements for the corresponding tables (each file = one table) can be found in the file tables.sql, together with the LOAD DATA statements which insert the data into the database. For the latter to work you must adapt the paths to the datafiles at the end of tables.sql.
The dataset consists of seven files:
These are tab-separated files, where each line represents a row and the fields of each row are delimited by a tabulator. Please note that the fields themselves can contain line breaks which are quoted by MySQL. The best way to load the data into a MySQL database is by using the LOAD DATA statement.
The fields of each row correspond to the following columns:
Tag Assignments: Fact table; who attached which tag to which resource/content
Files bookmark and bookmark_spam
Dimension table for bookmark data
Files bibtex and bibtex_spam
Dimension table for BibTeX data
Mapping of non-spammer / spammer for each user. This file can be used for spam classification.
Size of Files
Number of lines in files:
To contact us please send a mail to firstname.lastname@example.org.Tagora - Semiotic Dynamics in Online Social Communities.