The dataset can be uploaded in a mysql database. The CREATE statements for the corresponding tables (each file = one table) can be found in the file tables.sql.
The dataset consists of seven files:
These are tab-separated files which have the following columns:
Tag ASsignments: Fact table; who attached which tag to which resource/content
Files bookmark and bookmark_spam
Dimension table for bookmark data
Files bibtex and bibtex_spam
Dimension table for BibTeX data
Mapping of non-spammer / spammer for each user. This file can be used for spam classification.
Size of Files
Number of lines in files:
For the tag recommender competition, the tas table of the test dataset will not contain tags, as it is the task to predict these tags. The tas table of the test dataset contains for every post only one line having the tag null. No information about the actual number of tas will by given.
You can download a version of the training tas file converted to the descibed format here: tas_testing_recommender.gz.