The TechTC-100 Test Collection contains 100 labeled datasets whose categorization difficulty (as measured by baseline SVM accuracy) is uniformly distributed between 0.6 and 0.92. This test collection was used for the experiments in feature selection for text categorization described in (Gabrilovich and Markovitch, 2004).
The data acquisition procedure and the format of the data files for this collection are comprehensively described at the main TechTC page.
Section "Detailed statistics" below contains further information on individual datasets.
Evgeniy Gabrilovich and Shaul Markovitch
"Text Categorization with Many Redundant Features: Using Aggressive Feature Selection to Make SVMs Competitive with C4.5"
The 21st International Conference on Machine Learning (ICML), pp. 321-328, Banff, Alberta, Canada, July 2004
[Abstract / PDF]
Please also inform your readers of the current location of the data: http://techtc.cs.technion.ac.il/techtc100/techtc100.html
No. | Category id1 |
Category id2 |
Number of documents |
SVM (100%) |
C4.5 (100%) |
KNN (100%) |
SVM (0.5%) |
C4.5 (0.5%) |
KNN (2%) |
---|---|---|---|---|---|---|---|---|---|
1 | 1622 | 42350 | 163 | 0.8625 | 0.7725 | 0.838 | 0.74375 | 0.750 | 0.729 |
2 | 6920 | 8366 | 140 | 0.919 | 0.897 | 0.927 | 0.897 | 0.897 | 0.907 |
3 | 8308 | 8366 | 144 | 0.8285 | 0.743 | 0.793 | 0.8285 | 0.807 | 0.862 |
4 | 10341 | 10755 | 145 | 0.77075 | 0.811 | 0.708 | 0.8195 | 0.792 | 0.794 |
5 | 10341 | 14271 | 145 | 0.6805 | 0.78775 | 0.618 | 0.81275 | 0.820 | 0.796 |
6 | 10341 | 14525 | 158 | 0.58975 | 0.5065 | 0.590 | 0.564 | 0.615 | 0.514 |
7 | 10341 | 61792 | 153 | 0.7565 | 0.80275 | 0.717 | 0.79575 | 0.822 | 0.810 |
8 | 10341 | 186330 | 147 | 0.70125 | 0.8 | 0.729 | 0.8475 | 0.854 | 0.826 |
9 | 10341 | 194927 | 159 | 0.73725 | 0.7695 | 0.737 | 0.80125 | 0.769 | 0.821 |
10 | 10350 | 10539 | 157 | 0.69875 | 0.75 | 0.673 | 0.7885 | 0.808 | 0.760 |
11 | 10350 | 13928 | 148 | 0.75 | 0.66575 | 0.681 | 0.83325 | 0.792 | 0.815 |
12 | 10350 | 194915 | 154 | 0.7305 | 0.78025 | 0.678 | 0.85525 | 0.849 | 0.875 |
13 | 10385 | 14525 | 156 | 0.829 | 0.65725 | 0.798 | 0.8945 | 0.836 | 0.852 |
14 | 10385 | 25326 | 153 | 0.87175 | 0.8515 | 0.831 | 0.87175 | 0.885 | 0.865 |
15 | 10385 | 269078 | 153 | 0.80425 | 0.892 | 0.734 | 0.946 | 0.919 | 0.848 |
16 | 10385 | 299104 | 147 | 0.799 | 0.54525 | 0.780 | 0.736 | 0.722 | 0.753 |
17 | 10385 | 312035 | 145 | 0.65 | 0.53625 | 0.655 | 0.48575 | 0.519 | 0.543 |
18 | 10539 | 10567 | 152 | 0.65775 | 0.855 | 0.638 | 0.8685 | 0.921 | 0.723 |
19 | 10539 | 11346 | 155 | 0.84225 | 0.875 | 0.763 | 0.9475 | 0.921 | 0.827 |
20 | 10539 | 20673 | 152 | 0.67775 | 0.81575 | 0.664 | 0.86825 | 0.895 | 0.855 |
21 | 10539 | 61792 | 161 | 0.65625 | 0.78125 | 0.569 | 0.80625 | 0.775 | 0.773 |
22 | 10539 | 85489 | 155 | 0.80925 | 0.671 | 0.789 | 0.783 | 0.790 | 0.770 |
23 | 10539 | 186330 | 155 | 0.65775 | 0.645 | 0.651 | 0.763 | 0.763 | 0.792 |
24 | 10539 | 194915 | 165 | 0.7375 | 0.8535 | 0.658 | 0.89625 | 0.897 | 0.803 |
25 | 10539 | 300332 | 164 | 0.76825 | 0.85225 | 0.725 | 0.9085 | 0.896 | 0.860 |
26 | 10567 | 11346 | 139 | 0.794 | 0.9045 | 0.713 | 0.9635 | 0.964 | 0.941 |
27 | 10567 | 12121 | 138 | 0.7795 | 0.934 | 0.747 | 0.9635 | 0.956 | 0.912 |
28 | 10567 | 46076 | 142 | 0.65 | 0.9 | 0.593 | 0.94275 | 0.936 | 0.879 |
29 | 11346 | 17360 | 140 | 0.8235 | 0.93375 | 0.765 | 0.956 | 0.934 | 0.919 |
30 | 11346 | 22294 | 125 | 0.85025 | 0.8925 | 0.825 | 0.93325 | 0.908 | 0.933 |
31 | 11498 | 14517 | 125 | 0.9165 | 0.89175 | 0.892 | 0.95 | 0.909 | 0.917 |
32 | 13928 | 18479 | 151 | 0.75025 | 0.8245 | 0.737 | 0.811 | 0.798 | 0.845 |
33 | 13928 | 71892 | 154 | 0.892 | 0.8245 | 0.838 | 0.9055 | 0.892 | 0.837 |
34 | 13928 | 186330 | 146 | 0.71425 | 0.743 | 0.736 | 0.80725 | 0.786 | 0.825 |
35 | 13928 | 300332 | 155 | 0.83525 | 0.7735 | 0.809 | 0.90775 | 0.855 | 0.888 |
36 | 13928 | 312035 | 146 | 0.7215 | 0.74275 | 0.679 | 0.7645 | 0.772 | 0.766 |
37 | 14271 | 20186 | 145 | 0.67375 | 0.87675 | 0.521 | 0.90275 | 0.951 | 0.807 |
38 | 14271 | 46076 | 143 | 0.6785 | 0.735 | 0.693 | 0.78575 | 0.786 | 0.806 |
39 | 14271 | 194927 | 152 | 0.69625 | 0.79075 | 0.722 | 0.84475 | 0.831 | 0.847 |
40 | 14271 | 312035 | 140 | 0.7575 | 0.79425 | 0.735 | 0.79425 | 0.787 | 0.835 |
41 | 14517 | 20673 | 127 | 0.89525 | 0.89475 | 0.790 | 0.93525 | 0.944 | 0.903 |
42 | 14517 | 186330 | 130 | 0.86275 | 0.8505 | 0.825 | 0.911 | 0.936 | 0.836 |
43 | 14525 | 61792 | 159 | 0.73725 | 0.8015 | 0.731 | 0.827 | 0.808 | 0.750 |
44 | 14525 | 194927 | 165 | 0.66875 | 0.75 | 0.688 | 0.80625 | 0.819 | 0.870 |
45 | 14630 | 18479 | 157 | 0.8075 | 0.86025 | 0.795 | 0.90375 | 0.859 | 0.891 |
46 | 14630 | 20186 | 157 | 0.59625 | 0.94875 | 0.468 | 0.92925 | 0.949 | 0.833 |
47 | 14630 | 94142 | 154 | 0.77625 | 0.7985 | 0.829 | 0.90125 | 0.887 | 0.882 |
48 | 14630 | 300332 | 161 | 0.90625 | 0.88125 | 0.844 | 0.9 | 0.888 | 0.872 |
49 | 14630 | 312035 | 152 | 0.72325 | 0.8515 | 0.717 | 0.89875 | 0.885 | 0.773 |
50 | 14630 | 814096 | 163 | 0.8625 | 0.875 | 0.838 | 0.9375 | 0.925 | 0.906 |
51 | 17360 | 20186 | 145 | 0.597 | 0.875 | 0.563 | 0.90275 | 0.931 | 0.852 |
52 | 17360 | 46875 | 150 | 0.6895 | 0.73675 | 0.656 | 0.88525 | 0.858 | 0.872 |
53 | 18479 | 20186 | 152 | 0.64475 | 0.93425 | 0.533 | 0.921 | 0.928 | 0.875 |
54 | 18479 | 20673 | 144 | 0.75 | 0.87475 | 0.660 | 0.8475 | 0.847 | 0.845 |
55 | 18479 | 46076 | 150 | 0.6625 | 0.757 | 0.710 | 0.8245 | 0.791 | 0.799 |
56 | 18479 | 186330 | 147 | 0.729 | 0.72475 | 0.709 | 0.80575 | 0.785 | 0.766 |
57 | 20186 | 22294 | 130 | 0.78125 | 0.89075 | 0.704 | 0.93 | 0.907 | 0.880 |
58 | 20186 | 61792 | 153 | 0.592 | 0.89175 | 0.582 | 0.84225 | 0.928 | 0.731 |
59 | 20673 | 46076 | 142 | 0.70725 | 0.793 | 0.729 | 0.88575 | 0.907 | 0.872 |
60 | 20673 | 269078 | 147 | 0.71525 | 0.8815 | 0.625 | 0.9375 | 0.938 | 0.903 |
61 | 20673 | 312035 | 139 | 0.6395 | 0.78675 | 0.618 | 0.8385 | 0.838 | 0.812 |
62 | 22294 | 25575 | 127 | 0.9275 | 0.88275 | 0.911 | 0.95975 | 0.936 | 0.870 |
63 | 22294 | 46076 | 128 | 0.8065 | 0.92075 | 0.742 | 0.9435 | 0.952 | 0.883 |
64 | 25575 | 47456 | 143 | 0.88575 | 0.8785 | 0.879 | 0.9 | 0.890 | 0.876 |
65 | 25575 | 275169 | 151 | 0.8785 | 0.88525 | 0.811 | 0.91225 | 0.892 | 0.882 |
66 | 25936 | 94142 | 144 | 0.86425 | 0.7715 | 0.843 | 0.8645 | 0.836 | 0.820 |
67 | 46076 | 61792 | 151 | 0.676 | 0.80425 | 0.642 | 0.7975 | 0.798 | 0.807 |
68 | 46875 | 61792 | 158 | 0.72425 | 0.79475 | 0.680 | 0.88475 | 0.859 | 0.802 |
69 | 47418 | 814096 | 155 | 0.77625 | 0.86825 | 0.698 | 0.90125 | 0.911 | 0.881 |
70 | 47456 | 497201 | 131 | 0.8515 | 0.82025 | 0.836 | 0.9065 | 0.888 | 0.906 |
71 | 58108 | 85489 | 147 | 0.8405 | 0.764 | 0.854 | 0.8405 | 0.778 | 0.809 |
72 | 61792 | 814096 | 159 | 0.91025 | 0.87175 | 0.853 | 0.94875 | 0.923 | 0.873 |
73 | 69753 | 85489 | 156 | 0.875 | 0.86175 | 0.908 | 0.9275 | 0.908 | 0.890 |
74 | 85489 | 90753 | 154 | 0.865 | 0.80425 | 0.903 | 0.838 | 0.831 | 0.845 |
75 | 186330 | 46076 | 145 | 0.65 | 0.7155 | 0.672 | 0.8215 | 0.764 | 0.832 |
76 | 186330 | 94142 | 144 | 0.89975 | 0.74275 | 0.886 | 0.84275 | 0.814 | 0.850 |
77 | 186330 | 195558 | 139 | 0.93375 | 0.91175 | 0.956 | 0.93375 | 0.883 | 0.902 |
78 | 186330 | 300332 | 151 | 0.7975 | 0.798 | 0.804 | 0.8515 | 0.798 | 0.858 |
79 | 186330 | 314499 | 146 | 0.88575 | 0.74525 | 0.893 | 0.87125 | 0.829 | 0.857 |
80 | 194915 | 20186 | 157 | 0.40375 | 0.561 | 0.547 | 0.3975 | 0.603 | 0.442 |
81 | 194915 | 67777 | 153 | 0.90775 | 0.9145 | 0.888 | 0.97375 | 0.954 | 0.934 |
82 | 194915 | 194927 | 164 | 0.63125 | 0.70625 | 0.613 | 0.7625 | 0.706 | 0.639 |
83 | 194915 | 324745 | 152 | 0.68275 | 0.84475 | 0.548 | 0.84475 | 0.852 | 0.875 |
84 | 194927 | 20186 | 159 | 0.67325 | 0.8845 | 0.673 | 0.83325 | 0.904 | 0.759 |
85 | 194927 | 46875 | 164 | 0.725 | 0.78125 | 0.656 | 0.8375 | 0.825 | 0.833 |
86 | 194927 | 61792 | 160 | 0.7245 | 0.7435 | 0.744 | 0.73075 | 0.782 | 0.696 |
87 | 194927 | 299104 | 156 | 0.83525 | 0.4815 | 0.717 | 0.7895 | 0.762 | 0.778 |
88 | 194927 | 312035 | 154 | 0.79075 | 0.7975 | 0.798 | 0.80425 | 0.798 | 0.800 |
89 | 269078 | 46076 | 153 | 0.8245 | 0.9055 | 0.771 | 0.95275 | 0.939 | 0.871 |
90 | 269078 | 324745 | 150 | 0.8055 | 0.872 | 0.715 | 0.90975 | 0.896 | 0.903 |
91 | 299104 | 46076 | 147 | 0.729 | 0.61825 | 0.743 | 0.785 | 0.750 | 0.796 |
92 | 299104 | 58108 | 149 | 0.91225 | 0.84475 | 0.865 | 0.865 | 0.825 | 0.888 |
93 | 299104 | 312035 | 144 | 0.69275 | 0.53875 | 0.650 | 0.7215 | 0.614 | 0.764 |
94 | 300332 | 85489 | 151 | 0.8245 | 0.57125 | 0.865 | 0.7975 | 0.628 | 0.783 |
95 | 316970 | 85489 | 145 | 0.85025 | 0.79275 | 0.822 | 0.83575 | 0.843 | 0.839 |
96 | 324745 | 61792 | 148 | 0.7015 | 0.85425 | 0.646 | 0.90975 | 0.910 | 0.784 |
97 | 324745 | 85489 | 142 | 0.8235 | 0.875 | 0.846 | 0.89725 | 0.912 | 0.831 |
98 | 332386 | 61792 | 159 | 0.846 | 0.7235 | 0.842 | 0.782 | 0.737 | 0.863 |
99 | 332386 | 85489 | 153 | 0.838 | 0.76275 | 0.854 | 0.7975 | 0.757 | 0.811 |
100 | 364836 | 71892 | 142 | 0.8825 | 0.87925 | 0.875 | 0.912 | 0.902 | 0.912 |
Evgeniy Gabrilovich
gabr@cs.technion.ac.il
Last updated on August 24, 2011