Preface - This is an experimental page designed to ease the steps needed to evaluate the performance of a clustered microcalcification detection algorithm. We are trying to determine if sample data sets, combined with some evaluation tools are valuable in promoting the comparison of algorithms. Here we have extracted a set of cases from the database that all have at lease one, malignant lesion containing clustered microcalcifications per case. The cases listed in the training side of the table can be used to optimize an algorithm and the cases on the testing side of the table can be used to measure the performance of an algorithm.
The Digital Database for Screening Mammography here at the University of South Florida is made up of 2620 cases of data, each of which contains four mammograms. The cases were collected from four mammography centers and were scanned on one of four digitizers. Some cases in the database represent normal screening exams in which nothing unusual was found. Others contain cancers and benign lesions. Each non-normal case was examined by one of three radiologists who provided pixel level ground truth for each abnormality.
The central goal in the development of this database was to provide a common dataset of mammograms in a digital format with associated ground truth that could be used to aid in quantitative evaluation of computer-aided-detection algorithms for detecting breast cancer.
The Data SetsSampling a set of cases from the DDSM database to use for evaluating a microcalcification cluster detection algorithm required making some choices. While researchers like to divide the problem of cancer detection into pieces (i.e. spiculated mass detection, detection of clustered microcilcifications etc.) mammography screening exams are not easily divided along these lines. Lesions containing clustered microcalcifications can appear in a mammogram with other mammogrphic abnormalities. Therefore these cases may contain other abnormalities in addition to malignant clustered microcalcifications.
We decided to select a set of cases from the DDSM that had at least one, malignant, lesion with clustered microcalcifications in it. We selected a set of cases that were acquired from two institutions. The cases with case numbers starting with a 4 all came from one institution and were all scanned on a HOWTEK MultiRAD 850 scanner. The other cases came from a different institution and were all scanned on a HOWTEK 960 scanner. A different radiologist marked the ground truth at each institution.
The resulting set of cases were split into a training set and a test set using while attemping to balance the lesion subtlety and ACR breast density in the two datasets. The resulting list of cases, with links to the data on our FTP site can be seen in the table below.
TRAINING (50 cases) | ||
---|---|---|
Use this data for training and testing your algorithm as much as you want. | ||
cancer_06 case1108 |
cancer_06 case1113 |
cancer_07 case1116 |
cancer_06 case1131 |
cancer_06 case1133 |
cancer_06 case1141 |
cancer_06 case1148 |
cancer_06 case1152 |
cancer_06 case1153 |
cancer_06 case1167 |
cancer_06 case1185 |
cancer_06 case1188 |
cancer_06 case1201 |
cancer_06 case1212 |
cancer_07 case1213 |
cancer_07 case1214 |
cancer_07 case1219 |
cancer_07 case1220 |
cancer_07 case1223 |
cancer_07 case1238 |
cancer_07 case1245 |
cancer_07 case1248 |
cancer_11 case1252 |
cancer_07 case1257 |
cancer_08 case1283 |
cancer_08 case1415 |
cancer_08 case1470 |
cancer_08 case1500 |
cancer_08 case1508 |
cancer_08 case1528 |
cancer_08 case1535 |
cancer_10 case1570 |
cancer_10 case1585 |
cancer_10 case1588 |
cancer_10 case1595 |
cancer_10 case1626 |
cancer_11 case1632 |
cancer_11 case1635 |
cancer_11 case1637 |
cancer_11 case1675 |
cancer_11 case1693 |
cancer_11 case1697 |
cancer_12 case4103 |
cancer_12 case4110 |
cancer_12 case4115 |
cancer_12 case4116 |
cancer_12 case4134 |
cancer_12 case4142 |
cancer_13 case4161 |
cancer_12 case4183 |
TESTING (50 cases) | ||
---|---|---|
Once your algorithm has been fixed and its parameters have have been set, test your algorithm with this data. | ||
cancer_07 case1235 |
cancer_11 case1236 |
cancer_10 case1250 |
cancer_07 case1256 |
cancer_07 case1258 |
cancer_07 case1261 |
cancer_08 case1489 |
cancer_08 case1517 |
cancer_14 case1520 |
cancer_11 case1531 |
cancer_10 case1590 |
cancer_10 case1591 |
cancer_10 case1596 |
cancer_11 case1614 |
cancer_10 case1621 |
cancer_10 case1629 |
cancer_11 case1636 |
cancer_10 case1644 |
cancer_10 case1654 |
cancer_11 case1663 |
cancer_11 case1721 |
cancer_11 case1728 |
cancer_11 case1729 |
cancer_11 case1731 |
cancer_11 case1766 |
cancer_11 case1780 |
cancer_11 case1816 |
cancer_11 case1819 |
cancer_14 case1852 |
cancer_14 case1872 |
cancer_14 case1874 |
cancer_14 case1875 |
cancer_14 case1894 |
cancer_14 case1897 |
cancer_14 case1900 |
cancer_14 case1903 |
cancer_14 case1905 |
cancer_14 case1907 |
cancer_14 case1928 |
cancer_14 case1929 |
cancer_14 case1930 |
cancer_14 case1983 |
cancer_12 case4105 |
cancer_12 case4113 |
cancer_12 case4117 |
cancer_12 case4127 |
cancer_12 case4147 |
cancer_12 case4176 |
cancer_13 case4179 |
cancer_13 case4182 |
Each case contains four mammograms from a screening exam. The images were scanned on either a HOWTEK 960 or a HOWTEK MultiRAD 850 digitizer with a sample rate of 43.5 microns at 12 bits per pixel. The images were preprocessed to crop out much of the image that did not contain imaged breast tissue and to darken regions of the image that contained patient information or technician identifiers by setting pixels in those regions to the value zero. Each image was then compressed using a truely lossless compression algorithm. Some tools are available for decomressing the images, resampling them, mapping them to optical density and for creating masks of the ground truth regions. Click here for more information on this software.
Performance EvaluationTo evaluate an CAD algorithm using these cases of data, one can examine the training cases and use them to optimize parameters for their algorithm. During this process, the test data should not be examined or used in any way. It must ramain untouched until the algorithm is ready for testing. That means the algorithm and any required parameters must be fixed. This is very important and can not be emphasized enough! The performance can then be illustrated with a Free Receiver Operating Characteristic (FROC) plot.
An FROC plot shows the fraction of cancers that were detected and how that fraction relates to the average number of false positive detections per image. This illustrates a range of possible operating points for the algorithm. An ideal algorithm would have a true positive fraction of 1.0 at 0.0 false positives per image. Obtaining that performance in practice is not generally considered a realistic goal.
Ordering the DataYou are welcome to download the training and testing cases free of charge, but you should be warned that there is nearly 4.7 GB of data in the training set and nearly 4.6 GB of data in the test dataset. If you would like to order the data on two 8mm data cartridges, you can do so using the following order form.