Preface - This is an experimental page designed to ease the steps needed to evaluate the performance of a mass detection algorithm. We are trying to determine if sample data sets, combined with some evaluation tools are valuable in promoting the comparison of algorithms. Here we have extracted a set of cases from the database that all have at lease one, malignant, spiculated mass per case. The cases listed in the training side of the table can be used to optimize an algorithm and the cases on the testing side of the table can be used to measure the performance of an algorithm.
The Digital Database for Screening Mammography here at the University of South Florida is made up of 2620 cases of data, each of which contains four mammograms. The cases were collected from four mammography centers and were scanned on one of four digitizers. Some cases in the database represent normal screening exams in which nothing unusual was found. Others contain cancers and benign lesions. Each non-normal case was examined by one of three radiologists who provided pixel level ground truth for each abnormality.
The central goal in the development of this database was to provide a common dataset of mammograms in a digital format with associated ground truth that could be used to aid in quantitative evaluation of computer-aided-detection algorithms for detecting breast cancer.
The Data SetsSampling a set of cases from the DDSM database to use for evaluating a spiculated mass detection algorithm required making some choices. While researchers like to divide the problem of cancer detection into pieces (i.e. spiculated mass detection, detection of clustered microcilcifications etc.) mammography screening exams are not easily divided along these lines. Masses may and often do have calcifications in them, masses with spiculated margins in one mammogram may not have exhibit spiculated margins in a mammogram taken with a different projection and in general, spiculated masses may appear in a mammogram with any other type of mammographic abnormality.
We decided to select a set of cases from the DDSM that had at least one, malignant, spiculated mass in it. For simplicity, we selected a set of cases that were all scanned on the same scanner and that all had the ground truth marked by the same radiologist.
The resulting set of cases were split into a training set and a test set using while attemping to balance the lesion subtlety and ACR breast density in the two datasets. The resulting list of cases, with links to the data on our FTP site can be seen in the table below.
TRAINING (39 cases) | ||
---|---|---|
Use this data for training and testing your algorithm as much as you want. | ||
cancer_07 case1118 |
cancer_07 case1134 |
cancer_06 case1156 |
cancer_07 case1159 |
cancer_07 case1160 |
cancer_06 case1163 |
cancer_07 case1166 |
cancer_06 case1174 |
cancer_06 case1203 |
cancer_06 case1212 |
cancer_07 case1217 |
cancer_11 case1222 |
cancer_07 case1224 |
cancer_08 case1229 |
cancer_11 case1236 |
cancer_11 case1252 |
cancer_07 case1262 |
cancer_08 case1403 |
cancer_08 case1417 |
cancer_08 case1467 |
cancer_08 case1486 |
cancer_14 case1520 |
cancer_08 case1557 |
cancer_10 case1587 |
cancer_10 case1589 |
cancer_10 case1592 |
cancer_10 case1620 |
cancer_10 case1622 |
cancer_10 case1642 |
cancer_11 case1671 |
cancer_11 case1693 |
cancer_10 case1700 |
cancer_11 case1701 |
cancer_11 case1720 |
cancer_11 case1726 |
cancer_11 case1790 |
cancer_14 case1896 |
cancer_14 case1899 |
cancer_14 case1908 |
|
TESTING (40 case) | ||
---|---|---|
Once your algorithm has been fixed and its parameters have have been set, test your algorithm with this data. | ||
cancer_06 case1112 |
cancer_07 case1114 |
cancer_06 case1122 |
cancer_07 case1127 |
cancer_06 case1140 |
cancer_07 case1147 |
cancer_07 case1149 |
cancer_06 case1155 |
cancer_06 case1168 |
cancer_06 case1169 |
cancer_06 case1171 |
cancer_07 case1207 |
cancer_06 case1211 |
cancer_07 case1228 |
cancer_07 case1233 |
cancer_07 case1234 |
cancer_07 case1237 |
cancer_07 case1247 |
cancer_07 case1258 |
cancer_08 case1401 |
cancer_08 case1416 |
cancer_08 case1468 |
cancer_08 case1485 |
cancer_08 case1504 |
cancer_08 case1510 |
cancer_10 case1573 |
cancer_10 case1577 |
cancer_10 case1618 |
cancer_10 case1628 |
cancer_11 case1658 |
cancer_10 case1669 |
cancer_11 case1673 |
cancer_11 case1674 |
cancer_11 case1804 |
cancer_11 case1821 |
cancer_11 case1827 |
cancer_14 case1892 |
cancer_14 case1906 |
cancer_14 case1985 |
cancer_14 case1999 |
Each case contains four mammograms from a screening exam. The images were scanned on a HOWTEK 960 digitizer with a sample rate of 43.5 microns at 12 bits per pixel. The images were preprocessed to crop out much of the image that did not contain imaged breast tissue and to darken regions of the image that contained patient information or technician identifiers by setting pixels in those regions to the value zero. Each image was then compressed using a truely lossless compression algorithm. Some tools are available for decomressing the images, resampling them, mapping them to optical density and for creating masks of the ground truth regions. Click here for more information on this software.
Performance EvaluationTo evaluate an CAD algorithm using these cases of data, one can examine the training cases and use them to optimize parameters for their algorithm. During this process, the test data should not be examined or used in any way. It must ramain untouched until the algorithm is ready for testing. That means the algorithm and any required parameters must be fixed. This is very important and can not be emphasized enough! The performance can then be illustrated with a Free Receiver Operating Characteristic (FROC) plot.
An FROC plot shows the fraction of cancers that were detected and how that fraction relates to the average number of false positive detections per image. This illustrates a range of possible operating points for the algorithm. An ideal algorithm would have a true positive fraction of 1.0 at 0.0 false positives per image. Obtaining that performance in practice is not generally considered a realistic goal.
Below is an example FROC curve obtained with these data sets using an algorithm developed by Michael Heath here at the University of South Florida. If you would like, you can download some software and run it on your own computer system to duplicate the results illustrated in this plot. The source code, application programs and performance evaluation tools that are icluded in this software can be used to your benefit. They greatly simplify extracting image data from DDSM cases for your programs and automate the performance assessment. Click here for more information on this software.
Ordering the DataYou are welcome to download the training and testing cases free of charge, but you should be warned that there are nearly 4.5 GB of data in each of the two datasets. If you would like to order the data on two 8mm data cartridges, you can do so using the following order form.