This page describes the contents of the directory for a case in the database and provides information regarding the format of the files associated with a case.
Information is provided for each of the following:
B-3024-1.ics B_3024_1.RIGHT_CC.OVERLAY B_3024_1.RIGHT_MLO.OVERLAY B_3024_1.LEFT_CC.LJPEG B_3024_1.LEFT_CC.OVERLAY B_3024_1.LEFT_MLO.LJPEG B_3024_1.LEFT_MLO.OVERLAY B_3024_1.RIGHT_CC.LJPEG B_3024_1.RIGHT_MLO.LJPEG TAPE_B_3024_1.COMB.16_PGM |
In the DDSM database, each case is stored in a separate directory. Figure 1 shows an example of the files in the directory for case3024. As you can see, there are ".ics", ".LJPEG", ".OVERLAY" and ".16_PGM" files. Each type of file is described in the documentation that follows. Please note that "normal" cases will not have any overlay files.
ics_version 1.0 filename B-3024-1 DATE_OF_STUDY 2 7 1995 PATIENT_AGE 42 FILM FILM_TYPE REGULAR DENSITY 4 DATE_DIGITIZED 7 22 1997 DIGITIZER LUMISYS SELECTED LEFT_CC LINES 4696 PIXELS_PER_LINE 3024 BITS_PER_PIXEL 12 RESOLUTION 50 OVERLAY LEFT_MLO LINES 4688 PIXELS_PER_LINE 3048 BITS_PER_PIXEL 12 RESOLUTION 50 OVERLAY RIGHT_CC LINES 4624 PIXELS_PER_LINE 3056 BITS_PER_PIXEL 12 RESOLUTION 50 OVERLAY RIGHT_MLO LINES 4664 PIXELS_PER_LINE 3120 BITS_PER_PIXEL 12 RESOLUTION 50 OVERLAY |
This ".ics" file provides information about a case as a whole. In ASCII format, it lists important information such as the date of the study, the patients age, the date of digitization of the films, the type of digitizer used and a list of the image files. The ".ics" file also gives an ACR breast tissue density rating of 1 to 4 as assessed by an expert radiologist.
The size of each image file, number of bits per pixel, the scanning resolution (in microns) and information on the existence or lack of an overlay file for each image is provided. As you can see in Figure 2, all four images have overlays and these files are listed in Figure 1. If the image description lines had "NON-OVERLAY" instead of "OVERLAY" then the images would not have overlay files.
The images have all been stored in a format using LOSSLESS JPEG compression. Even with the compression, each image file is very large because the films were scanned with a resolution between 42 and 100 microns. The source code for the program that we used to compress the images is available in the archives at Stanford University. An executable version for SunOS 5.5 is available on our ftp server ftp://figment.csee.usf.edu/pub/DDSM/software/bin/jpeg. This program is used to uncompress the images. Once uncompressed, each image file contains only raw pixel values. Because there is no "header information" in the file, the size of each image must be obtained from the ".ics" file.
TOTAL_ABNORMALITIES 1 ABNORMALITY 1 LESION_TYPE CALCIFICATION TYPE PLEOMORPHIC-FINE_LINEAR_BRANCHING DISTRIBUTION REGIONAL ASSESSMENT 5 SUBTLETY 4 PATHOLOGY MALIGNANT TOTAL_OUTLINES 4 BOUNDARY 8 1368 4 4 4 4 4 4 4 4 2 2 2 2 2 2 2 2 ... 0 0 0 0 0 0 0 0 0 1 # CORE 168 1824 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ... 1 0 1 1 0 1 1 0 1 1 # CORE 384 1848 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 ... 0 0 0 0 0 0 0 0 0 0 # CORE 368 2192 6 6 6 6 6 6 6 6 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 # |
Abnormal cases have between one and four overlay files depending on the number of images in which the radiologist marked any abnormalities. You can know which images have overlay files by looking in the ".ics" file. Each image that has "OVERLAY" at the end of the line (not "NON-OVERLAY) will have an overlay file.
Each overlay file may specify multiple abnormalities, so the first line of the file gives the total number of abnormalities. In the case of multiple abnormalities, each abnormality is then listed one after another.
Each abnormality has information on the lesion type, the assessment, the subtlety, the pathology and at least one outline. The keywords that describe the lesion type are taken from the ACR Bi-RADS lexicon. The assessment code is a value from 1 to 5, and also comes from the ACR Bi-RADS standard. The subtlety rating is not from the Bi-RADS standard. It is a value from 1 to 5, where 1 means "x1," 2 means "x2," 3 means "x3," 4 means "x4," and 5 means "x5." The lesion type, assessment, and subtlety are specified by an experienced radiologist. Similarly, the outlines for the suspicious regions are derived from markings made on the film by an experienced radiologist. In some cases there is more than one outline for the same abnormality. In these situations the "TOTAL_OUTLINES" number is more than one. Figure 3 shows an example of this. The first boundary will contain all of the other boundaries, and all boundaries after the first one will begin with the work "CORE".
Each boundary is specified as a chain code. This chain code is found on the line after the keyword "BOUNDARY" or "CORE" as discussed above. The first two values of each chain code are the starting column and row of the chain code that order. Following these two numbers, the chain code is given and a "#" character indicates the end of the chain code. The numbers correspond to the directions as follows:
Chain code value | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
X Coordinate | 0 | 1 | 1 | 1 | 0 | -1 | -1 | -1 |
Y coordinate | -1 | -1 | 0 | 1 | 1 | 1 | 0 | -1 |
-X-> | |||
| Y | V | 7 | 0 | 1 |
6 | X | 2 | |
5 | 4 | 3 |
The ".16_PGM" files are concatenated sub-sampled images. They are stored in 16-bit PGM (Portable Gray Map) format. The popular "xv" program can be used to display these images if the -hist option is specified. (i.e. xv -hist TAPE_B_3024_1.COMB.16_PGM). A small version of the result of displaying the image in that way is displayed in Figure 4. Since global histogram equalization does not provide very good results, the quality of the displayed image may be poor even though the file contains pretty good information. The purpose of these files are just to provide a "quick" look at the images.