Reading/Writing Your Own TIFF and BMP Files

I first started writing code to produce TIFF files fifteen or twenty years ago. What I wanted to do was create a lens resolution test chart. So it could print at convenient sizes like 8"x10" or 8½"x11", I made it 7" x 9½". At 300 DPI this is a size of 2100 x 2850 pixels. Rather than have an image array of this size, because memory was still relatively expensive back then, I made the program generate one line of the image at a time, which was then repeated to give the test pattern, or the spacing between the various elements, requiring only a line buffer of 2100 bytes.

Eventually I wanted to both read in and write out existing TIFF scans of my slides and negatives, so I've developed quite a lot of experience figuring out how to do this. The TIFF v.6 PDF file is pretty good as such documents go, so I recommend starting at the beginning and reading down through it section by section as it introduces how to do increasingly more sophisticated image types, up to greyscale (8-bit, "B&W") pictures. At the end of each section it shows examples of the dozen or fifteen TIFF tags required as a minimum to produce each kind of file (FAX, palette, greyscale, etc.).

Reading a TIFF file is much more complicated than writing one because a program needs to be able to recognize and handle all kinds of possible TIFF tags, not just the minimum required, whereas writing out a legit file needs only the minimum, and maybe a couple of common but optional ones. So I'll concentrate on the writing of TIFF image files first.

The program I've been using is FreeBASIC. To use the code I present below you should first go to their website and download and then install their program. You want the "IDE" version, this standing for Integrated Development Environment. All this means is when you run the program it will open up its own text editor. Once you type in and/or read in some program code you can both compile it and then run the stand-alone .exe programs it creates from within the FreeBASIC IDE program. You can also easily access the (immense) Help files. The editor recognizes FreeBASIC keywords and will show them in a distinct color and/or highlight them, depending on the Preferences settings, which are also accessible under the View menu.

The program both reads and writes your code in plain text files just like Notepad, even though they have a .bas extension rather than a .txt one. After starting the program I recommend first hitting the F4 key, which will open up a "results" window below the editor; this is where any errors will be shown when you compile your code. (This window can also be opened up under the "View" menu at top left.) The other thing you'll want to do when compiling code is notice that in the very lower left corner of the program's window border it will say "Compiling..." and then "Compilation Complete" when it's done. If you didn't see this you might not know anything was happening during compilation.

But before getting to TIFF files let me switch to how to do greyscale BMP files, because they're actually much easier. Below you'll find the code for a subroutine which writes out a header for a BMP file. As well I then present the complete program code for an example showing its use to then write out the image data.

The subroutine takes just four numbers: the file number for the file being written out (which would have been assigned when you opened the file for output), the image width, the image height, and a number of padding bytes to add at the end of each row of the image.

This last number requires a little explanation as it's one of the idiosyncrasies of the BMP format. Each row written out has to have a number of bytes or pixels which is evenly divisible by four. So if your image width is 200 pixels, the number of padding bytes needed is zero. If your width was 201, you'd then need 3 padding bytes. (Look into the MOD function in the Help section for how you could do this without having to do it by hand.)

Once the header is written out one then can write out the image data to the file. This brings one to the next idiosyncrasy of the BMP format, namely that the image data has to be written from bottom to top. Every other format I know of goes from top to bottom. You'll see in the example program how an image array index counter which counts down from the image height to zero accomplishes this.

The BMP_Header subroutine is presented as a single block of text below. You'll want to highlight it, and then copy and paste it into an empty Notepad window. Save it as BMP_Header.bas so that you can later copy and paste into any program where you want it. (One could also use the #INCLUDE pre-processor directive, but that's a more advanced topic.)

The first thing to notice about this code is that everything following an apostrophe on a line is a comment, which is ignored by the compiler.

Second, FreeBASIC is what's called a strongly typed language, which means every variable has to be defined, or dimensioned (by the DIM command), before it can be used. These definitions are almost always found at the very beginning of the code, before anything is actually done, with each variable of the type being specified separated by a comma.

There are three principle types of variables: integer, floating point (decimal numbers), and strings. The latter are strings of characters, like text, and are found within a pair of double quotes. Decimal numbers are of two sub-types: SINGLE and DOUBLE, depending on their precision; SINGLEs will show six digits after the decimal point and are sufficient for most, but not all, calculations. Integer variables are the most varied in sub-type, as they can be of three different lengths and, thus, of capacity: byte, word (2 bytes), or long (4 bytes). In addition, they can be either signed (the default) or unsigned. FreeBASIC has several shorthand ways of referring to these; for example, a USHORT variable is an unsigned short (2-byte or word) integer, which can range from zero up to 65,535 (= 2^16 - 1). These are used a lot when simply counting things.

Finally, because of the large variety of different variable types there are a number of built-in functions which convert between them. For example, CSNG() converts whatever is inside the parentheses (like an integer of some type) into a floating-point SINGLE, while CINT() converts into an integer, rounding the decimal part up or down as appropriate. The CAST(,) operator is the most general conversion function, if there's not a shorthand version. See the Help documentation. FreeBASIC is sophisticated enough that when different variable types are mixed in a calculation, like multiplying a decimal number by an integer, the complier will automatically "promote" all the variables to the "highest" type (here a decimal number), but I like to make it explicit by using the conversion functions. Combining different variable types in a calculation can be a hidden source of problems, so if code is doing seemingly mysterious things this is something to look at.


    SUB BMP_Header(ByVal filenum AS INTEGER, ByVal ImgW AS USHORT,_
                ByVal ImgH AS USHORT, ByVal Pad AS UBYTE)
'
'  Writes out to an already opened file a BitMap Header for an ImgW x ImgH
'  8-bit paletted greyscale image. Pad is the number of padding bytes (0-3)
'  that will need to be written at the end of each line to fit the word
'  alignment convention.
'
    DIM AS USHORT I
'
'  14 byte Bitmap file header (BITMAPFILEHEADER):
    PUT #filenum,, "B"
    PUT #filenum,, "M"
    PUT #filenum,, MKL(CAST(ULONG, ImgH*(ImgW+Pad) + 794)) ' total file size
    PUT #filenum,, MKL(0) ' 4 bytes reserved
    PUT #filenum,, MKL(794) ' offset to start of bitmap image data
'                               = total length of header
'  12 byte DIB header / bitmap information header (BITMAPCOREHEADER):
    PUT #filenum,, MKL(12)
    PUT #filenum,, MKSHORT(ImgW)
    PUT #filenum,, MKSHORT(ImgH)
    PUT #filenum,, MKSHORT(1)
    PUT #filenum,, MKSHORT(8)
'
'  768 byte color table (RGB24 format), 3 bytes per entry:
    FOR I=0 TO 255
        PUT #filenum,, CHR(I,I,I)
    NEXT I
'
EXIT SUB
END SUB

Well, the nice thing about subroutines is that once they're written you don't have to worry about their internal details any more. You just use them. But in case someone's interested, they'll notice the use of the PUT # command, which outputs data to the file number or variable after the hashtag (#). In this instance, two additional built-in, special conversion functions are used: MKL (MAKE LONG) and MKSHORT (MAKE SHORT). Both these take a number or variable of the corresponding type and convert it into its binary representation as a string of bits, which is what's required here by the format specifications.

And, even though I've referred to the BMP file being created here as a greyscale image, technically it's a paletted, or color tabled (as it's called here) file. This is what's written out last in the header subroutine, and consists of 256 triplets of RGB values (three times one byte each) that the pixel values correspond to. One could get creative here and make various pixel values correspond to all sorts of colors, but the FOR ... NEXT loop construct in this case simply writes out triplets of R=G=B=pixel value greys from 0 to 255, producing the equivalent of a greyscale color table. If you were to fool around with this you'd want to be sure you wrote out 3x256=768 bytes. The CHR (character) function is needed to make the numbers output to the file into the equivalent of characters, again because the format requires this. Once the NEXT I statement causes I to exceed 255, the subroutine hits the exit statement and program execution returns to the main program statement following the call of the subroutine.

Without further ado, here's the program code showing how to use the subroutine, generate, and output a BMP graphic:

'
'   BMP-DEMO.bas - Demonstrates the use of the BMP header subroutine
'                   BMP_Header by making a simple BMP graphic with
'                   the size W x H, both of which have to be evenly
'                   divisible by four, and in the ratio W/H = 3/2.
'
'                   The factor in line #45 (1.80277) controls the
'                   image contrast, here at a maximum; it can only
'                   be lowered, which will not affect the whites
'                   but make the blacks less black.
'
'
    DIM AS USHORT W = 600, H = 400, W1 = W - 1, H1 = H - 1
    DIM AS UBYTE IMG(W1,H1) ' the image array, starts w/(0,0)
    DIM AS USHORT I, J, W2 = W/2, H2 = H/2, W21 = W2-1, H21 = H2 -1
    DIM AS SINGLE X, Y, PX, PY, DX, DY, D
    DIM AS STRING DS
'
    DECLARE SUB BMP_Header(ByVal filenum AS INTEGER, ByVal ImgW AS USHORT,_
        ByVal ImgH AS USHORT, ByVal Pad AS UBYTE) '  BMP file header writer
'   note: the underscore at end of 1st line is a "continuation" symbol
'           for code that is too long to fit on a single line.
'
'  start of program:
    CLS ' clears the screen
    COLOR 14 ' sets text color to Yellow
    PRINT "  BMP writing demo program, by Chris Wetherill"
    PRINT "  --------------------------------------------"
    COLOR 15 ' makes all text White
    PRINT ' just a blank line
    INPUT " >> Hit  to start: ", DS
    OPEN "Test-graphic.bmp" FOR OUTPUT AS #2
    BMP_Header(2, W, H, 0) ' the call to the header subroutine
'
'  make image graphic:
    FOR J=0 TO H21 ' the top half of the graphic
        Y = CSNG(J)
        FOR I=0 TO W21 ' left half
            X = CSNG(I)
            PX = (9*X + 6*Y)/13 ' PX,PY is point on diagonal
            PY = 2 * PX / 3 '      perpindicular to (I,J)
            DX = PX - X ' delta x
            DY = PY - Y ' delta y
            D = SQR(DX*DX + DY*DY) ' distance from (I,J) to (PX,PY)
            IMG(I,J) = CUBYTE( 255 * (1 - 1.80277*D/W21) )
        NEXT I
        FOR I=W2 TO W1 ' right half is a mirror image of the left
            IMG(I,J) = IMG(W1-I,J)
        NEXT I
    NEXT J
'
    FOR J=H2 TO H1 ' the bottom half of the graphic is a mirror image of top
        FOR I=0 TO W1
            IMG(I,J) = IMG(I,H1-J)
        NEXT I
    NEXT J
'
'  write out the image array:
    J = H
    DO
        J -= 1 ' decrements J by 1 (i.e., J = J - 1)
        FOR I=0 TO W1 ' wrtie out one line of the image
            PUT #2,, CHR(IMG(I,J))
        NEXT I
    LOOP UNTIL J=0
    CLOSE #2 ' don't forget to properly close a file that has been opened!
'
    INPUT " >> Hit  to end/exit: ", DS
    STOP
    END
'
' - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
'
'    BMP_Header subroutine goes in place of this line.

The first thing to notice in the main program is how it has to be made explicitly aware of the subroutine, by using the DECLARE statement. I usually do that after defining all the variables. The other thing you'll see is how variables can be initialized with specific values, or even arithmetic expressions; otherwise they all default to a value of zero.

I've used code like this to generate many of the graphs you see here and around the site. After setting every pixel in the IMG(,) array to white (255), it's like a blank sheet of paper. It's a simple matter to draw x and y axes 50 or 75 pixels in from the edge. Then, just like with graph paper, you need to give some thought to the range in x values and the range in y values that apply for your particular problem, which will give you horizontal and vertical scaling factors, along with their corresponding offsets from zero. Remember that while regular graph paper will usually have y increasing upwards, for computer graphics y increases downwards. A good check that you've got everything correct is to then put tick marks on the axes. (This will likely use the STEP parameter with a FOR ... NEXT loop, this defaulting to 1 in its absence, so that the NEXT I (or whatever) will increase I by something like 50 or 100 instead of only 1.) After that, the (x,y) points that are calculated by the code can be converted into (I,J) array index values. Setting that pixel to zero then plots a black point. If you want something bigger, like a "+" sign, simply set the four adjacent pixels black also.

I do use a spreadsheet program (OpenOffice), but there are some calculations which are too difficult, complicated, or impossible to do that way. Plus, the graphs are pretty crude. Hence the need to be able to write custom code to do the calculations and make the graphs. But instead of trying to read spreadsheet files directly into my programs, I take the column(s), say, of intermediate values, highlight them, and then copy and paste them into Notepad, saving them as a text file. This is easily read into a FreeBASIC program, using the OPEN FOR INPUT command to start.

The only thing I haven't tackled yet in my code is the labelling of my graphs. In the directory where the FreeBASIC program installs you'll find lots of example programs. In the "examples\graphics\FreeType" directory there's a program, char.bas, which is about fonts and rendering text into a bitmap, but I haven't spent the time yet to integrate it into my programs. So I load my resulting BMP files into an image editing program and do the labelling there. This is necessary in any event, since the BMPs are uncompressed and thus take up a lot of file space. Saving them back out as compressed TIFFs or PNGs reduces their size by a lot, as there's usually much blank (white) space.


Ok, on to TIFFs... At the highest conceptual level a TIFF has just three parts: an 8-byte "header" identifying it as a TIFF, a variable length Image File Directory (IFD) where the relevant meta information about the image is found, and the image data itself.

The 8-byte header is composed of three fields, the first two being 2 bytes long and the last 4 bytes.

The first 2-byte field is either "II" or "MM" (without the quotes) and specifies the byte order employed by the file. "II" stands for the Intel byte order convention, which is least-significant byte first (LSB), sometimes called little-endian. "MM" stands for Motorola byte order, which is most-significant byte first (MSB), or big-endian. The way we usually write numbers in the west, left to right, is most-significant digit first, i.e., MM.

Obviously, a full-service TIFF reading program needs to be able to handle either byte order, whereas a reading/writing program on a given machine really only needs to know which byte order convention it uses. Since there are only two possibilities, if you don't know the simplest way to find out is just to guess, which will be right 50% of the time. If things don't work, switch to the other byte order. If you have an AMD processor (CPU), mine is II.

The second 2-byte field is just the number 42, which is the ASCII character code for an asterisk. This serves as a check on the byte order specified in the first field, since in II format it will be "* ", whereas in MM format it'll be " *" (again, without the quotes).

The third field in the header is an unsigned 4-byte (long) integer which is a "pointer" (or offset within the file) to where the (first) IFD can be found. Almost everything in a TIFF file works like this, with pointers to where things are in the file. If the IFD immediately follows the header this will just be the number 8.

Why wouldn't the IFD always just follow the header? Because of compression. When a picture is compressed, you don't know until you've done the compression how many bytes it'll be turned into. But this number is needed for the correct field in the IFD, so the reading program knows how many bytes to read and then de-compress. So, when making compressed image files, it makes most sense to do the compression and write the results out to the output file, counting bytes as one goes. After completion one can then write out the IFD, which will thus be after the image data it describes in the file. One then has to go back up and put the proper number in the third field of the header to point to where the IFD is.

Fortunately this little complication doesn't matter to me, because my code only reads and writes uncompressed data. So the IFD follows the header and thus has an offset of 8.

The IFD itself has a simple structure. It consists, first, of a 2-byte unsigned integer field telling how many directory entries there are which follow. This will usually be a number of at least a dozen and maybe more like fifteen or sixteen, but it can be somewhat larger if many optional directory entries are used. Each directory entry itself is a 12-byte structure broken up into four parts: first, two 2-byte fields, followed by two 4-byte fields. More on these in a second.

The last part of an IFD is a 4-byte unsigned integer pointing to the/any next IFD. You may have noticed the word "first" in parentheses a few paragraphs back... A TIFF file can contain multiple pictures. (The only real limit is how big a file you want to deal with.) These are sometimes called multi-page TIFFs. The way this works is the IFDs are part of what's called a linked list, with the first one pointing to the second one, which points to the third one, and so on. The last one terminates the list by having the pointer in this next IFD field set equal to zero. So if your file only has a single image in it this final field in the IFD will just be zero.

 

This page in progress...

 

Tip Me Please!

©2022, Chris Wetherill. All rights reserved. Display of words or photos here does NOT constitute or imply permission to store, copy, republish, or redistribute my work in any manner for any purpose without expressed prior permission. -- except for the computer code, which is "open source" and carries the usual restrictions, namely that you can't use it for commercial purposes.

 

-------------------------------

 

Back to: 4x5 Large Format Photography Main Page || Main VISNS Page