2007-02-08

Graphics on the Web 3 - Graphic File Type Primer

First off, file types, in the Microsoft world, are the three letter designations that go after the period on file names. Examples are filename.gif, filename.jpg, or filename.txt. The computer uses those designations to figure out what program created the file. Thus, when you double-click on a .txt file, notepad (a text editor) opens up the file. If you click on a .jpg file, then a graphics program, or whatever program is “registered” to open .jpg files will open the file for you.

There are many, many different file types for graphics programs but some of the more common ones are BMP, GIF, JPG, TIF, and WMF. These five, at least, provide good examples for the next discussion.

BMP files are “Windows Bitmap Files.” These files are the native Windows format for graphic files. They can be in any of the colour systems mentioned before. There is nothing overly special about them.

GIF files are “Graphics Interchange Format” files. When people first started working with pictures, the file sizes were too large. Someone figured out a way to compress an image by recording repeated colours as counts rather than over and over. For example, if you had a line of pixels that was black, blue, blue, blue, blue, blue, blue, blue, blue, blue, red. Then you could store it that way, as BMP files do, or you could say black, blue (x9), red. This is a simplified example of how GIF files compress file sizes. Depending on how simple the image is (if there are a lot of repeated colours) a GIF file can be significantly smaller than a BMP file, often less than one half the size, with no loss of information. This last part is important: GIF compression is “lossless” in that no information is thrown away, you always get out what you put in, it just takes less room to store. GIF files are usually in 16 or 256 colour mode. Note that there is also a system called “animated GIF. This system is just a series of images, stored in a single GIF file, that play in an animated sequence and is used extensively on the web.

JPG file are “Joint Photographic Experts Group” files, also know as jpeg. JPG, like its cousins in video (MPG) or audio (MP3), is a “lossy” compression system. In other words, information is lost when an image is compressed to this standard. The amount of information lost depends on what the image is of (widely varying images don’t compress well) and how “aggressive” you choose the compression to be. Typically, JPG compressed files will be one twentieth the size of the same image in BMP , a significant savings in file size. To casual observation, the resulting compressed image will not look any different from the original; it takes a trained eye to spot “compression artifacts” from lost information so the file savings are usually worth the lost information. However, JPG images are almost always 24 bit colour (there is a greyscale JPG standard but many applications don’t support it). That means, if you compress a 256 colour (8 bit) image with JPG , it has to convert to 24 bit when you open it up again. In the end, the size can actually grow to three times what it was originally! You need to use JPG compression with caution and full knowledge of what’s going on to get the most from it.

TIF files are “Tagged Image Format” files. This is a file standard that isn't really standard. Most programs support the TIF format but they usually do in their own special way. For example, some mapping files in TIF format include “georeferencing” tags to tell the opening program exactly where the map file is suppose to represent. In other words, the image is “tagged” with special information. Different applications may tag the image in different ways. One unique feature of TIF files is that they can be “paged” from a disk if they are very large. Because of this, it is possible to work on TIF files that are much larger than in the other formats. There is a compressed version of TIF files, actually several version, but not all programs support them.

WMF files are “Windows Meta Files.” I’m ending with this type because it’s weird. Before you can understand what’s going on with these files, you have to know the difference between raster and vector graphics. Raster graphics is what I've been describing all along. The pixels across and down the screen form a raster. Lines and other shapes are made by turning pixels on or off to create the image desired. All but the most specialised monitors and printers are raster devices. In other words, no matter what you start with, the displayed image is raster. A raster graphics file is simply a pixel by pixel recording of the image. Vector, on the other had, is a completely different ball game. In a vector graphics file, rather than storing information for each pixel, information is stored for each line or shape displayed. Thus, a line would be stored as “Line, starting at coordinates 27, 67, ending at coordinates 234, 553, thickness of 3, colour green. Each line or shape of the entire image is stored in this fashion. Most computer-aided design (drafting) or mapping programs are vector based.

Windows Meta Files are vector graphic files. In other words, these files are collections of shapes rather than arrays of pixels. So, why do they bother with this filetype? Well, try enlarging a raster graphics file and you’ll find out why. If you shrink a raster file, say by cutting it in half (from 800 x 600 to 400 x 300) all the program has to do is throw away 3 out of ever 4 pixels (if you half the width and height, you get 1/4 of the picture size), it’s very simple. But, if you took that same file and doubled it’s size, both horizontally and vertically (from 800 x 600 to 1600 x 1200), then the graphics program has to quadruple every pixel into chunky squares. When you do this, smooth lines become jagged staircases. For this reason, raster images are said to be "not scalable." You can’t just keep zooming in without the image turning into large chunky blobs. However, vector files are infinitely scalable. If you zoom in on a vector file, it just changes the co-ordinates of the start and stop points, and maybe changes the line thickness. Smooth lines are still smooth. You could zoom in 500x on an image and the lines will still be smooth.

If vector graphics files are so scalable, many people wonder why we don’t use them for all graphics and just do away with raster entirely. Well, the problem is that photographic images just don’t reduce down to lines and shapes. When there is just too much variation, vector graphic systems can’t hold the image, they just don’t work. Where they do work particularly well is in simple graphic images, and that is exactly where they are used in Windows. Most Windows clipart images are stored in WMF format. Not only are the file sizes small, they are scalable to whatever size you require within the program you are using them in.


What about the WEB:

Well, hard drives and memory may have become cheap in the last while but there is now a new reason to keep file sizes small. The larger the file size, the longer it takes to transfer from a web server to a web browser, especially on a slow link. This is a powerful incentive for web-designers to shrink their images down to the smallest possible size. To do this, they use GIF and JPG files. For simple graphics, reduced to 256 or 16 colour mode, GIF files are the best choice for the reason previously mentioned. For everything in 24 bit colour, JPG is the way to go.

Part 1 - Binary Numbers

Part 2 - Colour Systems

No comments: