2009-10-13

Phishing and SPAM, A Crowd-Source Solution

You know… a long time ago, we told people to “just ignore” spam. Answering it, we said, would just get them to send more to us. If we ignore it, it will eventually go away. History has proven us wrong, very wrong indeed. We didn’t factor in that even a 0.01% idiocy rate is profitable when millions of people are tagged. Now, we’re telling them to just ignore phishing, like it will go away. It's time to admit we were wrong; It’s time to start fighting back.

Instead of telling people to ignore phishing, we should tell them to respond with a lie. Every piece of SPAM or Phishing attempt should be answered with bogus information. That way, the economies of scale change, they change in our favour. Instead of them sending out 10,000,000 emails and getting 100 replies from stupid people, they get 100,000 replies where only 100 are real. Then, they have to go through each one, sorting out the good from the bad. Not only do we get some satisfaction in wasting a little bit of the phisher’s time, we also protect the 100 really stupid idiots that responded with the truth.

Send out a batch of emails telling people they have won the lottery, and your email server gets buried under false replies. Send out a link for selling Viagra, and you get millions of orders with fake credit card numbers and a shipping address to the Vatican. Phish for passwords and get more back then you could ever process, mostly garbage. The Nigerian businessman should always get buried under requests for more information, false bank account numbers, and phone numbers for some telemarketing companies. Reply to everything; make a game of it. Let the phishers waste time sorting out the mess they get back. What have we got to lose? They already have our email addresses. The results can’t be any worse than they are now.

It's a simple crowd-source solution to SPAM and Phishing. Why should we try to figure out technical solutions to the problem of responding idiots. Tell everyone to respond with garbage; bury the phishers with data and leave them the task of creating technical tools to fix their problem. All we have to do is tell everyone to lie. That's pretty easy to do. It's fun too.

2009-07-15

A Teaspoon of Sewage

There is an old adage: If you take a barrel of sewage and add a teaspoon of wine, you get a barrel of sewage; if you take a barrel of wine and add a teaspoon of sewage, you get a barrel of sewage.

Digital Rights Management (DRM) is crap. A good part of my living is made setting up computers for end-users, making sure everything "just works" for them. I make a good living so I shouldn't complain too much about DRM; it is, after all, job security for me. But, I've grown to loath it with a passion. The majority of my time is spent dealing with arcane DRM systems like FlexLM, web activation, system IDs, and the like. It's all a load of crap.

I know that I could easily go off to the web and download cracked versions of nearly all this DRMed software that I'm fighting with and get them up and running for free. What's more, it would be easier to do than actually staying legal and fighting with the DRM. What kind of morons make it harder to legally use their products than it is to steal them? The people that are paying for their software are the people that want to stay legal, that have to stay legal for various reasons.

Yes, license management is a good idea for institutions because, honestly, sometimes we loose track of just how many licenses we're using. Even better, make them concurrent with a license server that stops too many people using them at the same time. But, why not just have a license server that asks "how many licenses do you have?" Why go through some arcane song and dance to get "activated" licenses when stealing said licenses would be easier. The people that actually pay are going to put in the right number of licenses when asked. The people that pay are the honest ones that want to stay legal. Why punish your paying customers by treating them like theives? Why make them do more work to pay you and use your product than the theives have to do to use your product for free? It makes no sense.

DRM is crap. If you add it to your software application, your application is crap. It makes no difference how good your code is otherwise, if you pollute it with DRM then you will accomplish nothing but piss-off your legal customer base. The theives don't care; DRM-stripped versions of your applilication will be out soon enough. People strip out DRM because it's fun; the stronger the DRM, the bigger the challenge. Only your legal customer base has to deal with DRM; it gets in the way of using what they paid for. If you release DRM-restricted applications, then you are releasing crap and people like me will write nasty things about you. What's more, we'll moan and complain to everyone that asks us to install your crap and suggest that they find something better. We'll also support and promote any free and open-source product that remotely competes with the crap you're releasing, just so we don't have to deal with DRM.

Yes, I've just spent the better part of my day dealing with one stupid crappy application that won't work with our new license server. Writing this is my way of venting; it beats yelling at the poor woman on the other end of the phone. She can't do anything about it; I can't do anything about it. Everyone is pissed-off because some idiotic managers demanded that their company's software be protected from theft - an impossible goal. It doesn't make it harder to steal; it just makes it harder to use. Why do you do this to your paying customers? Why?

2009-03-06

Sharing is Human

Sorry, I've moved this post to the Keliso blog: HERE
Google search indexing will catch up eventually.

Silly Laws

Sorry, I've moved this post to the Keliso blog: HERE
Google search indexing will catch up eventually.

2009-01-09

N810 - I Am Free

I've recently purchased a Nokia N810 Internet Tablet. I spent a lot of time reviewing the reviews, checking the spec's, comparing it to other devices, and it came out on top. The deciding factor was a little application that can be downloaded and installed. It's called "I-am-free" and is maintained by Owen Williams. This application displays a picture of a shiny gem, nothing more. Now, I've not actually installed it, and have no intention of doing so. The fact that it exists is enough.

It's existence is part joke but mostly a statement of beliefs. You see, for a few days, there existed an application on the iPhone store called "I Am Rich." It was put up by Armin Heinrich and it displays a picture of a shiny gem, nothing more. The difference between this application and the I-am-free application is, of course, price. The "I Am Rich" app cost $999, the maximum the iPhone store allows. If you think that no one would be insane enough to spend a grand on something that does nothing, well, eight people say you're wrong. I suppose it would be more than eight if Apple hadn't pulled it off the iPhone store within a few days.

Thus, we have the real difference between the iPhone and the N810. The iPhone is a locked-down proprietary system where nearly everything you do with it will cost money. The N810 runs a version of Linux and nearly everything you can do with it is free. Sure, the N810 includes a GPS navigation application that wants you to pay for a subscription to get the advanced features, but there are several other completely free mapping apps that you can install. The N810 software repositories, the Linux way of distributing software, contain hundreds of other programs available, all for free.

Linux is a Free and Open Source Software (FOSS) operating system. That means that it's free, as in free beer, and free, as in free speech; both forms of freedom are important. Free, as in beer, means the the software is free of Digital Rights Management (DRM) and all the other stupid tricks companies put in to stop people from using their products without permission. DRM systems, these days, are so complicated that they are often the primary difficulty encountered while installing and using a piece of software. Free, as in beer, also means that you can use the software for free, which is good, very good. FOSS also means that if the software doesn't work the way you want, you are free to change it. This second freedom, free as in free speech, means that the source-code for the software is available to anyone that wants it. That means that you can modify it to meet your requirements. Or, if you're not a programmer, you can pay someone else to modify it. This kind of freedom may not be critical to your average Joe playing with an N810, but if you were a company using an application for business, then the ability to customise the code can be very useful. More importantly, it allows communities of people to collect around applications or particular hardware platforms, like the N810, and improve them. These communities often drive the development of free, as in beer, applications. Of the two FOSS freedoms, free as in free speech is the most important over the long term.

I know, because there is a vibrant community of people supporting the N810, that my new purchase will still be useful long after iPhone users have to send their toy in to Apple to get a replacement battery installed. Yes, it's so easy to change the N810's battery that I'm thinking of carrying spares while travelling. I know the N810 software repositories will exist long after Apple has yanked the last iPhone app from its store. Yes, anyone can put up an N810 repository if they want, several have already; I could put up my own repository and complile my own applications if I really wanted to, and I might at some point. And, when the day comes that technology standards have long-since left both the N810 and the iPhone behind, I know I will find a niche use for the N810 while the iPhone will be landfill. I can already think of several, from a car OBDII reader (car computer interface) to a digital photo frame. Being based on FOSS, the possibilites are only limited by the imagination.

2008-08-01

Data Recovery with the Ubuntu Linux Live CD

I'm an old-school computer tech, started out PIPing around in CP/M but basically grew up with DOS. Yes, I'm a Microsoft OS expert, I do it for a living. However, on my own time, I've become a convert to Linux. So, when the drive failed on my home computer, I took it as an excellent opportunity to learn data recovery, Linux style. What follows are my notes; please be aware that I'm just learning. I don't actually know much of anything about Linux and there are probably better ways to do what I'm doing. This is the official legal caveat: follow these suggestions at your own risk! I take no responsibility, because, well, I'm totally irresponsible.

So, my drive failed. One day it was fine, the next day the OS was grinding along, barely responding to commands. So, being a Windows kind of guy, my first reaction was to reboot. The system didn't come back. The Linux boot loader just reported IO errors on the kernel and stopped. Sigh, it's a good thing I've learned my lessons over the years and had a fairly recent backup. You do have a recent backup, right?

Having a backup, and not being particularly concerned, I thought I'd take the time to learn some Linux data recovery techniques. Besides, it was a good opportunity to install a new, bigger drive and upgrade to the latest version of Ubuntu. Anyway, after fussing and farting around, I managed to get nearly all of my data back from the failing HD. Now, if I were faced with it again, here is the order that I would do things. It's not the order that I did do things, but, hey, I did learn a thing or two in the process.

  • Get an Ubuntu LiveCD to boot the failed system from. I tried a couple of other LiveCDs but my system is an older Mac Mini Power PC (PPC) so there aren't a lot of choices in the LiveCD arena. The couple that I did try lacked decent tools or even the ability to install more, so I went back to Ubuntu, which is my distribution of choice and the only one I have any real knowledge of. So, there may be better tools out there, but this article is using Ubuntu.
  • Get a USB drive that has more free space than the entire partition size you are trying to recover. Not the used space, the entire space.
  • Boot from the LiveCD.
  • Enable all the software sources.
  • Install DD_Rescue. Using a terminal session to typing: "sudo apt-get install ddrescue" will do it.

  • Identify the failed drive partition by typing: "sudo fdisk -l" It should be /dev/hda1 or something like that. Make sure you know the right drive or this whole process is going to be kind of pointless.

  • Connect your working USB drive and wait for it to auto-mount. Ubuntu is nice that way. Note the path to it, which should be something like /media/disk ... whatever it's volume label is anyways.

  • Use DD_rescue to copy your entire failed drive to an image file on the USB. Type: "dd_rescue /dev/hda1 /media/drive/recover_hda1.img" This is just Command -space- Source -space- Destination, nothing complicated. Of course, you need to replace these parameters that match the ones you identified previously, and you can pick whatever filename you want. If it's a large drive, this is going to take quite some time, not the name picking, the file copy. Well, some people seem to spend a lot of time picking names, but that's your problem if you do.

  • Okay, so now you have a backup of the readable part of the failed data. From here, you have several choices. If your problem is just scrambled format information, then you can try to recover the partition on the failed drive. If the drive is failing at a hardware level, then this approach is rather pointless. You could instead replace the failed drive or you could use a second USB drive. Either way, you're going to have to use DD_Rescue to copy the image file back to the drive, just reverse the command parameters - file to dev.

  • Okay, you now have a drive with the data you want recovered and an image file backup of said drive, so it doesn't matter if you do anything wrong with said data drive in the process of recovery, right? You can always repeat the process of using DD_Rescue to copy the file information back to the drive.

  • You have several recovery options: partition recovery, file recovery, and carving. In partition recovery, you can use a utility called TestDisk (which you'll have to install just like DD_Rescue) to scan the drive for partition information. There's lots of information out there on how to use it so I'm not going to put it here. It's not the easiest program to use but, considering the technical nature of what it does, well, it could be a whole lot worse. In file recovery, you want to use a tool like fsck, which is kind of like the old DOS checkdisk. There are lots of command-line options, just type: "fsck --help" or maybe "man fsck" for a list. In file carving, you use a tool like PhotoRec to scan through the drive looking for recognizable file patterns. Use this as a last resort because it will not recover your drive folder structure or filenames. You just get a list of files that you have to go through, one by one, to see what they are. And, yes, it recovers a whole lot more than just photos. So far, I've not had to resort to file carving.

  • At this point, you should have a drive or folder with your recovered data, or at least all the data you're going to get recovered, in it. The next process is to get your system back up and running with the new data. I recommend copying this data to a USB drive and then just re-installing your system. After that, it's a fairly simple matter to selectively locate and copy back your old data to the new system.

So, here's a summary of useful tools:
  • Ubuntu Live CD (to do anything on a system with no functional operating system)
  • DD_Rescue (to copy data, drive to drive, drive to file, or file to drive)
  • TestDisk (to recover lost partitions, among other things)
  • Fdisk -l (to list drive partitions)
  • fsck (to check the file structure of a partition)
  • PhotoRec (to carve files - note there are other file carvers as well. This one comes with TestDisk)
  • chown (to take ownership of your recovered files - you may need to do this)
  • chmod (to allow a non-root user to access the recovered files - so you can use nautilus to move them around)
Note that both DD_Rescue and TestDisk can be installed in an Ubuntu LiveCD environment using sudo apt-get.

2007-02-08

Graphics on the Web 1 - Binary Number System Primer

Have you ever wondered where those strange computer numbers come from: numbers like 8, 16, 256, 1024? Have you ever wondered why these same numbers keep coming up over and over again? Well, the answer is really quite simple.

Computers are binary machines. This means that, unlike humans, they work in a “base-2” number system. Humans generally do math in base-10, or decimal. Decimal means that each digit can represent 1 of 10 possible values: 0-9. Note that there is no number ten, while there are 10 possible combinations; the possible value ranges from 0 up to 9. If we want to represent the value “ten” in decimal, we have to use two digits in sequence: 10, or a one and a zero. Computers work the same way except they only have 2 possible values: 0-1. Just like decimal, with two possible values, because zero counts, the biggest number we can represent with one digit is “one.” If we want to represent the value “two” then we have to use two digits in sequence: 10, or a one and a zero. Binary seems strange to humans only because we’re used to decimal. The two systems work exactly the same, except that one has two possibilities, and the other ten.

A “bit” is a binary digit. Just like decimal, where a two digit number can represent 00-99 for 100 possible combinations, or three digit numbers can be 000-999 for 1000 combinations, bits can be strung together to form larger numbers. The following table outlines what strings of bits are capable of:

Number of Bits

Number of Possibilities

Largest Number

1

2

1

2

4

3

3

8

7

4

16

15

5

32

31

6

64

63

7

128

127

8

256

255

9

512

511

10

1024

1023

Note that 4, 8, and 10 bit combinations are the most common. The first thing you will notice is that, if you have done anything with computers, the numbers above will seem very familiar. The number 256, while it seems arbitrary in decimal, is simply the maximum number of combinations that 8 bits can be, and 255 is the largest number you can get with 8 bits. The arbitrariness of 256, 512, or 1024 is only an artifact of converting binary numbers to decimal number. Rest assured, converting decimal 100, 1000, or 10000 to binary would also produce seemingly arbitrary binary numbers.

In fact, binary does not convert well to decimal; we always wind up with these arbitrary numbers that are not intuitively obvious. When working with computer hardware or machine-level programming, this becomes such a problem that people dispense with decimal altogether and work in something called “hex.” Hex, or hexadecimal, is base-16 and follows the same rules as binary or decimal. One hex digit can represent 16 possible combinations: 0-F. People sometimes stumble over the ‘F’ part but it’s actually quite simple; it comes from exactly the same place ‘7’ comes from: somebody just said so, it’s an arbitrary choice. Hex requires 16 symbols to represent its possible combinations. Way back when, someone said “let’s use the decimal digits 0-9 and then add the letters A,B,C,D,E, and F to make up the extra 6 possibilities.

To see how this works, another table is in order.
Four bits of binary can be strung together like this:

Binary

Hex

Decimal

0000

0

0

0001

1

1

0010

2

2

0011

3

3

0100

4

4

0101

5

5

0110

6

6

0111

7

7

1000

8

8

1001

9

9

1010

A

10

1011

B

11

1100

C

12

1101

D

13

1110

E

14

1111

F

15

So, now we have binary, decimal, and hexadecimal. To understand why programmers bother with hex, let’s look at some more binary numbers.

Binary

Hex

Decimal

0011

3

3

0011 0011

33

51

1111 1111

FF

255

1111 0000 1111 0000

F0F0

61680


Now, things start to clear up. Writing out longs strings of binary ones and zeros can get tedious but converting to decimal is not intuitive so we use hex. Converting between hex and binary is quick and easy as it only involves remembering those 16 combinations in the table above. This is why programmers use hex and this is why, when your computer bombs, you see crash-dump screens full of hexadecimal numbers that seem like gibberish. It’s not gibberish, it’s just programmer-speak for “here’s what went wrong.”

There are other times you will see hex numbers as well, and when you do it’s because the programmer didn't bother converting, or chose not to convert, the numbers before displaying them. Many high-end graphics programs will display colour values in hex for this reason. So, when you see letters in the middle of numbers, you’re probably looking at hex.

One other thing to note: when you see a number like 10, you automatically think “ten” but this could also be binary “two” or hex “sixteen” depending on the context. Because of this, programmers will often follow binary numbers with “b” and hex numbers with “h” to avoid confusion. Then there’s “o” for octal but you don’t need to know about that.

Another place these numbers come up is with kilobytes, megabytes, and gigabytes. First off, a byte is 8 bits, and that’s just because somebody said so. Just like the reasoning behind hex, memory and hard drives are organised in bytes rather than bits out of convenience for system engineers. Now, in the metric system kilo is a prefix for 1000, a kilometre is 1000 metres, and mega is a prefix for 1,000,000, as in megawatts or 1 million watts. In computing, because engineers are a strange bunch, the metric system is perverted to deal with those arbitrary decimal conversions of binary numbers. Kilo is short for 1024, the decimal conversion of 10 bits. Similarly, mega is 1024*1024,giga is 1024*1024*1024, and on it goes.

Why, because it’s easier to say 128Mbytes than it is to say 134,217,728 bytes. Remember, it’s a binary system and it doesn't convert well to decimal. However, by using 1024 as the multiplier, a “half-conversion” can happen that sort of makes sense to regular humans. 128MBytes makes more sense that 8000000h to the average person. For the most part, the system makes for good shorthand as the extra 24’s don’t make too much difference. However, some manufacturers will cheat by saying their hard drive is 40Gbytes, which is true as it is over 40,000,000,000 bytes, but when it shows up on your system as only 37Gbytes you know one persons “giga” is different than the others. One is a 1000*1000*1000 in true metric fashion, the other is 1024*1024*1024 in the computer confabulated system.

As you can see, the computer numbers 4, 8, 16, 32, 64, 128, 256, 512, and 1024 are around for a reason; they are just decimal conversions of binary numbers. There really is nothing mysterious about them, they are just values that come up when you double two, and then double it again, and double it again… It’s the same reason we get 10, 100, 1000, 10000 etc. in decimal. One’s base-2 so we double it, the other is base-10 so we multiply by 10 each time. If you don’t like the arbitrariness or these numbers, you could work entirely in hex and computers would make perfect sense. No one else will understand you, but computers will make sense.

Part 2 - Colour Systems

Part 3 - File Types

Graphics on the Web 2 - Colour System Primer

Now that you have a basis for understanding where these strange computer numbers come from, we can apply that knowledge to colour systems.

Back in the old days, monitors were black and white only. Okay, they were black and green, or maybe amber, white didn't come along for a while. But hey, if you want to get picky, the really old systems were lines of blue text on white paper. But, in any event, the systems were “monochrome” or one colour. Monochrome can either be on or off, white or black; it’s a binary system that has only two possible states. This means that it takes 1 bit of storage for each dot on the screen. A screen dot is called a “pixel”

It looks something like this: (note, for printing clarity, I've made on = shaded)

The Screen


The Data






0

1

1

0






1

0

0

1






1

0

0

1






0

1

1

0






0

1

1

0


In words, for a screen that was 4 pixels across and 5 pixels down, it would take 20 bits of data to store the image.

Now, This is pretty boring so someone came up with the idea of having variable intensity levels for each pixel. Each dot on the screen could be black (off), dim (33% on), medium (66% on), or white (full on). In other words, each pixel had 4 possible states and this requires 2 bits per pixel to store. For the same number of pixels, our bit count now stands at 40. It looks like this:

The Screen


The Data






00

11

11

00






10

00

00

10






01

00

00

01






00

11

11

00






00

11

11

00


Then someone thought, wouldn't colour be nice, and 16 colour mode was born. In 16 colour mode, each pixel can be, you guessed it, one of 16 possible colours. Of course, 16 possibilities require 4 bits per pixel to store. It looks like this: (note that the colours won’t print on a black and white printer, they may appear as shades of grey. Also, I've randomly picked values for each colour.) Note that our bit count is now running at 80.


The Screen


The Data






0000

1010

1010

0000






1001

0000

0000

1011






0101

0000

0000

0111






0000

1100

1100

0000






0000

1111

1111

0000

But still, 16 measly colours does not a perfect world paint. How many colours do you thing you would need to create realistic scenes? Well, getting realistic colours requires a lot more: millions more. Thus, 24 bit colour was born. This colour system uses 8 bits for each of the 3 primary colours (red, green, and blue). Remember that 8 bits gives you 256 possible combinations, and that’s 256 possible shades of red, 256 possible shades of green, and 256 possible shades of blue. By mixing each of the primary colour together, we get 256 * 256 * 256, or 16777216 possibilities. Now, for realistic colour, our bit count, for the same 20 pixel display, now runs at 480. 24 times more information than 16 colour mode. It looks like this:

The Screen


The Data






R=00000000

G=00000000

B=00000000

R=00000000

G=00000000

B=11111111

R=00000000

G=00000000

B=11111111

R=00000000

G=00000000

B=00000000






R=00000000

G=11111111

B=00000000

R=00000000

G=00000000

B=00000000

R=00000000

G=00000000

B=00000000

R=11111111

G=00000000

B=00000000






R=11111001

G=00111011

B=11100110

R=00000000

G=00000000

B=00000000

R=00000000

G=00000000

B=00000000

R=10001101

G=00000110

B=00111110






R=00000000

G=00000000

B=00000000

R=00000000

G=11000110

B=11011100

R=00000000

G=11000110

B=11011100

R=00000000

G=00000000

B=00000000






R=00000000

G=00000000

B=00000000

R=11111111

G=11111111

B=11111111

R=11111111

G=11111111

B=11111111

R=00000000

G=00000000

B=00000000

Now, 480 bits is still pretty insignificant but you have to realise we’re talking about a 4x5 pixel array. Nowadays, most people run their monitors at 800 x 600 pixels. A full 24 bit colour image on this display requires 800*600*24, or 11520000 bits. That’s 11 million bits of information. Now, recall that computing people use bytes instead of bits when talking about storage space, so 11520000 / 8 = 1440000 bytes (or 1440000 / 1024 /1024 = 1.37Mbytes in computerised metric.) One screen image takes nearly 1 and a half million bytes of storage space.

Back when storage space, in hard drives and RAM, was very expensive, someone came up with a way of compressing this information down. The idea was that, while 16 colours is very limiting, most of the time you don’t really need 16 million colours. Why not have something in between? Thus, 256 colour mode was created. 256 colour mode is actually still 24 bit colour but limits the possible range of colours to 256 by using what’s called a “pallet.” At the beginning of the file for each image, or within a video card, there is an area called the pallet. It is a list of 256 possible colours and their corresponding 24 bit red/green/blue (RGB) values. A pallet looks something like this:

Pallet Entry

Colour

RGB Values

0


R=11111001

G=00111011

B=11100110

1


R=10001101

G=00000110

B=00111110

2


R=00000000

G=00000000

B=11111111

3


R=00000000

G=11111111

B=00000000

4


R=11111111

G=00000000

B=00000000

5


R=11111111

G=11111111

B=11111111

6


R=00000000

G=00000000

B=00000000



255


R=00000000

G=11000110

B=11011100

Then, the actual pixel info just references the pallet rather than storing the full 24 bits for each colour. It looks like this:

The Screen


The Data






00000110 (6)

00000010 (2)

00000010 (2)

00000110 (6)






00000011 (3)

00000110 (6)

00000110 (6)

00000100 (4)






00000000 (0)

00000110 (6)

00000110 (6)

00000001 (1)






00000110 (6)

11111111 (255)

11111111 (255)

00000110 (6)






00000110 (6)

00000101 (5)

00000101 (5)

00000110 (6)

So, we have a bit count of 160, not including the overhead for the pallet, or roughly one third the size of the same pixel area in 24 bit colour. In general 256 colour mode works very well for graphic images but, even with 256 possible colours, does not work well for photo-realistic content.

The very same system is also used for a newer form of 16 colour mode. In this system, rather than using the stock 16 colours, a pallet containing 16 possible 24 bit colour values is used. It provides a little more flexibility than the old 16 colour system, with very little extra bits. When using this system, the original 16 colours are sometimes referred to as “Windows colours.”

There is one more system worth mentioning: it’s “greyscale.” Greyscale is kind of like the old 2 bit variable intensity black and white, but with 8 bits per pixel. Thus, each pixel can be one of 256 possible shades of grey. I won’t bother with the table as it’s pretty similar to the ones already shown.


Now, with storage space still at a premium, engineers figured out other ways to compress the pixel information down. Thus, we will get into file types.

Part 1 - Binary Numbers

Part 3 - File Types

Graphics on the Web 3 - Graphic File Type Primer

First off, file types, in the Microsoft world, are the three letter designations that go after the period on file names. Examples are filename.gif, filename.jpg, or filename.txt. The computer uses those designations to figure out what program created the file. Thus, when you double-click on a .txt file, notepad (a text editor) opens up the file. If you click on a .jpg file, then a graphics program, or whatever program is “registered” to open .jpg files will open the file for you.

There are many, many different file types for graphics programs but some of the more common ones are BMP, GIF, JPG, TIF, and WMF. These five, at least, provide good examples for the next discussion.

BMP files are “Windows Bitmap Files.” These files are the native Windows format for graphic files. They can be in any of the colour systems mentioned before. There is nothing overly special about them.

GIF files are “Graphics Interchange Format” files. When people first started working with pictures, the file sizes were too large. Someone figured out a way to compress an image by recording repeated colours as counts rather than over and over. For example, if you had a line of pixels that was black, blue, blue, blue, blue, blue, blue, blue, blue, blue, red. Then you could store it that way, as BMP files do, or you could say black, blue (x9), red. This is a simplified example of how GIF files compress file sizes. Depending on how simple the image is (if there are a lot of repeated colours) a GIF file can be significantly smaller than a BMP file, often less than one half the size, with no loss of information. This last part is important: GIF compression is “lossless” in that no information is thrown away, you always get out what you put in, it just takes less room to store. GIF files are usually in 16 or 256 colour mode. Note that there is also a system called “animated GIF. This system is just a series of images, stored in a single GIF file, that play in an animated sequence and is used extensively on the web.

JPG file are “Joint Photographic Experts Group” files, also know as jpeg. JPG, like its cousins in video (MPG) or audio (MP3), is a “lossy” compression system. In other words, information is lost when an image is compressed to this standard. The amount of information lost depends on what the image is of (widely varying images don’t compress well) and how “aggressive” you choose the compression to be. Typically, JPG compressed files will be one twentieth the size of the same image in BMP , a significant savings in file size. To casual observation, the resulting compressed image will not look any different from the original; it takes a trained eye to spot “compression artifacts” from lost information so the file savings are usually worth the lost information. However, JPG images are almost always 24 bit colour (there is a greyscale JPG standard but many applications don’t support it). That means, if you compress a 256 colour (8 bit) image with JPG , it has to convert to 24 bit when you open it up again. In the end, the size can actually grow to three times what it was originally! You need to use JPG compression with caution and full knowledge of what’s going on to get the most from it.

TIF files are “Tagged Image Format” files. This is a file standard that isn't really standard. Most programs support the TIF format but they usually do in their own special way. For example, some mapping files in TIF format include “georeferencing” tags to tell the opening program exactly where the map file is suppose to represent. In other words, the image is “tagged” with special information. Different applications may tag the image in different ways. One unique feature of TIF files is that they can be “paged” from a disk if they are very large. Because of this, it is possible to work on TIF files that are much larger than in the other formats. There is a compressed version of TIF files, actually several version, but not all programs support them.

WMF files are “Windows Meta Files.” I’m ending with this type because it’s weird. Before you can understand what’s going on with these files, you have to know the difference between raster and vector graphics. Raster graphics is what I've been describing all along. The pixels across and down the screen form a raster. Lines and other shapes are made by turning pixels on or off to create the image desired. All but the most specialised monitors and printers are raster devices. In other words, no matter what you start with, the displayed image is raster. A raster graphics file is simply a pixel by pixel recording of the image. Vector, on the other had, is a completely different ball game. In a vector graphics file, rather than storing information for each pixel, information is stored for each line or shape displayed. Thus, a line would be stored as “Line, starting at coordinates 27, 67, ending at coordinates 234, 553, thickness of 3, colour green. Each line or shape of the entire image is stored in this fashion. Most computer-aided design (drafting) or mapping programs are vector based.

Windows Meta Files are vector graphic files. In other words, these files are collections of shapes rather than arrays of pixels. So, why do they bother with this filetype? Well, try enlarging a raster graphics file and you’ll find out why. If you shrink a raster file, say by cutting it in half (from 800 x 600 to 400 x 300) all the program has to do is throw away 3 out of ever 4 pixels (if you half the width and height, you get 1/4 of the picture size), it’s very simple. But, if you took that same file and doubled it’s size, both horizontally and vertically (from 800 x 600 to 1600 x 1200), then the graphics program has to quadruple every pixel into chunky squares. When you do this, smooth lines become jagged staircases. For this reason, raster images are said to be "not scalable." You can’t just keep zooming in without the image turning into large chunky blobs. However, vector files are infinitely scalable. If you zoom in on a vector file, it just changes the co-ordinates of the start and stop points, and maybe changes the line thickness. Smooth lines are still smooth. You could zoom in 500x on an image and the lines will still be smooth.

If vector graphics files are so scalable, many people wonder why we don’t use them for all graphics and just do away with raster entirely. Well, the problem is that photographic images just don’t reduce down to lines and shapes. When there is just too much variation, vector graphic systems can’t hold the image, they just don’t work. Where they do work particularly well is in simple graphic images, and that is exactly where they are used in Windows. Most Windows clipart images are stored in WMF format. Not only are the file sizes small, they are scalable to whatever size you require within the program you are using them in.


What about the WEB:

Well, hard drives and memory may have become cheap in the last while but there is now a new reason to keep file sizes small. The larger the file size, the longer it takes to transfer from a web server to a web browser, especially on a slow link. This is a powerful incentive for web-designers to shrink their images down to the smallest possible size. To do this, they use GIF and JPG files. For simple graphics, reduced to 256 or 16 colour mode, GIF files are the best choice for the reason previously mentioned. For everything in 24 bit colour, JPG is the way to go.

Part 1 - Binary Numbers

Part 2 - Colour Systems