Data Storage

In this section we will look at how data is stored in different file types, what these file types mean, and how data can be compressed across different formats.

WHAT IS A FILE FORMAT?

A file format is a type of file. For example, on your computer you will have many different files which contain many different types of data. One might be an image, whereas another might be a text document. Each file will be in a different format depending on what is stored in it. A file format can be identified by the 'extension' at the end of the name. This is a few letters that tell is what is contained in the file. For example, the average computer user might have the following files on their PC:

These of course aren't the only file types, but they are examples of some of the most popular ones.

A computer uses a file type to recognise what type of file it is and ensure the correct program opens it. Without file types there would be no way of knowing which program should open what file and the user would have to select this manually. The file extension tells the computer what type of data is contained in in the file, and which program should open it. It is possible for the user to override what program opens which file, but this can be problematic as a program won't be able to open a file which contains the wrong type of data.

We will now look at a few different file types in more detail.

MIDI

MIDI stands for Musical Instrument Digital Interface and is a standard which allows electronic musical instrucments to communicate with computers. Developed in the 1980s, it was a way of recording instruments in to a computer without having to use microphones. The data comes from up to 16 channels of instruments and is turned in to a MIDI file, which can then be played back. A MIDI file contains information about the instrument used, the length of the note, the pitch andthe volume. The advantages of the format are that it allows people with no physical instrument skills to build complex music using computers. The files are also very small. An entire songs can be as small as a few kilobytes making them extremely portable. They were traditionally used in video games because of this small file size. When games were limited to only a few megabytes on a floppy disk, the music had to take up as little as possible so MIDI was perfect for this. They were also used in early ring-tones on mobile phones, before the data storage increases allowed for real music to be used.

MP3

An MP3 is a digital audio file format. It is extremely popular due to the popularity of portable music devices such as iPods which used for format. Unlike MIDI, MP3s can be made up of live musical recordings of analogue instruments which are played in to microphones and recorded using a computer. MP3s became popular because they allowed large audio files to be compressed in to smaller files which could then be used on portable storage devices such as mobile phones without taking up a lot of storage space. MP3s are compressed using lossy compression which means that they lose some quality from their original formats. However, they are very small when compared to the data on a CD. An MP3 encoded at a 128k bit rate is 1/11th the size of the equivalent CD song.

JPEG

Created by the Joint Photographic Experts Group, the JPEG is a compressed image format which is frequently used on the internet. It is a popular format becauseof its small file size when compared to raw image files. Images compressed in to the JPEG format lose some of their original quality, and therefore the format is known as a lossy format as the data lost in compression cannot be recovered. JPEG is used widely in photography, in particular on smart phones with cameras as the image quality is high when compared to other formats such as GIFS. Although quality is lost, due to the complex compression algorithm, the reduction in quality is not highly noticeable.

COMPRESSING A FILE

Compressing a simple document may seem like a pointless task. After all, the text file is hardly going to fill up your hard drive. However, when you’re working with hundreds, or even thousands of individual documents, being able to compress them for transferring to an email address, a USB stick or uploading them to the internet becomes handy.

Programs like Winzip and Winrar are designed to do just that. They take individual files and compress them in to a single compressed document that can be uncompressed using the same problem. The do this by running the files through several different algorithms. The algorithms look for things which can be taken out to save space and replaced later. For example, run length encoding (RLE) removes all repeated occurrences of a letter or number and just states how many times they were used. Therefore

EEGGGHHSJJJI

becomes...

2E3G2HS3JI

While we didn’t exactly save much space thereconsider how much we might save if we had 10,000 letters. Dictionary compression is also a handy tool. The software will assign each word a number and remove it from the text, only to use the dictionary later to rebuild the file.
Remember though that you cannot decompress the file unless you have the same software that they were compressed with which may cause problems when sending files to other people.

Compressing a document is known as LOSSLESS compression. This means that everything can be built back exactly as it was before when it is decompressed. Think of it as the difference between carefully taking down your Lego house and putting the blocks safely back in the box instead of just smashing it up with a hammer.

COMPRESSING AN IMAGE

Image compression works differently to just compressing a document. There are a couple of ways you can do it.

A digital image is made up of lots of pixels. A pixel is just a small square which is made up of a single colour. Put thousands, or millions of these in to an image, and you won’t be able to tell they’re there. The number of these pixels in each image is called the RESOLUTION. Each pixel takes up memory on a drive. Therefore, the higher the resolution of an image, the larger the image. Therefore to make an image smaller, it is possible to reduce the amount of pixels that are used to make it. Consider the two images below.

imgb4

imgafter

The picture above takes up more space as it has more pixels in it. The one below is smaller as it uses less. However, as you can tell, it also loses a lot of visual quality. The main problem with this is that this is a LOSSY compression method. This means that no matter what you do, you can never get the bottom image back to the quality of the top one. The detail has been removed and discarded and so cannot be replaced.

Another method of compression is to reduce the amount of colours in an image. Each colour in an image takes up a part of the memory. Therefore, use less colours and you use less memory. Once again however, this reduces the quality as you can see below. It’s also another LOSSY method. The colours are gone, and can’t be brought back.

imglowcolour

It is possible to perform some lossless compression on images. One method is to store all of the data of the first pixel in the image, and then in each subsequent pixel, just store how different it is from the one to its left.

COMPRESSING AUDIO

Audio can be compressed in a similar way to images and video. It is possible to remove elements of the audio to save the amount of space used to store it. One popular method is to remove the frequencies (the number of vibrations per second which make up sound) that humans find it difficult to hear. For example, any really low sounds or really high sounds that are barely noticeable to humans can be removed to save space. Although this is technically a lossy method, it may not be completely noticeable to some people

COMPRESSING VIDEO

Because a video is effectively a series of images played at speed, it is possible to compress one using exactly the same techniques as above. However, there are also a few other ways specific to video

Firstly, you can reduce the frame rate in a video. Most videos run at approximately 29 frames per second. That means that for every second in a video file, the computer will have to store 29 separate pictures. Reducing this frame rate will mean less pictures. For example, bringing the frame rate down to 14 frames per second would mean that each second, the video used less than half the frames it did before, saving a lot of memory in the process. Reducing the frame late excessively though can lead to a video looking ‘jumpy’ and unpleasant to watch.

A final method is removing repeated elements of a frame. For example, imaging you have a movie where one scene takes place in an office. If the camera doesn’t move, then a lot of the office background is going to be the same in every frame, with only the characters moving. Telling the computer that parts of the next frame are the same as the first one will mean that it doesn’t have to store the information for the full frame every time. The good thing about this is that it is LOSSLESS

You may have downloaded a movie before that has required a specific CODEC to run. The codec is the piece of information which tells the computer how to uncompress it if it is lossless. Codec simply means COmpressDECompress


1) Give two methods of compressing an image

2) Explain the difference between lossy and lossless compression