Data and Representation

Data and Representation

Statistics is a branch of Mathematics which deals with collection, presentation, analysis, interpretation of numerical data and drawing inferences and conclusions there from.

Data: Facts or figures, which are numerical or otherwise collected with a definite purpose.

Collection of Data

In any field of investigation, the first step is to collect the data. Data is said to be primary if the investigator himself is responsible for the collection of data.

It is not always possible for an investigator to collect data due to lack of time and resources. In that case, investigator may use data collected by other governmental or private agency in the form of published reports. They are called secondary data. Data may be primary for one individual or agency but it becomes secondary for other using the same data.

Types of Data

  1. Primary Data: Data which an investigator collects for the first time for his own purpose.
  2. Secondary Data: Data which the investigator obtains from some other source, agency or office for his own purpose.

Presentation of Data

When the work of collection of data is over, the next step to the investigator is to find ways to condense and organise them in order to study their salient features. Such an arrangement of data is called presentation of data.

1. Raw or Ungrouped Data: The data obtained in original form and presented ungrouped without any re-arrangement or condensed form.

Suppose there are 20 students in a class. The marks obtained by the students in a mathematics test (out of 100) are as follows:

45, 56, 61, 56, 31, 33, 70, 61, 76, 56, 36, 59, 64, 56, 88, 28, 56, 70, 64, 74

The data in this form is called raw data. Each entry such as 45, 56 is called a value or observation.

2. Arrayed Data

The Presentation of a data in ascending or descending order of magnitude.

Arrange these numbers in ascending order:

28, 31, 33, 36, 45, 56, 56, 56, 56, 56, 59, 61, 61, 64, 64, 70, 70, 74, 76, 88

3. Tabular Data (Ungrouped Frequency Table)

Presentation of data in this form is time consuming, when the number of observations is  large. To make the data more informative we can present these in a tabular form.

Table: Marks of 20 students

Marks Number of Students
28 1
31 1
33 1
36 1
45 1
56 5
59 1
61 2
64 2
70 2
74 1
76 1
88 1
Total 20

From the table, you can easily see that 1 student has secured 28 marks, 5 students have secured 56 marks, 2 students have secured 70 marks, and so on. Number 1, 5, 1, 2 are called respective frequencies of the observations 28, 56, 59, 64.

Grouped Frequency Distribution Table

When the number of observation is large, you classify the data into classes or groups or class intervals.

Step 1: Determine the range of the raw data (Difference between maximum and minimum observations). In the above example, range is 88 - 28 = 60.

Step 2: Decide the number of classes or groups into which the raw data are to be grouped. There is no hard and fast rule for determining the number of classes, but generally there should not be less than 5 and not more than 15.

Step 3: Divide the range by the desired number of classes to determine the approximate size (or width) of a class-interval. In the above example, suppose you decode to have 9 classes. Then, size of each class is 60/9 = 7 (approximate).

Step 4: Next, set up the class limits using the size of the interval determined in step 3. The classes should be non-overlapping, no gaps between the classes, and classes should be of the same size.

Step 5: Take each observation from the data, one at a time, and put a tally mark (|) against the class to which it belongs. For the sake of convenience, we record the tally marks in bunches of five, the fifth one crossing the other four diagonally.

Step 6: By counting tally marks in each class, you get the frequency of that class. The total of all frequencies should be equal to the total number of observations in the data.

Step 7: The frequency table should be given a proper title so as to convey exactly what the table is about.

Terms

Grouped Data: Rearrangement or condensed form of data into classes or groups.

Range of Data: Difference between the highest and lowest values (or observations) in the data.

Frequency: The number of times an observation occurs in data.

Class Interval: Each group in which the observations or values of a data are condensed.

Class limits: Values by which each class interval is bounded. Value on the left is called lower limit and value on the right is called upper limit.

Class size: Difference between the upper limit and the lower limit.

Class mark of a class interval: Mid value of a class interval = (lower limit + upper limit)/2

Cumulative Frequency of a class: Total of frequencies of a particular class and of all classes prior to that class.

Cumulative Frequency Table

If you insert a column showing the cumulative frequency of each class, you get cumulative frequency distribution or simply cumulative frequency table of the data.

Graphical Representation of Data

1. Bar Graph or Charts

A pictorial representation of data in which usually bars (rectangles) of uniform width are drawn with equal spacing between them on horizontal axis and values of variable (frequencies) are shown on vertical axis.

The width of the rectangle has no special meaning except to make it pictorially more attractive.

2. Histogram

A pictorial representation like vertical bar graph with no space between the bars. It is used for continuous grouped frequency distribution.

  1. The classes of the grouped data are taken along the horizontal axis.
  2. The respective class frequencies on the vertical axis, using a suitable scale on each axis.
  3. For each class a rectangle is constructed with base as the width of the class and height determined from the class frequencies. The areas of rectangles are proportional to the frequencies of their respective classes.

3. Frequency Polygon

A graphical representation of grouped frequency distribution in which the values of the frequencies are marked against the class mark of the intervals and the points are joined by line segments.

A frequency polygon is obtained by first joining the mid points of the tops of the adjacent rectangles in the histogram and then joining the mid point of first rectangle to the mid point of the class preceding the lowest class and the the last mid point to the mid point of the class succeeding the highest class.

A frequency polygon can also be drawn independently without drawing a histogram by using the class marks of the classes and respective frequencies of the classes.