Data handling
Statistics is concerned with the collection and the analysis of data.
Statistics can be used to study the size of families in Singapore, the life cycle of a butterfly, or the evolution of the crime rate in Chicago.
Data can be collected by several means.
- Observing the occurences of a given event
- Running an experiment several times and recording the results
- Conducting surveys by interviewing a group of people
Count the beats of your heart for a minute.
Throw a dice several times to see if it is biased.
Ask the students in your class how many siblings they have.
After the data has been collected, it needs to be summarised.
You have collected the numbers of brothers and sisters your friends have.
$$\Tblue{0}$$ | $$\Tblue{2}$$ | $$\Tblue{1}$$ | $$\Tblue{3}$$ | $$\Tblue{1}$$ | $$\Tblue{5}$$ | $$\Tblue{6}$$ | $$\Tblue{1}$$ | $$\Tblue{1}$$ | $$\Tblue{1}$$ |
$$\Tblue{2}$$ | $$\Tblue{2}$$ | $$\Tblue{0}$$ | $$\Tblue{2}$$ | $$\Tblue{6}$$ | $$\Tblue{1}$$ | $$\Tblue{0}$$ | $$\Tblue{1}$$ | $$\Tblue{0}$$ | $$\Tblue{0}$$ |
$$\Tblue{4}$$ | $$\Tblue{3}$$ | $$\Tblue{0}$$ | $$\Tblue{0}$$ | $$\Tblue{0}$$ | $$\Tblue{1}$$ | $$\Tblue{0}$$ | $$\Tblue{0}$$ | $$\Tblue{2}$$ | $$\Tblue{1}$$ |
The number of times a value (such as number $$\Tblue{3}$$ in the table) occurs is the frequency of the value.
The frequency of $$\Tblue{3}$$ in the table is $$\Tred{2}$$, as number $$\Tblue{3}$$ appears twice.
The frequency table shows how many times each value occurs.
Number of Siblings | $$\Tblue{0}$$ | $$\Tblue{1}$$ | $$\Tblue{2}$$ | $$\Tblue{3}$$ | $$\Tblue{4}$$ | $$\Tblue{5}$$ | $$\Tblue{6}$$ |
---|---|---|---|---|---|---|---|
Frequency | $$\Tred{10}$$ | $$\Tred{9}$$ | $$\Tred{5}$$ | $$\Tred{2}$$ | $$\Tred{1}$$ | $$\Tred{1}$$ | $$\Tred{2}$$ |
$$\Tred{10}$$ of your friends are a single child and $$\Tred{9}$$ have only one sibling.
The data can be grouped to give a more compact representation.
Number of Siblings | $$\Tblue{0}$$ | $$\Tblue{1}$$ | $$\Tblue{2-3}$$ | $$\Tblue{4+}$$ |
---|---|---|---|---|
Frequency | $$\Tred{10}$$ | $$\Tred{9}$$ | $$\Tred{7}$$ | $$\Tred{4}$$ |
The table shows that you have $$\Tred{4}$$ friends who have at least four siblings.
A dot diagram is a way of displaying data so it is easy the different frequences. For each entry, you add a dot in front of the entry.
Sometimes the dots are replaced by small icons (or pictures). The resulting diagram is called a pictogram. We can use different images for different numbers or types of data. For instance, you can have one image for a pair and another one for a unit.
It is easy to get the frequency table from the dot diagram and vice-versa. The frequency of an entry is the number of dots.
This is the frequency table used for the diagrams above.
Number of Siblings | $$\Tblue{0}$$ | $$\Tblue{1}$$ | $$\Tblue{2}$$ | $$\Tblue{3}$$ | $$\Tblue{4}$$ | $$\Tblue{5}$$ | $$\Tblue{6}$$ |
---|---|---|---|---|---|---|---|
Frequency | $$\Tred{10}$$ | $$\Tred{9}$$ | $$\Tred{5}$$ | $$\Tred{2}$$ | $$\Tred{1}$$ | $$\Tred{1}$$ | $$\Tred{2}$$ |
- In a line chart, you plot the frequency for each value and you connect the points with a line.
- In a bar chart, you draw a bar for each value. The height is determined by the frequency.
The bar corresponding to $$\Tblue{0}$$ sibling will be $$\Tred{5}$$ cm long; the bar corresponding to $$\Tblue{1}$$ sibling with be $$\Tred{4.5}$$ cm, etc.
You have collected the numbers of brothers and sisters your friends and stored it in a frequency table.
Number of Siblings | $$\Tblue{0}$$ | $$\Tblue{1}$$ | $$\Tblue{2}$$ | $$\Tblue{3}$$ | $$\Tblue{4}$$ | $$\Tblue{5}$$ | $$\Tblue{6}$$ |
---|---|---|---|---|---|---|---|
Frequency | $$\Tred{10}$$ | $$\Tred{9}$$ | $$\Tred{5}$$ | $$\Tred{2}$$ | $$\Tred{1}$$ | $$\Tred{1}$$ | $$\Tred{2}$$ |
Bar charts can be vertical or horizontal. To go from vertical to horizontal or vice versa, you just need to flip axes.
A common way of representing a data distribution is to use a pie chart. A pie chart represents all data in a circle (pie) and splits it according to the proportion of each value. A pie chart does not give the actual frequency. We would need to have the number of results as separate information.
Each slice of a pie chart is called a sector. The area of a sector is proportional to the frequency of the data.
You have gathered the number of brothers and sisters of your friends in a frequency table.
Siblings | $$\Tblue{0}$$ | $$\Tblue{1}$$ | $$\Tblue{2}$$ | $$\Tblue{3}$$ | $$\Tblue{4}$$ | $$\Tblue{5}$$ | $$\Tblue{6}$$ |
---|---|---|---|---|---|---|---|
Frequency | $$\Tred{10}$$ | $$\Tred{9}$$ | $$\Tred{5}$$ | $$\Tred{2}$$ | $$\Tred{1}$$ | $$\Tred{1}$$ | $$\Tred{2}$$ |
The corresponding pie-chart has seven sectors.
Here is how to construct a pie chart from a frequency table.
Siblings | $$\Tblue{0}$$ | $$\Tblue{1}$$ | $$\Tblue{2}$$ | $$\Tblue{3}$$ | $$\Tblue{4}$$ | $$\Tblue{5}$$ | $$\Tblue{6}$$ |
---|---|---|---|---|---|---|---|
Frequency | $$\Tred{10}$$ | $$\Tred{9}$$ | $$\Tred{5}$$ | $$\Tred{2}$$ | $$\Tred{1}$$ | $$\Tred{1}$$ | $$\Tred{2}$$ |
- Compute the total frequency by summing all the frequencies.
- Determine the angle-to-frequency ratio with the formula $$$ \text{Ratio}= \frac{\Tlightgreen{360}}{\Tlightred{\text{Total Frequency}}} $$$
The ratio for the table is $$\displaystyle\frac{\Tlightgreen{360}}{\Tlightred{30}} = 12$$.
- Multiply each frequency by the ratio to convert it into an angle . $$$\Tgreen{\text{Angle}} = \Tred{\text{Frequency}} \times \text{Ratio}$$$
It is $$\Tlightred{30}$$ for the table.
Data | $$\Tblue{0}$$ | $$\Tblue{1}$$ | $$\Tblue{2}$$ | $$\Tblue{3}$$ | $$\Tblue{4}$$ | $$\Tblue{5}$$ | $$\Tblue{6}$$ | Total |
---|---|---|---|---|---|---|---|---|
Freq. | $$\Tred{10}$$ | $$\Tred{9}$$ | $$\Tred{5}$$ | $$\Tred{2}$$ | $$\Tred{1}$$ | $$\Tred{1}$$ | $$\Tred{2}$$ | $$\Tlightred{30}$$ |
Angle | $$\Tgreen{120}$$ | $$\Tgreen{108}$$ | $$\Tgreen{60}$$ | $$\Tgreen{24}$$ | $$\Tgreen{12}$$ | $$\Tgreen{12}$$ | $$\Tgreen{24}$$ | $$\Tlightgreen{360}$$ |
- Divide a circle into the computed angles with a protactor.
A histogram is common way to represent the frequency of data. It is similar to a bar chart .
In a histogram, the area of the bars is proportional to the frequencies. In a bar chart , it is the height of the bars.
The histogram is generally used for continuous data (i.e. data that can take any value, not only integer values). In this case, the frequencies is given for data within an interval.
For the following frequency table, the histogram and the bar chart are the same. They represent the time (in number of hours) families spent daily on watching TV. $$\Tblue{0}$$ means strictly less than one hour. $$\Tblue{1}$$ means between one and two hours.
TV time (hour) | $$\Tblue{0}$$ | $$\Tblue{1}$$ | $$\Tblue{2}$$ | $$\Tblue{3}$$ | $$\Tblue{4}$$ | $$\Tblue{5}$$ | $$\Tblue{6}$$ |
---|---|---|---|---|---|---|---|
Frequency | $$\Tred{10}$$ | $$\Tred{9}$$ | $$\Tred{5}$$ | $$\Tred{2}$$ | $$\Tred{1}$$ | $$\Tred{1}$$ | $$\Tred{2}$$ |
Bar charts and histograms are only different when we use grouped data for groups of different size.
TV time (hour) | $$\Tblue{0}$$ | $$\Tblue{1}$$ | $$\Tblue{2-3}$$ | $$\Tblue{4-6}$$ |
---|---|---|---|---|
Frequency | $$\Tred{10}$$ | $$\Tred{9}$$ | $$\Tred{7}$$ | $$\Tred{4}$$ |
Another representation of data is the stem-and-leaf diagram. It gives similar information to a histogram, but it focuses on the raw data rather than on their frequency.
Assume you have collected the weight of all your friends (in kg).
$$\Tblue{50}$$ | $$\Tblue{55}$$ | $$\Tblue{52}$$ | $$\Tblue{52}$$ | $$\Tblue{42}$$ | $$\Tblue{35}$$ | $$\Tblue{40}$$ | $$\Tblue{65}$$ | $$\Tblue{47}$$ | $$\Tblue{61}$$ |
$$\Tblue{45}$$ | $$\Tblue{48}$$ | $$\Tblue{52}$$ | $$\Tblue{39}$$ | $$\Tblue{51}$$ | $$\Tblue{55}$$ | $$\Tblue{41}$$ | $$\Tblue{63}$$ | $$\Tblue{68}$$ | $$\Tblue{38}$$ |
To get a stem-and-leaf diagram, you order all the entries and put them in categories. Each category is a stem. The data are the leaves.
We take the tens as stems and the units as leaves. The data corresponding to the stem $$\Tblue{3}$$ are $$\Tblue{35}$$, $$\Tblue{38}$$, $$\Tblue{39}$$. The leaves are therefore $$\Tgreen{5}$$, $$\Tgreen{8}$$, $$\Tgreen{9}$$.
Stem-leave diagrams are still used for timetables.