Use adaptive quiz-based learning to study this topic faster and more effectively.

# Data handling

## Collecting data

Statistics is concerned with the collection and the analysis of data.

Statistics can be used to study the size of families in Singapore, the life cycle of a butterfly, or the evolution of the crime rate in Chicago.

Data can be collected by several means.

• Observing the occurences of a given event
• Count the beats of your heart for a minute.

• Running an experiment several times and recording the results
• Throw a dice several times to see if it is biased.

• Conducting surveys by interviewing a group of people
• Ask the students in your class how many siblings they have.

Statistical data can summarize a lot of information

## Frequency table

After the data has been collected, it needs to be summarised.

You have collected the numbers of brothers and sisters your friends have.

 $\Tblue{0}$ $\Tblue{2}$ $\Tblue{1}$ $\Tblue{3}$ $\Tblue{1}$ $\Tblue{5}$ $\Tblue{6}$ $\Tblue{1}$ $\Tblue{1}$ $\Tblue{1}$ $\Tblue{2}$ $\Tblue{2}$ $\Tblue{0}$ $\Tblue{2}$ $\Tblue{6}$ $\Tblue{1}$ $\Tblue{0}$ $\Tblue{1}$ $\Tblue{0}$ $\Tblue{0}$ $\Tblue{4}$ $\Tblue{3}$ $\Tblue{0}$ $\Tblue{0}$ $\Tblue{0}$ $\Tblue{1}$ $\Tblue{0}$ $\Tblue{0}$ $\Tblue{2}$ $\Tblue{1}$

The number of times a value (such as number $\Tblue{3}$ in the table) occurs is the frequency of the value.

The frequency of $\Tblue{3}$ in the table is $\Tred{2}$, as number $\Tblue{3}$ appears twice.

The frequency table shows how many times each value occurs.

 Number of Siblings Frequency $\Tblue{0}$ $\Tblue{1}$ $\Tblue{2}$ $\Tblue{3}$ $\Tblue{4}$ $\Tblue{5}$ $\Tblue{6}$ $\Tred{10}$ $\Tred{9}$ $\Tred{5}$ $\Tred{2}$ $\Tred{1}$ $\Tred{1}$ $\Tred{2}$

$\Tred{10}$ of your friends are a single child and $\Tred{9}$ have only one sibling.

The data can be grouped to give a more compact representation.

 Number of Siblings Frequency $\Tblue{0}$ $\Tblue{1}$ $\Tblue{2-3}$ $\Tblue{4+}$ $\Tred{10}$ $\Tred{9}$ $\Tred{7}$ $\Tred{4}$

The table shows that you have $\Tred{4}$ friends who have at least four siblings.

## Dot diagram and pictogram

A dot diagram is a way of displaying data so it is easy the different frequences. For each entry, you add a dot in front of the entry.

Sometimes the dots are replaced by small icons (or pictures). The resulting diagram is called a pictogram. We can use different images for different numbers or types of data. For instance, you can have one image for a pair and another one for a unit.

A dot diagram and a pictogram (a red character represents two friends, a green character represents one).

It is easy to get the frequency table from the dot diagram and vice-versa. The frequency of an entry is the number of dots.

This is the frequency table used for the diagrams above.

 Number of Siblings Frequency $\Tblue{0}$ $\Tblue{1}$ $\Tblue{2}$ $\Tblue{3}$ $\Tblue{4}$ $\Tblue{5}$ $\Tblue{6}$ $\Tred{10}$ $\Tred{9}$ $\Tred{5}$ $\Tred{2}$ $\Tred{1}$ $\Tred{1}$ $\Tred{2}$

## Line charts and bar charts

• In a line chart, you plot the frequency for each value and you connect the points with a line.
• In a bar chart, you draw a bar for each value. The height is determined by the frequency.

The bar corresponding to $\Tblue{0}$ sibling will be $\Tred{5}$ cm long; the bar corresponding to $\Tblue{1}$ sibling with be $\Tred{4.5}$ cm, etc.

You have collected the numbers of brothers and sisters your friends and stored it in a frequency table.

 Number of Siblings Frequency $\Tblue{0}$ $\Tblue{1}$ $\Tblue{2}$ $\Tblue{3}$ $\Tblue{4}$ $\Tblue{5}$ $\Tblue{6}$ $\Tred{10}$ $\Tred{9}$ $\Tred{5}$ $\Tred{2}$ $\Tred{1}$ $\Tred{1}$ $\Tred{2}$
A line chart and a bar chart

Bar charts can be vertical or horizontal. To go from vertical to horizontal or vice versa, you just need to flip axes.

A horizontal bar chart.

## Pie chart

A common way of representing a data distribution is to use a pie chart. A pie chart represents all data in a circle (pie) and splits it according to the proportion of each value. A pie chart does not give the actual frequency. We would need to have the number of results as separate information.

Each slice of a pie chart is called a sector. The area of a sector is proportional to the frequency of the data.

You have gathered the number of brothers and sisters of your friends in a frequency table.

 Siblings Frequency $\Tblue{0}$ $\Tblue{1}$ $\Tblue{2}$ $\Tblue{3}$ $\Tblue{4}$ $\Tblue{5}$ $\Tblue{6}$ $\Tred{10}$ $\Tred{9}$ $\Tred{5}$ $\Tred{2}$ $\Tred{1}$ $\Tred{1}$ $\Tred{2}$

The corresponding pie-chart has seven sectors.

Pie-chart: Number of siblings of my friends.

## How to draw a pie chart?

Here is how to construct a pie chart from a frequency table.

 Siblings Frequency $\Tblue{0}$ $\Tblue{1}$ $\Tblue{2}$ $\Tblue{3}$ $\Tblue{4}$ $\Tblue{5}$ $\Tblue{6}$ $\Tred{10}$ $\Tred{9}$ $\Tred{5}$ $\Tred{2}$ $\Tred{1}$ $\Tred{1}$ $\Tred{2}$
• Compute the total frequency by summing all the frequencies.
• It is $\Tlightred{30}$ for the table.

• Determine the angle-to-frequency ratio with the formula $$\text{Ratio}= \frac{\Tlightgreen{360}}{\Tlightred{\text{Total Frequency}}}$$

The ratio for the table is $\displaystyle\frac{\Tlightgreen{360}}{\Tlightred{30}} = 12$.

• Multiply each frequency by the ratio to convert it into an angle . $$\Tgreen{\text{Angle}} = \Tred{\text{Frequency}} \times \text{Ratio}$$
 Data Total Freq. Angle $\Tblue{0}$ $\Tblue{1}$ $\Tblue{2}$ $\Tblue{3}$ $\Tblue{4}$ $\Tblue{5}$ $\Tblue{6}$ $\Tred{10}$ $\Tred{9}$ $\Tred{5}$ $\Tred{2}$ $\Tred{1}$ $\Tred{1}$ $\Tred{2}$ $\Tlightred{30}$ $\Tgreen{120}$ $\Tgreen{108}$ $\Tgreen{60}$ $\Tgreen{24}$ $\Tgreen{12}$ $\Tgreen{12}$ $\Tgreen{24}$ $\Tlightgreen{360}$
• Divide a circle into the computed angles with a protactor.
The angle of each sector of the pie chart is proportional to the frequency of the corresponding category.

## Histogram

A histogram is common way to represent the frequency of data. It is similar to a bar chart .

In a histogram, the area of the bars is proportional to the frequencies. In a bar chart , it is the height of the bars.

The histogram is generally used for continuous data (i.e. data that can take any value, not only integer values). In this case, the frequencies is given for data within an interval.

For the following frequency table, the histogram and the bar chart are the same. They represent the time (in number of hours) families spent daily on watching TV. $\Tblue{0}$ means strictly less than one hour. $\Tblue{1}$ means between one and two hours.

 TV time (hour) Frequency $\Tblue{0}$ $\Tblue{1}$ $\Tblue{2}$ $\Tblue{3}$ $\Tblue{4}$ $\Tblue{5}$ $\Tblue{6}$ $\Tred{10}$ $\Tred{9}$ $\Tred{5}$ $\Tred{2}$ $\Tred{1}$ $\Tred{1}$ $\Tred{2}$
Bar chart (left) and histogram (right) are the same for uniform data.

Bar charts and histograms are only different when we use grouped data for groups of different size.

 TV time (hour) Frequency $\Tblue{0}$ $\Tblue{1}$ $\Tblue{2-3}$ $\Tblue{4-6}$ $\Tred{10}$ $\Tred{9}$ $\Tred{7}$ $\Tred{4}$
Bar chart (left) and histogram (right) for grouped data.

## Stem-and-leaf diagram

Another representation of data is the stem-and-leaf diagram. It gives similar information to a histogram, but it focuses on the raw data rather than on their frequency.

Assume you have collected the weight of all your friends (in kg).

 $\Tblue{50}$ $\Tblue{55}$ $\Tblue{52}$ $\Tblue{52}$ $\Tblue{42}$ $\Tblue{35}$ $\Tblue{40}$ $\Tblue{65}$ $\Tblue{47}$ $\Tblue{61}$ $\Tblue{45}$ $\Tblue{48}$ $\Tblue{52}$ $\Tblue{39}$ $\Tblue{51}$ $\Tblue{55}$ $\Tblue{41}$ $\Tblue{63}$ $\Tblue{68}$ $\Tblue{38}$

To get a stem-and-leaf diagram, you order all the entries and put them in categories. Each category is a stem. The data are the leaves.

We take the tens as stems and the units as leaves. The data corresponding to the stem $\Tblue{3}$ are $\Tblue{35}$, $\Tblue{38}$, $\Tblue{39}$. The leaves are therefore $\Tgreen{5}$, $\Tgreen{8}$, $\Tgreen{9}$.

A stem-leaf diagram : the weight of my friends.

Stem-leave diagrams are still used for timetables.

A train-timetable in a stem-leaf format