Customer Service: +44 1334 657957

Introduction to repeat question analysis

This document introduces repeat questions in Market Research data.

The questionnaire

For this introduction the questionnaire is very short to make it easy to check the tables produced.  The questionnaire relates to shopping habits and is administered to shoppers at the end of the day.  Shoppers are asked about their first two purchases and then about the first two drinks they consumed.

_____________________________________________________________________

NAME. Name                                                            |

-------------------------------------------------------------         |

______________________________________________________________________________

Q1A1. First purchase - Type of outlet visited                  |      |

                                                     Department |    1 |

                                               Specialist store |    2 |

______________________________________________________________________________

Q1B1. First purchase - Type of goods purchased                 |      |

                                                        Clothes |    1 |

                                                 Other products |    2 |

______________________________________________________________________________

Q1A2. Second purchase - Type of outlet visited                 |      |

                                                     Department |    1 |

                                               Specialist store |    2 |

______________________________________________________________________________

Q1B2. Second purchase - Type of goods purchased                |      |

                                                        Clothes |    1 |

                                                 Other products |    2 |

______________________________________________________________________________

SEX. Sex                                                       |      |

                                                           Male |    1 |

                                                         Female |    2 |

______________________________________________________________________________

Q2A1. First drink - Type of drink consumed                     |      |

                                                     Tea/coffee |    1 |

                                                    Other drink |    2 |

______________________________________________________________________________

Q2B1. First drink - Number of drinks consumed                         |

                                                           |__|        |

______________________________________________________________________________

Q2A2. Second drink - Type of drink consumed                    |      |

                                                     Tea/coffee |    1 |

                                                    Other drink |    2 |

______________________________________________________________________________

Q2B2. Second drink - Number of drinks consumed                        |

                                                           |__|        |

 

This questionnaire contains two sets of repeats – Q1 and Q2.  The gender question has been put between Q1 and Q2 to separate them – to make it clear that they are separate questions and not related to each other.

If there was no second purchase or no second drink these questions are left blank.

The data records

In order to keep the data simple and to facilitate the checking of tables against the original questionnaires, we are only going to have 4 data records, 2 men and 2 women.  We are assuming that these 4 people are representative of all shoppers.

Serial number 1

NAME          Mary

Q1A1     1    Department

Q1B1     1    Clothes

Q1A2     1    Department

Q1B2     2    Other products

SEX      2    Female

Q2A1     1    Tea/coffee

Q2B1          1

 

Serial number 2

NAME          Jill

Q1A1     2    Specialist store

Q1B1     1    Clothes

Q1A2     1    Department

Q1B2     2    Other products

SEX      2    Female

Q2A1     1    Tea/coffee

Q2B1          1

Q2A2     2    Other drink

Q2B2          2

 

Serial number 3

NAME          John

Q1A1     2    Specialist store

Q1B1     2    Other products

SEX      1    Male

Q2A1     2    Other drink

Q2B1          5

 

Serial number 4

NAME          Billy

Q1A1     2    Specialist store

Q1B1     1    Clothes

Q1A2     2    Specialist store

Q1B2     1    Clothes

SEX      1    Male

Q2A1     1    Tea/coffee

Q2B1          2

 

Mary is a busy mother who visited department stores to buy some clothes and some other products and had a quick cup of tea.

Jill bought designer clothes from a specialist store and some electrical goods from a department store.  She also had time for a morning coffee and later 2 other drinks.

John is a student and he bought a book from a specialist store and spent the rest of the day in the pub.

Billy visited specialist stores and bought two lots of clothes.  He stopped twice for a coffee.

From these 4 records we can see that there were –

3 purchases from department stores and 4 from specialist stores.

4 purchases of clothes and 3 purchases of other products.

3 tea/coffee and 7 other drinks consumed.

Analysis of Q1A

We are now going to look at the analysis of Q1A Type of outlet visited.  There are two repeats of this question Q1A1 and Q1A2.

There are two ways to analyse this question – we can base tables on individuals or on purchases.

Here is table based on individuals with gender as the breakdown (banner):

Table 1

Q1A combined

Base: All respondents

                   Total   Male Female

Total                 4      2      2

Department            2      -      2

                      50%     -%   100% 

Specialist store      3      2      1

                      75%   100%    50%

 

This table has four respondents, two of each sex.  In our data men only bought from specialist stores.  Both women bought from department stores and one also bought from a specialist store.

The rows (Q1A combined) are made by creating a multi-coded variable (called VQ1A) and using [Block insert] to [Or together] Q1A1 and Q1A2.

Here is the same table based on purchases:

Table 2

Type of outlet visited

Base: All purchases

                   Total   Male Female

Total                 7      3      4

Department            3      -      3

                      43%     -%    75%

Specialist store      4      3      1

                      57%   100%    25%

 

This table has the same rows and columns as table 1 but is now based on the 7 purchases.  In our data all three purchases by men were in specialist stores.  Women made 3 purchases in department stores and 1 purchase in a specialist store.

This table is actually two tables – the first has Q1A1 as the rows and the second has Q1A2 as the rows and the second table is overlaid onto (added to) the first table.  The first table also has the row title text changed and filters have been applied applied to get the correct base text.

Because some people may not have a second repeat, the tables are also need to be filtered on the question not being blank.  Without these filters the base would be 8.

So which is the correct table – table 1 or table 2?

The answer is that they are both correct.  It all depends whether you are more interested in people or purchases.

Analysis of Q1A and Q1B

We are now going to produce a table of Q1A by Q1B.  These are two related repeats – the answer to the Q1B repeat is related to the answer to the relevant Q1A repeat.

We can now produce four tables:

Table 3

Q1A combined

by Q1B combined

Base: All respondents

                        Total  Clothes    Other

                                        products

Total                     4         3        3        

Department                2         2        2

                          50%       67%      67%

Specialist store          3         2        2

                          75%       67%      67%

 

Table 4

Type of outlet visited

by Type of goods purchased

Base: All purchases

                         Total  Clothes     Other

                                         products

Total                       7        4         3

Department                  3        1         2

                            43%      25%       67%

Specialist store            4        3         1

                            57%      75%       33%

 

Table 5

Type of outlet visited

by Q1B combined

Base: All purchases

                          Total  Clothes     Other

                                          products

Total                        7        6         5

Department                   3        3         3

                             43%      50%       60%

Specialist store             4        3         2

                             57%      50%       40%

 

Table 6

Q1A combined

by Type of goods purchased

Base: All purchases

                          Total  Clothes     Other

                                          products

Total                        7        4         3

Department                   4        2         2

                             57%      50%       67%

Specialist store             5        3         2

                             71%      75%       67%

 

These four tables (3, 4, 5 and 6) have the same rows and columns as each other.  The first is based on people and the other 3 are based on purchases.  The figures are different in all four tables but which one is correct?

Again, they are all correct, although some are more correct than others.

Table 3 is a valid table and shows shopper profiles.  This table is people based, and for shoppers who bought clothes and other products it shows where the shopper purchased things (not necessarily the clothes or other products).  It must be clearly understood that this table using combined (“Or together”) variables does not show where clothes and other products were purchased.

The table which would normally be produced would be table 4.  This is based on purchases and does show where clothes and other products were purchased.  This table is Q1A1 by Q1B1 with an overlay of Q1A2 by Q1B2.  Where two related repeated questions are tabulated it is normally correct to overlay each repeat on top of each other and base the table on the number of actual repeats.

Table 5 is valid but would normally be considered incorrect.  It is overlaid but uses the combined variable as the columns and shows the number of purchases made in the store types by those shoppers who purchased clothes and other products (but not necessarily in the store types shown).

Table 6 is valid but should probably be considered incorrect.  It is overlaid but uses the combined variable as the rows and shows the number of purchases of clothes and other products and what types of store the shopper bought goods in (but not necessarily clothes or other products).

To clarify the meanings of these tables we will look at the first cell in the body of the table – Clothes column and Department row.

Table 3.  This cell shows that of the 3 shoppers who bought clothes (Mary, Jill, Billy), 2 of them bought something in a department store (Mary, Jill).  Note that Jill did not buy any clothes in the department store – she bought Other products.

Table 4.  This cell shows that of the four clothes purchases only 1 was purchased in a department store.

Table 5.  This cell shows that of the 6 purchases made by shoppers who bought clothes (Mary 2, Jill 2, Billy 2), three of those purchases were in department stores.  Note that only 1 of these 3 purchases was actually clothes (Mary), the other two were Other products (Mary, Jill).

Table 6.  This cell shows that of the 4 clothes purchases (Mary 1, Jill 1, Billy 2), 2 of these were made by shoppers who bought something in a department store (Mary 1, Jill 1).  Note that Jill purchased her clothes in a specialist store, not the department store.

Summary so far

If a repeated question is by itself we can choose whether to:

  • Make a summary of the repeats and produce one table from it.
  • Overlay the repeats on top of each other as tables 1 and 2 above.

 

For repeated questions where more than one related repeated question is involved, it is nearly always correct to use overlay tables.  The base then becomes not the number of questionnaires, but the number of repeats.

Always remember to filter tables on valid repeats only and to change the base description to the correct base.

When producing overlay tables on repeated questions the golden rule is to ensure that for each table the rows, columns, filters and weights are correct for the repeat being done.  For example, for the table of the second repeat then any definitions of variables and filters used on the table must be correct for the second repeat.

Analysis of Q2

We are now going to look at the analysis of Q2.

We can combine Q2A1 and Q2A2 as before, by creating a multi-coded variable with [Block insert] [Or together]:

Table 7

Q2A combined

Base: All respondents

                Total   Male Female

Total              4      2      2

Tea/coffee         3      1      2

                   75%    50%   100%

Other drink        2      1      1

                   50%    50%    50%

 

Table 7 uses a combined variable.

Alternatively we can overlay the tables:

Table 8

Q2A1. First drink - Type of drink consumed

Base: All drinking occasions

                Total   Male Female

Total              5      2      3

Tea/coffee         3      1      2

                   60%    50%    67%

Other drink        2      1      1

                   40%    50%    33%

 

Table 8 is an overlay.  Note the base description on table 8.

We can combine Q2B by adding together the drinks for one shopper.  To do this we use an integer variable and use an arithmetic definition to add drinks together $Q2B1+$Q2B2:

Table 9

Q2B combined

Base: All respondents

                Total   Male Female

Total              4      2      2

 

5 (5.0)            1      1      -

                   25%    50%     -%

3 (3.0)            1      -      1

                   25%     -%    50%

2 (2.0)            1      1      -

                   25%    50%     -%

1 (1.0)            1      -      1

                   25%     -%    50%

Mean score       2.8    3.5    2.0

 

Table 9 is a “list all rows” from the combined variable and we can see that all four people drank a different number of drinks in total.

Another way to present this table is to use the combined number of drinks as a quantity weight on the table:

Table 10

Base: All respondents

                Total  Male Female

Total             11     7      4

 

Table 10 has no rows but has been quantity weighted to show that the men drank 7 drinks whilst the ladies drank 4, making 11 drinks in all.

If we want to know which of these 11 drinks were tea/coffee or other drinks we need to tabulate Q2A by Q2B.  We can run a table using Q2A as the rows quantity weighted by Q2B:

Table 11

Q2A combined

Base: All respondents

                Total  Male Female

Total             11     7      4

Tea/coffee         6     2      4

                   55%   29%   100%

Other drink        8     5      3

                   73%   71%    75%

 

That was easy but inspection of the total column shows that of the 11 drinks 6 were tea/coffee and 8 were other drinks.  This table is valid but confusing – it shows that 6 drinks were consumed by shoppers who drank tea/coffee (Mary, Jill, Billy) and 8 by Shoppers who drank other drinks (Jill, John).  Note that Jill is in both rows because she drank both types of drink and has been counted as 3 drinks in both rows because she drank 3 drinks in total.

The problem with table 11 is that we are combining related repeats Q2A and Q2B in the table and should have used overlays.  Here is the overlaid equivalent using the Q2A1 weighted by Q2B1 overlaid with Q2A2 weighted by Q2B2:

Table 12

Q2A1. First drink - Type of drink consumed

Base: All drinking occasions

                   Total  Male Female

Total                11     7      4

Tea/coffee            4     2      2

                      36%   29%    50%

Other drink           7     5      2

                      64%   71%    50%

 

This table looks much better and shows the drinks as expected.

It would be possible to construct other tables with the same rows and columns using combined variables on overlay tables as we did earlier – these would be valid but not useful.

A final warning

It should be clear by now that it is very easy to produce valid, but misleading, tables from repeated questions and careful thought is needed when producing them.  Problems usually arise because tables are produced without proper thought as to what the base should be and when it is safe to use combined variables.

One easy mistake in our simple survey above would be to try to produce tables combining drinks with purchases without due care.  For example, if we wanted to produce table 12 with purchase types (Clothes, Other) as the break instead of Gender then we would have to use the combined Q1B as the columns.  We must not use Q1B1 on the first overlay and Q1B2 on the second overlay because Q1 and Q2 are not related.

It is very important to state clearly on the tables what the base is.  The easiest way to achieve this is to define a filter with the correct text and apply it to the first overlay table.  It may be possible to attach the correct base text to a filter applied to the first repeat; in this case it will automatically be used on overlaid tables.

The end users of tables do not always understand why the base is not all questionnaires on overlaid tables and ask for the base to be “put right”.  Sometimes it may be necessary to suppress the total column in such tables (using format option NPTC) to avoid confusion.  It would never be correct to suppress the total row if the table contains vertical percentages (as in all the tables above) because quality standards state that we must show the base for percentages.

Sometimes the Total base is wrong because the overlaid tables for each repeat have not been filtered correctly – each overlay table must only include valid repeats where there is data for the repeat in question.

If a client cannot be persuaded to accept tables with repeats as the base then it may be necessary to do lots of accumulations separately to use on tables.   For example in our survey it would be possible (but lengthy) to accumulate the tea/coffee drinks into one variable for each respondent and in a separate variable the number of other drinks.  These variables can then be used on other unrelated questions to get tables based on particular types of drink.  If there were 100 different types of drink this could be very lengthy indeed.

Summary

If a set of questions appear 4 times on the questionnaire then every table for these questions will consist of four tables (1 table followed by 3 overlaid tables).  This can be stated as our second golden rule:

The number of overlays (including the first) is always the same as the number of repeats in the project.

Confusion sometimes occurs because each respondent only uses a limited number of the repeats.  For example in a product test there may be four repeated sets of questions but individual respondents will always answer only two of them.  Using the second golden rule above it is usually right to use 4 overlays for all tables even though 2 of them will not be used (will be empty) for each respondent.

Another common source of confusion is where respondent details and repeat details are mixed together in a breakdown.  In this case a new breakdown variable is needed for each repeat.  The respondent detail columns are the same in all the breakdown variables, the columns from the repeated information will vary depending on the repeat being processed.

Finally to repeat earlier advice:

When producing overlay tables on repeated questions the golden rule is to ensure that the rows, columns, filters and weights are correct for the repeat being done.