DOC: Independence test on closed questions in AskiaAnalyse
|Summary||This article describes how the independence test on closed questions are calculated in AskiaAnalyse.|
|Written for||Data processor|
|Keywords||independence; test; closed; analyse; askiaanalyse|
Documentation note : merge with http://analysishelp.askia.com/significance-2_analyse
Suppose that we want to compare the responses to the same question from two different surveys (or from two sub-populations). We will need to make an adjustment test using the Khi2 test.
Principle of the Khi2 test on a closed question
We make the hypothesis that there is no significative difference between the observed and theoretical frequencies. We calculate the value of the Khi2. This is the square of the difference of the observed frequency and the theoretical frequency divided by the theoretical frequency, which is the sum of all responses.
Thus, we use the Khi2's tables to obtain the result with the number of liberty degrees (which is the number or response items minus 1).
Khi2= SUM ( Eoi - Eti )2 / Eti
. . . where the Eoi are the observed counts and the Eti are the theoretical counts.
Sex Male Female
Observed counts 169 161
Theoretical counts 165 165
Khi2= ( 169 - 165) 2 / 165 + ( 161 - 165 )2 / 165 = 0,058564
Reading the Khi2 tables with a degree of freedom of 1, we obtain the probability of 97,1% of making a mistake when refusing the hypothesis. We can therefore accept the hypothesis that the sample is representative of the population.
Independence test of the Khi2
After having done a cross-tab count, we can ask ourselves whether there is a dependence between the two questions. That is to say, if the fact of knowing a response to a question gives information on another question.
We can ask ourselves whether there is a significative difference on the preference for a kind of packaging between men and women. 62,5 % of men preferred "packaging B" compared to 83,33% of women. This distance may seem significative, but what would it have been if the results had been 80% and 82%, or if we had only surveyed 10 people ?
To establish whether or not there is a significative dependence between the two closed questions, we therefore use the Khi² test.
Before defining what the dependence is, we shall attempt to define what the independence is.
This signifies that having an information on a question does not yield any information on the other. This implies that each of the line profiles are equal in the single counts. The same goes for the column profiles . . .
You dispose of 3 available tests:
- Comparing cells (Khi2): classic test.
- Comparing columns: complementary columns (Column TOTAL - cell to compare).
- Comparing lines: complementary lines (Line TOTAL - cell to compare).
In the "Tools" menu, choose the desired method of calculation.
To obtain the counts to independence, we calculate the Cntindij:
Fi. : Frequency in line
F.j : Frequency in column
Cij : Counts observed
N: Size of the population
We then have :
Cntindij= Fi. X F.j X N
The Khi2 will be calculated by summing over each cell the squared difference, the observed and the independence counts divided by the independence counts. If the tables are the same, the value will default to 0.
Khi2= SUMij (Eij - Cntindij)2 / Efindij
When we read this value khi2 table at "n" degrees of liberty, we find the value of the error percentage for refusing the independence hypothesis.
The number of degrees of freedom are calculated in the following way:
NDL= (n-1) response items in column X (n-1) response items in line.
We calculate from the table of the squared differences of observed and the independence counts divided by the independence counts, the contributions to the Khi2.
The "-" or "+" signs indicate whether we are below or above the counts to independence.
NB : The use of the Khi2 test is recommended when you have a sample of ate least 20 people and if all your theoretical counts to independence are superior to 5.
Reading and interpreting the tables:
When you make a cross-tab count, you can display the significance. The number of signs indicate significance:
- "+++" or "---": indicates a significance of 99%.
- "++" or "--": indicates a significance of 95%.
- "+" or "-": indicates a significance of 90%.
"+": indicates that the significance is increasing.
"-": indicates that the significance is decreasing.