|Summary||This article describes how the independence test on closed questions are calculated in askiaanalyse.|
|Written for||Data processor|
|Keywords||independence; test; closed; analyse; askiaanalyse|
Documentation note : merge with http://analysishelp.askia.com/significance-2_analyse
Suppose that we want to compare the responses to the same question from two different surveys (or from two sub-populations). We make an adjustment test using the Khi2 test.
Principle of the khi2 test on a closed question
We make the hypothesis that there is no significative difference between the observed and theoretical frequencies. We calculate the value of the Khi2 : This is the square of the difference of the observed frequency and the theoretical frequency divided by the theoretical frequency, which is the sum of all responses.
Thus, we use the Khi2's tables to obtain the result with the number of liberty degrees (which is the number or response items minus 1).
Khi2= SUM ( Eoi - Eti )2 / Eti
where the Eoi are the observed counts and the Eti are the theoretical counts.
Sex Male Female
Observed counts 169 161
Theoretical counts 165 165
Khi2= ( 169 - 165) 2 / 165 + ( 161 - 165 )2 / 165 = 0,058564
Reading the Khi2 tables with a degree of liberty of 1, we obtain the probability of 97,1% of making a mistake when refusing the hypothesis. We can therefore accept the hypothesis that the sample is representative of the population.
Independance test of the Khi2
After having done a cross-tab count, we can ask ourselves whether there is a dependance between the two questions. That is to say, if the fact of knowing a response to a question gives information on another question.
We can ask ourselves whether there is a significative difference on the preference for a kind of packaging between men and women. 62,5 % of men prefered "packaging B" compared to 83,33% of women. This distance may seem significative, but what would it have been if the results had been 80% and 82%, or if we had only surveyed 10 people ?
To establish whether or not there is a significative dependance between the two closed questions, we therefore use the Khi² test.
Before defining what the dependance is, we shall attempt to define what the independance is.
This signifies that having an information on a question does not yield any information on the other. This implies that each of the line profiles are equal in the single counts. The same goes for the column profiles...
You dispose of 3 available tests :
- Comparing cells (Khi2) : classic test
- Comparing columns : Complementary columns (Column TOTAL - cell to compare)
- Comparing lines : Complementary lines (Line TOTAL - cell to compare)
In the "Tools" menu, choose the desired method of calculation.
To obtain the counts to independance, we calculate the Cntindij :
Fi. : Frequency in line
F.j : Frequency in column
Cij : Counts observed
N: Size of the population
We then have :
Cntindij= Fi. X F.j X N
The Khi2 will be calculated by summing over each cell the squared difference, the observed and the independence counts divided by the independence counts. If the tables are the same, the value will default to 0.
Khi2= SUMij (Eij - Cntindij)2 / Efindij
When we read this value khi2 table at "n" degrees of liberty, we find the value of the error percentage for refusing the independance hypothesis.
The Number of Degrees of Liberty are calculated in the following way :
NDL= (n-1) response items in column X (n-1) response items in line.
We calculate from the table of the squared differences of observed and the independence counts divided by the independence counts, the contributions to the Khi2
The "-" or "+" signs indicate whether we are below or above the counts to independance.
NB : The use of the Khi2 test is recommended when you have a sample of ate least 20 people and if all your theoretical counts to independance are superior to 5.
Reading and interpreting the tables :
When you make a cross-tab count, you can display the significance. The number of signs indicate significance:
- "+++" or "---" : indicate a significance of 99%
- "++" or "--" : indicate a significance of 95%
- "+" or "-" : indicate a significance of 90%
"+" indicates that it is significant on the increase
"-" indicates that it is significant on the decrease
For more details : cf. article "Test Values in cross-tab counts (closed questions)".