DOC: Random Data Generation (RDG)
Summary | This article describes how to use random data generation features added in to AskiaTools until version 5.3.3.5. AskiaTools 5.3.3.5 onwards random data generation can take into account any quota related routings in your QES file. |
Applies to | AskiaTools |
Written for | Data Processors, Analysts, Scripters |
Keywords | RDG, dummy data, QuotaToDo, MaxQuotaToDo, AvailableQuota, IsQuotaFullFor, routing, quota, testing, split, database, connection, AskiField, parameter, script, Number of attempts per question, version used, unsuccessful attempts, save interview, screen out, random seed |
Documentation notes: merge with http://analysishelp.askia.com/generate_random_interviews and keep a link to this article.
Features up to 5.3.3.4
In AskiaTools, you can access the RDG interface in the Tools menu along the top. In versions up to 5.3.3.4 you’ll be presented with the below:
The Number of interviews is fairly self-explanatory.
Let’s look at Number of attempts per question.
We have the numeric question ‘num’ which during RDG will create values between 1 to 100.
However, there is also this blocking message routing which will stop progression through the survey if responses to ‘num’ are not 96 or higher.
So the RDG algorithm will try 10 times to get past the blocking condition and after that it will stop there. In the QES file RDG_1 NOT SAVE after uas (31).qes I had not ticked save after unsuccessful attempts and therefore this has only 31 interviews generated (69 were still unsuccessful after 10 attempts at getting past ‘num’). In the other file RDG_1 SAVE after uas (100).qes you can see that it attempts but if they fail they are allowed through with their last attempt.
Both these files are attached if you'd like to take a closer look.
Versions - The version can be really useful because if your questionnaire has many opportunities for the respondent to screen out before the main body of the survey. Without it you will find that you can run thousands of RDG interviews which will only result in a handful of respondents entering the main body of the questionnaire.
In AskiaDesign you can create versions and set your routing only to run on these versions.
Routing applied to 'Release' version only:
Then in the RDG interface you can run on either the Test or the Release version to see the difference – there are far fewer respondents in the main body of the questionnaire when you run on ‘Release’ version because this is the version which has the screener routings applied.
Save interview is screened out – this option will remove or keep screened out interviews in your file.
E.g. routing that uses ‘Go without saving’ . . .
Seed – this is a number you should note down in case you want to run the exact same RDG run again. By that I mean the same algorithm which will output the same distribution of data across all codes and variables. It can be useful for testing the effect of a questionnaire change whilst keeping everything else the same.
Features from 5.3.3.5 onwards (RDG using quota keywords)
Information on these quota routing keywords can be found in the AskiaDesign Assistant.
To demonstrate the new functionality I have created a small example file, LEAST_FILL_RDG, and I will run random data through it twice using this version of AskiaTools:
LEAST_FILL_RDG_0.qes – will have random data generated without the use of the new feature.
. . . and
LEAST_FILL_RDG_1.qes – will have random data generated with the use of the new feature.
We review the data at the end to see the difference in behaviour.
There needs to be a task set up for this on Supervisor with quotas:
The QES file is set up as follows (QES files attached):
The variable q1 has three codes. The example routing is such that if the respondent selects one of the first two codes at q1, they go on to answer Chapter A. If they select code three, they go on to answer Chapter B. This is driven by the select variable highlighted above.
Without the quota routing being taken into account we would expect a natural fall out of roughly 67% answering Chapter A and 33% answering Chapter B.
However, we can force the quotas to be balanced in terms of their final number by adding the routing:
and
QuotaToDo("quota:"+1)>0
This means that when quota code A naturally reaches the target of 50 before quota B, then QuotaToDo("quota:"+1) = 0 and the whole condition switches from being true to false.
Therefore, from that point on, the select variable has a value of 2 set into it rather than 1 until the quotas are both reached at 50 / 50.
You’ll notice that you need a string passed for the parameter “Connection to AskiaField database”.
Provider=sqloledb; Data Source=exampleServerAddress; Initial Catalog=exampleDataBase;Trusted_Connection=no;User Id=exampleUserId;Password=examplePassword
In my case this is the Surveys database connection string from Supervisor with "Surveys" changed to "Cca".
You may need to consult with your IT department or server admin on this point.
After running 100 interviews in each QES file now, let’s review the data:
Other points to note on this functionality are that it works either with target counts or target percentage in your quota definition.
You can also run the RDG into a new or legacy format field task and reload the task to access the achieved information in Monitor or Quota availability.