RegExp in the Command Line for Question Selections
Summary |
When you import SAV (SPSS) files, for example, the loops and multiples are flat. We use existing functionality in AskiaTools to create multiples and loops. If there are many to create, we can write a macro and/or batch file to save the selections but the initial, one-off process of working out and writing the selections to convert can take a long time. This article talks through how you can use RegExp in the command line to explain the pattern for the variable naming and get AskiaTools to detect and create the loops and multiples. You can also use the RegExp functionality to easily create a question selection to include or exclude from your exports for sav, delimited, multimedia from the command line. |
Applies to | AskiaTools |
Written for | Data Processor |
Keywords | command line; automate; task; tools; askiatools; commandline; create loop; createloop; create multiple; createmultiple; loops; multiple; SPSS; triple; triple-s; import; flat; structure; transform; convert, regular expression, regex, regexp |
You will need AskiaTools 5.4.8.0 (13/12/2017 onward) to make use of the functionalities to:
- Automatically detect and create loops and multiples.
- Automatically detect and include/exclude from exports.
This article will give examples of both of the functionalities listed above but will focus bullet point 1 for the full explanation.
The scenario
You imported your SAV or Triple-S file to create a QES - some manual work is required to create loops and multiples from the flattened file structure.
You can do this manually in AskiaTools using the options shown below. You can also record a macro before you start which saves the transformations you did and can be run later to repeat them.
The macro is simply an XML file so you can open this in a text editor and follow the pattern to create the transform routine for your whole QES file after only recording a few steps initially:
You can also use the command line to instruct the creation of loops and multiples:
So there are some options to save and quickly re-run your transformation routines.
The problem
You've imported your file for a survey that is very long, in the process of examining your structure you realise there are 450 multiple response questions and 70 loops to create. An added complication is that not all the questions which require transforming are in consecutive order. e.g. in the below example all questions containing "_NNb_" need to be merged and all questions containing "_NNc_" need to be merged.
It's one of the more extreme examples but I have seen one or two cases like this.
The reaction
Start to record a macro and stop it after creating a few loops and multiples. Open the XML file in a text editor and try to use a combination of the full question list and Excel to create the lines of code for the macro as in screenshot A.
Perhaps take the command line approach to create semi-colon separated lists of questions to be transformed, again using the full question list and Excel/Word to chop up and reformat into the required arrangement.
These processes are fiddly and prone to error, especially when questions to be merged are not in consecutive order. You also need to be fairly au fait with Excel formulas and have a few hours free to work out, write and check the routine. Of course, if a colleague needs to trouble-shoot your work or pick up a similar task next time, they need to sift through sheets of Excel formulas and separate steps which makes things difficult to understand.
The solution
Once you have worked out the naming patterns for the groups of questions to be transformed into multis and loops, you can write these patterns as a regular expression (RegExp). Examples 1 & 2 are simpler and can also be found on our Help Centre page for command line in askia Tools. Example 3 & 4 are for slightly more involved scenarios.
Here are the rules worth remembering for these examples:
- . means match any character which is not part of a newline sequence. E.g. ab. matches abc and abz and ab_.
- * matches zero or more of the preceding character, means that the immediately-preceding pattern is repeated, e.g. a* matches ab and aaab.
- + matches one or more of the preceding character i.e. the string has to begin with the character specified for matching (unlike * directly above).
- () has many uses but here we are capturing the sub-pattern and returns this as the new child question name (loop) or multiple question name once transformation done.
- [] square brackets enclose a list and/or range of characters e.g. [a-zA-Z0-9_] means any single character that is alphanumeric or underscore.
- .* the dot-star pattern (or wildcard) is one of the most permissive because it matches zero or more occurrences of any character (except newline). E.g. abc.*123 matches abcAnything123 as well as abc123.
E.g. [0-9]* means any number of numeric digits and [a-z]* means any number of alpha characters
There is a handy quick reference guide here for commonly used RegExp syntax.
Ex1.- Click here to download example
To create multiples from questions where alphanumeric string before '_' underscore marker is the same and text after is a number to denote response order in the multiple:
Tools.exe "C:\my directory\QesImportedFromSAV" /createmultiple /noyes /regex:"(.+)_[0-9]+"
E.g. this will group:
- Questions RB93_1 through to RB93_20 into a multicode 'QRB93' with 20 codes.
- Questions SA1_1 through to SA1_100 into a multicode 'QSA1' with 100 codes.
- . . . and continue to create more multiple questions with any groups of questions that match the pattern specified.
Ex2.- Click here to download example (same as above)
To create loops from questions where variable names end with "SLICE". Immediately before this there is a number to denote iteration order in the loop. Immediately before this there is some alphanumeric string for the variable names.
Tools.exe "C:\my directory\QesImportedFromSAV" /createloop /regex:"(.+)[0-9]+SLICE"
E.g. this will group:
- Questions D15B1SLICE through to D15B10SLICE into a 10 iteration loop, 'Loop_D15B1SLICE' with child question, 'QD15B'.
- Questions SA-1SLICE through to SA-50SLICE into a 50 iteration loop, 'Loop_SA-1SLICE' with child question, 'QSA-'.
- ... and continue to create more looped questions with any groups of questions that match the pattern specified.
Ex3.- Click here to download example
To create loops from questions where:
- Alphanumeric string before first '_' underscore marker is the same.
- Text after is a number for the brand and either the letter b or c to denote market type (which loop).
- Then the second '_' underscore marker follows this.
- Finally, the text after is a number to denote the product types for a brand.
Tools.exe "C:\my directory\QesImportedFromSAV" /createloop /regex:"q([0-9]*)_[0-9]*([a-z]*)_[0-9]+"
This creates two loops:
Q2b's Loop:
Q2c's Loop:
Ex4.- Click here to download example
You could easily amend this to create a loop for each market's product type. There are six product types (1, 2, 3, 4, 5, 91) and two markets (b, c) so we would end up with 12 loops (with 9 iterations, one for each brand). The command line to create loops like this is as follows:
Tools.exe "C:\my directory\QesImportedFromSAV" /createloop /regex:"q([0-9]*)_[0-9]*([a-z]*)_([0-9]+)"
Note here the only difference is we have put some parentheses around the last section ([0-9]+). These parentheses tell the command line to also use this part of the name as a differentiator when considering variable names to merge into loops.
Ex5.- Click here to download example
Finally, the other mentioned use of this RegExp functionality is when you are exporting a set of questions from the command line. You can use:
- /regex - to list a pattern of questions to include in your export.
- /excluderegex - to list a pattern of questions to exclude in your export.
If we are exporting to SAV from the previous example QES file but we want to exclude all questions which end with _05 or _91 then we can write this in two ways:
Tools.exe "Ex5.qes" /exportsav /datafile:"Ex5b.sav" /regex:"q([0-9]*)_[0-9]*([a-z]*)_[0-4]+"
Tools.exe "Ex5.qes" /exportsav /datafile:"Ex5a.sav" /excluderegex:"q([0-9]*)_[0-9]*([a-z]*)_(05|91)"