To prevent sensitive data from leaking by users with access to SQL Server, we have implemented, in 5.4.8, Data Encryption.
In order to be the most efficient and be completely independent from existing SQL encryption systems that could require specific SQL versions, we have created our own encryption system.
- Encryption algorithm: 256 bit AES, with a symmetric key, CBC block cipher mode and PKCS 5 padding mode
- The encryption is randomized, so encrypting the same thing twice will generate different outputs. This randomization is done using a random ‘initialization vector’ for each encryption. This 16-byte initialization vector is prefixed to the encrypted value.
- A different key will be used for anonymized data and non-anonymized data
- Each survey will use different encryption keys
- The encryption keys will be generated on CCA, and stored in the CCA database with the survey. These encryption keys will be encrypted again using a 1024-bit RSA key stored on the CCA machine. This way the encrypted data can’t be read with only access to the SQL Server.
- Responses marked as ‘anonymized’ will always be encrypted.
- Text will be converted to UTF-8 before being encrypted. This way it’s always sure what format the data will be upon decryption.
- Responses with no data (open data = empty or closed data = -1) will not be encrypted. This makes it possible to find unanswered questions without decrypting all the data.
- The encryption key used to encrypt the other encryption keys on CCA is CRITICAL. If this key is lost, no encrypted data can be accessed again. This key is stored on disk, on a location configurable in the CCA settings.
- When the data is exported to a QES, it will be deciphered.
- When adding a question to quota, its data will be deciphered (if that’s not the case already)
- When removing a question from quota, the data will de encrypted if necessary.
- When adding a question to quota when its data is encrypted, a warning will be shown indicating the question will be deciphered.
The survey data will be modified as follows:
- A field will be added to indicate anonymized data:
- Field name: AnonymisationType
- Field type: int
This makes it possible to support multiple types of anonymization in the future.
- A field will be added to contain encrypted data (instead of ClosedData/OpenData/NumericData)
- Field name: EncryptedData
- Field type: varbinary(max)
When upgrading the AnonymisationType value will be set to 1 for all questions which are anonymized, and the data will be encrypted.
To allow external access to the data (e.g. by AskiaTools or AskiaAnalyse), it will be possible to retrieve the encryption keys depending on the user’s restrictions:
- If a user does NOT have the (new) restriction ‘allow direct access to survey data’, accessing these properties using the API won’t be possible.
- If a user can access anonymized data, they can retrieve both encryption keys
- If a user can access the data, but not the anonymized data, they can retrieve one encryption key
- If a user can’t access any data, they can’t retrieve any encryption keys
ResultCode GetSurveyDatabaseSettings(string token, long surveyid, string rsapublickey, ref SurveyDatabaseSettings settings)
- The encryption keys will only be returned if an RSA public key is passed.
- The response also contains the connection string