For the feature I was trying to implement in December, I needed to evaluate a batch of responses the subject answered earlier in the survey. Luckily, Qualtrics has an API that allows for response export! While the documentation has an example of a response export workflow, I found their per-format export pages more informative. Here's the CSV export documentation page. Still, I ran into some issues that merit documenting.

Requesting a single response? You can't

Since one of the embedded fields that Qualtrics creates is ResponseID, can't we just pass that and let our external service use it to grab our current participant's set of responses? Sadly, no. Qualtrics doesn't allow you to query at the level of a response, only at the level of a survey. (There is an optional lastResponseId parameter in the export query, but that will only get you all responses entered after the survey you're calling the service from. This could be useful if we were building a dataset incrementally, but in my case, I needed the data almost immediately.)

Instead, I assign the subject a unique ID early in the survey. This can be either pre-assigned or generated in the survey - perhaps with the random number generator web service I mentioned above. I pass this ID to my web service, which will use it to pick out the right response.

But we can't select on any response-level variable. This means that to limit our queries, we'll have to do some guessing. If we're sure that there are no race conditions -- i.e. only one person at a time only ever takes the survey -- we can use limit = 1 to only get the last response. Alternatively, if you know that the external service will be called immediately after the participant fills out the survey, you can use startDate set to a few hours before current time. (NB: the parameter value takes ISO-8601 format..)

The Nitty Gritty

Now, let's look at an example of the inquiry logic. In the abstract, there are three steps: get the response, unzip it, and load it into an appropriate data structure.

# Excerpt from a Sinatra helper function
response_zip = getResponseFromQualtrics()
response_string = unzip(response_zip)
csv_table = rawToTable(response_string)

Step 1: Get the data

Getting the data is a two-step process. First, I request a CSV file from Qualtrics and wait until it's ready. Second, I download it.

Instead of implementing the handshake myself, I took advantage of the qualtrics_api Ruby gem made by Yurui Zhang. (There's also sunkev's qualtrics gem, which I haven't tried.)

def getResponseFromQualtrics
  start_time = getStartTime(settings.prior_hours)

  QualtricsAPI.configure do |config|
    config.api_token = settings.token
  end

  survey = QualtricsAPI.surveys[settings.survey]
  export_service = survey.export_responses({start_date: start_time})
  export = export_service.start

  while not export.completed?
    sleep(5)
    export.status
  end

  require 'open-uri'
  return open(export.file_url, "X-API-TOKEN" => settings.token).read
end

def getStartTime(hours_offset)
  require 'time'
  start_time = Time.now.utc - (60 * 60 * hours_offset)
  return start_time.iso8601
end

(These are Sinatra helpers. settings is a Sinatra-wide global that reads in secrets specified in the environment and various other configuration. (The dotenv gem is excellent for secret storage in development; as for production, here's how to set secrets on Heroku.)

Steps 2 & 3: Unzip and convert

unzip is just rubyzip; no magic there. There is a bit of a trick to getting a compressed stream to a CSV with headers, though. That's because some of the Ruby CSV methods can only deal with files, not streams.

def rawToTable(response_string)
  require 'csv'
  response_csv = CSV.new(response_string, headers: true)
  response_csv = response_csv.read
  response_csv.delete_if do |row|
    # Remove the row with descriptions & internal IDs
    /^R_/ !~ row['ResponseID'] 
  end
  return response_csv
end

And done!

After this, I select the row that contains the subject ID I had passed in the Qualtrics redirect, pick a choice and evaluate it, and visualize it with an assist from the wonderful animate.css library at an endpoint created by Sinatra and deployed to Heroku. Unlike Qualtrics features, all are well-documented elsewhere.

Approach 2: Avoid the API, pass the values

The API approach has a number of problems. For one, Qualtrics API is a paid feature. Worse, API calls lag -- at least once, the call and processing took over 30 seconds and caused a request timeout. While I could re-write the interface so that the API call and processing are done by a background process that the front-end checks for periodically, it's a pain that might not be worth it.

The obvious alternative: instead of a subject identifier, pass the responses that the survey has readily available via URL. I write about this in part 1.

There are limits. Because Qualtrics uses GET for everything, you might have to keep your URI under 2000 characters. Basically, don't try to transmit essay responses. (I was worried that Qualtrics itself might throw a fit if I tell it to store 56k-character URI, because piped text is obviously longer than the response it denotes. I shouldn't have worried. Qualtrics managed even a 100k-character URI without a hiccup -- and that's way past the 2,000 characters that your browser and your server can handle. In other words, Qualtrics isn't going to be your constraint.)

As usual, the trade-off for speed is maintainability. You refer to many piped text variables instead of just one or two, so you will likely have to develop a pipeline to generate the URI. You might have named your questions for clearer data manipulation, but for the purposes of piped text, you'll have to replace them with the internal question IDs (QID#). And while you can maintain the order of values in one place, you have to explicitly plan for that.

Bonus Approach: No API is best API

Finally, I should note that custom web services and APIs are an extra overhead. For simpler problems, there are at least two steps to attempt first.

1. Abusing Survey Flow

Basic Survey Flow building blocks are quite powerful, making many problems tractable with stock Qualtrics. To pick randomly from a bag of option sets, you can use Randomization to pick exactly one of n embedded data blocks underneath it. Branches, of course, offer basic if conditionals (although not else -- you'll have to take care to make their triggering conditions mutually exclusive).

2. JavaScript

You can do some things with the Qualtrics Javascript. (For instance, if you can you get arbitrary piped text, that could make things easier.) You will need to weigh how much crucial logic you want to embed in JavaScript -- if you don't control the survey-taking environment, you cannot guarantee that the client has JS enabled, and you might have to take extra steps to either degrade functionality graciously or detect the absence.

Other approaches?

It is very possible that other approaches exist; they were not necessary for my purposes. In one of my next articles, I hope to talk about what they were.

Adventures with Qualtrics, part 1: Custom Web Services and Piped Text

2017-06-012017-06-04 Šimon Podhajský28 Comments

To create a feature in a pilot study I was running in December, I took a dive into Qualtrics API and custom web service building. In the process, I discovered a couple of workarounds and little-documented properties of both. The key to integrating them: piped text.

Piped Text: The Qualtrics Variable

With piped text, you can insert any embedded data and any answer your subject gave into (almost) any Qualtrics context.

If this doesn't excite you, it should.

Let me rephrase. Piped text references the content of variables you can set. It can do this in conditional validation, display logic and survey flow. (You can't make it into a GOTO, but that might be a good thing.) The documentation undersells this; this Qualtrics blog article does it a little more justice.

For my purposes, the most important insight goes unmentioned: you can use piped text to pass data to an external web service. That way, you can use data from an in-progress session as input for arbitrarily complex logic implemented in a programming language of your choice.

The approach

How does this work? First, you identify the shortcode for an answer or embedded field. Then, you insert it into the URL, like so:

http://your.service.URL/${e://Field/Identifier}/${q://QID1783/ChoiceTextEntryVField>

This will substitute the value of Field and the answer to question QID1783 in time for the redirect.

Qualtrics can call an external service in two ways.

End-of-survey redirect. Qualtrics simply passes the torch to your service, which wraps up the session for your participant.
Web Service step in Survey Flow. Your service will pass results back to Qualtrics, and they'll be available for as embedded data for the following Qualtrics questions in that session. (With the "Fire and Forget" setting, this can be asynchronous.)

The external service then passes the results back to Qualtrics.

What's the pass-back format?

"Pass results back to Qualtrics" glides over a big issue: Qualtrics documentation does not provide a list of valid return formats. The documentation and the only StackOverflow answer I could find both mention RSS as the only example of an acceptable format. The random number generator everyone uses for MTurk compensation, however, has a much simpler outcome: random=7. That's hopeful, but what if you want to pass multiple values back? Docs don't say.

I decided to test this out on a dummy web service I wrote in Sinatra. It turns out that Qualtrics will take data from JSON, XML, and URI query element. (That's ?a=b&c=d - I owe this insight to Andrew Long at the Behavioral Lab.) You can try this out for yourself -- just put down https://salty-meadow-86558.herokuapp.com/ as your Web Service in Qualtrics.

Pulling the API in

My project required more data to the custom Web service than Piped Text could conveniently pass, which meant that I needed to tangle the API. For that, see part two.

Executing nested rules with dragonfly

2017-05-102017-06-18 Šimon Podhajský25 Comments

Rule nesting makes context-free grammars very powerful. It allows for brevity while preserving complexity — and dragonfly, the unofficial Python extension to Dragon Professional Individual, seems to promise that functionality with RuleRef, which "allows a rule to include (i.e. reference) another rule".

But using RuleRef is less obvious than it would appear. How do you actually refer to the rules? How do you execute the actions that are associated with the referenced rules? And how do you ensure that dragonfly does not complain about rule duplication if you do this multiple times?

I will proceed step-by-step, but if you want to jump ahead to the solution, you can read it on GitHub.

If you're unfamiliar with Dragonfly, do read this introduction to basic Dragonfly concepts in the Caster documentation.

Step 1: Include the rule with `RuleRef`

Let's start with a toy grammar. In this grammar, we will have two rules that are not exported: that is to say, you can't invoke them directly. We'll call them simply RuleA and RuleB. (I will refer to them as "subrules" from here on out.)

# Rules proper
class RuleA(MappingRule):
    exported = False
    mapping = {
        "add <n>": Text('RuleA %(n)s'),
    }
    extras = [
        IntegerRef("n", 1, 10),
    ]

class RuleB(MappingRule):
    exported = False
    mapping = {
        "bun <n>": Text("RuleB %(n)s") ,
    }
    extras = [
        IntegerRef("n", 1, 10),
    ]

We'll call the top-level rule RuleMain and include Rules A and B in the extras.

class RuleMain(MappingRule):
    name = "rule_main"
    exported = True
    mapping = {
        "boo <rule_b> and <rule_a>": Text("Rule matched: B and A!"),
        "fair <rule_a> and <rule_b>": Text("Rule matched: A and B!"),
    }
    extras = [
        RuleRef(rule = RuleA(), name = "rule_a"),
        RuleRef(rule = RuleB(), name = "rule_b")
    ]

The name argument of RuleRef takes care of the correspondence between the spec and the subrule. To get recognized, you do actually have to match the subrule's spec by saying e.g. "boo bun three add five".

This only carries out the Text("Rule matched: ...") action defined in the MainRule, though. To actually execute the subrules, we'll need to add the Function action.

Step 2: Use (and mass-produce) `Function`

Dragonfly's Function allows arbitrary code execution. However, you can only pass in a function reference, to which Function passes the right extras (seemingly) automagically. The caster documentation gives a useful but incomplete example:

def my_fn(my_key):
  '''some custom logic here'''

class MyRule(MappingRule):
  mapping = {
    "press <my_key>":     Function(my_fn),
  }
  extras = [
    Choice("my_key", {
      "arch": "a",
      "brav": "b",
      "char": "c"
    })
  ]

When you say "press arch", my_fn gets called with the value of the my_key extra. But what if the mapping contained a reference to another rule in another extra? Would that also be passed to my_fn? It turns out that Function actually passes keyword arguments. If you name the argument to my_fn the same as the name of your extra, then my_fn will be called with the value of that extra. You're not limited to one extra, either: for example, if we added an extra called towel to MyRule.extras, then def my_fn(towel, my_key) would receive both.

(If you define my_fn with **kwargs, it will receive all extras in a dict, including the default _node, _rule, and _grammar. This does lose the order in which the subrules were invoked, so you can't just pass a general function that invokes all rules unless you're happy with them being invoked alphabetically / in an arbitrary order. That was my first approach:

def execute_rule(**kwargs): # NOTE: don't use
    defaultKeys = ['_grammar', '_rule', '_node']
    for propName, possibleAction in kwargs.iteritems():
        if propName in defaultKeys:
            continue
        if isinstance(possibleAction, ActionBase):
            possibleAction.execute()

In this case, Rule A will be executed before Rule B, no matter the optionality or the order of utterance, just because of the kwargs key order. I played around with exploring the default extras, but I haven't managed to figure out how to extract the order from the actual utterance to reorder the subrules automagically; that might require a deeper dive into Dragonfly than I'm ready for.)

You could write executeRuleA(rule_a) to run rule_a.execute(), then add Function(executeRuleA) to be executed alongside Text when the rule is matched. Unless you want to do different things for different rules, though, it is easiest to define a factory for functions that simply execute whatever extras you specify:

from dragonfly import Function, ActionBase

def _executeRecursive(executable):
    if isinstance(executable, ActionBase):
        executable.execute()
    elif hasattr(executable, '__iter__'):
        for item in executable:
            _executeRecursive(item)
    else:
        print "Neither executable nor a list: ", executable

def execute_rule(*rule_names):
    def _exec_function(**kwargs):
        for name in rule_names:
            executable = kwargs.get(name)
            _executeRecursive(executable)

    return Function(_exec_function)

This way, if you want to execute rule B before rule A, you can add execute_rule(['rule_b', 'rule_a']) to the action. Equivalently, you could use execute_rule('rule_b') + execute_rule('rule_a'). (Since both factories return a Function, their output can be added with other dragonfly Action elements.)

Step 3: Reusing subrule references in other rules

Let's say you want to reuse your subrules in another rule, like so:

# Note: This doesn't execute the sub-actions at all
class CompoundMain(CompoundRule):
    spec = "did (<rule_a1> and <rule_b1> | <rule_b1> and <rule_a1>)"
    exported = True
    extras = [
        RuleRef(rule = RuleB(), name = "rule_b1"),
        RuleRef(rule = RuleA(), name = "rule_a1"),
    ]

If you add this to your grammar, though, dragonfly will fail to load it with the following error:

GrammarError: Two rules with the same name 'RuleA' not allowed.

How did this happen? We even renamed the extras! It turns out that each subrule instantiated in RuleRef is registered as a separate rule. By default, each instance will assign name = SubRule.__name__. Consequently, you'll have to instantiate the subrules with unique names each time you re-use them. Fun fact: those names don't have to bear any relation to anything else.

    extras = [
        RuleRef(rule = RuleB(name = "Sweeney Todd"), name = "rule_b1"),
        RuleRef(rule = RuleA(name = "Les Miserables"), name = "rule_a1"),
    ]

There are many like it, but this one is mine

I'm sure this is not the only way to do it: one could override the _process_recognition method of your MainRule, or perhaps caster, aenea, or dragonfluid implement equivalent nesting functionality in ways that I have overlooked. I would be very excited to learn about other approaches!

For now, I'm looking forward to applying this in my vim-grammar for dragonfly project. I'm hoping to write about the reasons why vim is excellent for voice programming later.

Introducing git to scientists who code

2017-02-15 Šimon Podhajský22 Comments

Many scientists treat coding -- tasks, analysis, you name it -- as a necessary evil we have to do in order to get to the science. You might know the result from your own scientific practice: subtle changes to the code strewn across many folders, days spent getting into the mind of the postdoc author five years gone, and a general unwillingness to touch the code unless it's time to re-use it.

Before joining a research lab, I was lucky to have spent several years as a back-end programmer with the Yale Student Developers. (Best work-study job ever.) Consequently, working without proper version control and thorough documentation of each step now just feels icky.

(This isn't my personal quirk, by the way. "Use version control" is the fifth recommendation of both Aruliah et al.'s Best Practices for Scientific Computing (2012) and Wilson et al.'s Good Enough Practices for Scientific Computing (2016). So, at the very least, I can say that I share a squick with a number of published researchers.)

Since joining Yale Decision Neuroscience Lab, I've been inducing colleagues to use git. The presentation that I gave yesterday is a very high-level overview of what git is good for:

(It doesn't hold a candle to Alice Bartlett's excellent Git for humans, which I heartily recommend, but it does use our lab's problems as illustrations.)

Where do we from here?

Giving a tech presentation is only the first step in tech adoption. I lack a detailed roll-out plan, but here's what I'm doing now:

Since jumping straight to the command line might be too scary, I've been recommending GitKraken and GitHub Desktop.
Any code I touch lands in a remote repository rather than the lab file-share. If someone wants to use it, that's where they'll get it. If I'm asked to with anyone's code, it will need to be on that remote. This is on the theory that necessity is the best incentive.
And, of course, I've made it clear that anyone who struggles with anything git-related can contact me at any time and I'll do my best to help.

We'll see how it goes.

Šimon Podhajský

Code

Adventures with Qualtrics, part 2: exporting the latest response via API

Requesting a single response? You can't

The Nitty Gritty

Step 1: Get the data

Steps 2 & 3: Unzip and convert

Approach 2: Avoid the API, pass the values

Bonus Approach: No API is best API

1. Abusing Survey Flow

2. JavaScript

Other approaches?

Adventures with Qualtrics, part 1: Custom Web Services and Piped Text

Piped Text: The Qualtrics Variable

The approach

What's the pass-back format?

Pulling the API in

Executing nested rules with dragonfly

Step 1: Include the rule with `RuleRef`

Step 2: Use (and mass-produce) `Function`

Step 3: Reusing subrule references in other rules

There are many like it, but this one is mine

Introducing git to scientists who code

Where do we from here?

Requesting a single response? You can't

The Nitty Gritty

Step 1: Get the data

Steps 2 & 3: Unzip and convert

Approach 2: Avoid the API, pass the values

Bonus Approach: No API is best API

1. Abusing Survey Flow

2. JavaScript

Other approaches?

Share this:

Piped Text: The Qualtrics Variable

The approach

What's the pass-back format?

Pulling the API in

Share this:

Step 1: Include the rule with RuleRef

Step 2: Use (and mass-produce) Function

Step 3: Reusing subrule references in other rules

There are many like it, but this one is mine

Share this:

Where do we from here?

Share this:

Step 1: Include the rule with `RuleRef`

Step 2: Use (and mass-produce) `Function`