Investigating Slow Data Collection

If your data collection is slower than expected, you might be able to speed things up by investigating the issues below.

Issues that Slow Data Collection

  1. Setting Worker qualifications that are too restrictive
    The worker qualifications you select can greatly influence the speed of your data collection. For example, if you are gathering data from several hundred workers you do not want to select the “Master” workers qualification. This is because relatively few MTurk workers hold the Masters qualification and not all workers who have this qualification will necessarily take your study. There is currently no evidence that Master's workers offer better quality data than other workers.
    Another example of when worker qualifications can cause a problem with data collection is when researchers choose an invalid set of worker reputation qualifications. For example, if a researcher accidentally set the number of previous HITs approved at 0-100 instead of 100+ and the Approval Rating at 0-90 instead of 90+, there would be no workers eligible for the study. This is because workers with fewer than 100 HITs completed have a 100% approval rating by default.
  2. Choosing too many demographic criteria
    Data collection can be slow when researchers add too many CloudResearch demographic criteria to their panel during study setup. Each demographic variable you add to your panel will lower the pool of available workers because adding criteria requires workers who have answered ALL of the demographic questions you select. For this reason, you should select the most important criteria and avoid adding too many demographic criteria to your study.
    As an example of when adding additional demographic criteria can be problematic, consider a researcher who wants to sample people who are native born US citizens. If the researcher uses the demographic criteria that asks people whether they were born in the US (Yes or No) and the demographic criteria that asks people if they are US citizens (Yes or No) the distribution of these two variables is likely to have a high degree of overlap. By adding both variables the researcher is likely to restrict the number of workers available for the study. In this instance, it is often better to use one demographic question as a panel criteria and to ask about the other one in the study.  
  3. Selecting MicroBatch over HyperBatch
    The difference between MicroBatch and HyperBatch in terms of data collection speed is often minimal. For longer studies, however, choosing MicroBatch over HyperBatch may result in slower data collection. Due to some of the technical details of how HITs restart, MicroBatch may yield slower data collection for studies that last longer than 30 minutes. If you want to speed up your data collection, try choosing HyperBatch which results in the fastest possible data collection by making all HITs available at once.