Data Management

Data Collection


September 21, 2023

A list of resources produced by Bārbala Ostrovska in July 2023 is available at: What do we know about trial data collection?

What users think1 Star2 Stars3 Stars4 Stars5 Stars
grade-1 Evidence

Top 5 Data Collection Tips


You can get a collection of useful data collection resources by clicking here.


Trial teams should explicitly consider how long it will take to collect the data for an outcome and decide whether that time is worth it given importance of the outcome to the trial. See


Trialists should work to make sure that the data they collect are only those essential to support the health and treatment decisions of those whom the trial is designed to inform. Additional data may be considered wasteful in context of limited public funding for clinical research. See


Trialists must consult with patients and healthcare professionals to identify the outcomes they will need to inform their future decisions about the usefulness of the intervention being tested. See


To tackle the challenges of research with socially disadvantaged groups, and increase their representation in health and medical research, researchers and research institutions need to acknowledge extended timeframes, plan for higher resourcing costs and operate via community partnerships. See

More About Data Collection

The cost and work of data collection

Clinical trials incur additional costs when collecting non-essential data. According to the DataCat project, which categorized clinical trial data types, primary outcome data accounted for only 11.2% of all collected data, while secondary outcome data comprised 42.5%, and non-outcome data, such as identifiers and demographic information, accounted for approximately 36.5% (mean proportions). This indicates that a significant portion of data collected may not directly relate to the trial’s primary objective, leading to unnecessary expenses.

Furthermore, in alignment with these findings, another study evaluated the time spent on data collection in 120 trials. It reported that the median time dedicated to primary data collection was 56.1 hours, while the median time for secondary data collection exceeded three times that amount, totalling 190.7 hours. This substantial difference reinforces the concern that an excessive amount of time and resources is being allocated to secondary data collection.

To optimize cost efficiency, it is crucial to collect only essential data, thereby avoiding the inclusion of irrelevant information that could inflate expenses. The COMET Initiative might be a helpful resource as it brings together people interested in the development and application of agreed standardised sets of outcome, representing the minimum that should be measured and reported in all clinical trails of a specific condition.

One effective way to reduce costs is transitioning from traditional paper-based data collection to internet-based electronic data collection. A study comparing the cost difference between the two methods demonstrated an average savings of around 54% with electronic data collection. The most substantial cost reductions resulted from decreased monitoring and data management expenses.

Moreover, a separate study implemented both approaches—electronic and paper case report forms—across 27 trials. The results indicated that using electronic case report forms led to a remarkable reduction in the mean cost per patient, with a decrease of about 67% (95% C = 24%, 142%). This finding further supports the notion that electronic data collection is a practical and cost-efficient approach.

Focusing on collecting essential data and adopting electronic data collection methods can significantly contribute to cost savings in clinical trials. These measures not only reduce expenses but also enhance trial efficiency and data accuracy, ultimately leading to more successful and economical clinical research endeavours.


Increasing response rates

Research comparing response rates for data collection methods has shown that paper-based data collection tends to yield higher response rates compared to web-based collection.

  • A systematic review by Blumnberg and Darros (2018) assessed 19 studies and found that response rate of web-based data collection was 9 percentage points lower (95% CI = -19.0, -6.8) than alternative methods. However, due to significant heterogeneity among the studies, the researchers advised against interpreting this finding as conclusive meta-analytical evidence.
  • A cross-sectional comparative study by Ebert et al. (2018) assigned participants to two groups – paper or digital questionnaires. The response rate for web-based questionnaires was found to be 66% lower(95% CI = 7.40, 11.92) than paper-based ones.
  • A meta-analysis by Shih and Fan (2008) examined 35 study results that compared the response rates of e-mail versus mail surveys. E-mail surveys, on average, had a response rate approximately 20% lower than traditional mail surveys.

However, it is worth mentioning that there are promising alternatives in electronic data collection. For instance, a study exploring the use of SMS (text messaging) to collect data reported an impressive response rate of up to 97.9%. Researchers especially endorse the use of text messaging for studies requiring frequent data collection and real-time assessment. Additionally, another study utilizing SMS for collecting weekly symptom reports achieved a remarkable 100% completion rate from their subjects, irrespective of whether the responses were submitted on time or slightly delayed.

Paper-based questionnaires generally outperform web-based ones in terms of response rates. However, promising alternatives like SMS data collection have shown impressive results, especially for studies requiring frequent data collection and real-time assessment. Researchers are encouraged to consider the unique characteristics of their research projects to optimize data collection methods and enhance participant engagement and data quality.

The Trial Forge Retention pathway page gives a list of retention interventions from the Cochrane retention systematic review, many of which are directly applicable to enhancing data collection efforts.



Inclusivity is a vital aspect of clinical trials and data collection, necessitating the active involvement of patients and healthcare professionals in their design. A study comparing the trialists’ choice of primary outcome with what patients and health professionals want found that the primary outcome in a trial was only ranked as the most important outcome by patients and health professionals 28% of the time out of the 44 trials sampled. Considering that the primary outcome in a trial is the most important piece of data collected, it becomes imperative to seek input from patients and healthcare professionals to ensure the relevance and utility of the measured outcome.

An additional study re-examined the same sample of trials to check if any of them had Patient and Public Involvement (PPI) in the selection of collected outcomes, finding that none did. Emphasising PPI in this context can be an effective means of ensuring that clinical trials address the needs of patients effectively.

Furthermore, it is important to consider how data collection and measurement methods might inadvertently create barriers for individuals from socioeconomically disadvantaged groups. A literature review on this subject yielded a list of data collection improvements (see image below). In summary, the review underscored the significance of acknowledging extended timeframes, allocating higher resourcing costs, and fostering community partnerships to enhance the representation of socially disadvantaged groups in clinical trials.

To promote inclusivity, there are also valuable guidelines available for posing questions on sensitive topics:

Get In Touch / Ask A Question

Looking to get in touch with us directly ? Just fill out the short form below and we’ll get back to you.

Start typing and press Enter to search