Accelerating Clinical Research and Drug Delivery with a Clinical Data Repository Search

UCB has piloted the CDR Search System, which can potentially make data filtering and retrieval faster and more in-depth.

A clinical data repository (CDR) is a collection of anonymized patient data gathered from various clinical data management systems and sources. For clinical trials, a CDR Search can be an initial phase for a broader CDR approach to virtually pool data across various clinical trials and phases and help generate insights for the purpose of trend identification and further analysis. It can provide information related to safety, and help in responding to questions posed by regulatory authorities. It can also guide the decision making of pharma drug developers.

CDRs can contain large amounts of data that come in various formats. According to Oliver Thoennessen, Sr. Manager IT Drug Development at the biopharma company, UCB, manually going through all of the data is extremely time-consuming. “Researchers collect much clinical data over the course of time, and there are different ways to collect data accumulated from clinical trials,” he explains.

UCB currently stores all clinical data on a file server. According to UCB Director of Clinical Repositories, Todd Culverwell, “The file server itself contains clinical data going back maybe 15 years or more.” The company stores files according to different products and their related studies. “We have over one million folders on the file server and more than 10 million documents. It gets really difficult and time-consuming to go through all the different trial folders. There can be more than 100 trials per product, so you can imagine that it’s really cumbersome. Researching manually through this vast amount of data across studies is basically impossible,” says Thoennessen.

A few researchers, however, are experts in certain areas of clinical data at their firms. When they need to know more about a specific aspect in that area, they know what to look for and where to look. “If you are really experienced in a certain trial, then you are, for sure, more knowledgeable about where you can find the data,” says Thoennessen. However, it becomes a problem when these experts are not available or leave the firm. “It’s much more difficult for other people to find the relevant data in a fast manner, especially when it comes to searches across trials,” he explains.

The CDR Search System

Thoennessen explains how great a need there is to be able to retrieve relevant data in a quicker and easier way. In response to that need, UCB has piloted the CDR Search System, which can potentially make data filtering and retrieval faster and more in-depth. “We’re here to help our users get the right information faster, respond to requests, and generate insights,” says Culverwell.

The pilot CDR Search currently indexes two different products or drugs. According to Culverwell, “Inside of those two product folders, we have 130 trials. The full implementation will probably cover 50 product folders, and I don’t even know how many trial folders we have for those. It’s probably a couple of hundred in there, so that would be millions of documents.”

Despite such a large amount of data, the CDR Search System proves it is able to retrieve relevant information quickly. For answering questions that may take several hours or days, Thoennessen says, they can be reduced to an hour or seconds using the search system, depending on the activity that needs to be carried out. “It’s especially interesting to have insight about the full scope of a product, which can get additional trials over the course of time. The CDR Search System will enable the cross-study search through all the existing information that we have,” he adds.

Data standardization for efficient retrieval

Although CDR Search can index any file format, UCB’s normal process is to standardize the clinical data using the SAS® programming language, which is capable of managing and extracting information from various sources and conducting a statistical analysis. Experienced programmers use SAS® to convert raw data sets into CDISC2 Study Data Tabulation Model (SDTM) format and analysis data (ADaM) sets. “Files could be SAS® data sets, SAS® transport files, SAS® programs, logs, lists, XML files, or maybe some images depending on the kind of trial,” says Culverwell. SDTM and ADaM data sets provide a standard structure for study data tabulation and generating listings and figures.

Culverwell admits that the search interface is fairly straightforward to use. The CDR Search System has sophisticated inbuilt facets. If the user is looking for SAS® data sets, they will need to decide if they want to look at SDTM data or ADaM data. “And then by selecting SDTM, the facet would change, and then you’d actually be seeing all the different data domains that are related to SDTM,” explains Culverwell.

The facets around that data change based on what the user is looking for. “If you’re actually looking at SAS® data sets, it’s going to be applicable to the SAS® data set features that you’re looking for. If it’s a Word document you’re looking for, it’s going to ask you what criteria you want to look for in that particular Word document,” he explains further.

Ingesting clinical SAS® data sets makes the CDR Search far more sophisticated than the typical search engine, which searches for information and gives back hits. “If you’re looking out on the market, enterprise search systems have connectors to many data, like Microsoft Excel files, Word files, or PDF files, but not SAS® datasets,” differentiates Thoennessen, “and the clinical development departments of Pharma companies usually have a large number of them.”

The CDR Search provides a more in-depth filter. “The benefit is that it can drill down to SAS® datasets,” Thoennessen says, highlighting this as the innovative feature of the CDR Search. According to Culverwell, “That is the key of this CDR Search Project. We are able to crack open the SAS® data sets and look inside at a row level, instead of actually treating it as a single file.” For Thoennessen, relevant information becomes accessible easier than ever before: “You can execute all your clinical data searches together with finding Word documents and other documents all through the same facility and the same indexing.”

Culverwell provides an example of the CDR Search in action, using the scenario of a researcher wishing to know how many times a particular adverse event occurred for a product. “We can say we’re looking at SAS® data sets at a row level. We want to look at adverse events and look for this particular term, whether a reported (raw) or medical dictionary-coded term. Then we can actually see a list of all the patients who have that term. The system will return a list of unique patient IDs on that as well. Really quickly, we can get a list of the patients who had that adverse event,” he explains. The results could show that there were 10 patients who reported such occurrences. However, if it reports a much higher number, such as 3,000 occurrences, researchers can establish a trend and have every reason to analyze it further. “It could quickly point us to the right direction and data trends,” adds Culverwell.

Clinical trial design and patient outcomes

According to Thoennessen, it’s still too early to tell whether the CDR Search can influence how clinical trials are designed. The system does, however, facilitate the identification of trends, which can be further analyzed by qualified individuals. “If they actually find something in there, then they could take that information back to the protocol development phase and influence the patient populations towards those that we need to focus on,” he surmizes. He adds that identifying trends can provide better insights for researchers, triggering more clinical activities.

As an example, identified trends can provide insights for subsequent clinical trial and related clinical protocols. “If we did see a particular trend of adverse events, we would go back to the statisticians and the physicians and see what they suggest to potentially influence the next protocol. If there were that many side-effects, this would be something that UCB’s Patient Safety unit would want to be aware of as well,” says Culverwell.

“We believe that UCB will be able to find the correct data and folder location of the data quicker, and thus respond to requests quicker - internal requests and regulatory requests,” Culverwell adds. Without the assistance of the CDR Search, it could take up to one month to find the correct information to incorporate into a new analysis data pool and additional weeks to pool and analyze that information. However, with the search system, that time to locate the data can be cut down into a few hours or quicker. “We would hopefully get the response back to the requestor faster where our patients can also benefit from it” says Culverwell, explaining how a more efficient search system can contribute to positive patient outcomes.

Patient confidentiality

Accessing patient information might raise concerns regarding patient data confidentiality. However, data from clinical trials available to the trial sponsors do not contain identifiable information. “The data that is available in the current repository is protected by specific file and folder permissions, and these will still be respected by the search system,” says Thoennessen. Even if the data repository can only be accessed by authorized UCB members, the company adopts strict permission and access control around that data. “UCB actually has a data transparency initiative that utilizes a multi-sponsor environment where anonymized patient-level data and study level data that can be provided to researchers for approved research request,” adds Culverwell.

Data’s big opportunity

Data management and analysis have dramatically increased with advancing technology. “We are seeing that managing the data in an electronic way with improved technologies has a drastic impact on the turnaround time and speed of clinical trials,” observes Thoennessen.

Despite the improvements in speed and reduced transcription errors, however, Culverwell points out that the potential to detect trends can become greater with the right search and analytic tools. “As we move more and more into wireless devices, the amount of data from clinical trials is going to get larger. Pharma should focus on data collection,” he suggests, highlighting the use of smart phones in gathering clinical data and potentially detecting, predicting and preventing adverse events.

UCB’s pilot search system is only the first step. According to Thoennessen, their goal is to create a repository that can be used for deep analysis and data mining across clinical data standards. UCB is also looking to combine other data sources to be able to generate new insights and break out of the data silos. As for Culverwell, by expanding data sources and reaching a point of sharing with other repositories, he foresees developing a single holistic search environment that accelerates retrieval and analysis of relevant cross-study data from various sources. He believes this has the potential to contribute to bringing more appropriate drugs to the right patient populations.


Since you're here...
... and value our content, you should sign-up to our newsletter. Sign up here

comments powered by Disqus