Real World Data and Health Outcomes

Nov 29, 2012 - Nov 30, 2012, Boston

Use new Data Sources, Models and Technology to measure Real World Health Outcomes

Bringing "Structure" to "Unstructured" Health Data

It is not practical for anyone – patients or pharma companies – to try to read every post someone has written about a particular condition or drug, so how can we bring structure to this data so we might extract constructive information?



Patients and caregivers are sharing oceans of information on the Web, turning to each other for support and information about treatments. They share everything from the decision factors in selecting a particular medication, to concerns they have, to the reasons they switched to a different drug. Patients share their thoughts and feelings online about their conditions and medications, at such a personal level that they may go even deeper than they would with their immediate family or physicians.

While people are sharing primarily for the audience of other patients like themselves, the information shared is of high value to pharmaceutical companies, since it can help them to understand their patients' motivations and concerns.

However it is not practical for anyone – patients or pharma companies – to try to read every post someone has written about a particular condition or drug. It makes more sense to try to aggregate it somehow, to identify trends.

 When trying to make sense of this online information collectively, one rapidly encounters two key problems:

  1. The information shared by patients on blogs and forums is completely "unstructured"
  2. Patients share their thoughts and feelings in their own words, not necessarily medical language, with all the infinite permutations that each individual brings to his or her own expressions

What does this mean?

"Unstructured" is a term which simply differentiates this type of content from "structured" content, in which text or numbers are neatly slotted into defined fields. Unstructured content is great for the people writing it, but makes it hard for an audience to make sense of it.

The second problem relates to the language patients use to share their experiences online. Patients are not medical professionals, and often do not use medical terms when they write.

The language used by patients writing about their conditions and medications may vary depending on the culture or origin of the person, and even variances as basic as typographical errors and abbreviations need to be taken into account.

In order to structure, analyze and aggregate the drug-related information that has been identified, it is crucial to translate drug-related information into commonly recognized medical terms.

Try an easy experiment – in the search engine of your choice, enter a search term like "MS". Results might range from Microsoft, to Ms. Magazine, to Morgan Stanley, and of course mixed in some entries about Multiple Sclerosis. Of course if you know you are looking for information on Multiple Sclerosis it's not too demanding to type out the full name of the condition. In this way you will get results only related to the MS you are looking for. But there will be a huge chunk of content missing from your search results: those patient-written posts that never use the phrase "Multiple Sclerosis" and instead write "MS" when they discuss their experiences with the disease.

Think about it – when you are writing for yourself and your patient community, in the intimacy of your own home, are you really going to write out everything in full text? Especially terms that you know are shared and understood by the people you are writing for?

Context is also important; in a pregnancy forum, MS means something different altogether: Morning Sickness.

So as you can see, issues of ambiguity and context are key to understanding unstructured text.

Even more complex are the online discussions where patients consider switching from one drug to another. Here are some examples of switching discussions from online forums:

“I started with Avonex developed an allergy to interferon and moved to Copaxone.”

“Klonopin made me feel worse, I take Xanax for anxiety now.”

“Started out taking zyrtec, but have just switched over to regular benadryl every day.”

There are technological methods to draw meaning from unstructured content, the primary one being Natural Language Processing (NLP). NLP uses advanced algorithms to extract relevant concepts by searching for various anchors in the text, such as drug names (for example, Effexor), symptoms (for example, IBM spasms), side effects (for example, weight gain), drug usage experience (for example, have been taking), symptom experience (for example, have experienced) and so on.

With NLP, it is extremely important to resolve ambiguity. For example, the term 'Sonata' can refer to a car, a musical piece or a drug. The term 'hyper' can have several possible medical meanings: for example hyperactive behavior or hyperthyroidism. Understanding the context of the text is required to make an accurate determination about the word or phrase usage.

The events described in the analyzed text can be either single‑concept events that involve only a single event, or multiple-concept events, which contain some type of relationship between the events.

A single-concept event involves only one event, such as the personal experience of drug usage. Multiple-concept events are more complex and contain a relationship between the events, for example, a patient’s personal usage of a drug and his or her experience of illness/side effects. The relationship between events may involve a temporal switch, meaning that Drug A was used before Drug B, where both Drugs A and B treat the same condition, or may involve a side-effect relationship, where a drug caused a specific side effect

In addition to being able to extract relevant terms from an unstructured post using NLP, an extensive knowledge base of health terminology is needed in order to interpret the content. Medical dictionaries such as UMLS, MedDRA, RxNorm and First DataBank can form the basis for a patient language vocabulary, but anyone who reads health-related posts will see quickly that these will fall short when it comes to colloquial speech or regional vernacular.

What is needed is an ontology that combines the medical terms with patients' own language; this, together with the NLP technology makes it possible to aggregate multiple posts that use completely different language into coherent messages about medical conditions, drugs and side effects.   

Due to the complexity of healthcare conversations, traditional search engines fall short when it comes to analyzing and aggregating this unstructured text. Specialized tools and services that can resolve ambiguity and interpret patients' own expressions are needed, drawn from deep knowledge of the healthcare field.



Real World Data and Health Outcomes

Nov 29, 2012 - Nov 30, 2012, Boston

Use new Data Sources, Models and Technology to measure Real World Health Outcomes