Mar 07 2013

Online Illness Early Detection

Any intervention that interacts with a large system is bound to have unintended consequences. This concept is often brought up in the context of government – laws are passed to have a certain desired effect but have unintended secondary effects, often in opposition to what was desired.

Technology also can have unforeseen results. It is conventional wisdom, for example, that the invention of the cotton gin, intended to make cotton processing more efficient, also has the consequence of making slaves in much higher demand, leading ultimately to the civil war.

Arguably one of the biggest technological creations of our generation is the internet. It has transformed the way we access information. Predictions as to its utility were all over the place, some fairly accurate in certain respects, others way off.

I don’t recall any predictions, however (and I avidly consumed speculation about the coming internet as long as I can remember) that the internet could be used as a method of tracking human illness. The relevant applications have not been designed specifically to do this, it is simply an unintended consequence of their use.

Google appears to have been (at least according to them) the first to realize that tracking the search terms users enter into their search engine is a massive source of information. This doesn’t just track what is trending in popularity, but events that are happening in real time. Such events include disease outbreaks.

You have probably heard of Google-Fu (referring to one’s skills in using the search engine), but have you heard of Google-Flu. This cite uses Google search terms to track the spread of flu around the country. Comparison to the Center for Disease Control (CDC) data show the information to not only be highly accurate, but to anticipate CDC data. In fact, the CDC has officially partnered with Google to help track the flu.

Another medical application has recently been documented in the Journal of the American Medical Informatics Association. Doctors used data from a voluntary search tracking application to detect side effects to prescription drugs. In this case they looked for users searching on paroxetine and pravastatin in combination, which is now known to cause hyperglycemia as a side effect. They found that, before this side effect was known, users were searching for this drug combination and hyperglycemia.

This application of search data will be more complex as there is bound to be a great deal of noise in the data, and there are numerous drugs and drug combinations with potential side effects. But data mining applications are already fairly sophisticated, and it’s plausible (as this study shows – although they knew what they were looking for) that a signal can peak up over the noise.

This signal might indicate that a drug is associated with a possible side effect that has not been previously detected in clinical trials. For example, if the data mining starts to see a peak in searcher for “Drug X” and “itchy palms”, this could be an early warning that drug X causes itchy palms as a side effect.

Such data mining is a way to generate hypotheses, which can then be confirmed by doing controlled studies or looking at fresh data specifically for that correlation.

Search terms are not the only source of online data. Twitter and other social media outlets have recently been looked at as another source of data. Tracking tweets may just warn us about the next coming epidemic.


We now have at our disposal vast amounts of data and tools that can sift through that data looking for useful information. We are beginning to look at that data and tools with wide eyes, asking questions about how it can be used in a useful way.

So far, as might be expected, we are picking the low hanging fruit. But there are likely applications other than tracking the popularity of actors and singing stars, or epidemics and drug side effects. Such data can be a way of tracking all sorts of economic, criminal, political, and other forms of social activity.

The internet was not designed as a tool for tracking flu epidemics. Neither were search engines or social media. Tracking epidemics is just one of the possible consequences of a highly powerful, widely used, and extremely versatile information technology. There are likely many more to come.

2 responses so far