Google uses search data to estimate US flu activity

Nov 13, 2008 (CIDRAP News) – Millions of people who use the Internet to look for information about influenza provide the fuel for a system launched by Google this week to estimate flu activity day by day in the United States.

Google.org, the philanthropic arm of the Internet search company, unveiled the system, called Google Flu Trends, after determining that the volume of searches for information about flu matches up very well with reports of influenza-like illness (ILI) gathered by the Centers for Disease Control and Prevention (CDC). But the search data reveal flu activity up to 2 weeks earlier than official reports do, according to the company.

"By making our flu estimates available each day, Google Flu Trends may provide an early-warning system for outbreaks of influenza," Google.org said in an online statement.

The Google team found that certain search queries are very common during flu season, so it compared the frequency of these queries with the CDC's surveillance data on ILI.

"We found that there's a very close relationship between the frequency of these search queries and the number of people who are experiencing flu-like illness each week," the statement said. "As a result, if we tally each day's flu-related search queries, we can estimate how many people have a flu-like illness."

Google Flu Trends offers a Web page featuring a chart that compares this year's flu activity with past years', plus a US map that lists the level of flu activity for each state. Another page has a chart that shows how CDC surveillance data for last season closely matched estimates from the Google tool—but lagged the Google estimates by about 2 weeks.

CDC will use system
"Last year they [Google] validated the model prospectively," said Lyn Finelli, PhD, head of surveillance for the CDC's Influenza Division. "They looked at their signal and ours on a week-to-week basis, and the two tracked very, very closely. There was a very close correlation both regionally and nationally last year."

The CDC plans to monitor Google's data closely, using the publicly accessible site, Finelli told CIDRAP News. "We'll look at the data either daily or several times a week and see if it provides any signals for us to look into, we'll talk to Google about those signals, and then contact state and local public health departments and give them the heads up . . . and see if they need any help for control and prevention," she said.

While the CDC uses electronic systems to monitor various kinds of disease data, the Google tool marks the agency's first use of a disease surveillance system based on Internet queries by the public, Finelli said.

The development of Google Flu Trends is described in a draft scientific paper posted on the site. The Google statement said a later version of the report has been accepted "in principle" for publication in Nature.

"About 90 million American adults are believed to search online for information about specific diseases or medical problems each year, making Web search queries a uniquely valuable source of information about disease trends," says the draft paper, authored by a team from Google and the CDC, with Google.org Executive Director Dr. Larry Brilliant as senior author.

50 million search terms tested
The team used Google's system to identify about 50 million of the most common search queries in the United States, according to the paper. Then, an automated process was used to compare the frequency of these searches with flu-like illness data gathered by the CDC from its network of sentinel physicians, who report what percentage of their patients each week seek treatment for flu symptoms. This was done for each search term in each of the CDC's nine surveillance regions, with the aim of finding the terms whose usage most closely matched the CDC data.

This process produced a list of 53 high-scoring search terms that appeared to be related to flu, the report says. Armed with these, Google developed nine predictive models, one for each region. The models matched up well with the CDC's ILI percentages, with a mean correlation of 0.90.

The report says the search terms describe symptoms, medications, and other diseases that people might associate with flu, but Google is not disclosing the terms its system uses. As Finelli explained, "People could change their behavior based on publication of those terms," which could make the estimates less accurate.

Because the CDC does not publish weekly ILI percentages for each state, Google Flu Trends cannot directly generate state-level flu activity estimates, according to the draft report. However, the company uses its regional estimates to generate state-level estimates, using the number of queries in each state. When this method was tested for Utah by comparing the Google estimate with the state-reported level of ILI, the two estimates were highly correlated (0.85).

The tool does not give estimates for areas smaller than states at this point. But it "may be capable of providing ILI estimates for large cities and metropolitan areas with high internet penetration, providing even more local influenza surveillance," the draft report says. "We hope to explore this topic as well."

Although the report says the general location from which a query came can often be identified through the associated Internet protocol address, Google says its tool protects the privacy of Internet users. "Flu Trends can never be used to identify individual users because we rely on anonymized, aggregated counts of how often certain search queries occur each week," the company statement said.

International expansion contemplated
The draft report suggests the possibility of expanding Flu Trends to other countries: "We hope to extend this system to enhance global influenza surveillance, especially in areas which currently lack the necessary resources, including laboratory diagnostic capacity." But it adds that it would not yet be possible to apply this approach in "large parts of the developing world."

The report also says that when a pandemic strain of flu arises, Flu Trends might permit early detection of a surge in flu-like illnesses, enabling public health officials to respond faster. However, it adds that "panic and concern" in that situation might trigger a surge of flu-related queries from healthy people, leading to exaggerated estimates of the level of ILI.

Acknowledging that Flu Trends has some limitations, the report states, "Our system remains susceptible to false alerts caused by a sudden increase in ILI-related queries. An unusual event, such as a drug recall for some popular cold or flu remedy, could cause such a false alert." Also, media coverage of the system "may change the health-seeking behavior of Google search users. It is difficult to predict the extent to which this might occur."

The unveiling of Flu Trends comes on the heels of a study in which researchers found that flu-related searches on the Yahoo! search engine were a good predictor of the incidence of culture-confirmed flu cases. The report, published online Oct 27 by Clinical Infectious Diseases, says a model based on frequency of searches predicted increases in flu cases 1 to 3 weeks before they occurred. Similar models predicted increases in deaths attributable to pneumonia and flu up to 5 weeks in advance.

Google's tool joins a variety of other initiatives that use the Internet and computer technology to spot and track disease outbreaks, some of them by mining nontraditional information sources. Examples include ProMED, the Web and e-mail service of the International Society for Infectious Diseases; Boston-based HealthMap, which searches the Internet for disease reports, sorts them, and links them to a world map; and the volunteer effort WhoIsSick.org, which collects and maps reports of illness from site users.

See also:

Google Flu Trends main page
http://www.google.org/flutrends/

Google Flu Trends page with explanation and surveillance chart
http://www.google.org/about/flutrends/how.html

Clinical Infectious Diseases report on the use of Yahoo! searches to predict flu activity (abstract)
http://www.journals.uchicago.edu/doi/abs/10.1086/593098

Jul 21 CIDRAP News story "More efforts look outside the box for outbreak signals"

May 16, 2007, CIDRAP News story "Syndromic surveillance: faulty alarm system or useful tool?"

Gift Opportunity

Ebola and Emerging Infectious Disease Fund

Your support is critical to ensure CIDRAP's capacity to respond. Your gift in any amount is deeply appreciated.

Newsletter Sign-up

Get news & practices.

Sign up now»

OUR UNDERWRITERS

Unrestricted financial support provided by

Bentson Foundation 3M United Health Foundation Gilead Become an underwriter»