Mar 02 2015

Google Wants to Rank Websites for Trustworthiness

I like this idea, but it is certainly bold and needs some careful thought. Google wants to rank websites according to how trustworthy their factual statements are.

Google undoubtedly is a cornerstone of the internet, which itself is now a cornerstone of our civilization. We are rapidly evolving to having a worldwide network of shared human knowledge and communication. The internet is now the dominant medium of human ideas.

Google is not just a search engine – it is the dominant portal to this information. This makes Google rank a vital statistic for any website. In fact, there is an entire industry, search engine optimization (SEO), dedicated to improving one’s Google ranking.

Google’s big innovation, and the one that launched them to the top of the heap, was to rank websites according to the number and quality of incoming links. This turned out to be a useful proxy, and serves to reward users with a helpful ranking of the websites they are searching for. Specifically, it is not easy to game the system. You can’t boost your Google rank simply by repeating search terms in the coding. In fact, I have a couple friends at Google and they tell me that Google is constantly tweaking their algorithm specifically to make SEO ineffective. SEO is an attempt to game Google’s ranking algorithm, and Google doesn’t want that. They want the truly most valuable and appropriate sites to float to the top.

The current problem with Google’s ranking algorithm is that it is essentially a popularity contest. While this works for many types of information, it is problematic with subjects such as health advice. From what I understand (Google doesn’t exactly advertise the details of their search algorithm) incoming links from high value sites, such as academic sites or sites that themselves rank highly, count more than links coming in from small or personal websites. This helps, but is not enough. There is so much misinformation on the web that it can overwhelm reliable information, regardless of how the popularity rankings are tweaked.

As an aside, but one that demonstrates how Google’s algorithm affects our lives, I find it annoying that Google adjusts the ranking of websites according to my personal history. This is meant to tailor information to me, and may be useful when I am searching for an item I want to buy. However, when I am searching a topic sometimes I want to know what is most popular. I don’t want to see all my own articles on a topic. I know you can turn this feature off, but it is the default, and many people don’t realize how it is affecting their searches.

I do like the fact that Google is not resting on their laurels and they want to continue to push the envelope. Their new idea is very interesting – actually look at the factual statements on a website and determine if they are accurate by comparing them to a database of knowledge. According to New Scientist:

The software works by tapping into theĀ Knowledge Vault, the vast store of facts that Google has pulled off the internet. Facts the web unanimously agrees on are considered a reasonable proxy for truth. Web pages that contain contradictory information are bumped down the rankings.

One quibble – if the web “unanimously” agrees on a fact how can any website contradict it? Obviously they are referring to a strong consensus. Immediately I can see the protest to this approach. This approach will punish minority opinions and reinforce the current dogma.

My answer to this anticipated criticism is this, “Too bad!” More specifically, in areas of opinion, ideology, value judgments, and taste then I think ranking by popularity remains appropriate. However, in the realm of scientific knowledge, you can essentially use the consensus of scientific opinion to trump popular beliefs. Some scientific “facts” are objectively superior to others, and there is no reason why this should not be reflected in Google ranking.

So I think this approach is conceptually valid, within the realm of objective facts.

The devil is going to be in the details of how it is implemented. The new search algorithm is being build around the Knowledge Vault, which is a database of crowdsourced facts. Google now uses bots to crawl the web and, by consensus, determine specific facts that it then adds to the database. It considers a fact highly reliable if it has a >90% chance of being true.

This type of automated approach probably works really well when dealing with straightforward facts, such as the date the Declaration of Independence was signed. However, more nuanced scientific facts will be more challenging, such as – what caused the Younger Dryas extinction? Culturally controversial topics, such as are humans causing global warming, will probably be even more difficult.

This is why Google is also contemplating partnering with websites, such as Snopes, that contain large databases of specific questions with vetted and reliable answers.

In fact, it’s possible that Google’s new approach may spawn an additional industry, one dedicated to building databases of established facts, perhaps with different themes, specifically to feed Google’s new search algorithm.

I do acknowledge that while this approach is valid and will likely be useful, it does concentrate a great deal of power in the hands of very few people. Whoever builds the databases will have a level of power tantamount to determining what is truth.

This, of course, is an inherent issue with the internet itself – any gatekeeper that rises to a significant degree of dominance will have incredible power over the flow of information.

The solution to this conundrum, it seems to me, is the same as the solution to quality control in science itself – the process needs transparency to ensure fairness and high standards. Google, for example, when it decides to activate this feature, could make it absolutely clear that this feature is turned on and make it easy for users to find out how it works, which databases it is using to fact-check websites, and to turn it off if they desire. Perhaps they could even choose which databases to use.

This, of course, might lead to a war of the knowledge databases. This would be similar to the Wikipedia wars. Wikipedia is another vast common store of human knowledge, and so everyone wants to control it. There are also many Wikipedia alternatives, such as the Conservapedia (a particularly ridiculous example, in my opinion).

The bottom line is that there is tremendous power to be had in controlling the flow of information across the internet. Further, people generally will want public information to reflect their own personal biases and narratives. What will best serve the public interest, however, is if there are transparent and objective quality control measures in place. This is essentially an extension of the same issues we find in science and academia – the need for objective standards and quality control butting up against personal ideology and self-interest.

I will definitely watch how Google’s new initiative plays out. I like it – but the devil will indeed be in the details.

 

28 responses so far