Saturday, February 5, 2011

Big twist: Google Has Been Using Bing Data to Improve Its Search Results!



I followed yesterday news that Bing was copying Google results with shock. Was Bing that sleazy they would scrape Google search results and make it their own? What an awful thing to do! Shame on you Bing! And I went back to do my work. As the story grew, I thought I should check it out beyond the headline what was going on. The more I read about it throughout the day, the more shocked I became, to the point I was wondering, what’s wrong with Google?

Ok, I’ve got it. Google just turned 13. It’s the teenager years! You have to prove yourself to the world at the expense of smearing others. That’s how the world high school works.

Anyway, if you don’t know the true story Bing is using click-stream data from its Toolbar to improve its search results. It’s very simple. If you install the Bing Toolbar, they send data about what pages you visited and what links you clicked back to Bing, anonymously, to find new websites, understand patterns of clicks and figure out what are the best web sites to return on their search results. Search engines have been doing that for nearly a decade.

The problem was that Google think Bing should not have used click-stream data if the user was on Google’s website. Huh? There are hundreds of millions of websites out there, and Bing can collect data from each and every one but Google’s? That makes no sense to me. It also makes no sense to say Bing is actually parsing Google’s URL and figuring out what the search term was using and calling it unethical. This is what search engines do. They try to understand context.

Now, I have some shocking allegations to make of myself: GOOGLE HAS BEEN (SECRETLY) USING BING CLICK DATA TO IMPROVE IT’S SEARCH RESULTS!!!
 
Here’s the proof:
 
GET http://toolbarqueries. google.com/ search?client=navclient-auto&iqrn=m0pC&
orig=0wT2b&ie=UTF-8&oe=UTF-8&querytime=3F&features=Rank:&
q=info:http%3a%2f%2fwww . bing.com %2fsearch%3fq%3dMarcelo%2bCalbucci%26form%3dQB
LH%26qs%3dn%26sk%3d%26sc%3d3-16&googleip=O;74.125.95.104;172&ch=77545483278 HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; GoogleToolbar 6.2.1910.1554; Windows 6.1;
MSIE 8.0.7600.16385)
Host: toolbarqueries.google.com
Connection: Keep-Alive
P ragma: no-cache
Cookie: ...

Above is an HTTP request sent behind the scenes between the Google Toolbar running on Internet Explorer to Google’s server. It clearly shows that Google is sniffing I used Bing and that I searched for my name. But it gets worse...

HTTP 1.1 GET http://toolbarqueries. google.com /search?client=navclient-auto&
iqrn=L2AC&orig=0gJdq&ie=UTF-8&oe=UTF-8&querytime=bM&features=Rank:&
q=info:http%3a%2f%2f blog.calbucci.com %2f&googleip=O;74.125.95.104;109&
ch=76329322186 
User-Agent: Mozilla/4.0 (compatible; GoogleToolbar 6.2.1910.1554; Windows 6.1;
MSIE 8.0.7600.16385)
H ost: toolbarqueries.google.com
Connection: Keep-Alive
Pragma: no-cache
Cookie: ...

Well, well, well. Look at that. Google is also sniffing which search result I clicked on Bing!

Now, they are sending all this information to their servers and only God (and their engineers) know what they are doing with that data. Since I have no clue, I have to assume the worst (they are selling the data to the Taliban!)

BUT IT GETS MUCH WORSE THAN THAT!

In the process of looking what Google was sending to their servers I also found this:

GET http://www. google-analytics.com /__utm.gif?utmwv=4.8.8&utmn=...20&
utmhn=blog.calbucci.com&utmcs=utf-8&utmsr=1920x1200&utmsc=32-bit&utmul=en-us&
utmje=1&utmfl=10.1%20r102&utmdt=Marcelo%20Calbucci's%20Blog&utmhid=1696981564&
utmr=http%3A%2F%2Fwww. bing.com %2Fsearch%3Fq%3DMarcelo%2BCalbucci%26
form%3DQBLH%26qs%3Dn%26sk%3D%26sc%3D3-16&utmp=%2F&utmac=UA-604252-2&
utmcc=__utma%.54%3B%2B__utmz%.54.20.utmcsr%3Dbing%7Cutmccn%3D( organic )%7C
utmcmd%3Dorganic%7Cutmctr%3D Marcelo%2520Calbucci %3B&utmu=D HTTP/1.1
Accept: */*
Referer: http://blog.calbucci.com/
Accept-Language: en-US
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64;
Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729;
.NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.2; .NET4.0C; .NET4.0E)
Accept-Encoding: gzip, deflate
Host: www.google-analytics.com
Connec tion: Keep-Alive

Oh, snap! Now we are talking. Google is double-dipping. Not only they are using the Toolbar, but they are also sending that information to Google Analytics, because the destination website was Google Analytics on it. Do they use Google Analytics data to improve their search results? Possibly. They clearly understand it was an “organic” search, what query term was used and which search engine it came from.


But what about the Customers?

The most shocking part of this debate to me is Google completely ignoring the benefits to end-users. Google has been very vocal about doing what’s right to users and pushing/forcing others to do the same (Hey Google, did you forget how often you push Facebook to share its data with you to make a better experience to end users?). Bing using Google’s click-stream data (and vice-versa) is actually good and beneficial to end-users. Period.

Google I love (lots of) your products and I'll continue to use and support them despite this childish, sorry, teenager-ish behavior. Back to work now.


NVIDIA NUANCE COMMUNICATIONS NOVELLUS SYSTEMS NOVELL NETWORK APPLIANCE

No comments:

Post a Comment