Google Engineers Prove Bing Policy

Stop me if you've heard this one but how many Google engineers does it take to prove something stated in Bing's privacy policy?

If you answered 20 you're absolutely correct!

Unfortunately for Google, what they proved is NOT their accusation that Bing is copying Google's search results.

The Sting

For the three of you that don't already know what I'm talking about, allow me to fill you in.

Google manually ranked sites for 100 different gibberish search terms that had no results in Bing or Google. 20 engineers then installed the Bing toolbar and performed searches for those gibberish terms using Internet Explorer, clicking on the page they'd previously manually ranked at the top of their own results.

In 9 of the 100 tests, Google claims they started seeing the site they'd manually ranked show up ranked #1 over at Bing.

This, Google says, is smoking gun proof Bing is "copying" Google's search engine rankings. Google then served the hit piece story up on a silver platter to Danny Sullivan of Search Engine Land and it just happened to "go to print" the morning of Bing's highly publicized search event.

Google's Matt Cutts and Bing's Harry Shum then duked it out on the Future of Search panel with Google again claiming Bing was copying their results, Bing denying it, and for good measure throwing in an allegation that Google makes billions helping to monetize spam.

Since the internet barely slows down long enough to read headlines, that's where most of the coverage of this ongoing battle between the search engines stops.

Unfortunately, that means the jury of public perception won't notice the plethora of problems with Google's test, accusations, and conclusions.

The Problems

When reading the kind of bombastic accusation Google leveled today, it's easy to look past the details. And when Google is able to provide an example or two that seems to support their accusations, the evidence seems "pretty convincing" to quote Dave Winer.

@fxshaw The evidence is pretty convincing
@davewiner
Dave Winer

However, in 91 of the 100 queries, Bing's results did NOT mirror the Google fabrications. Of course, you don't hear about those 91 results from Google in their official blog post on the issue, or from Matt Cutts when he was on the panel. Instead they prop up the 7 or 9 (Danny Sullivan's article wasn't very clear what the precise number was) as proof that this is happening across a large number of Bing's search results.

Remember, these 100 queries were complete gibberish and had few if any results before Google performed the test. Yet they only saw a success rate of 9%! If click data from Google were being used heavily within Bing's ranking algorithm, one would expect to have seen a MUCH higher success rate in Google's test.

To make the leap from a 9% success rate on search queries manufactured to confirm Google's suspicions to assume this is impacting enough queries to create a "statistical pattern... too striking to ignore" takes a leap of faith or a preconceived confirmation bias.

Bing Admits Using User Data

Another fact Google seems to gloss over is that they instructed their engineers to click on the Google listings and provide Bing click data.

This is important because it's absolutely no secret Bing uses click data from it's users to influence their rankings. It's disclosed in the agreement for the Bing toolbar, and Bing made it quite clear that this kind of data was being used in response to Danny Sullivan's article.

3. Harry Shum very clear http://bit.ly/hYvCIM on 1k plus signals used in ranking algorithm...includes clickstream data.
@fxshaw
Frank X. Shaw

Despite apparently being beyond Matt Cutt's comprehension (see below), Bing's stance is fairly simple. They use click data collected from users as one of what they claim is over a thousand factors influencing their rankings.

So far Bing's response seems to be "We don't copy Google's results. Of course we do." http://goo.gl/8VoDJ vs. http://goo.gl/yW4Ia
@mattcutts
Matt Cutts

By choosing gibberish search terms that produced no natural results, Google effectively eliminated all other factors that could have influenced Bing's rankings. The terms were found on none of the pages in the index and certainly aren't used in any links pointing to any sites. So when Google engineers suddenly provided a data point (the click data collected via the Bing toolbar) for Bing to latch onto, it's no surprise that even a minor factor was able to influence 9% of the rankings.

There are a couple of points that are very important to establish at this point. First, ranking the same site #1 for any given query doesn't mean Bing is "copying" Google's results.

Second, the data being collected does NOT belong to Google. Yes, the data is being collected when the user is visiting Google, however, the users opted to provide their use data to Bing. Bing didn't somehow hack into Google's data, and Bing didn't scrape Google's search result page. Bing used data provided by their users to influence their rankings, which is a practice they've readily admitted is in use.

Bing SHOULD Use User Data

To take it one step further, given the complete lack of any other ranking factors, Bing owes it to their users to use any and all available data to provide the best experience possible. Would it be better for users if Bing refused to serve a result that click data suggests is relevant simply because the data was gathered on a Google property?

Of course not.

Proof NOT in the Pudding?

Given Bing's surprisingly straightforward privacy policy, which clearly states Bing will use data collected to "improve its sites and services" Google should hardly have been surprised click data is being used in Bing's algorithm.

Had Google manually altered their rankings, and NOT provided Bing with corresponding click data (or perhaps even clicked on the site ranking 10th instead of first) Google's accusations would have been supported if Bing's rankings mimicked Google's.

Unfortunately for Google, they DID provide user click data. As a result, the only thing their "sting" proved was that click data influenced Bing's rankings in 9 of the 100 search queries Google manufactured, NOT that Google's rankings were copied in any way, shape, or form.

Want More?

If you enjoyed this article you might want to read our follow up piece, Bing Sting Snares Google. Enjoy!

image source: lollyman

Comments

  1. says

    Yup.

    And none of the bloggers chatting about this “big battle” dare wonder aloud how SearchEngineLand was an apparent tool of Google for the propaganda? Content, Danny’s editorial slant in the article, the timing to land on Bing’s search day event?

  2. Ben Cook says

    John, my tweet stream yesterday was filled with those types of questions.

    I’m working on another post which will address a bit of that, it simply didn’t fit into this post well.

    I agree though, it’s most definitely troubling. I don’t know that I would have turned the story down if I were him, but I definitely feel MUCH deeper questioning of Google about this was required. Of course, maybe that’s why someone like me or you would never get a story like this served up to us on a silver platter.

  3. says

    Thank you Ben for exposing this ridiculous PR scam. This feels like Google’s worried about Facebook and Microsoft and keeps pointing at them, every time a scandal rocks their own nest. I think this is petty and disingenuous. How about they just “de-rank” eHow and Mahalo and be done with it.

  4. says

    Ha. This is exactly right and aligns with my thoughts in a recent post I just published. Google set up a test that had a desired result and took the steps necessary to achieve that result. Notice the domains that rank for the tests too. They are all strong and trusted domains which means along with the clickstream signals the URLs in question are also passing additional domain level signals which help them rank. This is nothing but an attempt at positive PR from the Google borg.

  5. says

    There’s certainly a whiff of opportunism and hypocrisy about Google’s outrage. Ignoring the fact that the greater portion on their UI changes over the past two years have been “influenced” by Bing, they’ve been tracking user behaviour for years using their toolbar and Chrome. To what extent have they used click data and search refinements used on other search engines to modify their results?

  6. says

    I was reading about this and I realized, wow maybe Google is beginning to feel insecure about it’s current position in the market.

    Maybe the first sign of a “Corporate Mid -life crisis”.

  7. Ben Cook says

    Mark, the point about Google using fairly strong domains for these terms without any results is a good one. It wouldn’t take much of a indicator for Bing to rank them #1 for these terms.

    James, Google claims they don’t use the toolbar data for ranking purposes, although they’ve also admitted they do just that in the past. That’s one of just many hypocritical aspects of this nonsense. Check back for more on that in a new post tomorrow.

  8. says

    Ben, not entirely sure I get your point. Click data is one thing, click data specifically collected from a competing search engine (which is the only way they would know to actually associate those phrases with those clicks) and then used to associate that phrase as a ranking factor in their own search engine (as opposed to say, simply adding 1 more click to that site for stuff it already ranks for) is something else altogether. Now, is it as big a deal as Google is saying? Who knows. Could they have done deeper and better testing? Absolutely. However, the results do show that MS is indeed collecting what their users are doing specifically on Google and using that to alter their rankings.

    I think that “copying” is a poor term to use for what is actually going on, and I am guessing that testing as you suggest, ie. doing the same thing with only the #10 result in each case, would better help to illustrate this. However, I think that MS is trying to play it off that it is nothing, which it isn’t. They are targeting behavior on Google to improve their search results, and they want us to believe that it is the same as any other generic click.

    Also, the 9 out of 100 doesn’t surprise me at all. I see ranking discrepancies and delays in indexing from one Google datacenter to the next all the time, I would actually expect MS to behave similarly in that regard.

  9. says

    Thank you for explaining how much more there is to the story. It didn’t seem right and you’ve shown why.

    I hope people read this and see how Google was intentionally skewing results, withholding information, and being extremely poor sports on Bing’s big day.

  10. Ben Cook says

    Michael, it certainly seems that Bing is specifically collecting query data along side the click data and using it to influence their rankings, yes. But I think the arguments that a) it’s their data to collect if they want to and b) they actually SHOULD do this to provide good user experience for their users.

    But to your larger point that MS wants this to seem like it’s no big deal, I would argue it really isn’t that big of a deal. Google’s previously admitted to using toolbar/Chrome data in their rankings. While not specifically targeting Bing (they claim) I don’t see it being all that different in nature.

    Google also uses social signals like Twitter or Facebook which are conceivably competitors, of Google. Why is that acceptable, but Bing’s actions aren’t?

    And finally the 9 out of 100 speaks to the fact that even on the hand picked manufactured tests the results aren’t all that similar, so for Google to suggest Bing is “copying” Google rankings (which is a pretty heavy accusation) on a broad scale is ludicrous and not at all supported by this sting.

  11. Thomas says

    Is the issue not that bing is tracking data at all but where it tracks data. If it were taking the link clicked from the list returned from a search at bing.com, then using that as a method to sort it’s own ranking, I can’t see an issue. However instead Internet Explorer is recognising that the user is on Google and intentionally notes what the user searches, and which link they click. There is a very big distinction between the two, the first being clever tracking of their own site and the second being intentional copying of the Google results.

  12. Ben Cook says

    Thomas, first of all thanks for commenting. Secondly, the data being collected is still Bing’s to use as they see fit. Also, it’s not copying. It’s used as one factor that Google isolated in this test. If Bing were copying Googles results, no click data would be needed.

    Also, do you take Google at their word that they don’t use the same kind of data in their rankings? And do you think Bing is relying heavily on this indicator given it only worked 9 out of 100 times?

  13. says

    Your argument on the 9/100 being a coincidence doesn’t hold up. Since the jibberish phrase wasn’t anywhere on the document, the only way it could show up would be as anchor text signals. Google could have fabricated links to make Bing look bad, but that would have backfired worse. 9/100 isn’t indicative of a strong correlation, but considering the test used jibberish, it does prove that Bing is ranking sites that wouldn’t have ranked for that keyword because it did rank on Google. I agree on your other points.

  14. Ben Cook says

    AJ, I didn’t intend to imply the 9 out of 100 was a coincidence. If that’s how it came across I probably need to clarify better somewhere. I meant to imply that’s a very low success rate & makes me think Bing isn’t using the click data very extensively as a factor.

    Google could have planted some links somewhere but I suspect Bing would have outed that if they could have. I think it’s safe to agree Bing is using click data gathered from users while on Google. What people don’t seem to agree on is whether that’s “copying” as Google alledges or whether is a smart move by Bing.

  15. Barnett Trzcinski says

    The only reason this is a “big deal” is that the site in question that the clickthrough data is coming from is Google. What’s to stop me from doing this on Dogpile (if they returned the results in question), having the clickthrough sent to Bing and then having it show up in results because the Dogpile URL has the gibberish word in it and they clicked a link?

    Bing absolutely SHOULD use clickthrough data from any search engine, including Google. It would be how you get relevant search results from places like corporate support knowledgebases because people browsed that site and clicked on a particular page a lot when searching for a term there. It applies to anywhere on the ‘net.

    As another poster somewhere I read today, all this means is that Google is concerned about Bing and clearly lends it a level of credence it didn’t have before…

  16. ExLoony says

    And what would happen if a group of people took Chrome browser, changed the search bar to use Bing, and then used the same kind of SEO hacking behavior to shift the ranking of some obscure page? Would you expect Google to notice? I would. And I think they would be quite justified and right to, if the user has opted in to Chrome collecting click data. I’d be pretty damned surprised if they don’t – after all, trend spotting is an equal opportunity job.

    The whole area of tracking clicks was pioneered by DoubleClick, now a division of Google responsible for about 30% of their revenue (they morphed into the organization running Adsense, and they still use all the same tricks and more to do “web tracking”). It is how the web works kids, and why we like it so much.

  17. Ben Cook says

    ExLoony, while Google is claiming they don’t do that, I would not be at all surprised if they did.

    Google is the king of data collection & use. They continually use the data collected from users on other sites for their own purposes. To call Bing out for doing the same is nothing more than hypocrisy and a desperate attempt to control the conversation about search quality.

    Your comment is spot on, thanks for chiming in.

  18. bing friend says

    So, even if Bing copied (so called, signal) 9 out of 100 times it is still stealing from Google. As expected Microsoft is quick to throw legal mumbo jumbo to defect the blame. Now, if you are a smart engineer — your algorithm will notice that 999 signals are missing and if the 1 signal from Google is telling you something — you eiher use it knowing you are copying Google’s results or ignore it as stealing from Google is unethical.

    Yes, it is legal — but would Microsoft have admitted by themselves any of this — absolutely not — it will be called a thief and it did not want that publicity.

    Here is something for us to ponder.
    When I do my term paper, I look at signals from Wikipedia, CNET, BBC, CNN and others. If I include my competitors in this list — it is plagiarism. Even if it is part of the input. If all but one of these signals go dark and I copy one signal (say Wikipedia) — it is still plagiarism.

    Bing had covered it’s rear in the 300 page end user license and now hiding behind it. No smart engineer will be proud of this system Bing is describing. There is no talent left at Microsoft — leadership or at lower levels. This was a sheer act of desperation. When Bing was running PRs day after day how they are catching up with Google on relevance, would they had admitted that there is a non zero chance the results are literally copied from Google?

    Engineers at Microsoft — wake up!

  19. Leo Bushkin says

    One thing I have not seen adequately explained is how the bogus search terms entered by the users were correlated to the links that were clicked. Just passing clickstream data to Bing wouldn’t be sufficient to place a URL at the top of the result stream for a particular search term unless the search term itself was also captured and forwarded. This would mean that the Bing Toolbar has some built in logic to identify when a keyword or phrase submitted by the browser is being used as a search term – whether on Google or any other site. Am I mistaken in this? Is there some other mechanism at play that correlates the search terms to URLs in the ranking process?

  20. Shaun says

    Sure, Microsoft hasn’t done anything other than leverage information gleaned from an opt-in service as a metric in it’s search algorithm. At the same time this kind of ‘cribbing’ from another search provider does make the service look pretty lousy from a P.R. standpoint; as the tarsorrhaphy case shows, Bing wouldn’t even return results in certain cases without looking at information returned from Google. Does this mean that I need to wait for a user using Bing Bar or IE with the reporting options turned on to use my misspellings (perhaps repeatedly) on Google before Bing will return a result?

    The world needs multiple search engines (if only to drive innovation and prevent complacency) and Google certainly needs strong competitors, but this ‘experiment’ shows Bing just isn’t doing (at least all) of it’s own work. I don’t blame Google for pointing this out publicly (although they perhaps could have been classier with the wording and timing) If someone used my multi-billion dollar search algorithm even indirectly without asking I’d be sore too.

  21. Greg Eales says

    Do the same test but this time click on say the 5th result repeatedly. Suddenly bing ranks the click through as #1 and therefore does 2 things. Not use googles ranking and provide better relevance. Perhaps google should learn from bing and implement this to improve their results

  22. Joe says

    The people working on Bing are lazy. Simple as that. With Microsoft losing 2+ billion dollars in their online business, and they’re jury copying google, what are they spending their money on?!

  23. says

    “There are a couple of points that are very important to establish at this point. First, ranking the same site #1 for any given query doesn’t mean Bing is “copying” Google’s results.

    Second, the data being collected does NOT belong to Google. Yes, the data is being collected when the user is visiting Google, however, the users opted to provide their use data to Bing. Bing didn’t somehow hack into Google’s data, and Bing didn’t scrape Google’s search result page. Bing used data provided by their users to influence their rankings, which is a practice they’ve readily admitted is in use.”

    This is the part of the story that needs to get the media’s attention.
    Thank you for the analysis

  24. says

    Thanks for writing this and thus countering the abundance of pro-Google PR spin some bloggers are churning out in an effort to get chummy with Matt Cutts.

    Speaking of which, the chummy relationship between Google’s Matt Cutts and SEL’s Danny Sullivan (the latter even featured with some other SEL writers in Cutts’ webmaster videos) should make anyone reading SEL’s coverage of Google related matters aware of their heavy pro-Google bias. They claim they criticise Google as often as defend it, but even a cursory glance of their reporting belies this fact – on all the big issues SEL is Google’s staunchest defender, and the way they’re covering this Google-v-Bing malarky is so blatantly pro-Google it’s almost sickening. Choice of phrasings, highlighting of certain facts in favour of others, the way they seem to accept Google’s data at face value and try to undermine Bing’s claims, etc.

    If Google believes it has a legit complaint, they should take it to court. If not, they should shut up.

  25. says

    Ben, I think you’re right. Your question, is it smart for Bing to be copying results? I say, absolutely it is. It’s the perfect strategy for someone who is behind the leader by 40% marketshare. It shows a considerable lack of innovation (Microsoft’s wheelhouse), although it being such a minute factor doesn’t upset me terribly. If Bing wants to eventually become the market leader, it’s time to innovate. If they want to hang on to their 30% marketshare with all their might, copying Google might get them there.

  26. says

    Bing should respect ‘robots.txt’, even for clickstream data. That file explicitly states exists to state “The URLs matching these patterns should not be distributed or crawled.” Yet Bing is doing just that.

  27. Ben Cook says

    Coyote, the robots.txt file has nothing to do with click stream data. Blocking a page from being indexed or crawled has nothing to do with the users behavior once they are on that page.

    Besides, if you’re blocking the page via robots even click data won’t make them rank your page so it doesn’t really matter if they collect the data.

  28. Ben Cook says

    Leo,
    The click stream data is enough to place a URL at the top of the results. Search queries are passed through URLs which are collected via the click stream data. I would imagine they parse the queries in the URLs and watch what URLs the user then clicks on.

  29. Siddhartha says

    I don’t think your arguments hold up at all. You are talking about these two points:

    1. Low success rate. Do you have any idea how many clickstream data Bing gets? It is unreasonable to assume that all of them will appear in bing’s index within a few weeks. Given the nature of the experiment, even one hit is sufficient to prove what Google is trying to prove.

    2. The argument is not only about ranking. Its sad that you didn’t understand that – like most other blogger pundits. In this experiment the keywords had nothing to do with the honeypot page pointed to. If Bing used its own index only, those pages shouldn’t even show up in the query. The fact that they show up in the results means that Bing essentially copied what Google returned in search results for those queries.

  30. Ben Cook says

    Siddhartha, I don’t mind disagreement, but your second point is completely incorrect.

    Bing can track the URLs you’re visiting, so when the Google engineers went to Google and performed the searches, they were passing the search queries to Bing. By then clicking on the first listing, they were indicating to Bing that the URL they were visiting related to the query they had just performed.

  31. co says

    If a page is protected by robots.txt, but the user’s browser says it gets to collect anything the user highlights on the page, does the browser’s company get to store the highlighted text of the page in the crawler database? This situation is analogous. Bing is discovering links via google results pages which are not supposed to be crawled, and ranking them according to google clicks, which is smart (but lazy/defeatist/”cheating”).

  32. Ben Cook says

    Co, who said Bing is pulling in pages that aren’t supposed to be crawled? As far as I saw there was never any mention of the ranking pages being blocked by robots.txt files until mentioned in the comments here. Besides, Google and Bing are both supposed to respect the robots.txt files in the same way.

  33. Don Munsil says

    My first question in all of this is why Google didn’t do the obvious thing and actually test what information was being sent. They went to great lengths to suggest that the “suggested sites” feature is sending the info, when a simple look at the data stream sent by IE would show that it isn’t so. It’s all the Bing Toolbar, which in fact clearly discloses that they can record your clicks anonymously when you install it.

    This blogger (http://projectgus.com/2011/02/bing-google-finding-some-facts/) actually looked at the feed being sent to Microsoft by the Bing Toolbar, and unsurprisingly it’s doing exactly what it said it would do if you opt-in to the data collection: it send the sequence of URLs you visit to Microsoft. The search text for a Google search is part of the URL, so it goes to Microsoft along with the next thing you click on. They are not sending a scrape of the actual Google results, nor is there anything special about Google; they send the same information for every site. So they’re getting signals from all kinds of web searches, not just Google.

    I must say, I find Google’s complaint petty in the extreme. Collecting click data from users is a smart thing to do. If Google isn’t already doing the same thing from their own toolbar (which I’m sure is substantially more popular than Bing’s), then they’re pretty foolish, since I’m pretty sure their privacy policy allows for it.

  34. Ben Cook says

    Don, thanks for the link to that post! Also, I completely agree with you. It’s not only NOT copying, it’s also smart on Bing’s behalf.

  35. ThinkAboutIt says

    Another thing, most people are missing is – it doesn’t matter what the rank of the link clicked was. No matter where the clicked link was in terms of ranking/order (served up by Yahoo, Google, Bing, Ask, etc.), the fact that the User chose that specific link from the many offered – it means something and hence the use as a signal (amongst others).

  36. says

    I respectfully disagree. Let me rephrase what I said in point 2 in my earlier comment.

    Yes – Bing uses clickstream data. But how it is using the clickstream data is of relevance here.

    This is the data that Bing would get from the clickstream data (after parsing the click url and all).

    For a search on keyword hiybbprqag in Google.com, the user clicked on a link pointing to “http://teamonetickets.com”

    Now what does Bing do?

    Since the result is from Google, it seems Bing automatically associates the site http://teamonetickets.com with the keyword hiybbprqag without even verifying if the site contains that term, or, anything relevant to that term. MS execs may call this “signal” and all, but the result is clear:

    After a Google search, anytime an user clicks a link, Bing associates that URL with the keyword (regardless of whether the keyword is relevant to the url or not), and uses that data, in addiion to many other data in serving up their own search results.

    In laymans words:

    “Oh, Google is returning this site for this keyword, and people are clicking on it – even though we have no clue why this keyword should be linked to this site. But since Google is saying it is related, and people are clicking on it, it must be related. Let’s associate this keyword to this site, and since this is coming from Google, let’s give it a higher relevance.”

    Bing is essentially using Google’s meticulously created index data to improve its own results.

    It is smart? Of course this is a smart thing to do.
    Is it strealing? In my opinion, yes. One can be a smart thief, but still a thief.

  37. Steven Fuqua says

    @Siddhartha

    Two things:

    It’s not as straightforward as “oh, this link is from Google, let’s steal the query!”. It’s more generalized and not targetting Google specifically.

    More importantly:
    “Oh, Google is returning this site for this keyword, and people are clicking on it – even though we have no clue why this keyword should be linked to this site. But since Google is saying it is related, and people are clicking on it, it must be related.”

    Bing is trusting the *user* to imply that it’s related. You have 20 engineers performing a search for a term (that has *never* been searched bfeore), and all of them click the same link. The implication there, from an objective standpoint, is that something about that link jumped out to the searchers as “useful”. It doesn’t matter that it came from Google. The fact that multiple people searched for a rare term and were drawn to the same page is an indicator (yes, a “signal”) that there is an association between that page and that term.

    Harping on whether or not the page contains that search term is having tunnel vision and not looking at it from a conceptual standpoint.

  38. ThinkAboutIt says

    @Siddartha
    To your point on ‘stealing’ meticulously created google index, note that the first thing Bing, Yahoo, Ask, etc. have to build is an indexing infrastructure. That’s the basics and you just can’t steal your way out of it. Bing could have been slower and faster about getting to the test page, but eventually it would have got there to index it. The ultimate magic sauce on top of this is Relevance – that’s what makes it useful to the user to find things quickly.
    All the innovation and dogfight currently is focused on the later. Hence the importance given to relevance and ranking algoritms by major Search players like Bing on learning from what exactly the User chose.
    Note that the unlike in the test, the real world link the user clicks can be at the bottom of the page (or othee pages) and not ranked #1 at all despite all the fancy advanced algorithms from the search provider. The fact that the user blessed “that one” is where lies the secret of intentions of his/her search.

  39. Don Munsil says

    There’s also an implicit misunderstanding in some of the criticisms of Bing in that people point out that a particular web site doesn’t contain the search text. Well, both Google and Bing routinely put links to pages that don’t contain the search term, because they’ve linked the page and the term some other way, for example:

    - The term is contained in the text of one or more hyperlinks to the page.
    - The term is near a hyperlink to the page.
    - The term is used in forum postings that contain hyperlinks to the page.
    - The term is close in spelling to another term which is linked to the page
    - The term is in the URL itself.

    In this particular case, the search term is contained in the URL the person gets when they run the search. So Microsoft is getting a URL that contains a plain-text search term, followed by another URL for a web page.

    My guess is that you’d find this same “honeypot” technique would work the same if you ran any other search that puts the plain text of the search terms in the URL. Which is a good thing. If people searching for “batteries” on Amazon usually click on one particular item, it seems like a smart thing to incorporate the popularity of that particular linkage into Bing’s search algorithms.

    What Google seems to be saying is that tracking a user’s sequence of clicks and using it as relevance feedback is OK for them to do in their own search engine, and OK for Bing to do in their own search engine, but not OK to do via a toolbar, because along the way Bing might unfairly pick up information that they aren’t entitled to. Which is silly. Assuming people are properly opted in, clearly there’s nothing wrong with asking if people will share their click sequence with you.

    Or maybe they’re saying that tracking clicks is OK, but whenever Microsoft sees a URL from Google, they should ignore it, because looking at it would be unfair? Honestly, I’m really having trouble understanding Google’s complaint, here.

  40. CGomez says

    I believe a larger conversation is: “Users have no idea what you are REALLY doing when they ‘read’ and agree to the EULA for any toolbar, so is it right to track their every click?”

    I’m not really going to answer that, because there are a lot of answers and opinions… opinions that I even hold that are in conflict. That’s what is going on here, right? Users install a toolbar, either explicitly (doubt it) or implicitly (there was a time when my wife installed iTunes and they tried to sneak Google Toolbar on there, which I think is evil). They don’t read EULAs. As we all should know, when a user sees a dialog box, the words on the box and the OK/Cancel buttons are translated in their minds to:

    “Blah blah blah blah blah blah. Blah blah blah blah. Blah blah blahhhhhh blah blah”
    Button: Keep Working/Install Program You Want
    Button: Stop Working/Don’t Get Program You Want.

    With that said, I will address the actual complaint. There is still no evidence Bing is directly copying Google. And it is precisely because we don’t know what the results of control experiments were, or what they were.

    Without ever specifically setting out to copy any particular site, you could improve search rankings by simply watching how users who end up at a particular domain or page stay there for long periods of time (either by repeated clicks to pages within the domain or an indication of stopping to read for long periods).

    One such example by itself is preetty useless (perhaps the user walked away for 30 minutes). 400 million such examples might mean “Users who start at xyz.com and query for ‘foo’ end up at xyz.com/foo for a ‘long period of time’”.

    Possible conclusion: Maybe we should return ‘xyz.com/foo’ when people search for foo.

    Obviously oversimplified, but definitely a plausible means of capturing user data from Bing or Google or Ebay or anyone’s toolbar and improving results. Why copy Google specifically when you can just track everyone equally and get even better tuning?

    Do I think what Bing is doing is wrong? I don’t know. I’m not really defending MSFT here. I’m just saying we don’t know if they say “OMG that’s a click after a google page, save it save it save it!” or if a non-specific algorithm can have basically the same effect without specifically setting out to copy Google.

    I think Web Toolbars are wrong. I can give you that much for sure.

  41. Ben Cook says

    Klaus,
    The problem with your solution is that the data isn’t the webmaster’s to control. It’s the user’s data. Unless Bing starts indexing pages that are blocked by robots.txt because of the clickstream data, I don’t see an issue here. Bing is free to collect information from people that have opted to provide it to them. Google (much to their dismay I’m sure) has nothing to do with the matter. Besides, as I and many others have pointed out, Google does the same exact thing, just not (they claim) to Bing.

  42. Frank Zappo says

    As mentioned by another commenter, the problem is not with Bings toolbar monitoring which URLs a user clicks/visits – the user agreed to sending that data off when they installed it. The problem is that the toolbar must be ‘snooping’ into what they searched for on a competing search engine in order to link the search term with the URL they click. That is the only way they can link fictional words to pages which have no other relation.
    It may be legal, and it may be an effective way for a younger search engine to gain ground on a bigger one, but it isn’t ethical without their permission. It’s not unlike someone copying this blog post word for word, removing the name and putting their own on the bottom.

  43. dont wanna says

    I read the conversation here and couldn’t help but laugh at the very idea of “ethics” in computing being spoken about as if ethics actually existed anywhere at all in the computer realm. I have to truly laugh out loud at that. There isn’t an honest programmer anywhere on this earth. Once a programmer fully realizes that what he knows about programming computers and directing information discovery is something that actually hardly anybody on earth knows how to do well, even a little bit, and it sinks in that in spite of all the “computer usrers” out there jumping up and down on the internet like a bunch of little kids laughing and jumping up and down together on a bed, blissfully unawaare of what it takes to make a bed and how it might feel if they fell off, and oblivious to all of that and so much more, they haven’t any way of knowing how to keep their eyes on what a programmer does. Programmers giggling together in conversations like this, but offering the bit of huff and itty bit of puff like whats written here is a hilarious piece of comedy. Nobody except programmers know how to do what they do. And since they all spend their time trying to think up new ways to make what they do even more confusing, squeeling about “security” and “hackers” and “patent violations” to make sure they raise enough fuss to confuse us even more and so they can broadly and without any fear whatsoever do anything they want to do. And thats all computer and computing and information management folks do. They work to take all they can from anyone who doesn’t know better. And since thats just about everyone they just play with us and take us for all we’ve got and we have no idea what they’re doing. in a world where such a nonsense piece of fluff like Instagram that does little more than fill holes and makes things all shiney and ooo, is .kicked around for a billion dollars, its clear that the people in charge of our information flow are laughing their asses off at us who can’t program and who don’t know how to figure out what they’re doing behind our backs, other than to be vaguely aware that its clear our information sharing structure is just a maze of mirrors with shiny bells and whistles to fool the users that hide what our information river really is, a vast system of back doors, where pools of money flows in, shepherded by pillaging bots dressed up to look respectable in their fine EULA’s and Terms and Conditions and what-nots.

    There are no rules in what Google and Bing are doing. And indignation over what our current crop of Robber Barons are doing to us now won’t become fully apparent for another hundred years. Rivers of Trains or Rivers of Information, they’re the same when they’re run by such a tiny few, and without any oversight sincre no one has any way of tracking or know what they do. Like then, nobody knew what the Rail Road barrons were doing cause only those few could get high enough about it to know more than anyone else, and they kept that knowledge to themselves. Like today, nobody has the education to police anything computer people wany to do. And its clear that what is stealing and what isn’t has yet to be acknowledged in the land of communications exchange. What kind of a SuperFund will be needed a hundred years from now to clean up the kinds of waste and debris the computer people spew at us now. Useless programs that clog up right when they shouldn’t be clogging something but nobody can quite tell they’re doing it on purpose even though they are really programming it that way. And the pollution out on Patent Diaster field is sinking like vicious toxic waste that is just poured and pured more and more anywhere without a care for the long term health of our server farms or of whats filling up our spectrum till one day it will be of a problem on the scale of Los Angeles, an invisible vast dust bowl of crap filling the air clogging it up for everyone in the future, with little concern for whats really best. Cleaning up the programming messes in the future will be hard on the people then. Fighting the patent wars we’re driving them towards today that we can hardly understand we’re doing now will be a lot for them to deal with then.

    So, lets all laugh when somebody says that ethics exist in the Land of Information, because we’re all just taking their word for it, the rest of us cash-on-demand-deliverers have no clue except to see that Instagram went for a billion dollars with little to offer its sale participants except the fun of waiving all that money around and laughing that we’re all so stupid we totally thought it offered something really worthwhile to all its users who do the little bit it lets one do but who wish it were really helpful. But, eh, Apple taught the computer world well, just make it look pretty on the outside, and even though its guts are really just filled with the cheapest of physical parts that cost less than a buck or two to make and assemble and ship, and the sofware inside it offers little more than distraction and still just lets people just talk to each other in a coouple new fancy ways, thats about all they get for that ridiculous monthy fee and preposturous price to buy. But, like this article suggests, sombody’s gotta pay google and bing when they wanna get paid. And squeeling about copying just a tiny bit of information that nobody cares about anyway. Except, somebodys gotta pay those Patent and Copyright fees. Those, along, send costs up till a healthy billion dollars gets squeezed down to just a just a billion dollars when they put the swueeze on everybody. Cause those two things are really the only things that can be demonstrated as “real” in court. No judge will give a crap if somebody somewhere copied something and sold it somehow in computer land in a split second, but a patent, ahhh, its become the dagger to the heart to all who would rather not pollute the information land’s resources, but to the Robber Spreadsheeter Dandy, they’re as strong as their 99 year leases on their chunk of the light spectrum. With Patents they control the words and the maths that channel their products that they make ever more twisted and confusingly so that nobody knows what they’re doing when they send what they know and we don’t through the spectrum that they own through the air. That we all share. But we can’t see what they’re sending. With no way of knowing, will the Trains ever run on time in the lands where algorythims grow and run free. Savage and wild it is there, though. Ethics! HA!

Trackbacks

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>