I was reading through several blogs tonight and I found this entry, at web3.0log …
If we assume that 2000 people world wide spit out an average of 25,000 junk scraped pages each and every day then that is 1.5 billion pages each month or 15-20% of the size of the google index every month.
… suggesting that 20% of Google‘s pages are spam.
Now I posted a reply to that post, and what I said is what I feel, but I don’t think that it’s Google‘s fault. There are some sneaky people out there, many of them trying to take advantage of Google. They do manage to get a lot of them, blacklist them, and remove the false and fraudulent entries from their database.
1.5 billions pages though, I’m not sure that I can believe Google would let that amount of spam through. My comment actually focused on something I had noticed a year back, that many of the bulletin board entries I click on, are duplicated in other bulletin board sites. That might not be what is strictly classified as ‘scraping’, but it is duplicated content.