Welcome to WebmasterWorld Guest from 188.8.131.52
I did the rounds to check on the state of various data updates. I'd estimate that the "0.5" (not algorithmic changes, but rather responses to various spam/porn complaints + processing reinclusion requests) should go out this weekend sometime or possibly Monday. There should be a binary push this week to improve a corner-case of CJK-related search, and that new binary should have the hooks to turn on the third set of data. Regarding finishing up the second piece of data, there's still two data centers with older data. Those data centers will probably be switched over by Monday. By Monday, 2.5 of the 3.5 things will probably be on.
If you click on it, it's a Google cache of our page from one of their IPs, coming up in the search results! So now Google is hijacking us too ;)? Have no clue how to get rid of it. Last week under the same inurl: there was a URL only result that looked like this:
With a comma in the middle! Tried to URL console remove it, but it wouldn't take it because it said it was in the wrong format! E-mailed G support, and they claimed they couldn't find it, and wanted more info. E-mailed them back. It disappeared the next day, but today, it's back again.
Where is all this trash coming from?
I think that in some cases it is penalising the innocent sites that are being scraped.
I have no evidence for this. I'm basing my guess on:
1) people have commented that they're seeing serps with fewer scraper sites.
2) such a filter would be very hard to implement without taking out innocent sites too.
I'm wondering whether having adsense is a factor in this. It might be time to start thinking about how much content on a page is unique.
Google has pretty much every page on my site, and each one ranks well for it's intended phrase. However, the homepage does not exist ...
A search for my domain.com returns "Sorry, no information is available for the URL mydomain.com". Now ... why would the site index go MIA but every other page is fine and maintaining their rankings?
All pages had adsense on them.
Pages with ~> 400 words have come back to be fully indexed although heavily penalized.
Pages with ~< 400 words are mostly totally gone, some are url-only.
I had a lot of pages that were blowups of screen-shot thumbnails. These pages had a paragraph or two describing the screen. I believe not one of these remain indexed in any manner. These pages would be very similar to product pages.
My solution is to go back to how the site was 2 years ago where these small programming example pages were all on a single page with a table of contents (within page links). I have one page remaining like that and it is in the index.
I'm just going to delete the small pages since 95% of them don't exist in G and, while they rank well in Yahoo, no programmers use Yahoo for search.
Time to visit the wayback machine.
Any comments are welcome.
Looks like people have given up on this update.
Yup! it has most certainly been a strange one. I have a new site that initially appeared to dodge the sandbox. It got a bit of pr and I was getting a little bit of traffic from G. A week down the line and G is listing all pages from the site but I can't find them in any useful positions. I still have the pr but almost zero google traffic. Funny thing is my Yahoo traffic has gone up quite dramatically and is driving far more relevant customers to my site and is converting far better than google did. I think its time to ban googlebot and remove my URL from google ;oP
I think G could be a 'bad neighbourhood' heh heh
Anyway, for me, at the moment Yahoo rocks and Google sucks so whatever they do this update has no effect on me except in influencing my new opinion of google.
On Wednesday night, I stumbled across a firm called Step Forth (sorry, no URLs, why?), but you can find them pretty easily - they're a .com.
They're an SEO firm, and they seem to have a pretty objective understading of Google. Here's just one comment they are making:
The second major factor is that Google needs to discourage search-marketing consultants (SEO/SEM) from abusing the obvious exploits found in their core-method of sorting and ranking sites, PageRank. While parts of the PageRank formula have changed over the years, the base concept that a link equals a vote has remained the backbone of Google’s ranking algorithm since day one. The simple logic behind PageRank produced highly relevant search engine listings, which were fairly easy to manipulate. In order to prevent gross commercial manipulation, Google has had to add several weights and measures to the evaluation of incoming links, a process that is obviously easier said than done. It also has to tie as many of its features and services together in order to present the best search listings it possibly can.
I may or may not have a problem with scrapers or hijacking, but if I do it's distributed over so many other sites that I have little chance of doing anything about it.