homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 39 message thread spans 2 pages: < < 39 ( 1 [2]     
Some research on Google quality raters' behavior

 7:28 am on Sep 8, 2011 (gmt 0)

So, I decided to check which pages they looked at before banning (de-indexing) one of my sites. I was hoping they've managed to find something really-really bad on my site and if I can find it by following their tracks, then remove this really-really bad thing, my consequent reconsideration requests will be more successful. One such request has already been rejected.

The tool used in the study: awk [the-art-of-web.com]. Got some great samples from that site.
Anyhow, after much awking, I worked out the code that seems to be grabbing the very logs I'm looking for: a visit from the Plex (by IP) that was an actual browser and not a bot. Also, I used superclown2's idea from here [webmasterworld.com] that the raters are coming in on Macs.

So I ran this on my August logs (both ban and recon request were last month):

awk -F\" '($6 ~ /Macintosh/)' *.com | awk '($1 ~ /^70\.90\.219|^70\.89\.39|^70\.32\.|^64\.233\.|216\.239\.|209\.85\.|199\.87\.|173\.194\.|^74\.125\.|^72\.14\.|^66\.249\.|^66\.102\./)' > ~/google_visits_on_Mac_most_IPs.txt

(it should be one-line command. Run it in your ~/access-logs directory, the resulting selection of human visits from Google will be in your home directory: ~/google_visits_on_Mac_most_IPs.txt )

So, anyhow, I was able to see a visit exactly 1 day (approx 25 hrs) before each of the events - ban and the response to my recon request. Both were from the 216.239.x.x subnet although earlier there were hits from other Google networks, too.

I was rather disappointed to see that before banning the site the rater visited a very drab and ordinary page on my site. Not a smoking gun of some incriminating evidence of a hacker break-in or some such I was looking for. Also disappointing is the fact that they visited one page only. I can't tell how long they have stayed on the page but can you make such a drastic decision about a 400,000+ pages site by looking at just one of those pages?

Probably even more disappointing yet is the way they treated the reconsideration request. A person came in and, indeed, only looked at a single page again. Only this time it was simply the homepage. My site is a forum, so the homepage contains pretty much only a list of the most recent threads - not much else to see there. At least the page they looked at before banning was representative of the layout (including ads layout which I hear they hate so much now). The only conclusion they could possibly have made by looking at the homepage and weighing my reconsideration request was that the site's still up. Apparently, that was enough to reject the request.

Anyway, hard data confirmed: your livelihood is in the hands of a typical overworked, disinterested American (IP geo) corporate employee. No surprise here...

Anyone want to fill in here about what raters are looking at on your site(s)?



 3:52 pm on Sep 27, 2011 (gmt 0)

(I would so jump on that if I were you)


 4:50 pm on Sep 27, 2011 (gmt 0)

I want to see the site!
Review My Site
The rules of that forum require me to review two sites by others and I don't feel qualified. Having enough confusion about my own. I can PM the URL to you if you like, just to remove the suspense. Additionally, I have an experience in submitting a site to the Google Help forum for review and that turned out to be just a beating that got me nowhere in terms of constructive changes I could make. I learned that I have no Web design skills (as if I didn't know that) and I have too many ads (fair) but as far as why the ban - that's still an open question.

Still, I started this thread to talk about the Google review process itself, at least those technical parts of it that could possibly be quantified, such as non-bot visits from Google networks. However, this thread constantly detours into reviews of my own sites. I don't exactly know what to make of it but it looks like many people think that the review is a piece of cake - anyone can do it, even though the exact parameters are not known. In other words, the expectation is: if you land on a junk site that has no business of being in Google's index, you'll know instantly that it's a junk site.

I guess, yet another way of putting it is: there is no review per se - it's purely a visceral response to certain attributes of a limited amount of pages (just one page is also a possibility) that has a potential to kill the entire site.

I'm not trying to argue with this point of view even though I myself hang around a lot of 90-s era hobby sites and have no qualms about the looks as long as the information is there. But I am very interested in how this review process goes and what I can do in the future to prevent a possibility of having my other sites (some of which do not exist yet) banned by a rushed reviewer. So, I'm trying to learn what those damning attributes are. I just thought that knowing what pages they look at can give me a clue.

Can I just reiterate that for the purpose of this thread there is no question whether the ban was justified in the first place? Just assume that my banned sites are trash. If you are still interested in the topic after that, I'll be ecstatic to see your comments here!


 5:05 pm on Sep 27, 2011 (gmt 0)

See if there is any hits on the page "this_page_should_not_exist.fake", then you have the IP. Seems like they sometimes do check for a proper 404 response using this filename.

Not this exact URL but Googlebot does probe every couple of days for
/noexist_googlesiteid_.htm where '_googlesiteid_" is the same as in my WMT verification URL /google_googlesiteid_.html

Come to think of it: I'm looking at this 404 page on my site and it meta-refresh redirects with 1 second delay to the homepage. Dumb move? Could this be looked at as a 301 of any junk pages including spam and DMCA-ed to the site's homepage?


 5:18 pm on Sep 27, 2011 (gmt 0)

I'm looking at this 404 page on my site and it meta-refresh redirects with 1 second delay to the homepage.

I do believe both Google and YaBing! interpret that directive as a 301.


 5:27 pm on Sep 27, 2011 (gmt 0)

I do believe Google interprets that directive as a 301.
Thanks, Pageone! Well, what do you know: this 404.shtml was on the server for at least the last four years. The critical mass of junk redirecting to the homepage got to the banning point? Fixed that, looks like another recon request is in order...

This looks like a technicality Google WMT could/would notify the webmaster about, no? I remember a wave of Soft-404 errors in WMT that came and went almost as quickly about a year ago. It looks like the same thing to me... Anyhow, we'll see how this pans out in another month or so.



 5:52 pm on Sep 27, 2011 (gmt 0)

You and I had this discussion over a year ago!

Google Displaying 'Soft 404' Errors in Webmaster Tools
Jun 6, 2010 - [WebmasterWorld.com...]

I wonder what the introduction of the META Refresh Element would do in this instance? Any ideas? For example, a page that returns a 404 Status and then has a 1 second refresh to the home page.


 7:11 pm on Sep 27, 2011 (gmt 0)

You and I had this discussion over a year ago!
Darn it, Pageone, I envy your memory! I do remember now that as a result of that exchange I've changed the CMS to return better 404s (410s in my case) for all plausible URLs i.e. those that could have been content pages but the particular record cannot be found in the DB, most likely deleted spam. What I forgot to check for was implausible URLs - those that could not be formed into a proper DB lookup and therefore CMS could not handle those. Those URLs received the system default 404 that, it turns out, had a meta refresh to the homepage in the content returned following the 404 header.

Soft 404 reports in WMT disappeared soon after those messages were posted and I moved onto the next issue du jour

Anyway, thanks for the tip! Looks like I've got some work to do now double-checking the 404/410 handling rather than catching those elusive Google raters :)


 7:52 pm on Sep 27, 2011 (gmt 0)

By the way, thank nettulf for the hint...

See if there is any hits on the page "this_page_should_not_exist.fake", then you have the IP. Seems like they sometimes do check for a proper 404 response using this filename.


 8:13 pm on Sep 27, 2011 (gmt 0)

Yes indeed, a proper credit is in order: thank you nettulf for nudging me into what promises to be a much more fruitful direction!

This 39 message thread spans 2 pages: < < 39 ( 1 [2]
Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved