homepage Welcome to WebmasterWorld Guest from 54.205.193.39
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Webmaster Tools is Going Nuts - or maybe I am
webdude




msg:3745031
 2:39 pm on Sep 15, 2008 (gmt 0)

Please stick with me... this is probably going to get long. I have been having some very odd problems in Web Master Tools and would like to get some advice from some of you with more knowledge then me (probably most of you out there). I have radically made the examples as simple as I can to get my points across.

History:
Two and a half years ago, I decided to take individual forum files and group them under one filename. Back then, each forum was named separately... i.e...
this-forum.ext
that-forum.ext
another-forum.ext
etc.

Each of these files would pass arguments to pull up a list of posts... i.e...
this-forum.ext?topic=1
that-forum.ext?topic=2
another-forum.ext?topic=3
etc.

Very simple. I decided to combine all the posts and rename them into a single filename. I knew there would be fallout for this, but at the time, I didn't care. The forum was relatively busy and I was changing it for easier administration. So now, the new filename was just forum.ext and I broke each forum out like this...
forum.ext?topic=1&ForumNum=1
forum.ext?topic=2&ForumNum=2
forum.ext?topic=3&ForumNum=3
Still, very simple (pay attention to case and order of arguments).

Okay, things went great after that. The old forum pages were dropped and now showed 404s. The new forum pages were crawled and everything was great. The forum in the SERPs dropped for about 4 months and then started to come back. I had arranged titles and meta-descriptions to avoid any duplicate content like this...
title - <topic> <forumName>
description <snip of first 100 characters of post> <forumName>

Everything went well and at the end of 2007, there were 13,500 forum pages in the index which was about 50% of the pages. All of this was done with no robots.txt or deleting URLs using Google Webmaster Tools.

In April, 2008, I started seeing a drop in the number of pages indexed. I really didn't know why. I read through other posts on WebMaster World about how the site: operator was flawed anyway, so I really didn't pay it close attention. GWT still looked good, rankings were good and everything looked okay to me.

In June, 2008, I was shocked to see in GWT that the site had over 6500 pages of duplicate titles and meta descriptions. As far as I could tell, URLs were getting muxed somehow. I posted of the dilemma here in Webmaster World in the Googlebot Now Crawls via HTML Forms posts located here...
[webmasterworld.com...]

What has happened is that Google has muxed the URLS like this...
forum.ext?topic=1&ForumNum=1
forum.ext?ForumNum=1&topic=1
forum.ext?forumNum=1&topic=1
forum.ext?forumnum=1&topic=1
forum.ext?topic=1&forumnum=1
Forum.ext?forumnum=1&topic=1
Forum.ext?topic=1&forumnum=1

All of these URLS would point to the same page. Case was randomly changed and the order of the arguments were changed. Also, I started getting a bunch of pages that didn't exist like so...
forum.ext?topic=1&ForumNum=18

There is no ForumNum=18

Needless to say, I was flabbergasted as to what was going on. I searched as much as I could to see if these URLs existed anywhere on the Web. I cannot find them. I then thought this was a programming error, but I checked, double checked, triple checked and quadriple checked my code and there is no way these URLs are being generated on my end. All log files on the server showed only GoogleBot was trying to load these pages.

So I started to take some preventive measures to see if I could fix the problem. All muxed URLs now return a 404. I added every combination of muxed URLs in the robots.txt file. I deleted as many of the muxed URLs as I could in GWT and life went on. GWT shows that dupicated titles is now down to 14 and duplicated descriptions down to 4. These are actually posts that have the same topic and very similar content so I am willing to let those go.

Yesterday, I logged into GWT and started going through internal links and guess what! It shows old forum filenenames in there... huh? Where did those come from. I clicked the links to see if I could find them, and they are not on my pages. So Now I have...
this-forum.ext?topic=1
that-forum.ext?topic=2
another-forum.ext?topic=3
back in the internal links though these pages have not existed for two years. Not only that, the pages that these links are supposedly on are from a redesigned site with a single forum file AND from pages that did not exist back when the original forum was busted up into multiple filenames.

So now, I added these filnames into robots.txt, but I am at my wits end trying to figure out what the heck is going on. It seems every time I try to fix something, it just get worse. Either there is something broken on Googles end or someone is trying really hard to bring the site down. And I cannot find the reason these pages even exist on the Web.

Now the site is down to 8050 pages indexed (though I know the site: operator can be flawed). It seems GoogleBot is only doing deep crawls about once a month (this has been going on for the past 2 months - it used to get deep crawled at least twice a week), and everything I look at in GWT seems to be bogus. Funny thing though is that the ranking of the site has never been better. #1s are still #1 and a few other terms actually went up a bit. Another thing too is that though there are so few deep crawls, new topics in the forum are getting added to the index within 24 hours in some cases. I am totally baffled as to what is going on...

Is there anyone else seeing any of this type of behavior? Am I going nuts? Today I actually found an archived version of the site (not even in the root of the server) and deleted it... that's how nuts I am getting. And then I started thinking about sites like the Way Back Machine... could G have picked up on something like that?

Anyway... thanks for reading...

 

tedster




msg:3745105
 4:17 pm on Sep 15, 2008 (gmt 0)

What HTTP response code does you server give when you request one of those old-style URLs?

webdude




msg:3745116
 4:32 pm on Sep 15, 2008 (gmt 0)

All requests for muxed pages returns a 404... tested, tested some more, and then retested. The filenames that showed up in the internal links pages of GWT have not existed for over 2 years, so they, of course are returning a 404. That's the real crazy part.

I have gone on a hunt to find these URLs... nothing. I cannot find them anywhere on the Web. Tried Yahoo link:, directories, old stuff, new stuff... nothing - nada. Even did an exhaustive text search on the server... nothing.
The way back machine though... I can view my site back to 2001. But that is the only place I can find any reference to these old filenames and when I try to click on any of these old links, they return a 404.

g1smd




msg:3745142
 4:59 pm on Sep 15, 2008 (gmt 0)

Google does recheck URLs previously returning a 404 status for a very long time, possibly forever, just in case the URL comes back into use.

As long as the URL still returns the 404 status, there is nothing much to worry about.

You might try returning a 410 and see if that affects anything. Check again early in 2009.

URLs in WMT are URLs that Googe "knows" about, current or historical. That's a very different thing to a URL appearing in the SERPs.

webdude




msg:3745215
 6:32 pm on Sep 15, 2008 (gmt 0)

URLs in WMT are URLs that Googe "knows" about, current or historical. That's a very different thing to a URL appearing in the SERPs.

Would this be true of duplicated titles and descriptions? Remember that the duplication mess was for URLs that didn't exist. In fact, these URLs NEVER existed throughout the life of the site. To rearrange the arguments or to just add values for arguments and then start showing all this Dupe content as being a problem seems pretty bogus...
What has happened is that Google has muxed the URLS like this...
forum.ext?topic=1&ForumNum=1
forum.ext?ForumNum=1&topic=1
forum.ext?forumNum=1&topic=1
forum.ext?forumnum=1&topic=1
forum.ext?topic=1&forumnum=1
Forum.ext?forumnum=1&topic=1
Forum.ext?topic=1&forumnum=1

All of these URLS would point to the same page. Case was randomly changed and the order of the arguments were changed. Also, I started getting a bunch of pages that didn't exist like so...
forum.ext?topic=1&ForumNum=18

There is no ForumNum=18


By the way...
ForumNum=1 through ForumNum=11 are legitimate. I have seen bogus values going up to ForumNum=22, none of these ever existed.

webdude




msg:3745768
 5:06 pm on Sep 16, 2008 (gmt 0)

Here's a new one...

All dupe content as of today was gone in GWT EXCEPT for the following...

forum.ext
forum.ext?_

This was not there yesterday. It's not on my site. It's not on any links. Now it seems the main forum page is duplicated.

I give up...

g1smd




msg:3745786
 5:42 pm on Sep 16, 2008 (gmt 0)

Make sure the duplicates return 301 or 404.

Once that's done it doesn't matter what is listed in WMT.

webdude




msg:3745806
 6:07 pm on Sep 16, 2008 (gmt 0)

That's exactly what I have been doing for the past several weeks. The problem is that every time I plug up a hole with a 404, a new one pops up.

g1smd




msg:3745845
 6:53 pm on Sep 16, 2008 (gmt 0)

Oh yes. Every time you deny Google access to something, they'll dig out another URL from their database of stuff they have already found - one that you haven't blocked - and start listing that instead. Google's public index represents only a very small fraction of all the URLs that they actually know about. They keep backups of everything, as they want to be able to plug those holes in their public listings when various pages and sites disappear or simply deny access.

webdude




msg:3745847
 6:58 pm on Sep 16, 2008 (gmt 0)

But what about URLs that have never existed? I'm stumped.

g1smd




msg:3745851
 7:16 pm on Sep 16, 2008 (gmt 0)

Either something somewhere did allow those URLs to appear somewhere in some HTML source code, or Google was "creative" in looking for more content...

webdude




msg:3745852
 7:20 pm on Sep 16, 2008 (gmt 0)

It's the word "creative" that's got me worried... it seems that G is being very creative.

Of all the pages that could "fill those holes" with all of the 404s I've got, you would figure that G would try to use legitimate URLs.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved