Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

site: operator shows internal page as the first listing - what does this mean?

         

helpnow

5:08 pm on Oct 1, 2009 (gmt 0)

10+ Year Member



When I do a site:mysite.com, I would expect that the first listing would be my home page, mysite.com. But it's not. It gives me 2nd tier pages, ie. mysite.com/contact.htm, etc.

I do not see this "issue" with any other sites I spot check - their home page is always the first listing for their site:theirsite.com searches.

So, should I assume there is a "problem" somehow with my site?

The reason I am focused on trying to see "what is wrong", is my rankings have been slightly reduced 3 times in past 8 weeks, and most recently, on Monday (they came back in 2 days twice, and once it took 8 days). By reduced, I mean, organic traffic is down by 50% overall, and I can see instances where rankings have been muffled, like, former page 1 #1 ranking is now page 2 #1.

Anyway, I know something is amiss. (<grin> Please don't tell me rankings shuffle, I know that, the fact is, overall traffic is down by about 50%, so something has occurred on a macro level....)

My question really is, is their a hint to the question of "why have my rankings been damaged?", in the fact that my home page does not show up when I do a site:mysite.com search. In fact, I can't see my home page anywhere! Just 40,000+ other pages from my site.

I even used to have sitelinks for my site - now they're gone. Also, when I do a standard "mysite" search, I still get a double indented result, with 2 close-to-top-level pages from my site, neither of which are my homepage, and the sitelinks are gone. In webmaster tools, I still have the sitelinks listed...

tedster

12:43 am on Oct 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've seen the home page (domain root) disappear from the site: operator results for a number of reasons. A data glitch at Google is one possibility. A repeated spidering problem when googelbot asks for the page is another.

And I'm also sure other scenarios exist - have you checked other data canters, explicitly using their IP address? How about the results on Caffeine?

helpnow

1:12 am on Oct 2, 2009 (gmt 0)

10+ Year Member



rUsing 64.233.179.104 I get different results, but no home page.

Using www2.sandbox.google.com I get different results, but no home page.

As for spidering, I have downloaded all my errors from Webmaster tools, and my home page is not there.

However, I have 2 possible concerns, the only things I can come up with:

1. I did have a batch of URLs which I 301ed into my home page. Could that be causing an issue? I thought with a 301, the old URL would die. You see, when I search site:mysite.com, these old URLs show up, but now, even worse, the content is my home page - thousands of them - this confuses me because I thought the 301 would kill the URL. I worry that I have caused all sorts of duplicate content on my home page. On top of 301ing these URLs, I also blocked them in robots.txt - though now I wonder if this simply resulted in googlebot not getting my 301 redirect because it was blocked from getting the URL at all - dunno. (I have today changed this strategy: I removed the 301 in my httpd.conf, I removed the robots block, and instead I added a meta tag noindex to these pages - my hope / understanding is that the URLs will exist as normal, (which I need for my site to function, but do not want google indexing), but the noindex will pull it from the search results).

2. The only other issue I had a while ago was a http:www.mysite.com:443 URL that somehow got indexed, and was actually pulling up an error page. I discovered it about a month ago, not sure how long it existed. I fixed that as much as I could, so now it pulls up the home page properly, but I have been unsuccessful in rewriting the 443 out of the URL. That troubles me, but I don't know if it is fatal. The point is, for a while, the 443 URL was actually the _first_ (and only home page) result in my site:mysite.com results! Now, it has disappeared from those results, though I know the URL could feasibly still exist.

[edited by: helpnow at 1:21 am (utc) on Oct. 2, 2009]

helpnow

1:15 am on Oct 2, 2009 (gmt 0)

10+ Year Member



P.S. thanks, tedster, for your help once again!

tedster

2:12 am on Oct 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Let us know how things work out. That 301 tangle could have been creating a problem, depending on a number of other trust-related factors - hope your home page comes back soon, and with sitelinks.

helpnow

2:32 am on Oct 2, 2009 (gmt 0)

10+ Year Member



So you think the 301 could have been the problem? I'm still confused: I thought a 301 would have killed the URL!

This is what google says about a meta noindex tag:

A noindex meta tag. When we see a noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it. If the content is currently in our index, we will remove it after the next time we crawl it.

Quite frankly, this is exactly what I understood a 301 would do too, no?

So, if the 301 caused my home page to dissappear from the sit: operator, then the root cause was that I had thousands of URLs all with the same content as my home page?

tedster

2:36 am on Oct 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You didn't say there was "a" 301 - but rather "a batch of URLs which I 301ed into my home page".

So here's my question - how many pages can realistically be taken offline and "replaced" by the content of one page? That's where I'm thinking there could be a trust issue.

helpnow

2:54 am on Oct 2, 2009 (gmt 0)

10+ Year Member



Well, quite frankly, the issue was an "Add to wish list" link, that I had on all my product pages. The page that came up when you clicked "Add to wish list" was obviously very similar, just a different product name, with a different product id number in the URL for each. So, lots of URLS, all with basically the same content. I realized years ago that this was "duplicate content", so, <LOL>, I 301ed all of them into my home page if it was a spider asking for the page. Seemed to work fine. <shrugging> I thought the 301 would kill them. I guess it didn't - it seems that all of the URLs continued to exist, but now they took on my homepage content for their content. But I've had that 301 in place for years. One of those things that I assumed had been properly addressed.

So, you're saying that this really could have been the cause of the site: operator issue? On top of this, I also had my rankings damaged this week, which is why I've been digging around.

I'm getting the feeling as this conversation progresses that this is all related: lots of URLs 301ed to homepage, home page dissappears from site: operator, rankings get damaged...

So, when the heck am I supposed to use a 301? Sounds like the meta noindex is the ultimate tool, certainly in this situation! If I am reading you right, I should NEVER have used the 301 for this, and it finally caught up to me! It sounds like I should just use the 301 to REWRITE bad URLs, but not to remove URLs from the index.

tedster

3:30 am on Oct 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sounds like you've got it. Use robots.txt disallow or robots meta tag noindex to get URLs out of Google. Use 301 when the same or similar content is now available at different url.

helpnow

1:09 pm on Oct 2, 2009 (gmt 0)

10+ Year Member



Actually, google says this on robots:

A robots.txt file restricts access to your site by search engine robots that crawl the web. (Note, however, that while Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web.)

So, from what I can see now, the only SURE way to get URLs out, is with noindex. noindex will get them out eventually, when google next crawls that page... to get the URLs out NOW, you can use google's remove tool [google.com...]

From what I can see, robots is an undependable tool, espcially if the URLs already exist. If they don't exist yet, and you are being proactive, well, robots will get the job done until the URLs slip out. And in my experience, over the years, the URLs will slip out eventually somehow. So, better you use noindex and be 100% sure. Assuming the spider in question honors noindex. ; )

But as I analyze it all, I can't see what value robots.txt has anymore - better you use noindex to be sure. I'm clearing mine out and moving everything to noindex. Because from what I can see, if you put in a noindex, but still have it blocked in robots, the spider will never get to see the noindex, and you will be trapped in a grey zone.

helpnow

1:10 pm on Oct 2, 2009 (gmt 0)

10+ Year Member



P.S. My rankings are back this morning. Hard to believe my fixes from just last night on this issue were the fix, but who knows! Hard to believe that google can assess good intentions that fast, and reward accordingly. ; ) My home page is still not present in site:mysite.com, though. I'll report back over the next few days if it does show up though. I suppose there is a possibility that if my 301 situation was the problem, it may take a few days for those URLs to dissappear fully, and for my home page to come back.

tedster

3:07 pm on Oct 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



From what I can see, robots is an undependable tool, especially if the URLs already exist.

A robots.txt disallow rule tells the spider not even to request the URL. But a noindex robots meta tag can only be seen if the page gets spidered in the first place.

For an already indexed URL, a robots.txt disallow may get a slow response - but you can always speed that along by doing a URL Removal Request.

helpnow

1:32 am on Oct 20, 2009 (gmt 0)

10+ Year Member



So... Let's say that I have some duplicate content. Let's say I have 100 pages which are duplicates. Let's say I have 301ed them all into a 404.html page, forcing a 404 code in the header.

Shoudl I wait for th 301s to al get crawled, or can I just go ahead and delete all the URLs with the URL Removal Request? My concern is, will Google "remember" the URLs and the duplicate content they had? Is it important to wait for them to all get recrawled with the 301 into the 404.html, thereby removing them as duplicate content?

When I do a site:mysite.com, let's say about 50 of them still show the duplicate content, and about 50 of them are listed with no title or description, suggesting they have been crawled to the 404.html page.

Why are they still displaying in the site: operator? Will they ever dissappear, or do I have to block them in robots, or URL Removal Request them? I thought they would dissappear once they hit the 404 code, no? Am I right in assuming that because some of them are now displaying with no titles/descs in site:mysite.com, that the crawl has updated them to the 404 page?

Yes, I have this problem. I have almost 5000 pages which mistakenly took on the content of my home page. I have fixed the pages, they are slowly losing the home page content in site:mysite.com.

So, now, I've lost my nerve - I am unsure if I should just relax and wait for the crawl, or if there is something else I need to do to cement the fix / could do to speed things up. I am nervous about the URL Removal Request because I worry that I will kill the URLs before Google has a chance to fix my home page, and that it will never be able to fix the home page if I kill the URLs too soon.