Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Phantom Pages Indexed

Does a Duplicate Content Penalty Loom?

         

ken_b

6:33 pm on Feb 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Background
My site has about 1350 pages. Using a site:www.mysite.com search Google shows 4450.

That's been happening for the last 6 months or so, but not consistently. Occasionally Google will show a number that is reasonably close to accurate. (I'm assuming that's probably an issue of which data center the search is drawn from)

Now, yesterday, prompted by a comment in another thread, I did a site:mysite.com search and it came up with 4515 pages.

Also I've noticed a growing number of URL only listings when I do the site: type searches. Again, the number is not consistent.

When Google returns the 4400+ number of pages, the number of URL only listings is also much larger.

Initially the URL only listings showed up for only bottom level, low traffic pages. Recently the URL only listings began to include higher traffic, higher level pages.

A year or so ago I deleted about 1,100 pages, and made the folders they had been in noindex, nofollow. I needed to keep the folders because of other content (images) that still are in those folders.

I also deleted a couple hundred other pages and replaced them with pages that redirected (meta refreshed actually) to my home page. I've since deleted those couple hundred pages.

Questions

Does anyone have an idea of why Google is showing so many phantom pages?

Am I likely headed for, or currently suffering, a duplicate content penalty?

If so, how do I avoid or recover from it?

Possible solutions I'm considering

Creating a 301 redirect from mysite.com to www.mysite.com.

[I don't know much about 301 redirects so I searched around WW and found this thread

An Introduction to Redirecting URLs on an Apache Server [webmasterworld.com]

by DaveAtIFG. I'm hoping I can understand enough of that to work my way through doing a 301 redirect if that's the best course of action.]

Because the issue seems to be intermittent, I don't know if I actually have to do anything?

I'm hoping for some guidance, suggestions, or ideas on what to do if anything.

g1smd

12:36 am on Mar 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>> relative = /page.html

>> absolute = http://www.domain.com/page.html

Err, not quite.

/page.html, and anything else that begins with a /, is position-in-site absolute, but starting-domain relative.

../page.html or ../../ is a relative URL.

Anything that includes the full domain name is always fully absolute.

g1smd

12:45 am on Mar 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Also part of this redirect issue is to make sure that if you link to an index page inside a folder then do NOT include the filename in the link. End with the folder name and a trailing / at the very end.

If you link to a folder, always include the trailing / at the end. It is important.

.

I just sorted out a mess where Xenu generated a massive site map for a site, one that was much larger than expected, contained every page duplicated, and loads with a title of "301 Moved". It turns out that although the site uses domain.com as the base in all the internal links, that the host name is configured as www.domain.com, and that many of the internal links did not include a trailing / on folder names. There was a valid .htaccess file directing calls for www.domain.com over to domain.com and it was correctly set up. So, what happens when you link to domain.com/folder is that there is an automatic internal server redirect to www.domain.com/folder/ (remember the host name is set to www.domain.com here) and then the 301 redirect inside the .htaccess file takes over and sends the visitor over to domain.com/folder/ instead. By including the trailing / this could have been avoided. Changing the server host name over to not include the www is also a good idea, but even if that was done, any request for domain.com/folder would still have to have an internal automatic redirect to domain.com/folder/ anyway. So, always include the / on the end to avoid any redirect happening at all.

ken_b

12:57 am on Mar 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Also part of this redirect issue is to make sure that if you link to an index page inside a folder then do NOT include the filename in the link. End with the folder name and a trailing / at the very end.

Why? This confuses me, of course confusing me is easily done). Does a se get confused between ...

.com/index.htm and
.com/folder/index.htm

or .... what?

g1smd

1:00 am on Mar 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you ever change the technology of your site over to PHP or ASP then all your links will be instantly broken, as they link to index.html not index.php and so on.

Link to www.domain.com/folder/ or to /folder/ to avoid that problem ever happening.

.

Make sure the foldername ends in a / to avoid the server having to do a redirect from www.domain.com/folder to www.domain.com/folder/ which may go via domain.com/folder/ if the host name is not the same as the one in, or implied by, your link.

ken_b

1:10 am on Mar 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's a bad link because when you change the technology of your site over to PHP or ASP then all your links will be instantly broken.

Thanks for the quick response. That makes it clearer for me.

[I've just been talking to folks about converting from a static page site to a dynamic. Part of the conversation has been that maybe I should convert only the old part(s) that would most benefit by being dynamic, and leaving the the reast static. So knowing some of the possible problems with conversion could surely help.]

Atticus

12:35 am on Mar 23, 2005 (gmt 0)



One of my sites has about 1800 pages. Last week G thought it had over 9,000 pages (site:example.com). That exagerated number had been dropping by a few hundred pages per day.

Today it dropped from over 6000 pages this morning to the correct count of 1800 pages this evening.

Hope the site comes back now that the phatoms have fled the index...keeping fingers crossed.

ken_b

6:45 pm on Apr 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Would a SE see

www.mysite.compage.htm

and

www.myste.com/page.htm#something

as the same page or different pages?

In other words, could using named anchors in a link cause problems?

The number of phantom pages problem is still going on for me and I'm wondering if this could be part of the problem

This 37 message thread spans 2 pages: 37