Welcome to WebmasterWorld Guest from 54.144.243.34

"fantom-pages" bug in Google?

Google indexes non existed pages

   
8:56 am on Oct 16, 2006 (gmt 0)

5+ Year Member



as I posted at [webmasterworld.com...] thread ,Google indexes pages that probably some scam have put links in blogs or sites or somewhere else ,those urls usually look like this

http://www.example.com/?eat_my_beans
http://www.example.com/?eat_my_cat
http://www.example.org/?b_f_c
or
http://www.example.com/?www.example-2.com/scam.html
http://www.example.com/?www.example-2.com/whatever.html

The phantom page has the PR of the targeted page and it is a phantom duplicate of the certain page usually your index and is usually listed as part of your site in a site command as supplemental result.
The question is, does Google penalize you for duplicate content because of that issue (i.e. 2 identical indexes with same PR), in my case looks like yes it does.
More worst ,there is no way to remove that URL using the URL removal tool because simply it does not exists.
Question 3 how long Google will keep that phantom page, as far as we know supplemental exist almost forever .
Question 4 ,what if scams start putting those phantom links for every url and directory of competitors pages or just for fun ,IMHO Goole will index them if the link comes from a page that has a PR.
Question 5 ,is there a way that Google can prevent this?

[edited by: tedster at 3:34 am (utc) on Oct. 17, 2006]
[edit reason] use example.com [/edit]

3:18 am on Oct 17, 2006 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



There's a way that you can prevent it: Redirect all such requested URLs to the correct URL using a 301-Moved Permanently external redirect. This will cause the search engines to drop the incorrect URLs, and use the correct one instead, passing any PageRank/Link-pop to the correct URL.

How you do this depends on your server, but it is the same basic issue as the "index.html" issue discussed in the Duplicate Content - Get it right or perish [webmasterworld.com] thread: Each resource (page, image, etc.) on your site should be accessible using one and only one canonical --that is to say, "standard," "conventional," or "usual"-- URL.

For example, if the "/" page is never supposed to have a query string appended, you could correct this problem on Apache using mod_rewrite in your .htaccess file with:


# If non-blank query
RewriteCond %{QUERY_STRING} .
# Redirect to "/" after clearing query string
RewriteRule ^$ http://www.example.com/? [R=301,L]

If, due to your site design, the "/" page may have a query string appended, then you could add code to determine the validity of the query string; For example, to redirect invalid queries with no 'page' parameter name (which would redirect all of those queries above, because they are all missing the page= variable name):

# If non-blank query
RewriteCond %{QUERY_STRING} .
# and if query is missing "page=" (and is therefore invalid)
RewriteCond %{QUERY_STRING} !page=
# Redirect to "/" after clearing query string
RewriteRule ^$ http://www.example.com/? [R=301,L]

You could do the same thing on IIS servers using ISAPI Rewrite, or use a script to do it on any server that supports PERL or PHP scripting or similar.

Google cannot know with any certainty that such a URL is a 'problem' on your site -- Whether the "eat-my-cat" query is valid or not depends entirely on your site design. Only you can know that such a query is invalid. So why wait around forever for Google to fix this "problem-or-not-a-problem" when you can easily fix it yourself?

Jim

3:34 am on Oct 17, 2006 (gmt 0)

5+ Year Member



jdMorgan
Thanks for the advise,still the problem is why google indexing such phantom pages ,what about if someone desides to duplicate the planet (if you understand what I mean),we don't know how many of this phantom pages exist ,it was easy for Google to remove manually the 5 billion page but how they can remove phantoms that only can be discover by individual webmasters.
3:48 am on Oct 17, 2006 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Well, that is the point: Only the individual webmaster can know if the example.com/?eat-my-cat URL is valid or not, so only the individual Webmaster can fix the problem if it is not valid.

If you were to try such a trick on any of my sites, it would not work because I use the code posted above -- and a lot more. Any attempt to link to an invalid URL on my sites will result in a 301-Moved Permanently redirect to the valid URL. So setting up a page (or many pages) with invalid links pointing to one of my sites in an attempt to cause me trouble just will not work.

If Google finds a link on a page anywhere on the Web, it will spider it. If the server returns a 200-OK response, then Google will list that URL in the Google search index. The same is true for any search engine. By returning a 301-Moved Permanently response instead of a 200-OK response, you tell the search engine, "That URL is incorrect, use this correct one instead."

No more problem.

Jim

[edited by: jdMorgan at 3:49 am (utc) on Oct. 17, 2006]

3:55 am on Oct 17, 2006 (gmt 0)

5+ Year Member



Jim your help was great and many webmasters must be notified about fixing that problem ,that is actually the meanining of that thread.
9:50 pm on Oct 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks toothake for starting this thread.

And many thanks to jdMorgan for valuable tips. In fact I have already implemented this tips of your ;-)

# If non-blank query
RewriteCond %{QUERY_STRING} .
# Redirect to "/" after clearing query string
RewriteRule ^$ http://www.example.com/? [R=301,L]

I don't like to see somebody out there doing this to my site

example.com/?eat-my-cat :-)

3:42 pm on Oct 18, 2006 (gmt 0)

10+ Year Member



What about a 410 status for these phantom pages?
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month