Forum Moderators: phranque
They spidered my cgi-bin perl script that displays a database driven directory AND they spidered a later developed PHP version (not in the cgi-bin). The PHP version is 1000 times better in terms of performance, and the way it presents the URL, and META-TAGS for each page. I developed it myself.
Bottom line, after 3 months of incredible traffic etc etc, on the PHP directory, all of a sudden all of my PHP related pages disappeared from Google. I couldn't figure out why as it simply is no longer possible to find the old perl script from any link on my existing site, hence I mistakenly thought no duplicate content. However, after more research, I discovered that in fact the original perl directory script pages remain in google. Sure enough, the XX,XXX number of pages of the perl version exactly matched the XX,XXX number of pages in the PHP version just before the PHP version pages got pulled from the Google index.
So, now what? Do I ask that the perl version pages get removed, risking that the PHP version pages may never get spidered again?
Can I simply do a redirect of the each perl version page to its corressponding PHP version? If so, how do I do it to avoid any of the re-direct pitfalls mentioned here and other places?
Please help! This has really put me in a bind, as I went from a rather hefty monthly paycheck (almost what I make at work) to essentially zero.
Thanks in advance!
Welcome to WebmasterWorld!
I'd recommend that you remove the PERL script from your server, or rename it and mark its file permissions as inaccessible.
Then redirect the PERL-driven URLs to the corresponding PHP-driven URLs using 301-Moved Permanently redirects.
The exact implementation details will depend on your server type, and the URL and directory structure of your site.
There are known problems with some search engines' handling of redirects, but we can only do what we can do.
Jim
Thanks for advice, and questions.
Yes, I copied the robots.txt format of Google itself, with my custom changes, so the cgi script and resulting directory pages should never have been spidered to begin with. In fact that is why I went the PHP route: to create the appearance of static pages from a dynamic site so that my site would be more search engine friendly. It never occured to me that Google would consider my site as having two duplicate directories as the CGI generated directory pages were never meant to be spidered and this was correctly indicated in the robots.txt file. As stated earlier, when the number of PHP generated directory pages reached the number of CGI generated directory pages, my newer PHP generated pages were dropped.
Looks like I'll do the 301 redirect....can't say for sure though...I have XX,XXX pages from the cgi script spidered. It is scary to say the least to let them go to be redirected to my PHP version, when it is the PHP version that got dropped.
I might just work on modifying the cgi version to mimic the PHP version. This is a shame as the cgi version is sloppy compared to the PHP version.
Yes, I copied the robots.txt format of Google itself.
I'm not getting at you, just trying to define this one. I ran my own robots.txt through the Webmaster World validator once and found it wasn't valid! It was just a slight tweak I had to make.
I'd also look at the robots.txt forum here. This thread shows how a missing slash let Google in:
[webmasterworld.com...]