Forum Moderators: open

Message Too Old, No Replies

force-type faux directories, robots.txt and Google

inadvertantly dropped

         

saoi_jp

3:22 pm on Jan 28, 2003 (gmt 0)

10+ Year Member



I was using false directories by forcing extensionless files to be interpreted as php, so that in:

www.widgets.com/goods/

"goods" is actually a script that checked /goods/x to see what x was, to show that content.

The site was listed in Google, no problems.

However, the site was trimmed back, so that

www.widgets.com/goods/sectionA/

was discontinued. Robots.txt was set up to disallow all spiders from /goods/sectionA/ (as opposed to /goods/sectionB/ etc.)

The reason for disallowing: Google knew about /goods/sectionA/ and when it revisited, the nature of the setup would not generate any 404. /goods/waligro would still create a page (albeit one that said we have no info on Waligro, whatever that may be). The thinking is, ban the spiders from /sectionA/ and then it will be dropped.

However, apparently, we now see that "/goods/ANYTHING/" is no longer listed in google. (A large number of other pages are still there, and new sections were added. /goods/ is gone.)

So if Google finds a ban on /goods/sectionA/ does it then directory query /goods/ to see which other files are there? My thinking now is, since /goods/ is not actually a directory, there'd be no additional files, and the intent of the ban would seem to be the entire /goods script. Is this plausible?

Also, how would one go about indicating to Google that /goods/sectionA is gone? Would sending a 404 header do it?

hakre

10:38 am on Jan 29, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



for a discussion more deeply about this, it would be good, you send in the parts of your robots.txt which is connected to this issue.

and, you'll find a lot of infos about robots.txt and page gone / 404 / 301 stuff in the forum around. try the site search! ;)

saoi_jp

12:56 pm on Jan 29, 2003 (gmt 0)

10+ Year Member



hakre wrote:
for a discussion more deeply about this, it would be good, you send in the parts of your robots.txt which is connected to this issue.

User-agent: *
Disallow: /page/section/

where "page" was actually the script page, not an actual directory, and "section" is the category that I wanted eliminated. A page url would look like /page/section/subsection/detail. A page url for categories not being eliminated is /page/anotherSection/etc. "page" is the same script. And what happened was, /page/ is still in google but everything else from /page/* is not there. (And other areas not mentioned in robots.txt such as /differentPage/sectionETC remain included in Google.)


and, you'll find a lot of infos about robots.txt and page gone / 404 / 301 stuff in the forum around. try the site search!

I haven't been able to find a related situation yet. Perhaps I don't know the right words for this. The closest I found was here [webmasterworld.com], but nothing that could explain what happened. On the bright side, it's a partial success (what I wanted elminated from Google was elminated), and the rest of it (the inadvertant eliminations) will probably be restored in a month or so, as I've removed that section from the robots.txt file and the site overall has pretty good PR and coverage in Google.