Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Removing Pages with session id's

Googlebot and Safety

         

Vienix

9:12 am on Oct 2, 2006 (gmt 0)

10+ Year Member



So I have a lot of page cached with a sessionid appended, allthough I adjusted the session manager not to append sessions when Googlebot comes by.

I am thinking about using the robots.txt and the Google url removal tool to get rid of those pages.

Googlebot supports wildcards so:

User-agent: Googlebot
Disallow: /pictures/index.php?sessionid*

Should work right?

Or would Google also remove the index.php?

Haven't go the faintest idea....

regards,

Bert

g1smd

12:33 pm on Oct 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The disallow removes URLs that start with what you state, and include at least all that you state.

Disallow: /abc gets rid of /abc and /abcaaa and /abcdef but cannot touch /ab or /abz etc.

Vienix

1:01 pm on Oct 2, 2006 (gmt 0)

10+ Year Member



I think in case of mr Googlebot only with the wildcard...

Anyway, I have added :

Disallow: /pictures/index.php?sessionid*

And did a removal request....

Takes 5 days....

If Google also removes index.php without the sessionid I will move it all to a new directory and start from scratch.

This site is a mess anyway after proxies etc :)

g1smd

1:13 pm on Oct 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi. Did you read my answer?

Vienix

2:29 am on Oct 3, 2006 (gmt 0)

10+ Year Member



Ok, that worked....

The index.php with sessionid has been dropped by Google...

with this line:

Disallow: /pictures/index.php?sessionid*

I then added this to the robots.txt:

Disallow: /pictures/*sessionid*

and tested it with the robots.txt analysis tool, and entered an url like this:

[mywidgets.com...]

To test, and the url was correctly filtered out..., No error message...

Ha, that would be easy for the removal tool, but unfortunately this message:

URLs cannot have wild cards in them (e.g. "*"). The following line contains a wild card:
DISALLOW /pictures/*sessionid*

So it seems the "ordinary" bot understands, but the removal tool doesn't....

ansible

5:02 am on Oct 3, 2006 (gmt 0)

10+ Year Member



For reference, please see this older thread on the same subject: [webmasterworld.com...]

g1smd

10:29 pm on Oct 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You never need the star at the END. A right-hand wildcard is ALWAYS assumed as "URLs that include at LEAST this".

The wildcard stuff should go in the User-agent: Googlebot section, and that section should ALSO include all of the stuff that was in the User-agent: * section too.