Use & or & amp; in robots.txt

The Problem...

A site has some URLs with session IDs indexed and those need to be removed from Google SERPs. They are Duplicate Content. Timescale unimportant - no need to get the removal tool out for this.

The site (hopefully) no longer gives session IDs out to anyone who is not logged in. So, no new URLs with sessions are being generated as far as anyone can tell.

Bots will indefinitely continue to request URLs that they already know about, including the rogue URLs that have session IDs in them.

Normally it would be simple to exclude these using robots.txt:

User-agent: Googlebot
Disallow: *s=

On this occasion there is another parameter that ends in an "s" in many of the URLs that DO need to be indexed.

So, what is needed is to:

Disallow: *&s=

This will ensure that only URLs including sessions are excluded.

The Question...

The problem is this. In the Valid HTML 4.01 source code, the URLs have the ampersand encoded as & each time, as it should be.

So, which format does the robots.txt file actually need?

Disallow: *&s=
OR
Disallow: *&s=

and why?

Use & or & amp; in robots.txt

Which to use?

g1smd

encyclo

g1smd

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week