Forum Moderators: Robert Charlton & goodroi
I know that using "User-agent: * Disallow: /" in the robots.txt file won't do the trick all the time and that some search engines just ignore it.A robot that disregards the most basic of robots.txt directives is not a legitimate search engine, and you should feel free to physically block it by any means necessary.
rel="alternate" hreflang="x" attributes tags.
You want "informed" users to be able to see a duplicate of an established site but no one / nothing else?
How do I properly quote on here, so the quoted person gets notified?We typically use your @ format to specify the person your response is intended for. Everyone who has participated in the thread gets a notification, the @ person sees the response was intended for them.
Even with a disallow, Google find such pages and it is best to have the signals in place to avoid misunderstandings. I would definitely noindex the duplicates if you do not want to allow crawling. Not having that noindex leaves Google's default (index,follow) for them to try to figure out.
<meta name = "robots" content = "noindex">
in the <head> of each individual page. (There may be a way to do this globally in WP; not2easy will know.) Header set X-Robots-Tag "noindex"
in the htaccess or <Directory> that applies only to that site. The line could either stand by itself, or be inside a <Files> or <FilesMatch> envelope if you want to constrain it to certain files or extensions--for example, on my sites I set it for all scripts.