Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

How can I hide a website from all search engines while still having it

How can I hide my websites from all search engines and crawlers

         

prodmoistwood

12:54 pm on Jun 21, 2023 (gmt 0)



I need some tips on how to hide my website from search engines. I want it to be only accessible by typing the URL directly. I know that using "User-agent: * Disallow: /" in the robots.txt file won't do the trick all the time and that some search engines just ignore it.

Do you have any suggestions on how I can make it work?

By the way, the reason behind this is that I want to create a duplicate of my website on another domain without messing up its SEO. I can't protect the duplicate with a password because I still want people to be able to access it through the URL.

Thanks for your help!

not2easy

1:53 pm on Jun 21, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Yo could look into blocking all UAs, that info would be found in the Apache forum: [webmasterworld.com...]

RedBar

3:39 pm on Jun 21, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



because I still want people to be able to access it through the URL.

You want "informed" users to be able to see a duplicate of an established site but no one / nothing else?

lucy24

4:32 pm on Jun 21, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I know that using "User-agent: * Disallow: /" in the robots.txt file won't do the trick all the time and that some search engines just ignore it.
A robot that disregards the most basic of robots.txt directives is not a legitimate search engine, and you should feel free to physically block it by any means necessary.

Depending on where the domain is registered, search engines may or may not learn of its existence automatically. If it's ARIN (dot com and similar), robots will come calling within days, even if it's a test site with no links anywhere. I believe RIPE doesn't publish new registrations, so if you want to keep a low profile, you might put the new site in some other country. Even this, of course, won't help if some human posts a link in a venue that search engines do have access to--which probably includes gmail.

Kendo

5:29 am on Jun 22, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You can restrict site access to your own IP address if you have a fixed IP.

To avoid being indexed without making modifications, just be sure not to use any web browser based on Chromium.

tangor

7:37 am on Jun 22, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Falls into "have cake and eat it, too" category. Been trying to figure that out for years!

tangor

7:56 am on Jun 22, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Actually, just set up a local dev machine/environment and work from there, never facing the net.

topr8

11:40 am on Jun 22, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



you might be asking the wrong question? perhaps you say what you are actually trying to achieve - having a duplicate domain may not be the best solution?

...

however given you do want a duplicate domain, is there a possibility that using a canonical tag on every page of the duplicate, pointing to the original domain might achieve what you want.

alternatively ... perhaps using a password which you just write to the page, so that the user can just use the password to enter the duplicate site.

not2easy

12:31 pm on Jun 22, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If the purpose is migrating to a new domain, Google explains how to do that step by step: [developers.google.com...]

If the point is to have two duplicate domains with only one being crawled and indexed, then use canonical metatags pointing to the indexed version and noindex the duplicate and you don't need to block search engines.

If the point is to have two duplicate domains indexed, that doesn't happen unless one of them is a different language version or serving a different location and using
rel="alternate" hreflang="x"
attributes tags.

prodmoistwood

3:44 pm on Jun 22, 2023 (gmt 0)



@RedBar
You want "informed" users to be able to see a duplicate of an established site but no one / nothing else?


That's exactly what I want, yes.

How do I properly quote on here, so the quoted person gets notified?

prodmoistwood

3:55 pm on Jun 22, 2023 (gmt 0)



Hey guys, I really appreciate your help. I need the website to be accessible to anyone who has the link. Let me explain why I'm looking for this setup. My previous profile on Facebook Business Manager went down, and unfortunately, it had my website linked to it. Now, I have a new profile and BM, but I don't want to use the same domain to avoid suspicion and protect my new profile and BM. However, I still need to direct people to my website because it's a crucial part of my sales funnel. So, my plan is to create a duplicate website on a new domain exclusively for Facebook. I don't want search engines to crawl and index the duplicate site, as it would mess up the SEO of my original website. I initially thought it would be an easy task, but based on your comments, it seems like it might be impossible to achieve...

not2easy

5:02 pm on Jun 22, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



How do I properly quote on here, so the quoted person gets notified?
We typically use your @ format to specify the person your response is intended for. Everyone who has participated in the thread gets a notification, the @ person sees the response was intended for them.

You can just use a password protected entry page to solve the problem. That would allow humans who access the page to use the password you post - might want to use an image showing the password, The search engine bots will abide by your robots.txt and humans can enter with the password. The unwanted type of bots won't get through, with typical security. Since you're sharing the link people will have what they need. If you are concerned about search bots following the link, you could noindex all pages on the new duplicate pages. No effect on your existing site.

prodmoistwood

6:08 pm on Jun 22, 2023 (gmt 0)



@not2easy

Thanks.

I can't password protect because it needs to function like a normal landing page for prospecting. Do I need to separately non index all pages if I already added "User-agent: * Disallow: /" in the robots.txt file? Is there a way to make sure that the duplicate website which is not password protected won't affect my original site's SEO at all? Everyone telling me that no matter what you do, at the end of the day google does what google wants. Also, someone commented here "Falls into "have cake and eat it, too" category. Been trying to figure that out for years!", it makes it seem like it's impossible. To me it seemed like a very doable non complex task, but I guess I was wrong unfortunately.

not2easy

6:50 pm on Jun 22, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If it is strictly for a landing page, you don't need to try to index it and Google won't index two identical pages on different sites. Your "User-agent: * Disallow: /" would help keep Google from crawling but as others have mentioned, that is not guaranteed if they come across the link somewhere such as email or on other sites. If you are using a canonical metatag, that would deal with duplicate content and help google understand that you aren't trying to index both pages.

With recent Google changes it can be hard to say for certain how they would deal with it but if your existing site is indexed and performing well you don't want to try to index the duplicate site. If you do not specify noindex, they might think you're trying to index both. I would add noindex as 'insurance'.

prodmoistwood

9:29 pm on Jun 22, 2023 (gmt 0)



@not2easy

Yes, my real website is preforming well and I don't want to index the duplicate website at all. It will be strictly used for my Facebook. As of right now, I only have the robots.txt with the disallow command. Do you suggest using a canonical metatag as well? And maybe even adding a non index tag to each individual page of the duplicate?

not2easy

9:49 pm on Jun 22, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Yes, a canonical metatag explains why this page is the same as that other page so you can hopefully get a pass using the same content in two domains. Even with a disallow, Google find such pages and it is best to have the signals in place to avoid misunderstandings. I would definitely noindex the duplicates if you do not want to allow crawling. Not having that noindex leaves Google's default (index,follow) for them to try to figure out.

phranque

10:17 pm on Jun 22, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Even with a disallow, Google find such pages and it is best to have the signals in place to avoid misunderstandings. I would definitely noindex the duplicates if you do not want to allow crawling. Not having that noindex leaves Google's default (index,follow) for them to try to figure out.

if googlebot is disallowed from crawling a noindexed url, it will not see the associated noindex signal.
if G cannot crawl a url it has discovered, it will likely use the context of discovery to make determinations about the use of that url in the index or the link graph.

prodmoistwood

1:52 pm on Jun 23, 2023 (gmt 0)



@phranque

So do you think what I'm trying to achieve is possible or not?

phranque

8:46 pm on Jun 23, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



i'm saying it's possible as long as you provide the noindex signal and then don't exclude googlebot from crawling that URL.

prodmoistwood

11:23 am on Jun 25, 2023 (gmt 0)



@phranque

I hope it's not too much to ask, but can you please tell me exactly what steps I need to take in order to achieve this? It's a WordPress website. I'm not too advanced with the technical stuff so I want to make sure I don't mess this up. Thanks.

lucy24

6:37 pm on Jun 25, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There are two ways to noindex all pages on a site. One way is to include the line
<meta name = "robots" content = "noindex">
in the <head> of each individual page. (There may be a way to do this globally in WP; not2easy will know.)

The other way is to put the line
Header set X-Robots-Tag "noindex"
in the htaccess or <Directory> that applies only to that site. The line could either stand by itself, or be inside a <Files> or <FilesMatch> envelope if you want to constrain it to certain files or extensions--for example, on my sites I set it for all scripts.

not2easy

6:59 pm on Jun 25, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



In WP a user typically installs the common Yoast plugin to deal with that and it is one simple setting. Since we just found out it is a WP site, that had not come up either.

Nutterum

11:00 am on Jun 26, 2023 (gmt 0)

10+ Year Member Top Contributors Of The Month



@prodmoistwood - "safest" way is to use the server settings and just block all UA then go manually and add the now-a-days massive IP range of known entities trying to crawl or look at the site. I suspect that another way of doing it is to have it under username and pass and just block everything that way , with interested parties that need to view the copy basically log in to view it. This second solution is a bit uglier but does work 100% of the time, as nothing can get leaked on the web unless the users using it decide to leak.

phranque

6:01 pm on Jun 26, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I suspect that another way of doing it is to have it under username and pass and just block everything that way , with interested parties that need to view the copy basically log in to view it.

I can't password protect because it needs to function like a normal landing page for prospecting.

Nutterum

11:00 am on Jun 27, 2023 (gmt 0)

10+ Year Member Top Contributors Of The Month



In that case you can use split.io and offer the non-sensitive version to the bots and the users with the link the one you want. You can very easily set up the URL with some parameter to execute the trigger. Bots see a blank page, user see the full deal. Easy and harmless.

blend27

1:57 pm on Jul 4, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



One can not hide content from bots unless it is a password protected content.

Today's supersized exercise:

    Create a random subdomain for your domain, create a default document
    Visit it once(optional).
    Wait 24 hours.
    Check subdomain web logs to see how many none human visitors you get trying to scrape it.

As soon one creates DNS entry(A Record), it is all for grabs. There are bots that scan DNS just to get there and ITS contents first. and it is almost in real time.

Try it. 2 minute supersized exercise.

Nutterum

2:19 pm on Aug 8, 2023 (gmt 0)

10+ Year Member Top Contributors Of The Month



@blend27 - that is very true. Hence solutions like split.io or similar can provide what OP needs. It's one thing to crawl your page, its another to be readily visible on the internet.

Peter_S

1:28 pm on Aug 9, 2023 (gmt 0)

5+ Year Member Top Contributors Of The Month



I didn't read all the messages above, but from the question, I would say , just don't have DNS entries, and use the local host file to map the domain name with the IP.