homepage Welcome to WebmasterWorld Guest from 54.242.126.126
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 44 message thread spans 2 pages: < < 44 ( 1 [2]     
What about your Parked domains.
And bots?
Angonasec

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4647405 posted 9:16 am on Feb 21, 2014 (gmt 0)

I've just moved our domains to a new host, with the consequent, character-building-bug-trawling-all-nighters.

Now that the dust is settling, I have the previously parked domains on my host account (sharing the main IP), so I'm more in control of what each parked page's construction.

Fellow log-watchers, with a healthy attitude to bot-blocking:

Should I let the bot-tide in, or put the pristine domains behind a password?

What is +your+ preferred set-up for previously unused, parked domains, especially in regard to bots?

 

Angonasec

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4647405 posted 1:25 pm on Mar 8, 2014 (gmt 0)

Interesting drift...

Let's continue with browsers...

It's something I've not thought about previously.

We've moved our main site to a new dedicated IP.
That IP is now shared with several parked domains, which I've hidden behind a password today.

When I use a browser numeric IP to access our dedicated IP

I get the main site homepage loading just as promptly as if I used the domain name.

Would you Ladies and Gents recommend using modrewrite to insist that numeric IP access attempts are re-routed to the domain name. (I'm thinking to prevent duplicate listings in SEs.)

Or just leave it be?

Comments please.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4647405 posted 1:40 pm on Mar 8, 2014 (gmt 0)

Generic response: When the same content can be accessed in more than one way, redirect to force everyone into the same path.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4647405 posted 8:11 pm on Mar 8, 2014 (gmt 0)

Lucy - on my IIS server I get Error 400 if I use wget or a link-checker, but that error is defined by the IIS web server. As I recall, it was something I turned on during setup.

I know some web sites can be addressed by IP either instead of or in addition to a domain name. I've done that myself long ago, I think on a linux server. As with IIS, I would expect it to be an option.

Angonasec - I would reject any attempt to access a site by IP - at least, in normal situations. If the IP gets into an SE you may well lose the index when the IP changes - that's obviously what domain names prevent. IP addressing is also open to abuse and cannot cope with several sites sharing a single IP, which is the usual hosting situation.

Angonasec

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4647405 posted 12:48 am on Mar 9, 2014 (gmt 0)

Thank you.

The "generic response": If applied in my case (8 "hosts" sharing my dedicated IP) would wreak havoc when I am ready to launch one of the developed domains with separate content.

dstiles: Useful comments. Reject with a simple 403 or something more sophisticated?

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4647405 posted 8:40 pm on Mar 9, 2014 (gmt 0)

Sorry, I made a mistake in my last posting. The 400 is actually generated by my own home-grown (IIS/ASP) security system. I wondered how it managed to get into my security logs! :(

=====
400 Bad Request
The request could not be understood by the server due to malformed syntax. The client SHOULD NOT repeat the request without modifications.
=====

403 could also be used, I think, BUT with 403 a reason should be returned; I don't think one is necessary for 400 and I think that code is more suitable in this situation since multi-host servers obviously cannot understand the request.

Angonasec

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4647405 posted 9:16 am on Mar 10, 2014 (gmt 0)

Yes, I too would prefer a 400 for dealing with numeric IP intruders, but my custom 403 is an empty file. Though the server header does give a reason for the rejected request, and I've yet to find a way to stop that "leakage".

My parked domains are all password protected now, and all numeric IP urls are redirected to the main site homepage.

So I may now have the set-up secure.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4647405 posted 7:53 pm on Mar 10, 2014 (gmt 0)

On IIS (Windows) the error messages all have a default text, changeable globally or per site through the IIS manager. I assume there is a similar method on linux and/or apache. Numeric IPs should not show the home page of anything, just the error code and rejection message (if any).

Angonasec

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4647405 posted 10:05 pm on Mar 10, 2014 (gmt 0)

"Numeric IPs should not show the home page of anything, just the error code and rejection message (if any)."

I asked here how to disallow access on Apache by a site's numeric IP:
[webmasterworld.com...]

The resulting mod_rewrite code does that, and transforms the browser address numeric to the desired domain name.

Are you saying that transformation is "risky" too?

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4647405 posted 7:27 pm on Mar 11, 2014 (gmt 0)

NEVER allow an IP-only access to reach anything other than (eg) a 400 "page" with zero content. At least, that's my approach.

If you return something reasonable (eg a home page) you are responding as the bot wants, exposing at least some kind of site information. IP hits can only be generated deliberately and almost always by someone with malevolent intentions. Very few people can determine what IP is allocated to a web site and most of them have technical knowledge of some kind (eg to interpret DNS results or back-trace a web page).

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4647405 posted 9:38 pm on Mar 11, 2014 (gmt 0)

Are you saying that transformation is "risky" too?

I think he's only saying that the request itself shouldn't be allowed to go anywhere. A redirect is fine because that's a whole fresh request.

Angonasec

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4647405 posted 2:23 pm on Mar 12, 2014 (gmt 0)

"A redirect is fine because that's a whole fresh request."

I wonder Lucy, dstiles has a point... the person, or bot, fishing with the IP, would surely perceive the resulting redirect destination, and thereby glean some data.

But having had the WebmasterWorld mod-rewrite A-Team's assistance, I'm loathe to alter the resplendent htaccess code they produced for me.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4647405 posted 5:34 pm on Mar 12, 2014 (gmt 0)

I would have just left the unused domains in parked status and avoided all this hassle (and all that code in your htaccess) but then I like to keep things simple.

Angonasec

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4647405 posted 5:44 am on Mar 13, 2014 (gmt 0)

KeyP: Yes, that is what I did if you scroll through the Apache thread, but I have the numeric dedicated IP redirect code in situ for the live primary domain.

Opinions welcomed :)

Angonasec

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4647405 posted 1:15 pm on Mar 13, 2014 (gmt 0)

Thanks to Lucy's able-assistance we now have various ways of either blocking or redirecting access by a site's numeric IP on Apache.

Whichever conclusion the user adopts.

Still canvassing informed opinions, whether it's safer to block, or redirect to a proper url...

This 44 message thread spans 2 pages: < < 44 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved