Welcome to WebmasterWorld Guest from 54.167.216.93

Forum Moderators: phranque

Message Too Old, No Replies

cPanel login pages indexed by Google

How did Google find them?

     
11:21 am on May 24, 2014 (gmt 0)

WebmasterWorld Senior Member penders is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



Has anyone experienced cPanel login pages being indexed by Google? (Possibly a link-only result in the SERPs since they are generally blocked by robots.txt.)

If so, how do you think Google found these files? I'm struggling to believe that anything but a direct link to these files resulted in Google finding them.

This is primarily in relation to shared servers where the cPanel URL is often cpanel.example.com and always the same protocol eg. example.com:2082 - these are easily found by a bit of trial and error - but does Google use "trial and error"?

Is there any way that Google could have found these pages, other than by stumbling across a direct link?
6:50 pm on Jun 10, 2014 (gmt 0)

5+ Year Member



The robots.txt file is not generally accessible at the root url for those ports.

In any case the robots.txt file ONLY specifies what is nto to be crawled, not what must nto appear in the index. So even if disallowed in the robots.txt file (assuming a robots.txt file is available there), then just the urls will appear in the index.

What is needed is for a robots noindex meta tag to be placed on those pages, instead of getting them disallowed in robots.txt.
8:28 pm on Jun 10, 2014 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



the method of URL discovery is irrelevant.
the only thing that really matters is your response to the request.
9:14 pm on Jun 10, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



does Google use "trial and error"

In some areas, sure. For example: If you've got a page with URL ending in a slash, like
example.com/directory/
then search engines will occasionally ask for both
example.com/directory/index.html
and
example.com/directory
(This is one of several search-engine behaviors that I never noticed until I moved sites and therefore paid unusually close attention to requests.)

And you know all those robots that come by asking for the top 87 permutations of "wp-admin" on the off chance that they might get in? It doesn't seem likely that a Ukrainian robot would know something that Google doesn't.

Has anyone experienced cPanel login pages being indexed by Google? (Possibly a link-only result in the SERPs since they are generally blocked by robots.txt.)

Was this a hypothetical question, or have you been seeing it yourself? In order for something to appear in the index, there has to be some concrete reason for the search engine to believe the page exists: either because they're seen it, or because it's listed in your sitemap, or because someone has linked to it. I can't think of a fourth possibility.
9:31 pm on Jun 10, 2014 (gmt 0)

WebmasterWorld Senior Member penders is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



The robots.txt file is not generally accessible at the root url for those ports.


There is a robots.txt served (with a valid 200 response) from this location that contains a single Disallow: / directive. It's obviously not the same robots.txt file used by the main site, but it is a robots.txt file.

What is needed is for a robots noindex meta tag to be placed on those pages, instead of getting them disallowed in robots.txt.


Exactly. I can only think that cPanel's decision to block with robots.txt is to save bandwidth by preventing unnecessary crawling.

However, it is my understanding that Google will only index these pages (or rather allow the pages to appear in the SERPS - link only style) if it has found that other pages are linking to them. Which, to the best of my knowledge they aren't, which is really the point of my question - how did Google find these pages?

I have seen far too many cPanel login pages indexed in this way to be just a one-off. So, they are being found somehow.

Unfortunately, on a shared we don't have access to this area to do anything about it.

the only thing that really matters is your response to the request.


The pages return a 401 Access Denied.
9:53 pm on Jun 10, 2014 (gmt 0)

WebmasterWorld Senior Member penders is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



It doesn't seem likely that a Ukrainian robot would know something that Google doesn't.


Maybe the Ukrainian bot is exposing the URLs for Google to find?!

Was this a hypothetical question, or have you been seeing it yourself?


I've been seeing this quite a lot. And it was discussed in the cPanel forums [forums.cpanel.net] some years ago, with a stated fix sometime later - which doesn't seem to have happened? (The "fix" was to remove robots.txt and instead include a noindex robots meta tag or X-Robots-Tag HTTP response header.)

Admittedly, if there are many pages indexed on the site then these pages can be hard to find (they are, after all, link-only results). You might need to "repeat the search with the omitted results included.". However, this hit me in the face recently when I purposefully deindexed a site. Once the main site pages had dropped from the index there were still a page of results in the SERPs for the cPanel subdomain, 2082 port address and associated URLs - which I have no control over!?
12:54 am on Jun 11, 2014 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Has anyone experienced cPanel login pages being indexed by Google? (Possibly a link-only result in the SERPs since they are generally blocked by robots.txt.)

The pages return a 401 Access Denied.


if the crawler respects robots.txt, it won't see the 401 response and the url could get indexed without being crawled.
once the crawler gets the 401 response, the url will be dropped from the index.
4:22 pm on Jun 11, 2014 (gmt 0)

5+ Year Member



something really getting out of my experience. :)
 

Featured Threads

Hot Threads This Week

Hot Threads This Month