Andylew - 9:19 am on Jun 27, 2011 (gmt 0)
A follow up:
The site was 404'd and a reconsideration request put in, the hypothesis was that after the usual 5 days google would reply with 'site still violates' if it uses pre-reconsideration request data however no message was received. After seven days and no message the site put live again. 5 days later the message came in, exatcly the same time of night and after the exact same period of time (with the site live).
This would suggest that if a site is 404'd (or possibly fixed) either the timeout is longer than 5 days before google will consider it as clean and reply or if it is 404'd they assume foul play/website problems.
The best thing this indicates is that 'something' is happening when a reconsideration request is put in, this seems to point to google not using pre re-consideration request data as a factor.
On this reconsideration a simple 'please reconsider this site' was entered into the text box. On previous attempts text has varied from a single paragraph to full a4 pages. This seems to have no effect on time periods involved in reconsideration or the outcome. I can only draw the conclusion that this is read after the reconsideration process to help develop future algos. The length of text (or lack of) does not seem to influence the process.
All this is pointing more and more to the process being automated.
Because 'something' seems to be happening when a reconsideration request is put in then there will be a log of it in the apache logs. This site is large enough and the crawl rate at such that it should be possible to find a specific page which is crawled on a significant basis above others between the times when the reconsideration requests are put in. ********
Further to this because I belive this is automated, would google be daft enough to use the same ip?....
Further to this with the site 404'd would the 'normal' spiders have gone and left a reconsideration spider?...
I think I will now have enough data points after several reconsideration requests to acheive answers to the above.
To further develop the ideas around 404ing a site I have taken another site (part of the same group but a different country, received penalty on the same day) put in a reconsideration against a completely 404'd site. I will leave this site 404'd until I receive a reply.
To do this I will be racking up a new server, taking the apache logs and spliting each line by replacing different characters with pipes to split it into, date, ip, request.
Importing it into mysql.
Taking each line then comparing it to ever other line in the database and doing a count.
There are several hundred thousand lines to do this on so it is going to take several days to complete - anyone with any bright ideas on speeding up this process?