Andylew - 9:35 am on Jul 1, 2011 (gmt 0)
Thanks for the reply Pierre, as always with google there are a lot of words but very little content.
To answer my earlier post and follow up with new findings:
Once more my hypothesis is incorrect, although something is happening when a reconsideration request is put in it is not a result of new activity after the reconsideration. Raw logs have been downloaded, I said this was a few hundred thousand lines, it actually turned out to be over 50 million lines spanning 4 reconsideration requests and I can say categoricaly there is no significant change in crawling, no new ips, no page looked at more than normal during that time or any other factor that could suggest different activity or a specific area, date, ip request etc etc which is being looked at after a reconsideration request has been put in.
So what can be concluded from that? Because no new/specific data (collected after a reconsideration request) is being used to determine whether a site is now fit to have a penalty lifted and reconsideration requests are unlikely read before reconsideration takes place there must be a continuous monitoring of the site and probability score for resolution. Google has already confirmed penalties are time limited, I wonder whether, through what I now belive 100% in this situation is an automated system, standard googlebot data is used to determine a score, this score is then constantly updated. Once the score reaches a certain level, a penalty resolved level, the site is determined to be 'fixed' then it goes into the timeout phase, 30 days, 60 days whatever where if no further problems the penalty is removed (or it goes to a human evaluator for final check). Once it is in this timeout phase a reconsideration request would just speed up the return to standard listings (or human evaluation), putting it to the front of the queue.
The 404 test, the exception to the rule could be if googlebot detects a significant change in the site (like it disappears!), it would make sense that a complete re-evaluation of the site takes place before it is then re-scored. This could be a good indicator of new ownership etc. Tinkering around the edges like many do is unlikely to trigger this complete re-evaluation.
With no clues as to an area on the site to concentrate on, with no change in google activity after a reconsideration is entered. The only current area to focus on is this notion that a 'significant change' influnces the reconsideration request in some way.
With one site now 404'd and awaiting a response, im going to do a complete redesign of the other site. New graphics, css, layout etc etc. The content and url structure will stay the same. I wonder whether this will trigger the 'significant change' response characterised by an increased wait for a response to reconsideration.
For all those coming across this thread in future and wanting to evaluate their raw logs the following provides a atarting point,
Download your raw logs then seperate out the different elements by replacing characters with pipes or similar then import into mysql.
create a new column, adder in my case with default value 1.
Analyse the data using queries similar to:
CREATE TABLE haystack_needle AS SELECT `ip`, `date`,`source`,`request` , SUM( `adder` )
GROUP BY `request`
ORDER BY SUM( `adder` ) DESC
this will give you a new table with the adder column equal to the number of instances the group by element apears, in the above example 'request', this could be changed to 'ip' or something else. This can then be used to the drill down further.