Forum Moderators: phranque

Message Too Old, No Replies

HTTP header field / value delimiter

RFC reads it should be column+space but ...

         

Yidaki

6:21 pm on Feb 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My self made robot had big problems today to crawl one special site. After digging arount a lot i found out that the http header field / value pairs of the server response were not seperated by colon+space but only by colon.

Example:
...
Connection:Keep-Alive
Content-type:text/html
...

instead of
...
Connection: Keep-Alive
Content-type: text/html
...

Since i programmed my robot to check the content-type by looking for "content-type: text/html" (with space) and ignore all others, the site couldn't get crawled.

I modified the coding and now it works. But i ask myself if it's normal for a se robot to also accept / read / index sites that even return wrong / not rfc conform http headers? Any insights?

<edited>cloumn -> colon, thanks andreas ;)</edited>

[edited by: Yidaki at 6:39 pm (utc) on Feb. 6, 2003]

andreasfriedrich

6:33 pm on Feb 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hehe, you´re probably referring to a colon, not a column. That had be baffled for a second.

Each header field consists of a name followed by a colon (":") and the field value. [...] The field value MAY be preceded by any amount of LWS, though a single SP is preferred.

[faqs.org...] - 4.2 Message Headers

Andreas

Yidaki

6:43 pm on Feb 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



thanks andreas! Looks like it's ok to also just use a colon instead of colon+space ...

... ts, ts, i thought i read all available rfc specs ... :)

Again, thanks for pointing me to the right spec (and spelling)!

andreasfriedrich

6:57 pm on Feb 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You´re welcome Yidaki.

I have links to the most commonly used RFCs and other docs included on my WebmasterWorld page with the custom code feature. That way they are just a mouse click away and answering questions is a breeze.

Andreas