I try to read a website and find some info i need, but when i read this website, sometimes parts of the text get messy. For example <span> could become <sp 2000 an> or informatio could become info 2000 rmation .
Because of these additional numbers 2000 my script fails.
Please help me.
PS: i have tried to convert the source code to utf-8,but it is a nonsense as the page's encoding is utf-8.