Forum Moderators: phranque

Message Too Old, No Replies

One little expression felled Cloudfare

The power - and hazard - that is RegEx

         

iamlost

7:10 pm on Jul 12, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A little bitty RegEx let me down, spoiled my act as a CDN
I had it made up to solve a concern, but a little bitty RegEx let me down.
---with apologies to Burl Ives


Here's the culprit that overloaded every Cloudfare HTTP/S server core's CPUs and 502ed their global network...

(?:(?:\"|'|\]|\}|\\|\d|
(?:nan|infinity|true|false|null|undefined|symbol|math)|
\`|\-|\+)+[)]*;?((?:\s|-|~|!|{}|\|\||\+)*.*(?:.*=.*)))

For timeline and explanation: Details of the Cloudflare outage on July 2, 2019 [blog.cloudflare.com] by John Graham-Cumming, CTO Cloudfare.

lucy24

9:14 pm on Jul 12, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



.*=.* eh. Yeah, that'll do it. (Don't have time to read & enjoy the details now, but will happily come back to it.)

Not long ago I came to grief trying to do a multi-file replace in TextWrangler with the intention of changing assorted OCR garbage to a string along the lines of <format>blahblah &c.</format> (meaning literal “et cetera” in this elderly text). The more I tried to fix it, the worse it got.

Finally in despair I consulted the manual and learned that in TextWrangler, the & character has special meaning in a substitution string. Sigh.

phranque

10:24 pm on Jul 12, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



reminds me of many discussions in the apache forum about the usage of the ambiguous, greedy and promiscuous .* [webmasterworld.com]

lucy24

1:06 am on Jul 13, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh, criminy, phranque, of all possible threads did you have to link to the most annoying one? ;)

tangor

1:52 am on Jul 13, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hence my constant testing of anything I do with RegEx. What I "know" would fit in a gnat's thimble!

iamlost

3:28 am on Jul 13, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What I "know" would fit in a gnat's thimble!

You're singing my song! :)
In all my years of programming and database work the one tool I've never truly got my mind around is RegEx. So when I do need to use it it takes sooo much longer just to think I have it sort of maybe close to correct and then I spend sooo long testing and tweaking while mumbling: it's worth it, yes it's worth it my precious...
Note: for future reference each equation (in staging server code not production) has detailed explanation documentation of what each little bit (as well as whole thing) actually (is supposed to do) does... has saved me so much agony and time over the years.

I am totally in awe of people (bows to lucy24) who say offhand things such as enjoy the details and happily come back to it about RegEx.

tangor

3:38 am on Jul 13, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



lucy24, wilderness, many many others ... and thanks for the help over the years!

not2easy

4:37 am on Jul 13, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The problem I have with RegEx is a basic one - I love it for mass editing of files in either TextWrangler or BBEdit but I understand that there are various flavors (engines) for RegEx and that makes me hesitant to play with it freely - I know I have limitations. There are Perl compatible, PHP, Python and Javascript engines (and others). My understanding of the differences between RegEx engines would fit in a gnat's thimble (thank you tangor) and I need to double check or triple check the Apache site before I'll edit htaccess entries. A pitfall of picking up bits and pieces of information here and there over the years. I do not even know what engine I'm working with when using RegEx to mass edit my files but it is comfortable, reliable and lightning fast.

lucy24

4:51 am on Jul 13, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For many years after I first met Regular Expressions--this would be around 2004, 2005, in the context of text editing only--I was deathly afraid of them. I think it took me about a year to work up the courage to use RegEx for replace rather than just find. And I maintain that this is a healthy and appropriate fear; you can seriously injure yourself with a carelessly applied Regular Expression.

tangor

5:24 am on Jul 13, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



you can seriously injure yourself with a carelessly applied Regular Expression.


As many instant 500s have proved to me over the years! (I ALWAYS keep the last good .htaccess handy!)

Whew!

Never used it for text editing ... have other tools I know much better that do the same thing.

graeme_p

11:04 am on Jul 13, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I cannot remember who said this:

some people when confronted with a problem think, "I will use a regular expression",
now they have two problems

not2easy

12:05 pm on Jul 13, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



you can seriously injure yourself with a carelessly applied Regular Expression.
Been there, done that. But closing the file without saving it gives me a redo. I have since then made a habit of manually running find/replace for at least several (depending on outlier parameters) before unleashing the "Find All". It certainly can be another case of be careful what you wish for.

lucy24

4:30 pm on Jul 13, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But closing the file without saving it gives me a redo.
Infinite Undo is your very, very, very good friend. Remember when it didn’t exist? :) Now, if only I could get the text editor to understand “please undo the third-to-last global replace while retaining the results of the ones after that”.

In SubEthaEdit you can do supervised global replaces--go through the whole document, but check each one as we go along, so I can skip the false positives--but I’ve yet to figure out how (or whether it’s even possible) to do it in TextWrangler.

not2easy

5:06 pm on Jul 13, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



There's no reason you can't do them one at a time in TextWrangler with "Find Next" or for a large document choose "Find All" to have a list of every line where "Search" occurs. I always try changes manually that way before I give it free run to replace with GREP. Another way would be to keep 2 copies (before/after) and then compare both files under search >compare to highlight each before/after change.

Depending on the project you could always remove all lines using Text > Process Lines Containing > for either a new file or clipboard copy to edit. Doesn't work for everything, no good for things that need to be in some specific order unless you're going back with Search > Compare to replace each instance.

lucy24

7:07 pm on Jul 13, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"Find Next" or for a large document choose "Find All"
I never use TW for individual documents; my default text editor is an older version of SubEthaEdit. (Newer versions changed the HTML Preview format to a style I don't like.) After much poring over TW menus, I did find a “Replace & Find Next”

:: detour to System Preferences to set a keyboard shortcut to match the one in SEE ::

but it’s only available once you’re inside an individual document in the Find All list; it doesn’t move along to the next document. Is there a Highlight Changes / Find Next Change option? I can't find it.

not2easy

4:39 am on Jul 14, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The "Find All" function is under Multi-File Search. TW's Find All pops open a results page showing each example by line number and listed by file name. They can be either edited (manually for limited edits) in that list/window or rerun the command with Replace All if all lines show the example you wanted to edit. You'd need to select the folders you want to edit and you can filter for filename extension. It would depend on the type of editing whether that would be useful or not.

ETA- Rereading I see that you might have been talking about running the replace after Find All in that list window. I only use Find All to get a list. Sometimes - such as when only certain listed files should be edited - I move the list results that I want edited to a temp folder and then run the GREP.

lucy24

7:14 am on Jul 14, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



TW's Find All
There just doesn't seem to be any middle ground between unsupervised global replace, and working on one file at a time--whether that one file is individually open in a window of its own, or selected from the list in the Find All window doesn't really matter. And if there's a way to refresh the Find All window (show how many matches are left after I've done whatever I wanted to do manually), I've yet to find it :(

File type has never been a problem; I say Text Files Only and that's what it gives me. I don't know what extensions TW actually looks at besides .txt and .log; I just know it doesn't do anything undesirable. (It is VERY alarming when I inadvertently open an image file in SubEthaEdit, or a text file in GraphicConverter, both of which are technically possible!)

Most of the time I only use the multi-file search to find things in saved logs, where there's the option of including compressed files or not (older logs vs. only the recent ones). But right now I happen to be working on a 24-volume publication where, if I find something to replace in one file, it is very likely to need replacing in the other 23 as well. For example, FineReader doesn't approve of æ, so that's a global unsupervised ae >> æ. And it doesn't recognize long s, so that was eleven volumes of things like \bftr >> str because even though its Latin is quite good, there are obviously limits.

But we digress.