Print Story Hacky hacky hack hack
Diary
By hulver (Fri Aug 29, 2008 at 05:29:25 AM EST) (all tags)
Perl Regexp help required within.


Yesterday I found a bug in the scoop html parsing code. It's a strange bug, and could possibly be exploited to post some naughty html code, so I'm interested in fixing it.

Plus it's a pain in the bum.

If you've got a HTML tag like this

<img alt="" src="/images/a.jpg" />

The html validation will fail. The parser falls over at empty attributes.

        while ($rest =~ /\s*(?:(\S+?)\s*=\s*(?:"+([^"]+)"+|'+([^']+)'+|([^'"\s]+)\S*)|(\S+)(?!=))\s*/g) {
                my $k = $1 || $5; # because of the way parenthesis are used in the
                my $v = $2 || $3 || $4; # regexp, these can be in a couple different
                $args{lc $k} = $v;  # places. it might be fixable, but it's no big deal
        }

Now somewhere in that horrible regexp, there is a + that needs changing to a *, but I don't fancy trying to fix it on the live server and I won't have time to test it on a test site until sometime over the weekend.

Can anybody spot the problem straight away?

< Cynicism | Just chillin'. >
Hacky hacky hack hack | 5 comments (5 topical, 0 hidden) | Trackback
How about... by Vulch (4.00 / 1) #1 Fri Aug 29, 2008 at 05:53:08 AM EST
There's a [^"]+ and a [^']+ which I have my suspicions about, I'd expect them to both be * instead of +.

I'm not sure about the "+ and '+ before them either, why would you have more than one opening inverted thingy?

Mind you, it is still pre-first tea at the moment...




Ah yes by hulver (4.00 / 1) #2 Fri Aug 29, 2008 at 05:55:17 AM EST
That looks like it might be the culprit. Thanks.

--
smart, pretty, sane. pick two - georgeha
[ Parent ]

hmm by herbert (4.00 / 1) #3 Fri Aug 29, 2008 at 06:07:26 AM EST

/\s*(?:(\S+?)\s*=\s*(?:"+([^"]+)"+|'+([^']+)'+|([^'"\s]+)\S*)|(\S+)(?!=))\s*/
                        3     1  3  3     2  3

I'm moderately confident that 1 and 2 should become * but you probably want to test it.

But I'm confused about the 3s existing at all because they mean that e.g. foo="""""bar""" gets accepted and treated as foo="bar"





dammit too late by herbert (4.00 / 1) #4 Fri Aug 29, 2008 at 06:08:22 AM EST
I agree with Vulch.

[ Parent ]

Better presented by Vulch (4.00 / 1) #5 Fri Aug 29, 2008 at 06:32:51 AM EST
...though.

[ Parent ]

Hacky hacky hack hack | 5 comments (5 topical, 0 hidden) | Trackback