![]() |
I'm learning Regex now, and'm quite stumped. I'd ask the instructor, but he's said to be no good with them either yesterday. I've made a test-page to get the hang of it, but its output's not what it should do, according to how I read the code.
Code:
*$string = "Best niet [b] bevet Moddervet Niet vet"; Code:
Best niet [b] bevet Moddervet Niet vet |
I did work a bit with regular expressions, but not in PHP.
Can you first explain what do you like program to do? |
I would like the program to tell me how many times it encounters [*b*] and how many times it encounters [/*b*] (without the *s) in a certain string. It then should add as many [/*b*]s as needed if the amount of [*b*]s is bigger than the amount of [/*b*]s.
In the end, it should turn all the [*b*][/*b*]-tags to HTML's [b] tags, but it should close as many as are opened, or it'd mess up my input. I eventually'd like to do that with hyperlinks too, which makes it so complicated. |
This is PHP? :blink:
|
Yep, regular expressions within PHP. It's one of the biggest brainwreckers out there for it, but sometimes a necessity. For more basic information, you can always look here. The site explains what it is, but it won't help me solve problem. Preg_match_all() can, though. Am going to try that one out now.
|
uhmn you misunderstood what is placed in $resultaat
in $resultaat[0] the copy of the total matched string is placed in $resultaat[1] is the match for the first () sequence in $resultaat[2] is the match for the second () sequence stored. so if you match for [*b*] then you will find in 0 [*b*] (as it's the full match) and in 1 (as it the first set of () that is matched). As you don't match for a second pair of () you will find any more matches. The count idea you are trying to implement will not work this way. |
Var_dump() gives wrong results too.
Code:
$string = "kakakapipikaka"; Code:
Test: array(2) { [0]=> *string(2) "ka" [1]=> *string(2) "ka" } A different solution works now, though. $number = preg_match_all($reg_ex, $string, $resultaat) returns 5 :) |
well the example you posted using var_dump gives the correct output for
ereg(i) you don't understand what eregi matches. (it only matches the string ka once.) and as that is the total string as well you see it in both 0 and 1 |
So eregi() stops the moment it finds a match. That's pretty useful, but not for what I'm trying to do :D Thanks for the clarification :cheers:
|
well it can match it more often if you specify it in the regexstring.
but it's not suitable for counting. the preg_match_all seems a better choice for that |
Quick side-question... is it possible to make function in PHP of which an argument's not necessary? A function that'd be in the manual as function_name(int one, int two, [int three])
|
What are you trying to do?
Code:
Replace [b] with <b>? |
Or str_replace(), but it wouldn't make sure each [*b*]'d have a [/*b*], which's another thing that needs to be checked and fixed.
|
uhm
Code:
function blah($arg1, $arg2=10) { blah(10) and blah(10,20) |
Then in reg. expression, you have to check for this:
'(\[b\])?(\[\/b\])' or something like that. So, you're looking for regular expression that in one line has both opening and closing tag, and use replace function to fix it. I'm not sure if ? will work for all characters between, data might be able to tell you that. (or just check reference on the page you posted above) When you are done, check for tags that have no matching opening/closing tag. |
Quote:
Code:
/\[b\](.*?)\[\/b\]/ig If you're trying to write a forgiving BBCode parser and aren't just learning regexp then you're going about things all wrong. Regular expressions are for matching patterns, not for constructing push-down automata. Keep in mind that there's nesting and nesting requirements which regular expressions just can't handle well (it's possible with things like look-ahead and look-behind matching, but it's not pretty, it's not fast, and it's not reliable). |
How do you suggest I'd go 'bout it then?
|
Quote:
|
Quote:
<!--QuoteBegin-plix Regular expressions are for matching patterns, not for constructing push-down automata.[/quote] Use a push-down automaton (a finite state machine doesn't include a stack, which is necessary to do open- and close-tag matching. It's a bit harder to implement if you haven't written one before, but it's not only easier to maintain, it's much more flexible. |
Hmmm... I've never heard of a push-down automaton. I'll see if I can find a tutorial 'bout it :ok:
|
Me neither. There's an excellent tutorial on finite state automate on this website though. And check out the wikipedia links on Pushdown automaton. Might get you further along the way. It seems a bit over the top though for a simple pattern matching operation, or am I being naive here?
|
It's a bit more than pattern matching. It's case insensitive pattern replacing with filling up lacking parts of a pattern. Let me explain it more concrete, with the example of a guestbook.
In the guestbook, users can use BBcode (the []-tags you can use on forums as well). It's quickly done by using str_replace(), which'd replace all BBcode tags with their corresponding HTML in a given string (which's the user's input, in this case). Now take this case: Code:
User_one: Foo [b]bar What the guestbook should do instead, is check if there're less closing tags for each tag than opening tags, then add the closing tags at the end of the input, so that it'll look like this (in a case of bad user input): Code:
User_one: Foo [b]bar [b]Foo[b]bar[/b][/b][/b] If that works, it'll be good, but it'll still not catch all cases of input. It'd miss this, for example: Code:
User_one: Foo[B]bar [b]Foo[b]bar[/b][/b] That's what the problems and purposes are. It's more than just a pattern matching, due to the possible adding and the definite replacing. Regular expressions seem to be providing horrible code for this. That or I didn't code it well :angel: |
Quote:
About a year ago I wrote an extremely forgiving, correcting, (X)HTML parser which validates against an arbitrary DTD (for custom tag support) and supports callback filters for custom rendering of elements. It's *way* more complex than what you need to do basic BBCode parsing and sanitization, but it's based on the exact same idea. However, since my implementation was correcting and supported filters it required the development of a full parse tree, which you shouldn't need. Unless you want to do complex transformations you can probably get away with doing things in-place. Note: FSMs and PDAs are not the same thing, only similar. The stack is absolutely crucial for an HTML or BBCode parser, which is why a FSM is not appropriate. |
Quote:
Another major problem with using regexps is for this is that running them across multiple lines can be a real pain (it's possible, but it complicates things). |
The current time is 01:02 AM (GMT) |
Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.