Now there is a string:
<code>str1 = '(subject_id = "A" OR (status_id = "Open" AND (status_id = "C" OR level_id = "D")))' </code>
or
<code>str2 = '(subject_id = "A" OR subject_id = "Food" OR (subject_id = "C" OR (status_id = "Open" AND (status_id = "C" OR (level_id = "D" AND subject_id = "(Cat)")))))' </code>
I need to match the innermost brackets in the string and the content inside them (not matching brackets within quotation marks) through regular expressions, that is:
<code>str1 => (status_id = "C" OR level_id = "D") str2 => (level_id = "D" AND subject_id = "(Cat)") </code>
So, how should we write this super complex regular expression?
If regular expression cannot be implemented, how can it be implemented with JS?
Added, for str1
, I found such a regular expression that can satisfy the matching:
<code>\([^()]+\) </code>
But for str2, there is still no solution. I look forward to everyone’s answers!
Now there is a string:
<code>str1 = '(subject_id = "A" OR (status_id = "Open" AND (status_id = "C" OR level_id = "D")))' </code>
or
<code>str2 = '(subject_id = "A" OR subject_id = "Food" OR (subject_id = "C" OR (status_id = "Open" AND (status_id = "C" OR (level_id = "D" AND subject_id = "(Cat)")))))' </code>
I need to match the innermost brackets in the string and the content inside them (not matching brackets within quotation marks) through regular expressions, that is:
<code>str1 => (status_id = "C" OR level_id = "D") str2 => (level_id = "D" AND subject_id = "(Cat)") </code>
So, how should we write this super complex regular expression?
If regular expression cannot be implemented, how can it be implemented with JS?
Added, for str1
, I found such a regular expression that can satisfy the matching:
<code>\([^()]+\) </code>
But for str2, there is still no solution. I look forward to everyone’s answers!
For str2, I found this
<code>\([^()]*\"[^"]*\"[^()]*\)</code>
After looking at the requirements, I didn’t consider using regular expressions at all. It seemed too complicated... Let’s just use the traditional method;
You can use the idea of operation priority, that is, use the stackdata structure to obtain the contents of the inner brackets ;
Technical points:
Match the innermost bracket
Contents within quotation marks are not used as matching criteria
Start designing the algorithm based on this idea:
The algorithm calculates the startIndex
and endIndex
of the substring to be matched and then uses the substring()
method to obtain the substring;
When a "("
character is matched, is pushed onto the stack. When we match the first ")"
, is popped out of the stack, which is the sub-character between the two indices. String is the target string;
matches a """
, it will stop matching "("
. It will not continue to search until the next """
is found. "("
.
This is an algorithm that I came up with through brainstorming. If there are any shortcomings, please feel free to add.
//This way, try /(([^()]*?"[^"()]*([^"()]+)[^()]*?"[^() ]*)+)|([^()]+)/
Analyze needs > Find solutions for each demand point > Integrate solutions = Solve problems
needs to match the form of ( a )
There are two possibilities for the characters contained in a
, represented by a1
and a2
a1
contains one or more strings in the form of b " c " b
,
where b
is a string that does not include "
, (
or )
where c
is a string that does not include "
a2
does not contain (
or )
2.2 =>
a2
=[^()]*
2.1.1 =>b
=[^()"]*
2.1.2 =>c
=[^"]*
2.1 =>a1
=(b"c"b)+
=(b"c")+b
=([^()"]*"[^" ]*")+[^()"]*
1 =>(a)
=(a1)|(a2)
=(([^()"]*"[^"]* ")+[^()"]*)|([^()]*)
<code>/\(([^\(\)\"]*\"[^\"]*\")+[^\(\)\"]*\)|\([^\(\)]*\)/</code>
<code class="javascript">var reg = /\(([^\(\)\"]*\"[^\"]*\")+[^\(\)\"]*\)|\([^\(\)]*\)/; '(the (quick "brown" fox "jumps over, (the) lazy" dog ))' .match(reg)[0] //"(quick "brown" fox "jumps over, (the) lazy" dog )" '(the ("(quick)" brown fox "jumps (over, the)" lazy) dog )' .match(reg)[0]; //"("(quick)" brown fox "jumps (over, the)" lazy)" '(the (quick brown fox (jumps "over", ((the) "lazy"))) dog )' .match(reg)[0]; //"(the)"</code>
Then change it like this:
<code>substr=str.match(/\([^()]+\)/g)[0] </code>
Get the innermost bracket and the value in it, and then determine whether the first digit of the value is ", and whether the last digit is "":
<code>index=str.indexOf(str.match(/\([^()]+\)/g)[0]) length=str.match(/\([^()]+\)/g)[0].length str.substr(index+length,1) str.substr(index-1,1) </code>
If it does not exist, it is the required answer. If it exists, replace substr in str first, then match it, and finally replace it back:
<code>str.replace(substr,"&&&") str.replace(substr,"&&&").match(/\([^()]+\)/g)[0] str.replace(substr,"&&&").match(/\([^()]+\)/g)[0].replace("&&&",substr) </code>
本题难点在需要对""进行递归统计,例如
<code>(level_id = "D AND subject_id = "(Cat)"")</code>
(cat)是符合要求的.
<code>\([^()]*?\"((?:[^\"\"]|\"(?1)\")*+)\"[^()]*?\)|\([^()]*?\) </code>
真爱生命,远离正则,该正则可以满足你的要求,php能用(php支持递归)java及Python无法使用.
推荐一个思路,找到(的index,切字符串处理
手机发不出正则 黑线
楼主的【^()】里如果不匹配()则继续
把不匹配(的条件去掉,把贪婪的+改成*?即可
!代码
console.log('(subject_id = “A” OR (status_id = “Open” AND (status_id = “C” OR level_id = “D”)))'.match(/(1*)/))
希望对你有帮助
用正则匹配会比较复杂,建议 把干扰串 "( 和 )" 替换掉,比如 "[, ]",再用简单的正则替换,之后再换回来。
正则用 Python 实现如下:
<code>import re str1 = '(subject_id = "A" OR (status_id = "Open" AND (status_id = "C" OR level_id = "D")))' str2 = '(subject_id = "A" OR subject_id = "Food" OR (subject_id = "C" OR (status_id = "Open" AND (status_id = "C" OR (level_id = "D" AND subject_id = "(Cat)")))))' pat = re.compile(r"""(?<=[^"]) \([^()]+? ("\(.+?\)")* \) (?=[^"]) """, re.X) print pat.search(str1).group(0) print pat.search(str2).group(0)</code>
输出为:
<code>(status_id = "C" OR level_id = "D") (level_id = "D" AND subject_id = "(Cat)") </code>