Home > Backend Development > PHP Tutorial > javascript - Regular expression to match the content of the innermost bracket

javascript - Regular expression to match the content of the innermost bracket

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB
Release: 2016-08-04 09:19:46
Original
1585 people have browsed it

Now there is a string:

<code>str1 = '(subject_id = "A" OR (status_id = "Open" AND (status_id = "C" OR level_id = "D")))'
</code>
Copy after login
Copy after login

or

<code>str2 = '(subject_id = "A" OR subject_id = "Food" OR (subject_id = "C" OR (status_id = "Open" AND (status_id = "C" OR (level_id = "D" AND subject_id = "(Cat)")))))'
</code>
Copy after login
Copy after login

I need to match the innermost brackets in the string and the content inside them (not matching brackets within quotation marks) through regular expressions, that is:

<code>str1 => (status_id = "C" OR level_id = "D")

str2 => (level_id = "D" AND subject_id = "(Cat)")
</code>
Copy after login
Copy after login

So, how should we write this super complex regular expression?

If regular expression cannot be implemented, how can it be implemented with JS?


Added, for str1, I found such a regular expression that can satisfy the matching:

<code>\([^()]+\)
</code>
Copy after login
Copy after login

But for str2, there is still no solution. I look forward to everyone’s answers!

Reply content:

Now there is a string:

<code>str1 = '(subject_id = "A" OR (status_id = "Open" AND (status_id = "C" OR level_id = "D")))'
</code>
Copy after login
Copy after login

or

<code>str2 = '(subject_id = "A" OR subject_id = "Food" OR (subject_id = "C" OR (status_id = "Open" AND (status_id = "C" OR (level_id = "D" AND subject_id = "(Cat)")))))'
</code>
Copy after login
Copy after login

I need to match the innermost brackets in the string and the content inside them (not matching brackets within quotation marks) through regular expressions, that is:

<code>str1 => (status_id = "C" OR level_id = "D")

str2 => (level_id = "D" AND subject_id = "(Cat)")
</code>
Copy after login
Copy after login

So, how should we write this super complex regular expression?

If regular expression cannot be implemented, how can it be implemented with JS?


Added, for str1, I found such a regular expression that can satisfy the matching:

<code>\([^()]+\)
</code>
Copy after login
Copy after login

But for str2, there is still no solution. I look forward to everyone’s answers!

For str2, I found this

<code>\([^()]*\"[^"]*\"[^()]*\)</code>
Copy after login

After looking at the requirements, I didn’t consider using regular expressions at all. It seemed too complicated... Let’s just use the traditional method;
You can use the idea of ​​operation priority, that is, use the stackdata structure to obtain the contents of the inner brackets ;
Technical points:

  1. Match the innermost bracket

  2. Contents within quotation marks are not used as matching criteria

Start designing the algorithm based on this idea:
The algorithm calculates the startIndex and endIndex of the substring to be matched and then uses the substring() method to obtain the substring;

  • When a "(" character is matched, is pushed onto the stack. When we match the first ")", is popped out of the stack, which is the sub-character between the two indices. String is the target string;

  • When
  • matches a """, it will stop matching "(". It will not continue to search until the next """ is found. "(".

This is an algorithm that I came up with through brainstorming. If there are any shortcomings, please feel free to add.

//This way, try
/(([^()]*?"[^"()]*([^"()]+)[^()]*?"[^() ]*)+)|([^()]+)/


Added:

Analyze needs > Find solutions for each demand point > Integrate solutions = Solve problems

Analysis requirements:

  1. needs to match the form of ( a )

  2. There are two possibilities for the characters contained in a, represented by a1 and a2

    1. a1contains one or more strings in the form of b " c " b,

      1. where b is a string that does not include ", ( or )

      2. where c is a string that does not include "

    2. a2 does not contain ( or )

Reverse derivation:

2.2 => a2 = [^()]*
2.1.1 => b = [^()"]*
2.1.2 => c = [^"]*
2.1 => a1= (b"c"b)+ = (b"c")+b =([^()"]*"[^" ]*")+[^()"]*
1 => (a) = (a1)|(a2) = (([^()"]*"[^"]* ")+[^()"]*)|([^()]*)

Regular expression:

<code>/\(([^\(\)\"]*\"[^\"]*\")+[^\(\)\"]*\)|\([^\(\)]*\)/</code>
Copy after login

Verification:

<code class="javascript">var reg = /\(([^\(\)\"]*\"[^\"]*\")+[^\(\)\"]*\)|\([^\(\)]*\)/;

'(the (quick "brown" fox "jumps over, (the) lazy" dog ))'
    .match(reg)[0]
//"(quick "brown" fox "jumps over, (the) lazy" dog )"

'(the ("(quick)" brown fox "jumps (over, the)" lazy) dog )'
    .match(reg)[0];
//"("(quick)" brown fox "jumps (over, the)" lazy)"

'(the (quick brown fox (jumps "over", ((the) "lazy"))) dog )'
    .match(reg)[0];
//"(the)"</code>
Copy after login

Then change it like this:

<code>substr=str.match(/\([^()]+\)/g)[0]
</code>
Copy after login

Get the innermost bracket and the value in it, and then determine whether the first digit of the value is ", and whether the last digit is "":

<code>index=str.indexOf(str.match(/\([^()]+\)/g)[0])
length=str.match(/\([^()]+\)/g)[0].length
str.substr(index+length,1)
str.substr(index-1,1)
</code>
Copy after login

If it does not exist, it is the required answer. If it exists, replace substr in str first, then match it, and finally replace it back:

<code>str.replace(substr,"&&&")
str.replace(substr,"&&&").match(/\([^()]+\)/g)[0]
str.replace(substr,"&&&").match(/\([^()]+\)/g)[0].replace("&&&",substr)
</code>
Copy after login

本题难点在需要对""进行递归统计,例如

<code>(level_id = "D AND subject_id = "(Cat)"")</code>
Copy after login

(cat)是符合要求的.

<code>\([^()]*?\"((?:[^\"\"]|\"(?1)\")*+)\"[^()]*?\)|\([^()]*?\)
</code>
Copy after login

真爱生命,远离正则,该正则可以满足你的要求,php能用(php支持递归)java及Python无法使用.

推荐一个思路,找到(的index,切字符串处理

手机发不出正则 黑线
楼主的【^()】里如果不匹配()则继续
把不匹配(的条件去掉,把贪婪的+改成*?即可

!代码

console.log('(subject_id = “A” OR (status_id = “Open” AND (status_id = “C” OR level_id = “D”)))'.match(/(1*)/))
希望对你有帮助
"javascript


  1. () ↩

用正则匹配会比较复杂,建议 把干扰串 "( 和 )" 替换掉,比如 "[, ]",再用简单的正则替换,之后再换回来。

正则用 Python 实现如下:

<code>import re

str1 = '(subject_id = "A" OR (status_id = "Open" AND (status_id = "C" OR level_id = "D")))'
str2 = '(subject_id = "A" OR subject_id = "Food" OR (subject_id = "C" OR (status_id = "Open" AND (status_id = "C" OR (level_id = "D" AND subject_id = "(Cat)")))))'

pat = re.compile(r"""(?<=[^"])
        \([^()]+?
        ("\(.+?\)")*
        \)
        (?=[^"])
        """, re.X)

print pat.search(str1).group(0)
print pat.search(str2).group(0)</code>
Copy after login

输出为:

<code>(status_id = "C" OR level_id = "D")
(level_id = "D" AND subject_id = "(Cat)")
</code>
Copy after login
Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Issues
What are JavaScript hook functions?
From 1970-01-01 08:00:00
0
0
0
What is JavaScript garbage collection?
From 1970-01-01 08:00:00
0
0
0
c++ calls javascript
From 1970-01-01 08:00:00
0
0
0
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template