现在有一个字符串:
<code>str1 = '(subject_id = "A" OR (status_id = "Open" AND (status_id = "C" OR level_id = "D")))' </code>
或者
<code>str2 = '(subject_id = "A" OR subject_id = "Food" OR (subject_id = "C" OR (status_id = "Open" AND (status_id = "C" OR (level_id = "D" AND subject_id = "(Cat)")))))' </code>
我需要通过正则,匹配字符串中最里层的括号及其中的内容(不匹配引号内的括号),即:
<code>str1 => (status_id = "C" OR level_id = "D") str2 => (level_id = "D" AND subject_id = "(Cat)") </code>
那么,这种超复杂的正则应该怎么写?
如果正则实现不了,那么JS怎么来实现?
补充,对于 str1
,我找到了这样的正则可以满足匹配:
<code>\([^()]+\) </code>
但是对于str2, 依然没有办法,期待大家解答!
现在有一个字符串:
<code>str1 = '(subject_id = "A" OR (status_id = "Open" AND (status_id = "C" OR level_id = "D")))' </code>
或者
<code>str2 = '(subject_id = "A" OR subject_id = "Food" OR (subject_id = "C" OR (status_id = "Open" AND (status_id = "C" OR (level_id = "D" AND subject_id = "(Cat)")))))' </code>
我需要通过正则,匹配字符串中最里层的括号及其中的内容(不匹配引号内的括号),即:
<code>str1 => (status_id = "C" OR level_id = "D") str2 => (level_id = "D" AND subject_id = "(Cat)") </code>
那么,这种超复杂的正则应该怎么写?
如果正则实现不了,那么JS怎么来实现?
补充,对于 str1
,我找到了这样的正则可以满足匹配:
<code>\([^()]+\) </code>
但是对于str2, 依然没有办法,期待大家解答!
对于str2,我找到了这样的
<code>\([^()]*\"[^"]*\"[^()]*\)</code>
看了一下需求我根本没考虑用正则,好像太复杂了...直接上传统方法吧;
可以使用运算优先级的思想,即用栈的数据结构来取得内部括号的内容;
技术要点:
匹配最内层的括号
引号内的内容不作为匹配标准
照着这个思路开始设计算法:
该算法是计算出要匹配的子字符串的 startIndex
和 endIndex
然后用 substring()
方法获得子字符串;
当匹配到一个 "("
字符的时,入栈,当我们匹配到第一个 ")"
时,出栈,即两个索引之间的子字符串为目标字符串;
匹配到一个 "\""
时,则停止匹配 "("
,直到搜索到下一个 "\""
时,才继续开始搜索 "("
。
拍脑袋想出来的算法,有不足之处欢迎补充。
//这样,试试/\(([^\(\)]*?"[^\"\(\)]*([^\"\(\)]+\)[^\(\)]*?\"[^\(\)]*)+)|([^\(\)]+\)/
分析需求 > 找到每个需求点的解决方案 > 整合解决方案 = 解决问题
需要匹配 ( a )
的形式
其中 a
包含的字符有两种可能,用a1
和a2
表示
a1
含有一个或多个 b " c " b
形式的字符串,
其中 b
是一段不包括 "
, (
或 )
的字符串
其中 c
是一段不包括 "
的字符串
a2
中不含有 (
或 )
2.2 =>
a2
=[^\(\)]*
2.1.1 =>b
=[^\(\)\"]*
2.1.2 =>c
=[^\"]*
2.1 =>a1
=(b\"c\"b)+
=(b\"c\")+b
=([^\(\)\"]*\"[^\"]*\")+[^\(\)\"]*
1 =>\(a\)
=\(a1\)|\(a2\)
=\(([^\(\)\"]*\"[^\"]*\")+[^\(\)\"]*\)|\([^\(\)]*\)
<code>/\(([^\(\)\"]*\"[^\"]*\")+[^\(\)\"]*\)|\([^\(\)]*\)/</code>
<code class="javascript">var reg = /\(([^\(\)\"]*\"[^\"]*\")+[^\(\)\"]*\)|\([^\(\)]*\)/; '(the (quick "brown" fox "jumps over, (the) lazy" dog ))' .match(reg)[0] //"(quick "brown" fox "jumps over, (the) lazy" dog )" '(the ("(quick)" brown fox "jumps (over, the)" lazy) dog )' .match(reg)[0]; //"("(quick)" brown fox "jumps (over, the)" lazy)" '(the (quick brown fox (jumps "over", ((the) "lazy"))) dog )' .match(reg)[0]; //"(the)"</code>
那就这么改:
<code>substr=str.match(/\([^()]+\)/g)[0] </code>
得到最里面括号及其中的值,后判断该值前一位是否是 “,后一位是否是 ”:
<code>index=str.indexOf(str.match(/\([^()]+\)/g)[0]) length=str.match(/\([^()]+\)/g)[0].length str.substr(index+length,1) str.substr(index-1,1) </code>
如果不存在,则是需要的答案,如果存在,则先将str中substr替换掉,后在match一下,最后在替换回来:
<code>str.replace(substr,"&&&") str.replace(substr,"&&&").match(/\([^()]+\)/g)[0] str.replace(substr,"&&&").match(/\([^()]+\)/g)[0].replace("&&&",substr) </code>
本题难点在需要对""进行递归统计,例如
<code>(level_id = "D AND subject_id = "(Cat)"")</code>
(cat)是符合要求的.
<code>\([^()]*?\"((?:[^\"\"]|\"(?1)\")*+)\"[^()]*?\)|\([^()]*?\) </code>
真爱生命,远离正则,该正则可以满足你的要求,php能用(php支持递归)java及Python无法使用.
推荐一个思路,找到(的index,切字符串处理
手机发不出正则 黑线
楼主的【^()】里如果不匹配()则继续
把不匹配(的条件去掉,把贪婪的+改成*?即可
!代码
console.log('(subject_id = “A” OR (status_id = “Open” AND (status_id = “C” OR level_id = “D”)))'.match(/(1*)/))
希望对你有帮助
用正则匹配会比较复杂,建议 把干扰串 "( 和 )" 替换掉,比如 "[, ]",再用简单的正则替换,之后再换回来。
正则用 Python 实现如下:
<code>import re str1 = '(subject_id = "A" OR (status_id = "Open" AND (status_id = "C" OR level_id = "D")))' str2 = '(subject_id = "A" OR subject_id = "Food" OR (subject_id = "C" OR (status_id = "Open" AND (status_id = "C" OR (level_id = "D" AND subject_id = "(Cat)")))))' pat = re.compile(r"""(?</code>
输出为:
<code>(status_id = "C" OR level_id = "D") (level_id = "D" AND subject_id = "(Cat)") </code>