Build a regular expression to add double quotes to parts of a JSON value that do not have double quotes
P粉231112437
2023-08-17 19:06:54
<p>I have a lot of malformed JSON strings, like this: </p>
<pre class="brush:php;toolbar:false;">{
"id":23424938,
"name":aN,
"ref":aN,
"jul":aN,
"cat":{},
"src":[],
"Code":"SA",
"type":d,
"spec":[i,j],
"child":a
}</pre>
<p>I'm trying to build a regex to double quote a JSON value without success. </p>
<p>I ended up using <code>/":([^"d{[] ?[^,}]?)/</code> which fixed everything except the values inside the array, For example, <code>[i,j]</code> will not be converted to <code>["i","j"]</code>. </p>
<p>Can you help me with the values in brackets? </p>
<p>https://regex101.com/r/CGskmy/1</p>
This task will be somewhat difficult because of ambiguity. For example, does
{ "x": [y] }
become{ "x": "[y]" }
or{ "x": [" y"] }
? I would assume that the unquoted string does not contain JSON control characters, such as'[', ']', '{', '}', '"', ':', ','
.I think you can accomplish this using named capture groups, which is a feature in PHP that is possible using PCRE. This requires some programming to perform the replacement. The usual
preg_replace
operation is not enough because we don't replace all matches.This is the method I came up with. First, I match quoted strings and ignore them. Second, I match numbers and ignore them. Finally, I match the unquoted string and store it in a capturing group called "unquoted". Note that PCRE will try to match these alternatives in the order in which they are matched. Unquoted strings are matched only if quoted strings and numbers cannot be matched. This is the key to this approach.
Once I've matched all unquoted strings, I just need to concatenate the output string with the replacement. This is done by iterating over the matches and copying the string fragments into the output.
I'm not dealing with the full JSON number syntax, nor with JSON syntax such as
true
,false
, ornull
. Hopefully this answer is a starting point and you can tweak it to suit your needs.InSync provides a nice regular expression that does not use named capture groups but instead instructs PCRE to skip unwanted matches.