A few tips on parsing XML and JSON content

Overview

In the absence of a unified standard When a system connects to multiple external systems, it often encounters heterogeneous response data from the request interface. It may return XML or
JSON. In addition to the different return types, the content structure is also different. Taking the XML type as an example,
Interface 1 returns content

<root>
    <bizKey>16112638767472747178067</bizKey>
    <returnMsg>OK</returnMsg>
    <returnCode>200</returnCode>
    ...
</root>

Copy after login

Interface 2 returns content

<root>
    <bid>16112638767472747178068</bid>
    <note>成功</note>
    <returnStatus>1</returnStatus>
    ...
</root>

Copy after login

It is obviously unreasonable to process each format of content in our system. In the above content, we only care about three types of information, namely business ID, status value and description information. Can we abstract these three types of information?
After obtaining this information, we can perform business logic processing.

Parsing XML and JSON

According to business abstraction, we need to obtain three types of information from XML or JSON content. We will use XPath and JSONPath to parse here. . For example, to obtain important information about interface 1,
we can set three XPath expressions,

{
    bid: "/root/bizKey",
    code: "/root/returnCode",
    description: "/root/returnMsg"
}

Copy after login

bid, code and descriptionCorresponds to the field name defined by our system.
The same goes for parsing JSON content, except that the JSONPath expression is defined.

Process data content in two steps

Suppose we obtain bid,code and from the original XML and JSON data descriptionInformation,
obtained from interface 1

{
    bid: &#39;16112638767472747178067&#39;,
    code: &#39;200&#39;,
    description: &#39;OK&#39;
}

Copy after login

obtained from interface 2

{
    bid: &#39;16112638767472747178068&#39;,
    code: &#39;1&#39;,
    description: &#39;成功&#39;
}

Copy after login

Assume we get the status value from the interface 1 document 200 indicates that the request was successful , we learned from the interface 2 document that the status value 1 indicates that the request was successful. Although they all indicate that the request was successful, we still cannot
save them intact into our business-related tables (of course these response data It still needs to be saved in another record table, at least to facilitate troubleshooting).
Assume that our business-related tables are designed like this

##Field nameTypeDescriptionbidstringBusiness IDcodeintStatus value, 0=initial, 1=requesting, 2=success, 3=failuredescriptionstringDescription

因此，我们还必须定义规则把接口1返回的状态值200转换为我们系统的2，把接口2返回的状态值1转换为我们系统的2。
总结一下，两步走解析XML和JSON数据内容

根据XPath或者JSONPath表达式解析获得重要信息
根据规则转换状态值

第一步解析数据获得重要信息

以XML为例，

public class XmlParseUtils {
    private DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    private XPathFactory xpathFactory = XPathFactory.newInstance();
    
    /**
     * 
     * @param param    数据内容
     * @param paths 表达式
     * @return
     * @throws Exception
     */
    public Map<String,Object> parse(String param, Map<String,String> paths) throws Exception{
        InputSource inputSource = new InputSource(new StringReader(param));
        Document document = dbFactory.newDocumentBuilder().parse(inputSource);
        Map<String,Object> map = Maps.newHashMap();
        for(String key : paths.keySet()) {
            XPath xpath = xpathFactory.newXPath();
            Node node = (Node) xpath.evaluate(paths.get(key), document, XPathConstants.NODE);
            if(node == null) {
                throw new Exception("node not found, xpath is " + paths.get(key));
            }
            map.put(key, node.getTextContent());
        }
        return map;
    }

}

Copy after login

parse函数的返回类型也可以是Map<String,String>，暂且用Map<String,Object>。

第二步根据规则转换状态值

这一步稍稍有点麻烦，不过我们先不考虑代码实现，反正你能想到的可能别人已经帮你实现了。首先我们根据接口文档定义规则，写出规则表达式（或者其他的什么），
又是表达式。假设接口1的返回的状态值比较简单，只有200表示成功，其他情况都是失败，那么我们可以这样定义规则，

code.equals(&quot;200&quot;) ? 2: 3

Copy after login

或者

<#if code == "200">
2
<#else>
3
<#/if>

Copy after login

亦或者

function handle(arg) {
    if(arg == 200) {
        return 2;
    }
    return 3;
}
handle(${code})

Copy after login

以上根据同一份文档定义了三种不同类型的状态值转换规则，肯定需要三种不同的实现。下面一一说明，

三目表达式

code.equals("200") ? 2: 3是一个三目表达式，我们将使用jexl引擎来解析，利用第一步解析数据获得重要信息的结果，我们可以这样做

    public Object evaluateByJexl(String expression, Map<String,Object> context) {
        JexlEngine jexl = new JexlBuilder().create();
        JexlExpression e = jexl.createExpression(expression);
        JexlContext jc = new MapContext(context);
        return e.evaluate(jc);
    }

Copy after login

FreeMarker模板

<#if code == "200">
2
<#else>
3
<#/if>

Copy after login

处理这段模板我们可以这么做

    /**
     * 
     * @param param FreeMarker模板
     * @param context
     * @return
     * @throws Exception
     */
    public String render(String param, Map<String,Object> context) throws Exception {
        Configuration cfg = new Configuration();
        StringTemplateLoader stringLoader = new StringTemplateLoader();
        stringLoader.putTemplate("myTemplate",param);
        cfg.setTemplateLoader(stringLoader);
        Template template = cfg.getTemplate("myTemplate","utf-8");
        StringWriter writer = new StringWriter();
        template.process(context, writer);
        return writer.toString();
    }

Copy after login

如果FreeMarker模板比较复杂，从模板预编译成Template可能会消耗更多的性能，就要考虑把Template缓存起来。

JavaScript代码段

function handle(arg) {
    if(arg == 200) {
        return 2;
    }
    return 3;
}
handle(${code})

Copy after login

这段js代码中存在${code}，首先它需要使用FreeMarker渲染得到真正的handle方法的调用参数，然后

    public Object evaluate(String expression) throws Exception {
        ScriptEngineManager manager = new ScriptEngineManager();
        ScriptEngine engine = manager.getEngineByName("javascript");
        return engine.eval(expression);
    }

Copy after login

ScriptEngineManager的性能估计不太乐观，毕竟是一个语言的引擎。

不同转换规则实现的比较

类型	实现	优点	缺点
三目表达式	Jexl	简单（easy）	简单（simple）
FreeMarker模板	FreeMarker	--	--
JavaScript代码段	FreeMarker + ScriptEngine	直观	过程复杂，性能问题

看起来Freemarker是一个不错的选择。
至此两步走小技巧已经实现了，都是利用了现成的代码实现。

或许我们会这样的挑战，在做状态值转换时需要知道当前系统某个业务状态值的情况，
此时Freemarker表达式可能是这样的，

<# assign lastCode = GetLastCode(code)>
<#if lastCode == "2">
2
<#elseif code == "200">
2
<#else>
3
<#/if>

Copy after login

这里我们可以使用Freemarker的特性，自定义Java函数或工具类，在模板中调用。

The above is the detailed content of Example code sharing of some techniques for parsing XML and JSON content. For more information, please follow other related articles on the PHP Chinese website!