Home > Web Front-end > JS Tutorial > How can I parse a CSV string with embedded commas in quoted fields using regular expressions in JavaScript?

How can I parse a CSV string with embedded commas in quoted fields using regular expressions in JavaScript?

Mary-Kate Olsen
Release: 2024-12-04 16:45:12
Original
944 people have browsed it

How can I parse a CSV string with embedded commas in quoted fields using regular expressions in JavaScript?

Regex-based CSV String Parsing

Problem Statement:

Parse a CSV string with commas embedded within quoted values, while ignoring commas outside quotes.

Solution Overview:**

To properly parse a CSV string that may contain quoted values with escaped characters, it's necessary to walk through the string character by character. Two regular expressions are employed:

CSV Validation Regex:

^\s*(?:'[^'\]*(?:\[\S\s][^'\]*)*'|"[^"\]*(?:\[\S\s][^"\]*)*"|[^,'"\s\]*(?:\s+[^,'"\s\]+)*)\s*(?:,\s*(?:'[^'\]*(?:\[\S\s][^'\]*)*'|"[^"\]*(?:\[\S\s][^"\]*)*"|[^,'"\s\]*(?:\s+[^,'"\s\]+)*)\s*)*$
Copy after login

This regex ensures that the input string follows the defined CSV format, where:

  • Values can be single-quoted, double-quoted, or unquoted.
  • Quoted values may contain escaped characters.
  • Commas are used as separators.

Value Parsing Regex:

(?!\s*$)\s*(?:'([^'\]*(?:\[\S\s][^'\]*)*)'|"([^"\]*(?:\[\S\s][^"\]*)*)"|([^,'"\s\]*(?:\s+[^,'"\s\]+)*)|)\s*(?:,|$)
Copy after login

This regex extracts one value at a time from the CSV string, considering the same rules as the validation regex. It handles quoted values and removes escaped characters.

JavaScript Implementation:**

function CSVtoArray(text) {
    const re_valid = /^\s*(?:'[^'\]*(?:\[\S\s][^'\]*)*'|"[^"\]*(?:\[\S\s][^"\]*)*"|[^,'"\s\]*(?:\s+[^,'"\s\]+)*)\s*(?:,\s*(?:'[^'\]*(?:\[\S\s][^'\]*)*'|"[^"\]*(?:\[\S\s][^"\]*)*"|[^,'"\s\]*(?:\s+[^,'"\s\]+)*)\s*)*$/;
    const re_value = /(?!\s*$)\s*(?:'([^'\]*(?:\[\S\s][^'\]*)*)'|"([^"\]*(?:\[\S\s][^"\]*)*)"|([^,'"\s\]*(?:\s+[^,'"\s\]+)*))\s*(?:,|$)/g;
    // Return NULL if input string is not well formed CSV string.
    if (!re_valid.test(text)) return null;
    const a = [];                     // Initialize array to receive values.
    text.replace(re_value, // "Walk" the string using replace with callback.
        function(m0, m1, m2, m3) {
            // Remove backslash from \' in single quoted values.
            if      (m1 !== undefined) a.push(m1.replace(/\'/g, "'"));
            // Remove backslash from \" in double quoted values.
            else if (m2 !== undefined) a.push(m2.replace(/\"/g, '"'));
            else if (m3 !== undefined) a.push(m3);
            return ''; // Return empty string.
        });
    // Handle special case of empty last value.
    if (/,\s*$/.test(text)) a.push('');
    return a;
}
Copy after login

Example Usage:**

const csvString = "'string, duppi, du', 23, lala";
const result = CSVtoArray(csvString);
console.log(result); // ["string, duppi, du", "23", "lala"]
Copy after login

The above is the detailed content of How can I parse a CSV string with embedded commas in quoted fields using regular expressions in JavaScript?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template