使用C/C++读取csv文件的每一行,以逗号分隔,如果某单元格的数据本身含有逗号,这里如何避免分隔出错?
PHPz
PHPz 2017-04-17 11:43:17
0
3
1137

在使用MFC读取csv文件的每一行并按逗号分隔时遇到这样的问题:

csv文件是用逗号作为列分隔符的,我读到csv文件的一行后,也是按照逗号来进行分割的,以便能够把这一行字符串按照原本csv文件中列的形式分割出来。但是当某个单元格的数据本身就带有逗号时(比如有一个单元格是“中国,朝鲜”),而且该逗号也是英文半角的,这个时候我的分隔便出错了,因为这个逗号的存在,原本csv中占10列的一行现在被分割成了11个,该如何解决?


2015/01/21 10:00更新

问题已经解决,今天有空会贴上解决方法和代码,谢谢各位的热心帮助。
只能采纳一个回答,就选kepler84的吧,也非常感谢Chobits提供的方法!

PHPz
PHPz

学习是最好的投资!

reply all(3)
大家讲道理

First of all, in this case, each column (or just the column with a comma) must have quotation marks to express the semantics of "although there is a comma, it is only one column". Otherwise, the CSV format is incorrect.

In the case of quotation marks, you cannot use simple split to process the string. The simple method is to scan the string and mark whether the current character is in quotation marks. If it is in quotation marks, ignore the comma, otherwise it will split when it encounters a comma.

Of course, the easier way is to use the existing csv library. There are a lot of them on github.

刘奇

http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader

PHPzhong

CSV format specification

According to the format specification of csv, a column of data containing commas should be wrapped in double quotes. This is easy to handle. Just extract the string enclosed in double quotes and replace the commas inside with other characters. symbol, and then concatenate the left and right parts of the string enclosed in double quotes in the original string. This is only guaranteed to be applicable to csv files with a standardized format. Success is not guaranteed for files that are not standardized.

void _analyse_line(CStringW line) {
    // 保存原字符串副本
    CStringW copy_line = line;
    // 获取首个双引号在字符串中的索引
    int fquote_index = line.Find('\"');
    // 如果不存在双引号,可以直接按照逗号分割该字符串
    // 存在双引号的话检查被双引号包起的字符串中是否存在逗号
    if (fquote_index != -1) {
        int lquote_index = 0, _index;
        // 获取最后一个双引号的索引
        while ((_index = line.Find('\"')) != -1) {
            line = CStringW(line.Right(line.GetLength() - _index - 1));
            lquote_index += _index + 1;
        }
        // 双引号包起的字符串的长度
        int len = lquote_index - fquote_index;
        // 首个双引号左边的字符串
        CStringW left = copy_line.Left(fquote_index);
        line = CStringW(copy_line.Right(copy_line.GetLength() - fquote_index));
        // 最后一个双引号右边的字符串
        CStringW right = CStringW(line.Right(line.GetLength() - len));
        // 被双引号包起的字符串
        line = CStringW(line.Left(len));
        // 替换被双引号包起的字符串中的逗号
        line.Replace('\,', '\t');
        // 拼接新的字符串
        line = left + line + right;
    }
    
    vector arr = split(line, ',');
    // 把用'\t'替换过的逗号还原回来
    for (int i = 0; i 

split方法不贴了,实现方法很多。

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template