在使用MFC读取csv文件的每一行并按逗号分隔时遇到这样的问题:
csv文件是用逗号作为列分隔符的,我读到csv文件的一行后,也是按照逗号来进行分割的,以便能够把这一行字符串按照原本csv文件中列的形式分割出来。但是当某个单元格的数据本身就带有逗号时(比如有一个单元格是“中国,朝鲜”),而且该逗号也是英文半角的,这个时候我的分隔便出错了,因为这个逗号的存在,原本csv中占10列的一行现在被分割成了11个,该如何解决?
2015/01/21 10:00更新
问题已经解决,今天有空会贴上解决方法和代码,谢谢各位的热心帮助。
只能采纳一个回答,就选kepler84的吧,也非常感谢Chobits提供的方法!
First of all, in this case, each column (or just the column with a comma) must have quotation marks to express the semantics of "although there is a comma, it is only one column". Otherwise, the CSV format is incorrect.
In the case of quotation marks, you cannot use simple split to process the string. The simple method is to scan the string and mark whether the current character is in quotation marks. If it is in quotation marks, ignore the comma, otherwise it will split when it encounters a comma.
Of course, the easier way is to use the existing csv library. There are a lot of them on github.
http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
CSV format specification
According to the format specification of csv, a column of data containing commas should be wrapped in double quotes. This is easy to handle. Just extract the string enclosed in double quotes and replace the commas inside with other characters. symbol, and then concatenate the left and right parts of the string enclosed in double quotes in the original string. This is only guaranteed to be applicable to csv files with a standardized format. Success is not guaranteed for files that are not standardized.