解释一下为什么数据文件最好采用单字符作为分隔符
本文出处:http://blog.csdn.net/chaijunkun/article/details/17279565,转载请注明。由于本人不定期会整理相关博文,会对相应内容作出完善。因此强烈建议在原始出处查看此文。 距离上次写技术博客已经有半年时间了,年尾我觉得有必要写点东西总结一下经验,
本文出处:http://blog.csdn.net/chaijunkun/article/details/17279565,转载请注明。由于本人不定期会整理相关博文,会对相应内容作出完善。因此强烈建议在原始出处查看此文。
距离上次写技术博客已经有半年时间了,年尾我觉得有必要写点东西总结一下经验,分享给大家。近期在做一个数据同步的项目,从数据中心拿到定时分发的导出文件后,按照固定字段的含义再逐行解析,然后进一步分析后倒入到我这边的数据库。需求简单就是这样,我们来看个例子:
2013-09-29^_^21635265^_^测试标题^_^10^_^20^_^15
假设上面的例子是文本数据的其中一行。在这个例子中,列分隔符采用的是^_^(注意,是多字符的),字段定义分别是发布日期^_^文章ID^_^文章标题^_^评论数^_^点击数^_^顶数
测试标题^_^、测试^_^标题、^_^测试标题
而没有考虑到这样的情况:测试标题^_
也就是说标题中的末尾带有一半分隔符,这样从逻辑上和真正分隔符的前一半正好能拼成一个合理的分隔符,如:2013-09-29^_^21635265^_^测试标题^_^_^10^_^20^_^15
所以在拆分字段的时候评论数字段就被拆成了“_^10”,这种情况下是没有办法将其转换为Integer类型的,故而报错。说起来在这个项目中采用什么样的分隔符还是很早前其他同事定的,直到发生这个问题才觉得有必要改成单个字符,这样就不会产生歧义了。
后来在我用Excel导入其它数据进行分析的时候发现它早就注意到这个问题了,在指定自定义分隔符的时候只允许采用单字符:

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Wordpress site file access is restricted: troubleshooting the reason why .txt file cannot be accessed recently. Some users encountered a problem when configuring the mini program business domain name: �...

Causes and solutions for errors when using PECL to install extensions in Docker environment When using Docker environment, we often encounter some headaches...

Want to keep your Binance account safe? This article details how to use Binance security authenticator (such as Google Authenticator), including downloading and installing, enabling settings, backup keys, and daily usage tips to effectively prevent theft of the account.

Question about Django time range query: Why is the end date not included? When using Django for database queries, we often need to use the time...

Typecho routing matching rules analysis and problem investigation This article will analyze and answer questions about the inconsistent results of the Typecho plug-in routing registration and actual matching results...

In Foreach...

Difficulty in obtaining Taobao order data: Bypassing the challenge of login page Many friends will encounter a difficult problem when trying to obtain information about purchased products on Taobao:...

The problem of boxing when adding Chinese characters to images is found when PHP adds Chinese characters to watermarks. When adding Chinese characters to images, many developers will encounter a strange problem: Chinese characters become...
