今天下午和同事发生因为判断数据是否在数据库中已存在产生分歧.简易代码如下:
//观点一,先取出数据库中已存在ID存入内存,再进行判断
//数据太多情况,内存爆炸
DateTable dt = ExcuteSQL('SELECT ID FROM TABLE');
//List为数据集,已经去重
foreach(var item in List){
//观点二,在for循环中根据ID查询是否已存在
//for循环执行数据库?没这么干过
DateTable dt = ExcuteSQL(string.format('SELECT ID FROM TABLE WHERE ID = {0}',item.ID));
//判断是否在dt中存在
...
bool ISContain
...
//不存在
if(!ISContain){
//加入事物SQL列表
SQLList.Add('INSERT INTO ........');
}
}
//执行事物
....
In fact, the first thing you need to estimate is based on the actual data volume. If the ID is of Long type and the data volume is 100W, it will only be 8byte * 100W = 7.629 megabytes.
The second one is absolutely necessary, query each ID in a loop.
Recommended method 2, but this query will be faster:
Note that different databases have restrictions on parameters,
?
It is best not to exceed 1,000. If you need to determine 10,000 IDs, use 10 such queries.It should be better to use the method mentioned by @auntyellow and query in batches. When there are many IDs, consider whether you can check them in other batch methods? For example, take all
id
for a period of time at one time, and slightly improve method 1 to achieve this.It feels like both Method 1 and Method 2 are extreme, so a good solution should be a combination of both.
https://www.zhihu.com/questio...
It is recommended to use the method of first taking the id out of the database and then comparing it in memory, which can reduce database access.
Can it be segmented into memory? This can reduce memory usage