objective-c - How to speed up reading 50 million data from a file and storing it in a vector using C++?

Question

I need to read 50 million double data from a txt file and store it in a vector. I initially thought that the file io might be too slow, so I used file memory mapping to read all the file contents into memory as blocks, and then Then push_back into the vector one by one, but directly from the file...

漂亮男人 · Answer

It makes no sense to run in debug mode. When I use your code to run in release mode, it only takes about 14 seconds.

To solve a problem, find the problem first. I modified the code like this and first find out where the time is spent

std::cout << "Start" << std::endl;
    auto n1 = ::GetTickCount();
    auto n2 = 0;
    auto n3 = 0;
    auto n4 = 0;

    while (ss_sim.good())
    {
        auto n = ::GetTickCount();
        ss_sim >> fVecSim;
        n2 += (::GetTickCount() - n);

        n = ::GetTickCount();
        auto v = atof(fVecSim.c_str());
        n3 += (::GetTickCount() - n);

        n = ::GetTickCount();
        vec_similarity.push_back(v);
        n4 += (::GetTickCount() - n);
    }
    n1 = ::GetTickCount() - n1;

    std::cout << "ss_sim >> fVecSim：" << n2 << "ms" << std::endl;
    std::cout << "atof：" << n3 << "ms" << std::endl;
    std::cout << "push_back：" << n4 << "ms" << std::endl;
    std::cout << "Total：" << n1 << "ms" << std::endl;

So the bottleneck lies in the sentence "ss_sim >> fVecSim". atof is fast enough.

So my conclusion is: the ultimate optimization solution is to start with the storage format and store your data as binary instead of string. This avoids the overhead of string IO and conversion functions and truly achieves fetching data in seconds.

phpcn_u1582 · Answer

The most efficient way at present is to use streams, and it can be seen from your code implementation: you read all the file contents into the buffer at once, which is not the best way. It is recommended to read buffer[1024] on average each time, which is 1K, or other values. After reading, the pointer moves to the next line and continues reading until the end of the EOF position

天蓬老师 · Answer

1. If there is no dependency between data, you can try multi-threaded reading in blocks;
2. In addition, the memory of vector is continuous. If the subsequent traversal is not random access, using list will be more efficient. Quite a few.

天蓬老师 · Answer

You can switch to C-style scanfTry it

Wow, why are you treating my answer like this? The netizen who reported me would like to ask, why is there something wrong with this answer?