Skipping Rows During CSV Import with Pandas
When importing CSV data using Pandas, it's often necessary to skip rows that you don't want to include in your analysis. However, the ambiguity surrounding the skiprows argument can be confusing.
The syntax for skiprows is as follows:
skiprows : list-like or integer Row numbers to skip (0-indexed) or number of rows to skip (int) at the start of the file.
The question arises: How does Pandas know whether to skip the first row or the row with index 1 when skiprows=1 is specified?
To unravel this, let's perform an experiment using a sample CSV file with three rows:
1, 2 3, 4 5, 6
Skipping the Row with Index 1
If you want to skip the row with index 1, pass skiprows as a list:
<code class="python">import pandas as pd from io import StringIO s = """1, 2 ... 3, 4 ... 5, 6""" df = pd.read_csv(StringIO(s), skiprows=[1], header=None) # Skip row with index 1 print(df)</code>
Output:
0 1 0 1 2 1 5 6
Skipping a Number of Rows
To skip a specific number of rows (in this case, 1), pass skiprows as an integer:
<code class="python">df = pd.read_csv(StringIO(s), skiprows=1, header=None) # Skip the first row print(df)</code>
Output:
0 1 0 3 4 1 5 6
Hence, it's clear that the skiprows argument behaves differently depending on whether you provide a list or an integer. If you want to skip a row by its index, use a list. Otherwise, use an integer to skip a specified number of rows from the beginning of the file.
The above is the detailed content of How does `skiprows` in Pandas know if you want to skip the first row or the row with index 1?. For more information, please follow other related articles on the PHP Chinese website!