Home > Backend Development > Python Tutorial > Python BOT extract long column from Excel sheet and create a dataframe to catalog some numbers from another file

Python BOT extract long column from Excel sheet and create a dataframe to catalog some numbers from another file

WBOY
Release: 2024-02-10 18:00:06
forward
794 people have browsed it

Python BOT 从 Excel 工作表中提取长列并创建一个数据框来对另一个文件中的一些数字进行编目

Question content

I need to create a Python robot to extract column C from Excel file 1, sheet 1, and catalog it in file 2 , and calculates the sum of numbers from 0.00 to 0.99, from 1.00 to 1.99, etc. 12. All numbers above 12 are coded into the last row. Then I need to calculate the sum of all numbers.

I tried writing some code but it didn't write anything on the Excel file.


Correct answer


You can try the following methods;

  1. Read the excel data file (excel file 1) and select only the required columns ("column c").
  2. Create an array of values ​​0.00 - 0.99, 1.00 - 1.99, 2.00 - 2.99, 3.00 - 3.99 (up to 12) and use it to create a new data frame (df_write), grouping the values ​​in the data frame into array ranges Inside. Get the count for each range.
  3. Count values ​​greater than 12 and add them to df_write as new rows.
  4. Sum all values ​​in the data frame and add them to df_write as new rows.
  5. Write data frame to excel. In the example, xlsxwriter is used as the engine, which means that the workbook (catalog file) is created/overwritten every time the code is run.
  6. Additional data/formats can be included in the table. For example, change the text in the cell and add a formula to calculate the total number of all grouped range values, which should be equal to the total number of rows read from the excel data file (datafile).
import pandas as pd

datafile = "Excel File 1.xlsx"
catalogfile = 'Excel File 2.xlsx'
column = 'column C'

### Read specific column (column) from Excel Sheet
df_read = pd.read_excel(datafile, index_col=None, na_values=['NA'], usecols=[column])
# print(df_read)

### Create the dataframe of values within specified ranges to write to Excel
### Group ranges 0.00 - 0.99 in increments of 1 and make a count of each up to a max (12)
df_write = df_read.groupby(pd.cut(df_read[column], [float(i) - 0.01 for i in range(0, 13)])).count()

### Count values greater than 12 and add as row to the dataframe
df_write.loc['12+'] = df_read[df_read > 12].count()

### Sum all values in the column and add as row to the dataframe
df_write.loc[len(df_write.index) + 1] = df_read.sum()

### Rename Index Header
df_write.index.name = 'Range Totals'
### Rename Column Header
df_write.columns = ['Values Count']

### Write dataframe to Excel
### Using default engine Xlsxwriter so new workbook is created (any existing workbook is overwritten)
with pd.ExcelWriter(catalogfile) as writer:
    df_write.to_excel(writer, sheet_name='Sheet1', index=True)

    ### Xlsxwriter formatting
    workbook = writer.book
    cell_format = workbook.add_format()
    cell_format.set_bold(True)

    ws = writer.sheets['Sheet1']
    ### Rename Row Header and add formula to count the totals for each range 
    ### (should equal the total number of data rows read from Excel)
    ws.write(df_write.size, 0, 'Column Total', cell_format)
    ws.write_row(df_write.size + 1, 0, ['Total Rows', '=SUM(B2:B14)'], cell_format)

    ws.autofit()
Copy after login

Example of what an excel worksheet looks like for a column containing 100 rows of data (i.e. excluding hader) read from a data file.
The Range Total column is an index column in the data frame. The range text is determined by the data frame, but actually covers the ranges 0.00 - 0.99, 1.00 - 1.99, 2.00 - 2.99, 3.00 - 3.99, etc.
If needed you can remove the index column from the dataframe when writing to excel and use xlsxwriter to write custom text to the column or use a template with existing headers (in this case excelwriter requires additional schema and openpyxl as engine writes to an existing workbook).

The above is the detailed content of Python BOT extract long column from Excel sheet and create a dataframe to catalog some numbers from another file. For more information, please follow other related articles on the PHP Chinese website!

source:stackoverflow.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template