Appending XLSX Files In Folder With Python

Soldato
Joined
7 Feb 2004
Posts
8,176
Location
North East
Hi all.

I can find numerous examples scripts online, however, I can't get anything to work :(

I have a path with lots of small xlsx files (say \\server\folder\ )

I am hoping to combine these into one xlsx worksheet, so that if the final row of file a is row 20, then row 21 will be the first row of file b.

I don't care about formatting header rows or anything like that.

I don't seem to be able to use an xlrd library, but seem to able to run openpyxl

Code:
import numpy as np
import pandas as pd
import openpyxl
import os
import glob

all_names = []

for x in os.listdir(r'\\server\inputfolder'):
    if x.endswith(".xlsx"):
        all_names.append(x[:-5])
        print(all_names)

all_data = pd.DataFrame()
for f in glob.glob("*.xlsx"):
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)

appended_df = pd.concat((all_data), keys=(all_names))
appended_df.to_excel(r"\\server\outputfolder")


I found the above somewhere online and it seems to run, but it gives the error
first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"

Can anyone assist or point me in the right direction? My python experience solely consists of finding and adapting other people's scripts :p

Thanks
 
not sure how your script was meant to work. I can't make pandas cooperate with openpyxl
here is my version, basic iteration through files, rows and cells (assuming every file has one sheet)
Code:
from openpyxl import Workbook, load_workbook
from pathlib import Path

input_dir = Path('\\server\inputfolder')
output_file = Path('\\server\outputfolder\outputfile.xlsx')

out_workbook = Workbook()
ws = out_workbook.active
row_idx = 0

for f in input_dir.glob("*.xlsx"):
    wb = load_workbook(filename=str(f))
    sh = wb.active
    for row in sh.iter_rows():
        row_idx+=1
        for cell in row:
            new_cell = ws.cell(row=row_idx, column=cell.col_idx, value=cell.value)

            # optional to grab other formatting information
            if cell.has_style:
                new_cell.font = copy(cell.font)
                new_cell.border = copy(cell.border)
                new_cell.fill = copy(cell.fill)
                new_cell.number_format = copy(cell.number_format)
                new_cell.protection = copy(cell.protection)
                new_cell.alignment = copy(cell.alignment)

out_workbook.save(filename=output_file)
 
Last edited:
Thanks @alec

I tried the script you provided and got an error

WindowsPath' object has no attribute 'write'

https://stackoverflow.com/questions...rror-windowspath-object-has-no-attribute-seek

It was suggested to change the "\" for "/" in the file paths, which I did. This then gave the error

python name 'copy' is not defined

So I added "import copy" as line 3.

Now I'm stuck on the error

'module' object is not callable

I'm not sure if the above "fixes" are correct. Any further assistance or advice would be appreciated :)

As it stands;

Code:
from openpyxl import Workbook, load_workbook
from pathlib import Path
import copy

input_dir = Path('//server/folder')
output_file = Path('//server/outputfolder/combined.xlsx')

out_workbook = Workbook()
ws = out_workbook.active
row_idx = 0

for f in input_dir.glob("*.xlsx"):
    wb = load_workbook(filename=str(f))
    sh = wb.active
    for row in sh.iter_rows():
        row_idx+=1
        for cell in row:
            new_cell = ws.cell(row=row_idx, column=cell.col_idx, value=cell.value)

            # optional to grab other formatting information
            if cell.has_style:
                new_cell.font = copy(cell.font)
                new_cell.border = copy(cell.border)
                new_cell.fill = copy(cell.fill)
                new_cell.number_format = copy(cell.number_format)
                new_cell.protection = copy(cell.protection)
                new_cell.alignment = copy(cell.alignment)

out_workbook.save(filename=output_file)
 
Back
Top Bottom