Python - Append CSVs

Soldato
OP
Joined
7 Feb 2004
Posts
8,110
Location
North East
Hmmm. I couldn't see anything obvious, but as I'm not particularly proficient, thought there might be something hidden in there.

I've been using notepad to view, as I'm already aware excel does it's own thing and formats as it sees fit.

Could the below have anything to do with it?

Code:
import pandas as pd

data = pd.read_csv('SOURCE\Template4 Master.csv')

data_category_range = data['IDENTIFIER'].unique()
data_category_range = data_category_range.tolist()

for i,value in enumerate(data_category_range):
    data[data['IDENTIFIER'] == value].to_csv(r'SOURCE\Template4 '+str(value)+r'.csv',index = False, na_rep = 'N/A')
 
Man of Honour
Joined
19 Oct 2002
Posts
29,516
Location
Surrey
That snippet of code was not from post 12 so I'm a little confused at which stage the decimal places are behaving unexpectedly.

I'm not a pandas expert (never used them to be honest) so it's possible the issue could be there. I'll have a bit of a think when I have time.
 
Soldato
OP
Joined
7 Feb 2004
Posts
8,110
Location
North East
Sorry. I had another look and it was earlier in the process that the issue occurs, with the pandas script I nabbed from elsewhere.

This script takes a couple of CSVs and splits them out by the identifier reference.

I then rejoin these using the identifier field, so I have a combined file per reference.

I already had this part working before I started the OP, but hadn't realised the decimal issue.

All your scripts worked fine! This issue seems to be all my doing :(
 
Man of Honour
Joined
19 Oct 2002
Posts
29,516
Location
Surrey
I've copied your pandas script and replicated the same issue myself. The reason it is doing this is because pandas is treating the data as numeric and then adjusting the decimal precision. The solution is to get it to treat all of the columns as strings instead. Then it shouldn't manipulate the data. After a bit of searching it seems you can specify the datatype of one or more columns when you read the master csv (using ',dtype=str'). I've tried this myself and it seems to fix it for me.

New script below. Does this also fix it for you?

Code:
import pandas

import pandas as pd

data = pd.read_csv('SOURCE\Template4 Master.csv',dtype=str)

data_category_range = data['IDENTIFIER'].unique()
data_category_range = data_category_range.tolist()

for i,value in enumerate(data_category_range):
    data[data['IDENTIFIER'] == value].to_csv(r'SOURCE\Template4 '+str(value)+r'.csv',index = False, na_rep = 'N/A')
 
Soldato
OP
Joined
7 Feb 2004
Posts
8,110
Location
North East
Back again! Thought i might as well post here, but I can move to Windows Software if needs be. I'm hoping some peeps here are as well versed with VBA as Python!

I have pulled together a macro which does everything I want, except it seems to somehow mess dates up of it's own accord. I've stripped back to the bare bones of a macro below, which doesn't seem to have anything in re values/dates etc, and can't understand what is going wrong. Google doesn't seem to help.

Based on code found here - https://stackoverflow.com/a/52619462

I basically want it to open the CSV, format the file (stripped those lines out and still have the same issue, so it's this part that's the concern) and save as XLSX.

Code:
Sub CSVtoXLSB2()
Dim wb As Workbook
Dim CSVPath As String
Dim XLSXPath As String
Dim sProcessFile As String

CSVPath = "C:\CSV\"
XLSXPath = "C:\XLSX\"
sProcessFile = Dir(CSVPath & "*.csv")
Do Until sProcessFile = ""   ' Loop until no file found.
    Set wb = Application.Workbooks.Open(CSVPath & sProcessFile)
    wb.SaveAs XLSXPath & Split(wb.Name, ".")(0) & ".xlsx", FileFormat _
        :=51
    wb.Close
    sProcessFile = Dir()   ' Get next entry.
Loop
Set wb = Nothing
End Sub


Opening the CSV in excel manually gives

Code:
      A                                                  B
1    Monthly Transactions
2
3    Period From                          01/04/2020
4    Period To                              29/04/2020

But for whatever reason, the the above macro gives me a file showing

Code:
      A                                                  B
1    Monthly Transactions
2
3    Period From                          04/01/2020
4    Period To                              29/04/2020

:confused:

It appears the date values have changed, as formatting etc has no bearing.

The same occurs further down in the data, but then fixes itself after a few instances sometimes. eg (different examples).

If it means anything, those which are messed up are displayed as ######## until the column width is adjusted, but the dates below display correctly.

Code:
Transaction Date
04/06/2020
04/06/2020
04/06/2020
04/06/2020
04/07/2020
04/07/2020
04/08/2020
04/10/2020
04/10/2020
04/12/2020
13/04/2020
13/04/2020
13/04/2020
15/04/2020
15/04/2020
15/04/2020
17/04/2020
17/04/2020
20/04/2020
21/04/2020
22/04/2020
22/04/2020
23/04/2020
23/04/2020
26/04/2020
28/04/2020
28/04/2020
29/04/2020
29/04/2020

Code:
Transaction Date
04/03/2020
04/06/2020
04/08/2020
04/08/2020

Anything obvious at play here?

EDIT - Just noticed, are the dates that are being changed all those 12 or fewer days - ie the number of months. I presume these are being read as american dates, but the other left as text, as it's not compliant with the american date format.

Why though? This doesn't happen opening manually! My windows region is set to UK.

No further forward on the cause of this, but the "american dates" are being read as number values and the "UK dates" as text values.

EDIT 2 - Changed the export from the source system to dd-mmm-yyyy, then added a cell format change to the excel macro which has resolved the issue, but still keen to understand why this occurs if anyone knows :)
 
Last edited:
Man of Honour
Joined
19 Oct 2002
Posts
29,516
Location
Surrey
I'm afraid I don't have excel at home. I do have it at work but I won't have time there to look into it and learn VBA. It is very likely the cause of the issue is the way that VBA or excel treats dates and defaults to a specific format. I would suggest looking into how excel and VBA handles date processing and whether you can add some code to specify a format instead of using the default.

But you now have several steps all processing data in different ways and probably using several different languages. If this is one end to end workflow it feels quite disjointed and prone to unexpected behaviour in one of the steps. It might be worth taking a step back and looking at whether there is a better overall approach to achieving what you are trying to do?
 
Last edited:
Soldato
OP
Joined
7 Feb 2004
Posts
8,110
Location
North East
Thank all for the help and feedback. I'm sure there is a better way of doing things and I've made this more drawn out than it needs to be.

I'm not a programmer.... But the problem is, no one else involved, including the IT bods, is a programmer and the actual programmers don't have the resource to work on this, so it's a case of try to sort it out, or it doesn't get done!

I'm at the point where it works and have resolved the issues in a fashion, so I think I'll leave it be. I've learned a few things along the way and it gets the end result I'm after, so it's not been a complete disaster!
 
Man of Honour
Joined
19 Oct 2002
Posts
29,516
Location
Surrey
Yes, well done for persevering. I personally think learning to program is valuable even for people who don'#t use it as their main job. It can make your job much easier in some situations.
 
Back
Top Bottom