1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Python - Append CSVs

Discussion in 'HTML, Graphics & Programming' started by jellybeard999, Apr 29, 2020.

  1. jellybeard999

    Sgarrista

    Joined: Feb 7, 2004

    Posts: 7,681

    Location: NE

    Hmmm. I couldn't see anything obvious, but as I'm not particularly proficient, thought there might be something hidden in there.

    I've been using notepad to view, as I'm already aware excel does it's own thing and formats as it sees fit.

    Could the below have anything to do with it?

    Code:
    import pandas as pd
    
    data = pd.read_csv('SOURCE\Template4 Master.csv')
    
    data_category_range = data['IDENTIFIER'].unique()
    data_category_range = data_category_range.tolist()
    
    for i,value in enumerate(data_category_range):
        data[data['IDENTIFIER'] == value].to_csv(r'SOURCE\Template4 '+str(value)+r'.csv',index = False, na_rep = 'N/A')
     
  2. Hades

    Capodecina

    Joined: Oct 19, 2002

    Posts: 23,925

    Location: Surrey and London

    That snippet of code was not from post 12 so I'm a little confused at which stage the decimal places are behaving unexpectedly.

    I'm not a pandas expert (never used them to be honest) so it's possible the issue could be there. I'll have a bit of a think when I have time.
     
  3. jellybeard999

    Sgarrista

    Joined: Feb 7, 2004

    Posts: 7,681

    Location: NE

    Sorry. I had another look and it was earlier in the process that the issue occurs, with the pandas script I nabbed from elsewhere.

    This script takes a couple of CSVs and splits them out by the identifier reference.

    I then rejoin these using the identifier field, so I have a combined file per reference.

    I already had this part working before I started the OP, but hadn't realised the decimal issue.

    All your scripts worked fine! This issue seems to be all my doing :(
     
  4. Hades

    Capodecina

    Joined: Oct 19, 2002

    Posts: 23,925

    Location: Surrey and London

    I've copied your pandas script and replicated the same issue myself. The reason it is doing this is because pandas is treating the data as numeric and then adjusting the decimal precision. The solution is to get it to treat all of the columns as strings instead. Then it shouldn't manipulate the data. After a bit of searching it seems you can specify the datatype of one or more columns when you read the master csv (using ',dtype=str'). I've tried this myself and it seems to fix it for me.

    New script below. Does this also fix it for you?

    Code:
    import pandas
    
    import pandas as pd
    
    data = pd.read_csv('SOURCE\Template4 Master.csv',dtype=str)
    
    data_category_range = data['IDENTIFIER'].unique()
    data_category_range = data_category_range.tolist()
    
    for i,value in enumerate(data_category_range):
        data[data['IDENTIFIER'] == value].to_csv(r'SOURCE\Template4 '+str(value)+r'.csv',index = False, na_rep = 'N/A')
    
     
  5. ingrowzelqui

    Perma Banned

    Joined: May 11, 2020

    Posts: 1

    Location: New South Wales Hornsby

    Thanks for the code, guys. I will try it on my own.