Bash to search a csv for numbers and copy to a new file?

Commissario
Joined
16 Oct 2002
Posts
342,633
Location
In the radio shack
Hi folks,

After a couple of hours of messing around with this, I'm struggling.

I have two files, one is a text file consisting of a a bunch of numbers in this format

123556
213452
529521
134525

The second is a csv, the rows are quite long but it's a standard csv, nothing special. That csv starts with a six digit number.

I need to search the csv for any rows that start with a number from the first file and then move that entire row out to a new file. Moving it might be tricky, it may be easier just to copy the row into a new file.

The column of numbers is about 1500 numbers long and the csv is about 45,000 so it's quite a big job.

With how well *nix works with text files, I thought this would be quite straightforward but I'm struggling.

Can anyone help please?

Thanks.
 
Hi folks,

After a couple of hours of messing around with this, I'm struggling.

I have two files, one is a text file consisting of a a bunch of numbers in this format

123556
213452
529521
134525

The second is a csv, the rows are quite long but it's a standard csv, nothing special. That csv starts with a six digit number.

I need to search the csv for any rows that start with a number from the first file and then move that entire row out to a new file. Moving it might be tricky, it may be easier just to copy the row into a new file.

The column of numbers is about 1500 numbers long and the csv is about 45,000 so it's quite a big job.

With how well *nix works with text files, I thought this would be quite straightforward but I'm struggling.

Can anyone help please?

Thanks.
Honestly for stuff like this chatgpt is great if you struggle,
Code:
#!/bin/bash

# Read the numbers from the text file
numbers=($(<numbers.txt))

# Create a new file for the matching rows
output_file="output.csv"
touch "$output_file"

# Iterate over each number
for number in "${numbers[@]}"
do
  # Search for rows that start with the number in the CSV file and append them to the output file
  grep "^$number" input.csv >> "$output_file"
done

Thats the output it gave me when i pasted your content in with the explaination as below
Make sure to replace numbers.txt with the path to your text file containing the numbers and input.csv with the path to your CSV file. The script will create a new file called output.csv that contains the matching rows from the CSV file.

To run the script, save it to a file (e.g., script.sh), make it executable using chmod +x script.sh, and then execute it with ./script.sh in the terminal.



Then if it doesnt work just feedback in the issues etc
Hope this helps.
 
Part of the couple of hours I spent on this was trying to get chatgpt to produce working code! It either gave me zero output to the file whatsoever or it sat there for twenty minutes chuntering away and then gave me zero output.
 
To illustrate what I was getting at:

File ids.txt:
02 04 06 08 11

File data.txt:
01,one 02,two 03,three 04,four 05,five 06,six 07,seven 08,eight 09,nine 10,ten 11,eleven 12,twelve

command to show the ones that match:
[peter@centosmaster feek]$ join ids.txt data.txt -t "," 02,two 04,four 06,six 08,eight 11,eleven

command to show the ones that don't match:
[peter@centosmaster feek]$ join ids.txt data.txt -t "," -v 2 01,one 03,three 05,five 07,seven 09,nine 10,ten 12,twelve

This relies on two things:
* you must have the contents sorted
* The id's need to be the same length i.e. the 6 digits you said. If they're different lengths you'll need to zero pad them as I did in the above.
 
Last edited:
I was bored so thought to see what chatgpt came up with seemed to work so.
just change file names and the csv field name in the Powershell script.


Code:
1.txt - has list of 6 digit numbers.
123556
213452
529521

2.csv - first field named number column
numbercolumn,field1
829521,greg
934525,rgreg
123556,regre
913452,thtrh
829521,htr
374525,yefef

Powershell script named ps.ps1

Code:
# Path to file1.txt and file2.csv
$file1Path = "C:\Users\test\Desktop\1.txt"
$file2Path = "C:\Users\test\Desktop\2.csv"

# Path to the output file
$outputPath = "C:\Users\test\Desktop\output.txt"

# Read the contents of file1.txt
$file1Content = Get-Content $file1Path

# Read the contents of file2.csv
$file2Content = Import-Csv $file2Path | Select-Object -ExpandProperty 'numbercolumn'

# Find matching numbers
$matches = Compare-Object $file1Content $file2Content -IncludeEqual -ExcludeDifferent |
           Where-Object { $_.SideIndicator -eq '==' } |
           Select-Object -ExpandProperty 'InputObject'

# Write matching numbers to the output file
$matches | Out-File -FilePath $outputPath

# Display a message indicating the completion
Write-Host "Matching numbers have been saved to $outputPath."
 
Last edited:
sqlite3 script that'll work on any OS:
Code:
.import 2.csv t2 --csv
create table t1(id text);
.separator ~
.mode line
.import 1.txt t1
.mode csv
.output output.csv
select t2.* from t2 inner join t1 on t1.id = t2.numbercolumn;
 
I assume you've long since solved this, but I've never done any bash before and I thought this would be a fun little problem, so here you go:

Bash:
#!/bin/bash

# Read numbers into an array (change numbers.txt to the real filename, obvs.)
mapfile -t matchArray < numbers.txt

# Convert the array into a dictionary so we can do fast lookup
declare -A matchNumbers
for n in "${matchArray[@]}"; do
    matchNumbers[$n]="$n";
done

> output.csv

# Go line by line through the csv file (change 'input.csv' to the real filename)
while read -r line; do
    # Get the first comma separated value, assuming it's not quoted
    number=${line%%,*}
    if [ "${matchNumbers[$number]}" ]; then
        echo $line >> output.csv
    fi
done < input.csv

I didn't know whether your csv file was quoted or not, so I just assumed it wasn't, you'll need to clean it up a little to remove any quotes if it is.

After all I've heard about how great ChatGPT is, I'm reassured at how useless the code posted above is.
 
? Maybe not the best way but it did the job on my test from how I understood the problem.

It has two bugs in it, one serious, one not: the serious one is if you have a number in the numbers.txt file, it will include any line which starts with that number. So if you put '34' in the numbers.txt, it will include lines which start with "341" for example. To be fair, this might work if all the numbers are six digit as per @Feek 's example. Less seriously, it appends its output to the output file if it already exists rather than creating a new file.

It also two other problems: (1) it's massively slow - quite possibly too slow for the 45k csv and 1.5k numbers file that Feek is talking about, and (2) it re-orders the output so that every line in the output file is ordered according to the numbers in the numbers file. This could be okay, depending on use case.

edit: I did a little timing exercise, with just 15 numbers and 100 entries in the .csv file, the ChatGPT version takes 15s to run, my version returns instantly with that and can run the full 1500/45k task in under 5s. I wasn't waiting around for the full version with GPT.
 
Last edited:
Back
Top Bottom