I have just modified a script I use for finding duplicates. It is a Powershell script, so you will need to install it if you aren't on Windows 7. You can grab it from Microsoft
here.
Once installed or if you have Windows 7, launch a Powershell console as an Administrator [
Start > Programs > Accessories > Windows Powershell > Windows Powershell]
Type in the following then press enter:
Code:
Set-ExecutionPolicy RemoteSigned
Type
Y and press enter at the prompt.
The above tells PowerShell to only allow scripts written locally to run. You need only do it once.
Ok, now you need to create a new file, I will assume the name
get-duplicates.ps1 and paste the code at the bottom of this post into it. Save the file somewhere sensible.
Back in a PowerShell console type the following to run the script:
Code:
C:\path\to\script\get-duplicates.ps1 C:\whatever
You will need to enter the full path to the get-duplicates.ps1 file [or you can drag and drop the script file into an open PowerShell window] and change
C:\whatever to the directory you wish to find duplicates.
The script will only look for music files. Edit line 3 of the script to add more extensions as necessary. The script determines a duplicate by first of all collecting all files with exactly the same byte count - a good indication two files are the same, but not foolproof. It then opens these files and MD5 hashes the contents. Any MD5s that match are definitely duplicates.
Duplicates are written to the console and also to a file called dupes.txt which will be created in the same folder as the script is located. Be warned, if you intend to use this on your entire HDD, it can take a long time.
Have fun!
Code:
param ([string] $Path = (Get-Location))
$file_types = @("*.mp3", "*.ogg", "*.wav", "*.flac", "*.mp4", "*.wma")
function Get-MD5([System.IO.FileInfo] $file = $(throw ‘Usage: Get-MD5 [System.IO.FileInfo]‘))
{
# This Get-MD5 function sourced from:
# http://blogs.msdn.com/powershell/archive/2006/04/25/583225.aspx
$stream = $null;
$cryptoServiceProvider = [System.Security.Cryptography.MD5CryptoServiceProvider];
$hashAlgorithm = new-object $cryptoServiceProvider
$stream = $file.OpenRead();
$hashByteArray = $hashAlgorithm.ComputeHash($stream);
$stream.Close();
## We have to be sure that we close the file stream if any exceptions are thrown.
trap
{
if ($stream -ne $null) { $stream.Close(); }
break;
}
return [string]$hashByteArray;
}
function Get-Duplicates([string]$Path)
{
$fileGroups = Get-ChildItem $Path -Recurse -Include $file_types `
| Where-Object { $_.Length -gt 0 } `
| Group-Object Length `
| Where-Object { $_.Count -gt 1 };
foreach ($fileGroup in $fileGroups)
{
foreach ($file in $fileGroup.Group)
{
Add-Member NoteProperty ContentHash (Get-MD5 $file) -InputObject $file;
}
$fileGroup.Group `
| Group-Object ContentHash `
| Where-Object { $_.Count -gt 1 };
}
}
$dupes = Get-Duplicates $Path
$outfile = (Split-Path -Parent $MyInvocation.MyCommand.Definition) + "\dupes.txt"
Set-Content $outfile (Get-Date -f "dd-MM-yyyy HH:mm")
foreach($dupe in $dupes)
{
foreach($file in $dupe.Group)
{
Write-Host $file.FullName
Add-Content .\dupes.txt $file.FullName
}
Write-Host
Add-Content $outfile "`n"
}