Professional Code (not for Hacks): Bash Foo: Finding Unique (and Duplicate) Files

Where is that darned file I changed?

I sometimes have the problem that I changed a version of some file somewhere in some directory but I don't know which version I changed. As a programmer, I often copy or embed versions of the same or very similar utility or file in a different projects.

The problem is when later on I forget which project I know I made a patch or fix on that file. I imagine this thing happens to those of us that program for a living a bit more often than the average computer user. This exact thing happened to me this morning (again) concerning a fix in a rather difficult custom ThreadPool class I've developed that has made its way into at least 4 or 5 different projects (some of which have multiple branches). The issue was that I made that fix (an optimization really) months ago but couldn't bring it over to my other projects at the time. Today I couldn't even remember which project I made the fix for, much less branch of a project!

Bash to the Rescue: findUnique.sh

So, I put together a little Bash Foo this morning that searches out unique files and produces a little report. It takes a little while to run if searching a large directory tree (especially if you are still working on a harddisk instead of a solid-state drive), but it certainly is faster than trying to do it manually.

Running it is very simple. Here is an example:

Archimedes:~ $ findUnique.sh ThreadPool.hpp

The following files are equivalent:

./ProjectA/version1/include/util/concurrent/ThreadPool.hpp

./ProjectA/version2/include/util/concurrent/ThreadPool.hpp

./ProjectA/version3/include/util/concurrent/ThreadPool.hpp

./ProjectB/include/util/concurrent/ThreadPool.hpp

Totally Unique:

./ProjectC/include/util/concurrent/ThreadPool.hpp

Back to work

Now that I spent an hour building this awesome tool, quickly identifying the suspect file, I can go back to work merging that ThreadPool change everywhere. Maybe you too can use this tool, if so: Enjoy!

Here's the code:

#!/bin/bash
############################################################################
# findUnique.sh - Bash Foo to search out unique and duplicate files.
#
# Run with: findUnique.sh
#
# Author: Ryan Fogarty
# Last Edited: 2016.06.11
# Copyright: Ryan Fogarty (FogRising) 2016
############################################################################

# Thanks StackOverflow for this little convenient tidbit...
containsElement () {
local e
for e in "${@:2}"; do [[ "$e" == "$1" ]] && return 0; done
return 1
}

if [ $# -ne 1 ] ; then
echo "Run with: $0 "
exit 1
fi

bold=$(tput bold)
normal=$(tput sgr0)

uniqueFile=$1

SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
uniqueFiles=($(find . -name "${uniqueFile}"))
IFS=$SAVEIFS

#echo "num uniqueFiles to test : ${#uniqueFiles[@]}"

declare -a reportedFiles

# Number of files - 1 : used to determine if a file is totally unique
numOthers=${#uniqueFiles[@]}
let numOthers-=1

for uf in "${uniqueFiles[@]}" ; do

# If we've already reported as equivalent to
# something else, skip this one
if containsElement "$uf" "${reportedFiles[@]}" ; then
continue
fi

uniqueFlag=0
declare -a equivalentFiles
equivalentFiles=()
for ouf in "${uniqueFiles[@]}" ; do
diff --brief "${uf}" "${ouf}" &> /dev/null
retVal=$?
if [ $retVal -gt 1 ] ; then
retVal=1
fi
let uniqueFlag+=retVal
if [ $retVal -eq 0 ] ; then
equivalentFiles+=("${ouf}")
fi
done
if [ $uniqueFlag -eq $numOthers ] ; then
echo "${bold}Totally Unique:${normal}"
echo "${uf}"
reportedFiles+=("${uf}")
elif [ $uniqueFlag -eq 0 ] ; then
echo "${bold}All the following files are exactly alike:${normal}"
for eqf in "${uniqueFiles[@]}" ; do
echo "${eqf}"
done
exit 0
else
echo "${bold}The following files are equivalent:${normal}"
for eqf in "${equivalentFiles[@]}" ; do
echo "${eqf}"
reportedFiles+=("${eqf}")
done
fi

done

Professional Code (not for Hacks)

Saturday, June 11, 2016

Bash Foo: Finding Unique (and Duplicate) Files

Where is that darned file I changed?

Bash to the Rescue: findUnique.sh

Running it is very simple. Here is an example:

Back to work

Here's the code:

No comments:

Post a Comment

About Me