Saturday, June 11, 2016

Bash Foo: Finding Unique (and Duplicate) Files

Where is that darned file I changed?

I sometimes have the problem that I changed a version of some file somewhere in some directory but I don't know which version I changed. As a programmer, I often copy or embed versions of the same or very similar utility or file in a different projects.

The problem is when later on I forget which project I know I made a patch or fix on that file. I imagine this thing happens to those of us that program for a living a bit more often than the average computer user. This exact thing happened to me this morning (again) concerning a fix in a rather difficult custom ThreadPool class I've developed that has made its way into at least 4 or 5 different projects (some of which have multiple branches). The issue was that I made that fix (an optimization really) months ago but couldn't bring it over to my other projects at the time. Today I couldn't even remember which project I made the fix for, much less branch of a project!

Bash to the Rescue: findUnique.sh

So, I put together a little Bash Foo this morning that searches out unique files and produces a little report. It takes a little while to run if searching a large directory tree (especially if you are still working on a harddisk instead of a solid-state drive), but it certainly is faster than trying to do it manually.

Running it is very simple. Here is an example:

Archimedes:~ $ findUnique.sh ThreadPool.hpp

The following files are equivalent:
./ProjectA/version1/include/util/concurrent/ThreadPool.hpp

./ProjectA/version2/include/util/concurrent/ThreadPool.hpp



./ProjectA/version3/include/util/concurrent/ThreadPool.hpp



./ProjectB/include/util/concurrent/ThreadPool.hpp



Totally Unique:


./ProjectC/include/util/concurrent/ThreadPool.hpp

Back to work

Now that I spent an hour building this awesome tool, quickly identifying the suspect file, I can go back to work merging that ThreadPool change everywhere. Maybe you too can use this tool, if so: Enjoy!

Here's the code:


#!/bin/bash
############################################################################
# findUnique.sh - Bash Foo to search out unique and duplicate files.
#
# Run with: findUnique.sh
#
# Author: Ryan Fogarty
# Last Edited: 2016.06.11
# Copyright: Ryan Fogarty (FogRising) 2016
############################################################################

# Thanks StackOverflow for this little convenient tidbit...
containsElement () {
  local e
  for e in "${@:2}"; do [[ "$e" == "$1" ]] && return 0; done
  return 1
}

if [ $# -ne 1 ] ; then
   echo "Run with: $0 "
   exit 1
fi

bold=$(tput bold)
normal=$(tput sgr0)

uniqueFile=$1

SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
uniqueFiles=($(find . -name "${uniqueFile}"))
IFS=$SAVEIFS

#echo "num uniqueFiles to test : ${#uniqueFiles[@]}"

declare -a reportedFiles

# Number of files - 1 : used to determine if a file is totally unique
numOthers=${#uniqueFiles[@]}
let numOthers-=1


for uf in "${uniqueFiles[@]}" ; do

   # If we've already reported as equivalent to
   # something else, skip this one
   if containsElement "$uf" "${reportedFiles[@]}" ; then
      continue
   fi

   uniqueFlag=0
   declare -a equivalentFiles
   equivalentFiles=()
   for ouf in "${uniqueFiles[@]}" ; do
      diff --brief "${uf}" "${ouf}" &> /dev/null
      retVal=$?
      if [ $retVal -gt 1 ] ; then
        retVal=1
      fi
      let uniqueFlag+=retVal
      if [ $retVal -eq 0 ] ; then
         equivalentFiles+=("${ouf}")
      fi
   done
   if [ $uniqueFlag -eq $numOthers ] ; then
      echo "${bold}Totally Unique:${normal}"
      echo "${uf}"
      reportedFiles+=("${uf}")
   elif [ $uniqueFlag -eq 0 ] ; then
      echo "${bold}All the following files are exactly alike:${normal}"
      for eqf in "${uniqueFiles[@]}" ; do
         echo "${eqf}"
      done
      exit 0
   else
      echo "${bold}The following files are equivalent:${normal}"
      for eqf in "${equivalentFiles[@]}" ; do
         echo "${eqf}"
         reportedFiles+=("${eqf}")
      done
   fi

done

No comments:

Post a Comment