Fork me on GitHub Back to top

Fim - File Integrity Manager

Version 1.2.3


Fim

The purpose of Fim is to manage a set of files and be able to see quickly the files you are working on and be sure that files are not tampered.
When you do a lot of stuff at the same time, things take time and Fim is here to help you to figure out what you were doing before.

Fim allow you to quickly get the status of all kind of files. It can be photos, videos or a complete disk tree. You can manage them with a spirit that looks like Git.

On a daily workflow, you can know the files that have been added, modified, removed, moved, renamed and duplicated. When you are working with files and you are also very busy, it’s difficult to remind what are the files you were working on previously. A VCS can help you for this. Unfortunately when you are working with photos or videos, or other kind of big binary files you cannot manage them with a tool like Git.
Fim is meant to solve that.

Each time you have a stable set of files you can commit them to store their properties into a State and be able to compare it with your set of files later.

The big difference with Git is that Fim does not keep track of the different contents of the files that are managed.
This is why we can call it an UnVersioned Control System (UVCS).

Fim is able to manage hundreds of thousands of files with a total size of several terabytes. This kind of file tree could not be managed by Git.

Using Fim you can also easily detect duplicates and remove them.

1. Why do you need Fim

You can check the integrity of you files using filesystems capabilities. For example btrfs comes with the scrub command that reads all data from the disk and verifies checksums.

Fim has a different use case. It allows you to see files you are working on that you have modified, moved or deleted.
With btrfs all those files would have appeared as OK.

Fim allows you also to check the integrity of files with filesystems that do not maintain file checksums.
For example you will be able to detect hardware corruption on a DVD. You simply have to burn the complete Fim repository on the DVD, all the files and the .fim content.
You can verify the integrity of this DVD by going on the top of the DVD and type fim st or fim dcor. More details in Hardware corruption detection.

The Fim States could also be corrupted. To do so the State content is hashed and each time Fim loads a State, it checks his integrity by recalculating the State hash. More details in State integrity.

2. How does it work

Fim creates an index of the files that are managed. It contains for each file:

  • File name & length

  • File attributes (dates, permissions)

  • Hash of 3 small blocks

  • Hash of 3 medium blocks

  • Hash of the full file content

The 3 blocks that are hashed are one at the beginning, one in the middle and one at the end.
The block that is hashed at the beginning is not the first one. Most of the files contain headers, so to produce accurate hash we skip the first block and hash the second one. When the file size is less than twice the block size we hash the first block.

The small block size is 4 KB and the medium block size is 1 MB.
The hash algorithm that is used is SHA-512.

More details regarding file permissions can be found in File permissions management.

The index is called a State and acts like the Central Directory does for a Zip file. All the data generated by Fim are stored into the .fim located into the root directory of the Fim repository.

Fim keeps each version of the different State that has been committed.
But you cannot use them to retrieve the content of one file that you may have lost.

Note Fim does not replace a backup software

3. Fim workflow

First you need to initialize the Fim repository using the init command. This will record the first State of your file tree.

$ fim init

After having done modifications into the file tree, you can compare get the status that will show the differences between the recorded State and the current file tree using the status command. You can do a full status check that will compare the hash of all the files. It can be slow as all the file contents will be read and hashed.

$ fim st

You can compare quickly using the fast mode. With this option you can miss some modified files.

$ fim st -f

You can compare quicker using the super-fast mode. With this option you can miss some modified files.

$ fim st -s

Otherwise, you can request to not hash file content using the -n option. It will compare only the file names, file length, modification dates and permissions. You will not be able to detect files that have been renamed or duplicated.

$ fim st -n

Each time you want to record the State of the current file tree you can use the commit command.
It’s a time consuming operation that will compute hash of every file contents.

$ fim ci -m "My commit comment"

In order to commit faster you can use the Super-fast commit. More details in Super-fast commit. (with this option you can miss some modified files)

$ fim ci -s -m "My commit comment"

You can display the history of the modifications using the log command.

$ fim log

4. Most Common use cases

Fim can be used for different kind of use cases.

4.1. Managing a workspace

  • Manage directories filled with binary. For example: pictures, music or movies

  • Know the status of a workspace in which we work episodically

  • Track changes over time

Personally I use Fim to manage my photos and videos. When I have new photos, I put them at the right place in my pictures folder and then I do fim ci from the sub-directory containing the new photos to record a new State, as I could do with Git. As I do this from a sub-directory the commit is fast because Fim hashes only the files that I added and creates a new State by merging those new hashes with the previous State.

More details on using Fim from a sub-directory can be found in Run Fim commands from a sub-directory.

The fim status command let me know when I want (even super quickly) if something changed in my pictures folder.

4.2. Duplicates detection and removal

Fim detects duplicate files and distinguishes two cases:

  • Duplicates inside a Fim repository:
    Fim can detect and remove them

  • Duplicates that are outside:
    Useful to cleanup desynchronized old backups

More details in Dealing with duplicates.

4.3. Backup integrity

For offline long term backups Fim can be used to perform and store a hash of all the files in order to insure the backup integrity.
More details in Why do you need Fim and Hardware corruption detection.

5. Fim usage

$ fim --help

usage: fim <command> [-c <arg>] [-d <arg>] [-e] [-f] [-h] [-i <arg>] [-l] [-M <arg>] [-m <arg>] [-n] [-o <arg>] [-p]
       [-q] [-s] [-t <arg>] [-v] [-y]

File Integrity Checker

Available commands:
     init                       Initialize a Fim repository and create the first State
     ci / commit                Commit the current directory State
     st / status                Display the difference between the current directory State with the previous one.
                                You can get a quick result by using the -f or -s or -n options
     diff                       Deprecated command that is an alias on the 'status' command for backward compatibility
     rfa / reset-file-attrs     Reset the files attributes like they were stored in the last committed State
     dcor / detect-corruption   Find changes most likely caused by a hardware corruption or a filesystem bug.
                                Change in content, but not in creation time and last modified time
     fdup / find-duplicates     Find local duplicate files in the Fim repository
     rdup / remove-duplicates   Remove duplicates found by the 'fdup' command.
                                If you specify the '-M' option it removes duplicates based on a master repository
     log                        Display the history of the States with the same output as the 'status' command
     dign / display-ignored     Display the files or directories that are ignored into the last State
     rbk / rollback             Rollback the last commit. It will remove the last State
     pst / purge-states         Purge previous States
     help                       Prints the Fim help
     version                    Prints the Fim version

Available options:
     -c,-- <arg>                        Deprecated option used to set the init or commit comment. Use '-m' instead
     -d,--directory <arg>               Run Fim into the specified directory
     -e,--errors                        Display execution error details
     -f,--fast-mode                     Use fast mode. Hash only 3 medium blocks.
                                        One at the beginning, one in the middle and one at the end
     -h,--help                          Prints the Fim help
     -i,--ignore <arg>                  Ignore some difference during State comparison. You can ignore:
                                        - attrs: File attributes
                                        - dates: Modification and creation dates
                                        - renamed: Renamed files
                                        - all: All of the above
                                        You can specify multiple kind of difference to ignore separated by a comma.
                                        For example: -i attrs,dates,renamed
     -l,--use-last-state                Use the last committed State.
                                        Both for the 'find-duplicates' and 'remove-duplicates' commands
     -M,--master-fim-repository <arg>   Fim repository directory that you want to use as remote master.
                                        Only for the 'remove-duplicates' command
     -m,--comment <arg>                 Comment to set during init and commit
     -n,--do-not-hash                   Do not hash file content. Uses only file names and modification dates
     -o,--output-max-lines <arg>        Change the maximum number lines displayed for the same kind of modification.
                                        Default value is 200 lines
     -p,--purge-states                  Purge previous States if the commit succeed
     -q,--quiet                         Do not display details
     -s,--super-fast-mode               Use super-fast mode. Hash only 3 small blocks.
                                        One at the beginning, one in the middle and one at the end
     -t,--thread-count <arg>            Number of thread used to hash file contents in parallel.
                                        By default, this number is dynamic and depends on the disk throughput
     -v,--version                       Prints the Fim version
     -y,--always-yes                    Always yes to every questions

6. How can you use Fim

6.1. Download a Fim release

You can download a prebuilt release of Fim from the Download Latest release
Then you have to:

  • Untar / Unzip the Fim package

  • Add the created directory into your PATH to be able to use the fim command

6.2. Fim changelog

Known limitations
  • The '-M' option that is dedicated to specify the master Fim repository does not work when using the Fim Docker image

6.2.2. Version 1.2.3

(Released 2017-06-06)

GitHub Sources    -    List Full Changelog    -    Package Download

General
Bug fix
  • Fix issue #9: Exception in thread "main" java.lang.IllegalStateException

    • Fix the State comparison algorithm

    • When the size is over 1GB, don’t round down to the nearest GB boundary

    • Use the International System of Units (SI) to compute file size (1000 instead of 1024)

6.2.3. Version 1.2.2

(Released 2016-10-24)

GitHub Sources    -    List Full Changelog    -    Package Download

General
  • Dynamically allocate hash threads depending on the disk throughput.
    It allows using automatically more threads with an SSD and less with a classical HDD

  • Add English slides

  • On Linux and Mac OS X, display the hash progress depending on terminal width

  • The '-c' option is deprecated in favor of the '-m' one to provide a comment. The rdup command now use the '-M' option

  • When purging states, the file name of the last State is not modified and the State number is kept

Bug fix
  • Super fast-mode is able to detect correctly growing files

  • Add explicit confirmation when removing duplicates

  • On Windows, Manage correctly comments with spaces

6.2.4. Version 1.2.1

(Released 2016-10-10)

GitHub Sources    -    List Full Changelog    -    Package Download

General
  • 'diff' command deprecated in favor of the 'status' command

  • Fix issue #6 - Add commitDetails into the State in order to display more details regarding each commit while running 'fim log'. The 'log' command displays now the same output as the 'status' command. It works completely with States generated with this version of Fim

  • Thanks to @nch3v, clarified the 'find-duplicates' command output and sort duplicate sets by wasted space to display the biggest first

  • Fim is now able to remove duplicates that are in the repository. See more in Remove duplicates

  • Add French slides

Bug fix
  • Update memory maximum sizes to more accurate values

  • In fim-docker, don’t use the realpath command that is not installed by default

6.2.5. Version 1.2.0

(Released 2016-05-23)

GitHub Sources    -    List Full Changelog    -    Package Download

Global improvement of the performance
  • Decrease the State size in memory and on the disc by using Ascii85 instead of Hexa to store hash. Each hash string length is now 80 instead of 128.

  • Add Super-fast commit support. See more in Super-fast commit

  • Allow to manage a very big amount a files (1 098 138 files is working for me)

  • Optimisation of the State comparison to be able to compare quickly two States that contains 1 000 000 files

General
  • Fim is now distributed as Docker image on Docker Hub. See more in Run Fim using Docker

  • Add the --directory option to be able to specify where to run Fim commands

  • Every commit on Fim is now tested on Mac OS X thanks to the Travis Mac builder

  • Run static code analysis on Sonar and Coverity

  • Add the --purge-states option that purge previous States if the commit succeed

Bug fix
  • Check access rights for the .fim directory before executing every command

  • Ignore the milliseconds for date modification comparison because some JDK don’t retrieve them and set them to 0

  • By default truncate output to 200 lines of the same kind. --output-max-lines option added to modify this

  • Empty files are not seen as duplicates

Migrating from 1.1.0 to 1.2.0

State format is modified and not compatible. You need the rehash the complete repository.
To migrate type:

$ fim ci -y -c "Migrate to Fim 1.2.0"

6.2.6. Version 1.1.0

(Released 2015-11-01)

GitHub Sources    -    List Full Changelog    -    Package Download

  • Bug Fix

  • Complete rewrite of the Hash algorithm in order to hash one block at the beginning, one in the middle and one at the end. Details in How does it work

  • Fix issue #2 - Add the 'fim dcor' command that find changes most likely caused by a hardware corruption or a filesystem bug. Change in content, but not in creation time and last modified time. Details in Hardware corruption detection

  • Fix issue #3 - If available, store the SELinux label of each file. Details in File permissions management

  • Fix issue #4 - Fix fim shell script for Mac OS X

  • Add the ability to ignore files or directory using a .fimignore file. Details in Ignoring files or directories

  • Add automatic build and testing of Fim using Travis CI for Linux and Appveyor CI for Windows

  • Add Unit test coverage using Coveralls

  • Moved the documentation to AsciiDoc using the asciidoctor-maven-plugin

6.2.7. Version 1.0.2

(Released 2015-09-04)

GitHub Sources    -    List Full Changelog    -    Package Download

  • Fix issue #1: Hash the second 4 KB / 1 MB block to ensure that the headers don’t increase the collision probability when doing a rapid check.

  • Clarified the documentation

6.2.8. Version 1.0.1

(Released 2015-08-26)

GitHub Sources    -    List Full Changelog    -    Package Download

  • Bug Fix

  • Update the State format in order to improve Fim commands. It allows for example to display more details in the log command

  • Add the ability to run Fim from a sub-directory

  • Add Global hash mode to be able to change the default Fim behavior

6.2.9. Version 1.0.0

(Released 2015-07-29)

GitHub Sources    -    List Full Changelog    -    Package Download

  • First release of Fim

  • Setup all the basis to allow adding more and more stuff around the States

  • Provides mainly init, commit and diff commands

6.3. Run Fim using Docker

If you don’t have Java and you don’t want to install it (or you don’t have the required version of Java), you can run Fim using a Docker image.
All the environment required to run Fim is inside the Docker image. Just pull the image and run Fim.

Important

The Docker image runs only on Linux

6.3.1. Using the latest published Docker image

Fim releases are published as Docker images on Docker Hub.
You can use them like this.

Retrieve the fim-docker script:
$ curl https://raw.githubusercontent.com/evrignaud/fim/master/fim-docker -L -o fim-docker && chmod a+rx fim-docker
Run it:

This script takes the same arguments as the fim one.
If you don’t have the Fim Docker image locally it will pull the image first.

$ ./fim-docker -h

If you want to use the latest version of the Fim Docker image you can pull it manually:

$ docker pull evrignaud/fim

6.3.2. Creating your own Docker Image

After having build Fim (see below), type the following command:

$ ./build-docker-image

Then you can use the provided fim-docker script.

6.4. Build Fim

You can build Fim yourself to try the master version.

Important

Fim comes with versioned prebuilt release. I recommend using them as they are OK to be used.
If you clone Fim’s master, a SNAPSHOT is retrieved and there is no guarantee that this particular build of Fim will work properly.

mvn clean install
  • You can run the jar file generated into the target directory using the two shell script files located in the root directory.

    • fim for Linux or Mac OS X

    • fim.bat for Windows

It generates also two distribution files into the target/dist directory.

$ ls -a1 target/dist/
fim-1.1.0-SNAPSHOT-distribution.tar.gz
fim-1.1.0-SNAPSHOT-distribution.zip

6.4.1. Step by step procedure

Here are some tips on how you can build Fim easily.

Getting Maven

If you don’t have Maven, you can either download it from the Apache website or use a packages manager.

For Linux
$ sudo apt-get install maven
For Mac OS X

First install Homebrew if you don’t have it.

$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Then install Maven.

$ brew install maven

Then you need to add the maven/bin directory into your PATH.

Clone Fim
$ cd
$ mkdir projects
$ cd projects
$ git clone https://github.com/evrignaud/fim.git
$ cd fim
Build Fim
$ mvn clean install
Now Fim is ready
$ ./fim

 

7. Simple example

Here is a step by step example of Fim usage. For the purpose of this example we use small files.

You can try it yourself by using the samples/simple-example.sh script.

7.1. Step by step

7.1.1. Create a set of files

~$ mkdir simple-example

~$ cd simple-example/

# Creates 10 files
simple-example$ for i in 01 02 03 04 05 06 07 08 09 10 ; do echo "New File $i" > file${i} ; done

simple-example$ ls -la
total 48
drwxrwxr-x 2 evrignaud evrignaud 4096 mai    9 21:58 .
drwx------ 3 evrignaud evrignaud 4096 mai    9 21:58 ..
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file01
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file02
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file03
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file04
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file05
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file06
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file07
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file08
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file09
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file10

7.1.2. Initialize the Fim repository

simple-example$ fim init -m 'First State'
2016/05/09 21:58:36 - Info  - Scanning recursively local files, using 'full' mode and 2 threads
(Hash progress legend for files grouped 10 by 10: # > 1 GB, @ > 200 MB, O > 100 MB, 8 > 50 MB, o > 20 MB, . otherwise)
.
2016/05/09 21:58:36 - Info  - Scanned 10 files (120 bytes), hashed 120 bytes (avg 120 bytes/s), during 00:00:00

Added:            file01
Added:            file02
Added:            file03
Added:            file04
Added:            file05
Added:            file06
Added:            file07
Added:            file08
Added:            file09
Added:            file10

10 added
Repository initialized

7.1.3. A new .fim directory is created

simple-example$ ls -la
total 52
drwxrwxr-x 3 evrignaud evrignaud 4096 mai    9 21:58 .
drwx------ 3 evrignaud evrignaud 4096 mai    9 21:58 ..
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file01
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file02
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file03
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file04
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file05
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file06
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file07
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file08
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file09
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file10
drwxrwxr-x 3 evrignaud evrignaud 4096 mai    9 21:58 .fim

7.1.4. Do some modifications

simple-example$ mkdir dir01

# Move file01 to dir01
simple-example$ mv file01 dir01

# Change the file02 modification date
simple-example$ touch file02

# Duplicate twice file03
simple-example$ cp file03 file03.dup1
simple-example$ cp file03 file03.dup2

# Add content to file04
simple-example$ echo foo >> file04

# Copy file05
simple-example$ cp file05 file11

# And add content to it
simple-example$ echo bar >> file05

# Remove file06
simple-example$ rm file06

# Duplicate once file07
simple-example$ cp file07 file07.dup1

# Create the new file12
simple-example$ echo "New File 12" > file12

Here is the content of the directories after the modifications.

simple-example$ ls -la
total 68
drwxrwxr-x 4 evrignaud evrignaud 4096 mai    9 21:58 .
drwx------ 3 evrignaud evrignaud 4096 mai    9 21:58 ..
drwxrwxr-x 2 evrignaud evrignaud 4096 mai    9 21:58 dir01
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file02
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file03
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file03.dup1
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file03.dup2
-rw-rw-r-- 1 evrignaud evrignaud   16 mai    9 21:58 file04
-rw-rw-r-- 1 evrignaud evrignaud   16 mai    9 21:58 file05
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file07
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file07.dup1
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file08
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file09
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file10
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file11
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file12
drwxrwxr-x 3 evrignaud evrignaud 4096 mai    9 21:58 .fim

simple-example$ ls -la dir01/
total 12
drwxrwxr-x 2 evrignaud evrignaud 4096 mai    9 21:58 .
drwxrwxr-x 4 evrignaud evrignaud 4096 mai    9 21:58 ..
-rw-rw-r-- 1 evrignaud evrignaud   12 mai    9 21:58 file01

7.1.5. Fim detects the modifications

simple-example$ fim st
2016/05/09 21:58:36 - Info  - Scanning recursively local files, using 'full' mode and 2 threads
(Hash progress legend for files grouped 10 by 10: # > 1 GB, @ > 200 MB, O > 100 MB, 8 > 50 MB, o > 20 MB, . otherwise)
.
2016/05/09 21:58:36 - Info  - Scanned 14 files (176 bytes), hashed 176 bytes (avg 176 bytes/s), during 00:00:00

Comparing with the last committed state from 2016/05/09 21:58:36
Comment: First State

Added:            file12
Copied:           file11 	(was file05)
Duplicated:       file03.dup1 = file03
Duplicated:       file03.dup2 = file03
Duplicated:       file07.dup1 = file07
Date modified:    file02 	last modified: 2016/05/09 21:58:33 -> 2016/05/09 21:58:36
Content modified: file04 	last modified: 2016/05/09 21:58:33 -> 2016/05/09 21:58:36
Content modified: file05 	last modified: 2016/05/09 21:58:33 -> 2016/05/09 21:58:36
Renamed:          file01 -> dir01/file01
Deleted:          file06

1 added, 1 copied, 3 duplicated, 1 date modified, 2 content modified, 1 renamed, 1 deleted

7.1.6. Search for duplicate files

simple-example$ fim fdup
2016/05/09 21:58:37 - Info  - Searching for duplicate files

2016/05/09 21:58:37 - Info  - Scanning recursively local files, using 'full' mode and 2 threads
(Hash progress legend for files grouped 10 by 10: # > 1 GB, @ > 200 MB, O > 100 MB, 8 > 50 MB, o > 20 MB, . otherwise)
.
2016/05/09 21:58:37 - Info  - Scanned 14 files (176 bytes), hashed 176 bytes (avg 176 bytes/s), during 00:00:00

- Duplicate set #1: duplicated 2 times, 12 bytes each, 24 bytes of wasted space
      file03
      file03.dup1
      file03.dup2

- Duplicate set #2: duplicated 1 time, 12 bytes each, 12 bytes of wasted space
      file07
      file07.dup1

3 duplicate files spread into 2 duplicate sets, 36 bytes of total wasted space

7.1.7. From the dir01 sub-directory

You can run Fim on a subset of the repository.
More details on using Fim from a sub-directory can be found in Run Fim commands from a sub-directory.

simple-example$ cd dir01

Inside this directory only one file is seen as added.

simple-example/dir01$ fim st
2016/05/09 21:58:37 - Info  - Scanning recursively local files, using 'full' mode and 2 threads
(Hash progress legend for files grouped 10 by 10: # > 1 GB, @ > 200 MB, O > 100 MB, 8 > 50 MB, o > 20 MB, . otherwise)
2016/05/09 21:58:37 - Info  - Scanned 1 file (12 bytes), hashed 12 bytes (avg 12 bytes/s), during 00:00:00

Comparing with the last committed state from 2016/05/09 21:58:36
Comment: First State

Added:            dir01/file01

1 added

There are no duplicate file as we are looking only inside dir01.

simple-example/dir01$ fim fdup
2016/05/09 21:58:37 - Info  - Searching for duplicate files

2016/05/09 21:58:37 - Info  - Scanning recursively local files, using 'full' mode and 2 threads
(Hash progress legend for files grouped 10 by 10: # > 1 GB, @ > 200 MB, O > 100 MB, 8 > 50 MB, o > 20 MB, . otherwise)
2016/05/09 21:58:38 - Info  - Scanned 1 file (12 bytes), hashed 12 bytes (avg 12 bytes/s), during 00:00:00

No duplicate file found

Commit only the local modifications done inside this directory.

simple-example/dir01$ fim ci -m 'Modifications from dir01' -y
2016/05/09 21:58:38 - Info  - Scanning recursively local files, using 'full' mode and 2 threads
(Hash progress legend for files grouped 10 by 10: # > 1 GB, @ > 200 MB, O > 100 MB, 8 > 50 MB, o > 20 MB, . otherwise)
2016/05/09 21:58:38 - Info  - Scanned 1 file (12 bytes), hashed 12 bytes (avg 12 bytes/s), during 00:00:00

Comparing with the last committed state from 2016/05/09 21:58:36
Comment: First State

Added:            dir01/file01

1 added

There are no more local modifications.

simple-example/dir01$ fim st
2016/05/09 21:58:38 - Info  - Scanning recursively local files, using 'full' mode and 2 threads
(Hash progress legend for files grouped 10 by 10: # > 1 GB, @ > 200 MB, O > 100 MB, 8 > 50 MB, o > 20 MB, . otherwise)
2016/05/09 21:58:38 - Info  - Scanned 1 file (12 bytes), hashed 12 bytes (avg 12 bytes/s), during 00:00:00

Comparing with the last committed state from 2016/05/09 21:58:38
Comment: Modifications from dir01

Nothing modified

Return into the parent directory.

simple-example/dir01$ cd ..

7.1.8. Commit the modifications

simple-example$ fim ci -m 'All modifications' -y
2016/05/09 21:58:39 - Info  - Scanning recursively local files, using 'full' mode and 2 threads
(Hash progress legend for files grouped 10 by 10: # > 1 GB, @ > 200 MB, O > 100 MB, 8 > 50 MB, o > 20 MB, . otherwise)
.
2016/05/09 21:58:39 - Info  - Scanned 14 files (176 bytes), hashed 176 bytes (avg 176 bytes/s), during 00:00:00

Comparing with the last committed state from 2016/05/09 21:58:38
Comment: Modifications from dir01

Added:            file12
Copied:           file11 	(was file05)
Duplicated:       file03.dup1 = file03
Duplicated:       file03.dup2 = file03
Duplicated:       file07.dup1 = file07
Date modified:    file02 	last modified: 2016/05/09 21:58:33 -> 2016/05/09 21:58:36
Content modified: file04 	last modified: 2016/05/09 21:58:33 -> 2016/05/09 21:58:36
Content modified: file05 	last modified: 2016/05/09 21:58:33 -> 2016/05/09 21:58:36
Deleted:          file01
Deleted:          file06

1 added, 1 copied, 3 duplicated, 1 date modified, 2 content modified, 2 deleted

7.1.9. Nothing is modified now

simple-example$ fim st
2016/05/09 21:58:39 - Info  - Scanning recursively local files, using 'full' mode and 2 threads
(Hash progress legend for files grouped 10 by 10: # > 1 GB, @ > 200 MB, O > 100 MB, 8 > 50 MB, o > 20 MB, . otherwise)
.
2016/05/09 21:58:39 - Info  - Scanned 14 files (176 bytes), hashed 176 bytes (avg 176 bytes/s), during 00:00:00

Comparing with the last committed state from 2016/05/09 21:58:39
Comment: All modifications

Nothing modified

7.2. The Fim log

simple-example$ fim log
- State #1: 2016/05/09 21:58:36 (10 files - 120 bytes)
	Comment: First State

Added:            file01
Added:            file02
Added:            file03
Added:            file04
Added:            file05
Added:            file06
Added:            file07
Added:            file08
Added:            file09
Added:            file10

10 added

- State #2: 2016/05/09 21:58:38 (11 files - 132 bytes)
	Comment: Modifications from dir01

Added:            dir01/file01
Added:            file01
Added:            file02
Added:            file03
Added:            file04
Added:            file05
Added:            file06
Added:            file07
Added:            file08
Added:            file09
Added:            file10

11 added

- State #3: 2016/05/09 21:58:39 (14 files - 176 bytes)
	Comment: All modifications

Added:            file12
Copied:           file11 	(was file05)
Duplicated:       file03.dup1 = file03
Duplicated:       file03.dup2 = file03
Duplicated:       file07.dup1 = file07
Date modified:    file02 	last modified: 2016/05/09 21:58:33 -> 2016/05/09 21:58:36
Content modified: file04 	last modified: 2016/05/09 21:58:33 -> 2016/05/09 21:58:36
Content modified: file05 	last modified: 2016/05/09 21:58:33 -> 2016/05/09 21:58:36
Deleted:          file01
Deleted:          file06

1 added, 1 copied, 3 duplicated, 1 date modified, 2 content modified, 2 deleted

7.3. Rollback all the commit

Rollback the last commit.

simple-example$ fim rbk -y
You are going to rollback the last commit. State #3 will be removed

- State #3: 2016/05/09 21:58:39 (14 files - 176 bytes)
	Comment: All modifications

Added:            file12
Copied:           file11 	(was file05)
Duplicated:       file03.dup1 = file03
Duplicated:       file03.dup2 = file03
Duplicated:       file07.dup1 = file07
Date modified:    file02 	last modified: 2016/05/09 21:58:33 -> 2016/05/09 21:58:36
Content modified: file04 	last modified: 2016/05/09 21:58:33 -> 2016/05/09 21:58:36
Content modified: file05 	last modified: 2016/05/09 21:58:33 -> 2016/05/09 21:58:36
Deleted:          file01
Deleted:          file06

1 added, 1 copied, 3 duplicated, 1 date modified, 2 content modified, 2 deleted

Rollback again.

simple-example$ fim rbk -y
You are going to rollback the last commit. State #2 will be removed

- State #2: 2016/05/09 21:58:38 (11 files - 132 bytes)
	Comment: Modifications from dir01

Added:            dir01/file01
Added:            file01
Added:            file02
Added:            file03
Added:            file04
Added:            file05
Added:            file06
Added:            file07
Added:            file08
Added:            file09
Added:            file10

11 added

Nothing more to rollback.

simple-example$ fim rbk -y
2016/05/09 21:58:40 - Info  - No commit to rollback

7.4. Commit using super-fast mode

simple-example$ fim ci -s -m 'Commit modifications very quickly using super-fast commit' -y
2016/05/09 21:58:40 - Info  - Scanning recursively local files, using 'super-fast' mode and 2 threads
(Hash progress legend for files grouped 10 by 10: # > 1 GB, @ > 200 MB, O > 100 MB, 8 > 50 MB, o > 20 MB, . otherwise)
.
2016/05/09 21:58:41 - Info  - Scanned 14 files (176 bytes), hashed 176 bytes (avg 176 bytes/s), during 00:00:00

Comparing with the last committed state from 2016/05/09 21:58:36
Comment: First State

Added:            file12
Copied:           file11 	(was file05)
Duplicated:       file03.dup1 = file03
Duplicated:       file03.dup2 = file03
Duplicated:       file07.dup1 = file07
Date modified:    file02 	last modified: 2016/05/09 21:58:33 -> 2016/05/09 21:58:36
Content modified: file04 	last modified: 2016/05/09 21:58:33 -> 2016/05/09 21:58:36
Content modified: file05 	last modified: 2016/05/09 21:58:33 -> 2016/05/09 21:58:36
Renamed:          file01 -> dir01/file01
Deleted:          file06

1 added, 1 copied, 3 duplicated, 1 date modified, 2 content modified, 1 renamed, 1 deleted

2016/05/09 21:58:41 - Info  - Retrieving the missing hash for all the modified files, using 'full' mode and 2 threads
2016/05/09 21:58:41 - Info  - Scanned 8 files (104 bytes), hashed 104 bytes (avg 104 bytes/s), during 00:00:00

In this case, files are too short, commit in super-fast mode is not more efficient.
But with huge files it makes a big difference.

7.5. Again, nothing is modified now

simple-example$ fim st
2016/05/09 21:58:41 - Info  - Scanning recursively local files, using 'full' mode and 2 threads
(Hash progress legend for files grouped 10 by 10: # > 1 GB, @ > 200 MB, O > 100 MB, 8 > 50 MB, o > 20 MB, . otherwise)
.
2016/05/09 21:58:41 - Info  - Scanned 14 files (176 bytes), hashed 176 bytes (avg 176 bytes/s), during 00:00:00

Comparing with the last committed state from 2016/05/09 21:58:40
Comment: Commit modifications very quickly using super-fast commit

Nothing modified

7.6. Display the Fim log

simple-example$ fim log
- State #1: 2016/05/09 21:58:36 (10 files - 120 bytes)
	Comment: First State

Added:            file01
Added:            file02
Added:            file03
Added:            file04
Added:            file05
Added:            file06
Added:            file07
Added:            file08
Added:            file09
Added:            file10

10 added

- State #2: 2016/05/09 21:58:40 (14 files - 176 bytes)
	Comment: Commit modifications very quickly using super-fast commit

Added:            file12
Copied:           file11 	(was file05)
Duplicated:       file03.dup1 = file03
Duplicated:       file03.dup2 = file03
Duplicated:       file07.dup1 = file07
Date modified:    file02 	last modified: 2016/05/09 21:58:33 -> 2016/05/09 21:58:36
Content modified: file04 	last modified: 2016/05/09 21:58:33 -> 2016/05/09 21:58:36
Content modified: file05 	last modified: 2016/05/09 21:58:33 -> 2016/05/09 21:58:36
Renamed:          file01 -> dir01/file01
Deleted:          file06

1 added, 1 copied, 3 duplicated, 1 date modified, 2 content modified, 1 renamed, 1 deleted

7.7. State file content

Here is an extract of the State’s 2 content. To simplify reading, hashes are shortened and only one file entry is kept.

simple-example$ zmore .fim/states/state_2.json.gz
{
  "stateHash": "WKE\\VP<9...`$SnPo",
  "modelVersion": "4",
  "timestamp": 1462827520971,
  "comment": "Commit modifications very quickly using super-fast commit",
  "fileCount": 14,
  "filesContentLength": 176,
  "hashMode": "hashAll",
  "modificationCounts": {
    "added": 1,
    "copied": 1,
    "duplicated": 3,
    "dateModified": 1,
    "contentModified": 2,
    "attributesModified": 0,
    "renamed": 1,
    "deleted": 1
  },
  "ignoredFiles": [
    ".fim/"
  ],
  "fileStates": [
    {
      "fileName": "dir01/file01",
      "fileLength": 12,
      "fileTime": {
        "creationTime": 1462823913000,
        "lastModified": 1462823913000
      },
      "modification": "renamed",
      "fileHash": {
        "smallBlockHash": "qH\\4/L...@7&m!=",
        "mediumBlockHash": "qH\\4/L...@7&m!=",
        "fullHash": "qH\\4/L...@7&m!="
      },
      "fileAttributes": {
        "PosixFilePermissions": "rw-rw-r--"
      }
    },

    ...
    # Other file entries have been removed
    ...

  ]
}

 

8. Real life example

8.1. Initialize the big Fim repository

Here is the output of the initialization of a big Fim repository that contains 297 GB of photos and videos.
In my case it takes almost 3 hours to hash all the file contents.

photos-videos$ fim init -y
No comment provided. You are going to initialize your repository using the default comment.
2015/10/22 19:23:05 - Info  - Scanning recursively local files, using 'full' mode and 2 threads
(Hash progress legend for files grouped 10 by 10: # > 1 GB, @ > 200 MB, O > 100 MB, 8 > 50 MB, o > 20 MB, . otherwise)
2015/10/22 19:23:06 - Info  - SELinux is enabled on this system
.................................................................o.o...o......8........O.........oo.
...O....................#.................Oooooooooooo.@.o.o.oo..oo@................o.......88oOoooo
... # Hash progress legend shortened
.................................o.............@ooooo@oooooooooooooooooo.ooooooooooooooooooooooooooo
..............................................
2015/10/22 22:05:50 - Info  - Scanned 41467 files (297 GB), hashed 297 GB (avg 31 MB/s), during 02:42:44

... # File list is skipped as it is too long

41467 added
Repository initialized

8.2. Check the status very quickly

It is possible to check the status very quickly using the super-fast mode using the -s option that will check only 3 small blocks of 4 KB.
In my case it takes 17 minutes :=).

photos-videos$ fim st -s
2015/10/22 23:15:52 - Info  - Scanning recursively local files, using 'super-fast' mode and 2 threads
(Hash progress legend for files grouped 10 by 10: # > 1 GB, @ > 200 MB, O > 100 MB, 8 > 50 MB, o > 20 MB, . otherwise)
2015/10/22 23:15:52 - Info  - SELinux is enabled on this system
.................................................................o.o...o......8........O.........oo.
...O....................#.................Oooooooooooo.@.o.o.oo..oo@................o.......88oOoooo
... # Hash progress legend shortened
.................................o.............@ooooo@oooooooooooooooooo.ooooooooooooooooooooooooooo
..............................................
2015/10/22 23:32:57 - Info  - Scanned 41467 files (297 GB), hashed 484 MB (avg 484 KB/s), during 00:17:04

Comparing with the last committed state from 2015/10/22 19:23:06
Comment: Initial State

Nothing modified

8.3. Check the status more accurately

Using the super-fast mode can produce inaccurate results. You can miss some modified files.
To increase accuracy, you can use the fast mode using the -f option that will check 3 medium blocks.
Even if, using this mode increase accuracy, it cannot be completely accurate as we only hash 3 different 1 MB blocks of the files.
This time in my case, it takes 42 minutes to check the status.

photos-videos$ fim st -f
2015/10/22 23:33:22 - Info  - Scanning recursively local files, using 'fast' mode and 2 threads
(Hash progress legend for files grouped 10 by 10: # > 1 GB, @ > 200 MB, O > 100 MB, 8 > 50 MB, o > 20 MB, . otherwise)
2015/10/22 23:33:23 - Info  - SELinux is enabled on this system
.................................................................o.o...o......8........O.........oo.
...O....................#.................Oooooooooooo.@.o.o.oo..oo@................o.......88oOoooo
... # Hash progress legend shortened
.................................o.............@ooooo@oooooooooooooooooo.ooooooooooooooooooooooooooo
..............................................
2015/10/23 00:15:05 - Info  - Scanned 41467 files (297 GB), hashed 51 GB (avg 21 MB/s), during 00:41:42

Comparing with the last committed state from 2015/10/22 19:23:06
Comment: Initial State

Nothing modified

8.4. Full status checking

If you want to be completely sure of the status result, you need to run a full hash of all the file contents using the status command without any option.
This time in my case, it takes almost 3 hours as for the init command.

8.5. Checking without hashing

There is also the do not hash mode using the -n option that will not hash the file contents. It helps to detect faster changes but you will be able to detect only file names and file attributes that have changed.
This time in my case, it takes 46 seconds.

photos-videos$ fim st -n
2015/10/22 23:14:53 - Info  - Not hashing file content so thread count forced to 1
2015/10/22 23:14:53 - Info  - Scanning recursively local files, using 'do not hash' mode and 1 thread
2015/10/22 23:14:54 - Info  - SELinux is enabled on this system
2015/10/22 23:15:40 - Info  - Scanned 41467 files (297 GB), during 00:00:46

Comparing with the last committed state from 2015/10/22 19:23:06
Comment: Initial State

Nothing modified

9. Super-fast commit

Fim is able to commit using the super-fast mode (or fast mode). In that case, it checks into the repository for modifications using the selected mode.
And then it do a full hash of the modified files in order to fill up the missing hash.
Using Super-fast commit allow to save a lot of time. With this option you can miss some modified files.

This means that even if the global hash mode of your repository is hash all, you can commit using either fast mode or super-fast mode.

Super-fast commit example:

$ fim init -y -m "Create the repository slowly in full mode"
Info  - Scanning recursively local files, using 'full' mode and 2 threads
...

# Do some modifications

$ fim ci -s -y -m "Commit modifications very quickly using super-fast commit"
Info  - Scanning recursively local files, using 'super-fast' mode and 2 threads
...
Info  - Retrieving the missing hash for all the modified files, using 'full' mode and 2 threads
...

# Do some modifications

$ fim ci -f -y -m "Commit modifications quickly using fast commit"
Info  - Scanning recursively local files, using 'fast' mode and 2 threads
...
Info  - Retrieving the missing hash for all the modified files, using 'full' mode and 2 threads
...

10. Dealing with duplicates

Duplicate files are addressed by Fim in two different ways.

10.1. Duplicates inside a Fim repository

Fim allow you to detect duplicates using the fdup command.

You can also remove them.

10.1.1. Find duplicates

Fim is able to display duplicates contained in a repository using the fdup (find-duplicates) command. It displays the list of duplicate files.
See it in action in Search for duplicate files.

$ fim fdup

If the current State is already commited, you can skip the workspace scanning phase with the -l option :

$ fim fdup -l

10.1.2. Remove duplicates

You can remove duplicate files.

  • Either interactive:

$ fim rdup
  • Or automatically preserving the first file in the list:

$ fim rdup -y

In both cases, it is possible to use the current State as with fdup by adding the -l option:

$ fim rdup -l

10.2. Duplicates that are outside

Fim can delete duplicate files contained in another repository.
It can be useful if you want to cleanup old backups that are no more synchronized and you want to be sure to not lose any files that could have been modified or added.
It erases all files locally that already exist in the master workspace.

For example, backup is a copy of the repository named source :

$ cd backup
$ fim rdup -M ../source

When the workspace to clean is remote, you can just copy the .fim in an empty directory and set it as parameter to the -M option of the rdup command

10.2.1. Simple duplicates removing

Here is a step by step example of duplicates removing. For the purpose of this example we use small files.

You can try it yourself by using the samples/remove-duplicates-example.sh script.

Create a source directory with some files in it
~$ mkdir rdup-example
~$ cd rdup-example
~/rdup-example$ mkdir source
~/rdup-example$ cd source/
~/rdup-example/source$ for i in 01 02 03 04 05 06 07 08 09 10 ; do echo "New File $i" > file${i} ; done
~/rdup-example/source$ ls -la
total 48
drwxrwxr-x 2 evrignaud evrignaud 4096 mai   21 08:39 .
drwxrwxr-x 3 evrignaud evrignaud 4096 mai   21 08:39 ..
-rw-rw-r-- 1 evrignaud evrignaud   12 mai   21 08:39 file01
-rw-rw-r-- 1 evrignaud evrignaud   12 mai   21 08:39 file02
-rw-rw-r-- 1 evrignaud evrignaud   12 mai   21 08:39 file03
-rw-rw-r-- 1 evrignaud evrignaud   12 mai   21 08:39 file04
-rw-rw-r-- 1 evrignaud evrignaud   12 mai   21 08:39 file05
-rw-rw-r-- 1 evrignaud evrignaud   12 mai   21 08:39 file06
-rw-rw-r-- 1 evrignaud evrignaud   12 mai   21 08:39 file07
-rw-rw-r-- 1 evrignaud evrignaud   12 mai   21 08:39 file08
-rw-rw-r-- 1 evrignaud evrignaud   12 mai   21 08:39 file09
-rw-rw-r-- 1 evrignaud evrignaud   12 mai   21 08:39 file10
Initialize the Fim repository
~/rdup-example/source$ fim init -y
No comment provided. You are going to initialize your repository using the default comment.
2016/05/21 08:39:12 - Info  - Scanning recursively local files, using 'full' mode and 4 threads
(Hash progress legend for files grouped 10 by 10: # > 1 GB, @ > 200 MB, O > 100 MB, 8 > 50 MB, o > 20 MB, . otherwise)
.
2016/05/21 08:39:12 - Info  - Scanned 10 files (120 bytes), hashed 120 bytes (avg 120 bytes/s), during 00:00:00

Added:            file01
Added:            file02
Added:            file03
Added:            file04
Added:            file05
Added:            file06
Added:            file07
Added:            file08
Added:            file09
Added:            file10

10 added
Repository initialized
Create a backup of this directory
~/rdup-example/source$ cd ..
~/rdup-example$ cp -a source backup
Modify two files into the source directory and move two others
~/rdup-example$ cd source/

~/rdup-example/source$ echo modif1 >> file02
~/rdup-example/source$ echo modif2 >> file04

~/rdup-example/source$ mkdir subdir
~/rdup-example/source$ mv file01 subdir
~/rdup-example/source$ mv file03 subdir

~/rdup-example/source$ ls -la
total 48
drwxrwxr-x 4 evrignaud evrignaud 4096 mai   21 08:39 .
drwxrwxr-x 4 evrignaud evrignaud 4096 mai   21 08:39 ..
-rw-rw-r-- 1 evrignaud evrignaud   19 mai   21 08:39 file02
-rw-rw-r-- 1 evrignaud evrignaud   19 mai   21 08:39 file04
-rw-rw-r-- 1 evrignaud evrignaud   12 mai   21 08:39 file05
-rw-rw-r-- 1 evrignaud evrignaud   12 mai   21 08:39 file06
-rw-rw-r-- 1 evrignaud evrignaud   12 mai   21 08:39 file07
-rw-rw-r-- 1 evrignaud evrignaud   12 mai   21 08:39 file08
-rw-rw-r-- 1 evrignaud evrignaud   12 mai   21 08:39 file09
-rw-rw-r-- 1 evrignaud evrignaud   12 mai   21 08:39 file10
drwxrwxr-x 3 evrignaud evrignaud 4096 mai   21 08:39 .fim
drwxrwxr-x 2 evrignaud evrignaud 4096 mai   21 08:39 subdir

~/rdup-example/source$ ls -la subdir
total 16
drwxrwxr-x 2 evrignaud evrignaud 4096 mai   21 08:39 .
drwxrwxr-x 4 evrignaud evrignaud 4096 mai   21 08:39 ..
-rw-rw-r-- 1 evrignaud evrignaud   12 mai   21 08:39 file01
-rw-rw-r-- 1 evrignaud evrignaud   12 mai   21 08:39 file03
Commit all the modifications
~/rdup-example/source$ fim ci -s -m "Modifications"
2016/05/21 08:39:13 - Info  - Scanning recursively local files, using 'super-fast' mode and 4 threads
(Hash progress legend for files grouped 10 by 10: # > 1 GB, @ > 200 MB, O > 100 MB, 8 > 50 MB, o > 20 MB, . otherwise)
.
2016/05/21 08:39:13 - Info  - Scanned 10 files (134 bytes), hashed 134 bytes (avg 134 bytes/s), during 00:00:00

Comparing with the last committed state from 2016/05/21 08:39:12
Comment: Initial State

Content modified: file02
Content modified: file04
Renamed:          file01 -> subdir/file01
Renamed:          file03 -> subdir/file03

2 content modified, 2 renamed

Do you really want to commit (y/n/A)? y
2016/05/21 08:39:14 - Info  - Retrieving the missing hash for all the modified files, using 'full' mode and 4 threads
2016/05/21 08:39:14 - Info  - Scanned 4 files (62 bytes), hashed 62 bytes (avg 62 bytes/s), during 00:00:00
Remove the duplicates
~/rdup-example/source$ cd ../backup/
~/rdup-example/backup$ fim rdup -M ../source
2016/05/21 08:39:14 - Info  - Searching for duplicate files using the ../source directory as master

2016/05/21 08:39:14 - Info  - Scanning recursively local files, using 'full' mode and 4 threads
(Hash progress legend for files grouped 10 by 10: # > 1 GB, @ > 200 MB, O > 100 MB, 8 > 50 MB, o > 20 MB, . otherwise)
.
2016/05/21 08:39:15 - Info  - Scanned 10 files (120 bytes), hashed 120 bytes (avg 120 bytes/s), during 00:00:00

'file01' is a duplicate of '../source/subdir/file01'
Do you really want to remove it (y/n/A)? y
  'file01' removed
'file03' is a duplicate of '../source/subdir/file03'
Do you really want to remove it (y/n/A)? y
  'file03' removed
'file05' is a duplicate of '../source/file05'
Do you really want to remove it (y/n/A)? A
  'file05' removed
'file06' is a duplicate of '../source/file06'
  'file06' removed
'file07' is a duplicate of '../source/file07'
  'file07' removed
'file08' is a duplicate of '../source/file08'
  'file08' removed
'file09' is a duplicate of '../source/file09'
  'file09' removed
'file10' is a duplicate of '../source/file10'
  'file10' removed

8 duplicate files found. 8 duplicate files removed
Important

When you are prompted with a question asking for (y/n/A) which means Yes, No, or All Yes.
'All Yes' will reply Yes to all the remaining questions. You can see it in action above.

Only the two modified files remains
~/rdup-example/backup$ ls -la
total 20
drwxrwxr-x 3 evrignaud evrignaud 4096 mai   21 08:39 .
drwxrwxr-x 4 evrignaud evrignaud 4096 mai   21 08:39 ..
-rw-rw-r-- 1 evrignaud evrignaud   12 mai   21 08:39 file02
-rw-rw-r-- 1 evrignaud evrignaud   12 mai   21 08:39 file04
drwxrwxr-x 3 evrignaud evrignaud 4096 mai   21 08:39 .fim

10.2.2. Complex duplicates removing

Let say that you have:

  • a directory with a big file tree that we will call the source location.

  • other locations that contain some files that were copied long ago from this source location. We will call one those locations the backup location.

Now you want to cleanup the backup location from the files that are identical with the ones in the source location. To find duplicates into the backup location we will use the hash located into the source .fim directory. We will call master location the name of the directory where is this .fim.
Most of the time the master location is the source location.
If the source location is not reachable from the backup location, you just need to put a copy of the source .fim directory near the backup location.

Note

The backup location can contain also his own .fim directory. It will be ignored.

Step by step
  • Go into the source location and ensure that all the hash are up to date:

$ cd <source location>
$ fim ci -y -m "Content added"
  • If the backup location cannot reach the source location (so master location is not the source location), copy the .fim directory that is in the source location into a place near the backup location.

$ cd <somewhere near the backup location>
$ mkdir <master location>
$ scp -rp <remote host>@<source location>/.fim <master location>
Important

The source .fim directory can’t be nested into the root folder of the backup location.

  • Run the remove duplicates command. For this, go in the backup location.

$ cd <backup location>
$ fim rdup -M <master location>

11. File permissions management

Checking the integrity of data files can be good, but we need also to ensure that file permissions are not compromised.

Fim is able to save and restore back file permissions.
To do so it stores for each file:

  • The DAC information:

    • On Linux and Mac OS X file permission like rwxrwxrwx is stored.

    • On Windows the Archive, Hidden, ReadOnly and System attributes are stored.

  • The MAC information:
    Currently the SELinux label is stored, if supported by the OS.

You can check file permissions using the status command. A quick result can be produced using the do not hash mode (-n option), but in this case Fim will not be able to detect file permissions change for files that have been renamed.
If some permission has changed and you want to restore them use the rfa command.

When you use the same Fim repository with different OS, file permissions that are not supported are ignored.

12. Hardware corruption detection

Fim is able to detect changes likely caused by a hardware corruption or a filesystem bug.
A file is suspected to be corrupted when his content has changed, but the timestamps are not modified (creation time and modification time).
To run hardware corruption detection use the dcor command.

The status command can also be used to check if some file content has changed.
The difference for the dcor command is that only corrupted files are displayed. Files that you way have added or modified are not listed in the result.

12.1. False positive

The dcor command can produce false positive.

For example here is a simulation of hardware corruption based on the repository generated for the Simple example:

  • Change the content of a file

simple-example$ echo bar >> file05
  • Reset the file attributes using the fim rfa command

simple-example$ fim rfa -y
You are going to reset files attributes based on the last committed State done 2015/10/23 07:16:19
Comment: All modifications

2015/10/23 09:18:21 - Info  - SELinux is enabled on this system
Set creation Time: file05 	2015/10/23 09:18:14 -> 2015/10/23 07:16:12
Set last modified: file05 	2015/10/23 09:18:14 -> 2015/10/23 07:16:12

2015/10/23 09:18:21 - Info  - The attributes of 1 file have been reset
  • Now file05 is suspected to be corrupted

simple-example$ fim dcor
2015/10/23 09:19:26 - Info  - Scanning recursively local files, using 'full' mode and 2 threads
(Hash progress legend for files grouped 10 by 10: # > 1 GB, @ > 200 MB, O > 100 MB, 8 > 50 MB, o > 20 MB, . otherwise)
2015/10/23 09:19:27 - Info  - SELinux is enabled on this system
.
2015/10/23 09:19:27 - Info  - Scanned 14 files (164 bytes), hashed 164 bytes (avg 164 bytes/s), during 00:00:00

Comparing with the last committed state from 2015/10/23 07:16:19
Comment: All modifications

Corrupted?:       file05

1 corrupted

13. FAQ

13.1. Run Fim commands from a sub-directory

You can run all the Fim commands from every sub-directory inside a Fim repository. You can see it in action into the Simple example.

Doing this allow you to:

  • Quickly find the modifications done in this specific sub-directory. You will hash only the files contained inside and not the complete file tree

  • Quickly commit the modifications done in this sub-directory

  • Quickly find the duplicate files contained in this sub-directory and remove them

  • Quickly reset the attributes of files contained in this sub-directory

All the other commands will run as if you were on the top of the Fim repository.

13.2. Ignore some difference during State comparison

You may want to get the status of a repository ignoring some kind of differences. You can ignore:

  • attrs: File attributes

  • dates: Modification dates

  • renamed: Renamed files

  • all: All of the above

You can specify multiple kind of difference to ignore separated by a comma.

For example:

$ fim st -i attrs,dates,renamed

13.3. Ignoring files or directories

You can specify files or directories that you want to be ignored by Fim. For this, you can add a .fimignore file in one of the directories contained into the Fim repository.
You can also set global ignores by creating a .fimignore into the user home directory.

Each line of the .fimignore file specifies a pattern. The pattern is mainly a file or directory name.
Use wildcards in order to match many of them. For example *.mp3 will match all the files ending with .mp3.
A leading ** followed by a slash means match in all directories.
For example, **/foo matches a file or a directory named foo anywhere, starting from where the .fimignore contain this pattern.

13.4. Changing default hash mode

If you never want to hash the complete content of your files you can set a global hash mode that will indicate the maximum hash mode you want to use for this repository. You can specify this to the init command:

  • -f: Sets maximum hash mode to fast. You will be able to use -f, -s or -n after

  • -s: Sets maximum hash mode to super-fast. You will be able to use -s or -n after

  • -n: Means always don’t hash anything. You won’t be able to use other hash mode after

13.4.1. Example

Initialize the Fim repository specifying the global hash mode.

$ fim init -f

It sets a global hash mode for the complete repository to fast mode.
All the Fim commands that you use after will use by default the fast hash mode (or less if specified) and you won’t be able to hash the full file contents.

After the init command that we run in our example, you will be able to run the following commands:

$ fim st    # will run using -f

$ fim st -s

$ fim st -n

13.5. Hash files in multi-thread

Fim hashes files using several threads. This allows taking advantage of the computer resources and maximizes the overall performances of file hashing.
By default, the number of thread is dynamic and depends on the disk throughput. It allows using automatically more threads with an SSD and less with a classical HDD
You can specify, using the -t option, the number of thread to be used for file hashing.
The best value depends on the kind of hard disk you have. The more throughput you have, the more thread you may use.

13.6. State integrity

Every State file contains a hash of the State content.
If something is modified in the State file, the hash of the State content will change and the State will be reported as corrupted.
Fim won’t use a corrupted State.

13.7. Cross platform compatibility

The same Fim repository can be used by either on Linux, Mac OS X and Windows.
State content is normalized and the same State content can be loaded on the different supported OS.

14. Fim requirements

This tool is written in Java

Fim is compiled using Java 8 and require Java 8 to run. You can download Java 8 from the Java SE Downloads.
You need at least Java 8 Standard Edition JRE. You can also use the OpenJDK 8.

15. Supported OS

Fim can be used on Linux, Mac OS X and Windows

16. They talked about it

Fim (File Integrity Manager) est un outil vraiment excellent qui permet de gérer l’intégrité de nombreux fichiers. + Lire la suite

English version   Verify the integrity of many files
Fim (File Integrity Manager) is a really great tool for managing the integrity of many files. + Read more

Fim (File Integrity Manager) est un outil open source qui vous permet de vérifier l’intégrité de tous vos fichiers après les avoir manipulés en lots.    + Lire la suite pour Linux          + Lire la suite pour Windows

English version   Fim (File Integrity Manager) is an open source tool which allows you to check the integrity of all your files after have handled them bulk.    + Read more for Linux          + Read more for Windows

17. About

Created by Etienne Vrignaud                          Twitter icon Follow @evrignaud

18. License

This project is released under the GPLv3 license, for more details, take a look at the LICENSE file in the source.

Basically, that allows you to use all or part of the project for you own business.