synchronizing folders

JuggaloBrotha

VB.NET Forum Moderator
Staff member
Joined
Jun 3, 2004
Messages
4,530
Location
Lansing, MI; USA
Programming Experience
10+
Unless someone knows of a way to do this already in windows, what's the best method for (using a background worker) checking all of the files & subfolders in a folder against another folder in an external usb hdd?

I'd like it to copy only the files that's changed since the last backup, and it'd be nice if I could right-click on a folder (which i can easily google this one, it's a common thing) and select my program and it'd pop up a window to get the destination (since the source is known from the windows context menu) and only copy (overwriting the existing files/folders) that've changed since the last backup.
 
Hello.

Something like this should work:
VB.NET:
        Public Sub checkFolder(ByVal sourceFolder As String, ByVal targetFolder As String)
            If Directory.Exists(sourceFolder) Then
                If Not Directory.Exists(targetFolder) Then Directory.CreateDirectory(targetFolder)

                For Each folder As String In Directory.GetDirectories(sourceFolder)
                    Me.checkFolder(Path.Combine(sourceFolder, folder), Path.Combine(targetFolder, folder))
                Next

                Dim sourceFile As String = String.Empty
                Dim targetFile As String = String.Empty

                For Each filename As String In Directory.GetFiles(sourceFolder)
                    sourceFile = Path.Combine(sourceFolder, filename)
                    targetFile = Path.Combine(targetFolder, filename)

                    If File.Exists(targetFile) Then
                        If File.GetLastWriteTime(sourceFile) > File.GetLastWriteTime(targetFile) Then
                            File.Copy(sourceFile, targetFile, True)
                        End If
                    Else
                        File.Copy(sourceFile, targetFile, True)
                    End If
                Next
            End If
        End Sub

Bobby
 
Yea, the actual code to do that I know already, I'm just wondering if anyone would know what conditions to check for, like: Date Modified & Archive attribute or just one or the other or something else or some combination of these and something else.

I'm also wondering what would be the best way to determine which files on the destination to delete (because they don't exist in the source anymore). I'm not sure how to conceptually do that one yet
 
What I do is to load two Dictionary(Of String, FileInfo), one for source files and one for target files. The string key I use is the relative path, which is same for source and target. Then I loop all source keys, if it doesn't exist in target or it is changed (1. size different or 2. archive flag) file is copied (async), finally the key is removed from target. By the end of loop the target list contains only files that don't exist in source, so they are deleted. The copy operation (optionally) remove the archive flag after file is copied.

The reason I primarily check file size is to more reliably copy changed files that for some reason may not have the archive flag, for example in a network environment other backup systems or users may have removed them.
 
I installed LiveMesh.. It's quite a bit easier to use a wheel that's already invented though I do have some complaints about how much it thrashes the hard disk when starting up.
 
Another possible solution is to use the System.IO.FileSystemWatcher and maintain a single Dictionary(Of String, Integer) where the key is the path that changed and the value is sort of enumerated (0 - Created, 1 - Deleted, 2 - Renamed, etc). If you ran that as a Service that uses configurable "Source" paths (which could be potentially setup through Context Menu), and then watch for a USB device to be inserted and then process the list of changes. Come to think of it I would probably use a Memory Mapped File or maybe manual record syncing to disk to store the change information to keep it persistent in case of system failure or restart. It sounds more complicated than it would be in code, probably about a week of free time is all. That would be a fun project... I might just have to do it.
 
I'm planning on making two versions of the program, one that you install and it runs as a windows service, but also for my purposes I'm making a standalone app that does it all manually.

I was wondering, would it be better to just calculate the CRC32 or the MD5 (which is better, I know CRC32 is faster on the comp to calculate than MD5 but I may want to just calculate MD5 because the precision is a little more accurate) and the last modified date and the filesize or should I just do the CRC32 or MD5 and call it good?
 
The fastest way to compare two files of equal size (forget dates) is just by reading them sequentially in chunks and comparing byte for byte

CRC or MD5 means you have to read the whole file and calc the hash, but they might be different on the first byte, and you would have found that out nearly instantly rather than reading all X megabytes for the hash

If youre doing an N-way compare, you start with all files and progressively divide files off into separate "buckets" if their bytes are the same

Thus files:

AABBCCCC
AABBDDDD
ABBBBBBBB
ABBBBBBBB

Would be all 4 files in one bucket
Then in 2 buckets
Then (after reading 5th byte) top 2 files are no longer equal and are the only 2 files left in their bucket so whole bucket is dropped
Leaving only 2 files that you must read to end of to determine equality

I have an implemented algorithm for this I wrote a while ago..

Note if you're doing this over a network, MD5ing a chunk or the whole file instead of a byte, and applying the algorithym might be better - depends on net speed and file size
 

Latest posts

Back
Top