Results 1 to 5 of 5

Thread: Remove Duplicate lines in a text File

  1. #1
    mattkw80 is offline VB.NET Forum Enthusiast
    .NET Framework
    .NET 4.0
    Join Date
    Jan 2008
    Location
    Ontario, Canada
    Posts
    49
    Reputation
    69

    Remove Duplicate lines in a text File

    Hey Everyone,

    Having trouble removing Duplicate lines in a text file. (Exact Duplicate lines).

    If I have a text file called animals.txt, how can I use the StreamWriter to loop through a text file, and erase and duplicate lines?

    For example: If the text files says this...

    cat
    cat
    cat
    dog
    bird

    Then I would want the output to be:

    cat
    dog
    bird

    With all line spaces also left over.

    Any help is greatly appreciated.

    Matt

  2. #2
    jmcilhinney's Avatar
    jmcilhinney is offline VB.NET Forum Moderator
    .NET Framework
    .NET 4.0
    Join Date
    Aug 2004
    Location
    Sydney, Australia
    Posts
    11,504
    Reputation
    1553
    You don't need to use a StreamReader. There are easier ways, which include LINQ:
    Code:
    Dim lines As String() = IO.File.ReadAllLines("file path here")
    
    lines = lines.Distinct().ToArray()
    IO.File.WriteAllLines("file path here", lines)

  3. #3
    mattkw80 is offline VB.NET Forum Enthusiast
    .NET Framework
    .NET 4.0
    Join Date
    Jan 2008
    Location
    Ontario, Canada
    Posts
    49
    Reputation
    69
    Wow, that's incredible, you solved it in 3 lines of code. I googled this for hours, and nobody had a clear answer, or an answer under 20 lines of code.

    I know nothing of LINQ, but it sure is a time saver.

    Thank you so much.

  4. #4
    jmcilhinney's Avatar
    jmcilhinney is offline VB.NET Forum Moderator
    .NET Framework
    .NET 4.0
    Join Date
    Aug 2004
    Location
    Sydney, Australia
    Posts
    11,504
    Reputation
    1553
    The LINQ part is the '.Distinct().ToArray()'. Without LINQ you could do it like this:
    Code:
    Dim lines As String() = IO.File.ReadAllLines("file path here")
    Dim distinctLines As New HashSet(Of String)
    
    For Each line As String In lines
        distinctLines.Add(line)
    Next
    
    Array.Resize(lines, distinctLines.Count)
    
    distinctLines.CopyTo(lines)
    IO.File.WriteAllLines("file path here", lines)
    The generic HashSet class was added in .NET 3.5. To do the equivalent without the HashSet:
    Code:
    Dim lines As String() = IO.File.ReadAllLines("file path here")
    Dim distinctLines As New List(Of String)
    
    For Each line As String In lines
        If Not distinctLines.Contains(line) Then
            distinctLines.Add(line)
        End If
    Next
    
    lines = distinctLines.ToArray()
    IO.File.WriteAllLines("file path here", lines)

  5. #5
    JohnH's Avatar
    JohnH is offline VB.NET Forum Moderator
    .NET Framework
    .NET 4.0
    Join Date
    Dec 2005
    Location
    Norway
    Posts
    14,225
    Reputation
    2370
    Quote Originally Posted by jmcilhinney
    Without LINQ you could do it like this:
    Code:
    Dim distinctLines As New HashSet(Of String)
    
    For Each line As String In lines
        distinctLines.Add(line)
    Next
    
    Array.Resize(lines, distinctLines.Count)
    
    distinctLines.CopyTo(lines)
    or simpler using constructors to fill:
    Code:
    Dim distinctLines As New HashSet(Of String)(lines)
    
    lines = New List(Of String)(distinctLines).ToArray
    Note this ToArray method is not the Enumerable.ToArray (Linq) method used in first reply, but one specific to the List(Of T).

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Harvest time tracking