Results 1 to 1 of 1

Thread: scrape html table - question...

  1. #1
    TeachMe is offline VB.NET Forum Newbie
    .NET Framework
    .NET 3.5
    Join Date
    Jun 2012
    Posts
    14
    Reputation
    58

    Lightbulb scrape html table - question...

    Say I have a table that will always contain RANDOM DATA (various product titles, prices, & ratings in no particular order). I noticed that sometimes either the "Price:" column or "Rating" column won't always have a value. So when I'm scraping multiple items into an array & sending each column into a listview, the data won't sync up properly if a value is missing in say the "Price" column.

    Here is an example of a html table that I'm trying to scrape data from, but notice how row "# 5" is missing the price. This is what's messing up the syncing of the data while it's being added to the listview in VB.NET:

    HTML Code:
    <html>
    <head>
    <style>
    
    table {
    margin:auto;
    margin-top:50px;
    font-family: arial, sans-serif;
    border-collapse: collapse;
    width: 40%;
    }
    
    td{
    border: 3px solid #000;
    text-align: left;
    padding: 3px;
    }
    
    th {
    border: 3px solid #000;
    background-color:gold;
    text-align: left;
    padding: 3px;
    }
    
    tr:nth-child(even) {
    background-color: #dddddd;
    }
    </style>
    </head>
    <body>
        <table>
            <tr><th>#</th><th>Product Title:</th><th width="60">Price:</th><th width="60">Rating:</th></tr>
            <tr><td width="20">1</td><td class="ProductTitle">Minera Natural Dead Sea Salt, 5lbs Bulk Bag - Fine Grain</td><td class="Price">$20.00</td><td class="Rating">9/10</td></tr>
            <tr><td width="20">2</td><td class="ProductTitle">Minera Dead Sea Salt 2lb Bag Fine Grain, 100% Pure Mineral Salt Treatment</td><td class="Price">$9.99</td><td class="Rating">6/10</td></tr>
            <tr><td width="20">3</td><td class="ProductTitle">Minera Pure Dead Sea Salt 10lbs Fine Grain</td><td class="Price">$15.95</td><td>8/10</td></tr>
            <tr><td width="20">4</td><td class="ProductTitle">Dead Sea Warehouse - Amazing Minerals Dead Sea Bath Salts, Temporary Relief from...</td><td class="Price">$16.00</td><td class="Rating">5/10</td></tr>
            <tr><td width="20">5</td><td class="ProductTitle">Natural Planet Dead Sea Salt, 5lbs Fine Grain - 100% Pure Bath Salt - For Psoriasis...</td><td></td><td class="Rating">5/10</td></tr>
            <tr><td width="20">6</td><td class="ProductTitle">Art Naturals Himalayan Salt Body Scrub 20oz -Deep Cleansing Exfoliator With Shea...</td><td class="Price">$13.95</td><td class="Rating">7/10</td></tr>
            <tr><td width="20">7</td><td class="ProductTitle">Dead Sea Salt 2.2lb try for Psoriasis, Eczema, and Dermatitis (1 x Resealable...</td><td class="Price">$9.99</td><td class="Rating">4/10</td></tr>
            <tr><td width="20">8</td><td class="ProductTitle">Premier Dead Sea Aromatherapy Mineral Body Treatment, Silver, Salt Scrub, 425...</td><td class="Price">$15.95</td><td class="Rating">8/10</td></tr>
            <tr><td width="20">9</td><td class="ProductTitle">Dead Sea Warehouse - Amazing Minerals Dead Sea Bath Salts, Temporary Relief from...</td><td class="Price">$16.00</td><td class="Rating">6/10</td></tr>
            <tr><td width="20">10</td><td>Natural Planet Dead Sea Salt, 50lbs Fine Grain - 100% Pure Bath Salt - For Psoriasis...</td><td class="Price">$90.25</td><td class="Rating">10/10</td></tr>
    
        </table>
    </body>
    </html>

    Now here is an example of what I'm using in VB.NET to collect data from this table:


    Code:
    Imports System.Text.RegularExpressions
    Public Class Form1
        Dim ITEM As New ListViewItem
        Dim ProductTitle As String
        Dim ProductPrice As String
        Dim ProductRating As String
    
        Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
            ListView1.Items.Clear()
            ProductTitle = ""
            ProductPrice = ""
            ProductRating = ""
    
            Dim keyword As String = TextBox1.Text
            keyword = keyword.Replace(" ", "+")
            Try
        'This is the HTML Table That I'm talking about:
                Dim html As String = "THE HTML TABLE SPECIFIED"
      
                'Product Title:
                Dim regx1 As New Regex("td class=""ProductTitle"">.+?</td>", RegexOptions.IgnoreCase)
                Dim matches1 As MatchCollection = regx1.Matches(html)
                For Each match1 As Match In matches1
                    ProductTitle += match1.Value & "^"
                    ProductTitle = ProductTitle.Replace("td class=""ProductTitle"">", "").Replace("</td>", "")
                Next
    
                'Price:
                Dim regx As New Regex("td class=""ProductPrice"">.+?</td>", RegexOptions.IgnoreCase)
                Dim matches As MatchCollection = regx.Matches(html)
                For Each match As Match In matches
                    ProductPrice += match.Value & "^"
                    ProductPrice = ProductPrice.Replace("td class=""ProductPrice"">", "").Replace("</td>", "")
                Next
    
         'Rating:
                Dim regx As New Regex("td class=""ProductRating"">.+?</td>", RegexOptions.IgnoreCase)
                Dim matches As MatchCollection = regx.Matches(html)
                For Each match As Match In matches
                    ProductRating += match.Value & "^"
                    ProductRating = ProductRating.Replace("td class=""ProductRating"">", "").Replace("</td>", "")
                Next
    
                'Create the split & add all items to listview:
                Dim split1() As String = ProductTitle.Split("^")
                Dim split2() As String = ProductPrice.Split("^")
                Dim split3() As String = ProductRating.Split("^")
    
    
                For i = 0 To split1.Count - 2
                    ITEM = ListView1.Items.Add(split1(i))
                    ITEM.SubItems.Add(split2(i))
         ITEM.SubItems.Add(split3(i))
                Next
    
            Catch ex As Exception
    
            End Try
        End Sub
    End Class


    Again, the problem is that sometimes I won't know which table is going to have some elements missing (such as the "Price" column) which causes the data NOT to be synced up in the rows of the ListView. How could I fix this with the code that I've written above? Thanks.
    Last edited by TeachMe; 12-24-2016 at 6:25 PM.

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •