How could I extract data from txt included in this post

shurb

Member
Joined
Oct 9, 2007
Messages
16
Programming Experience
1-3
:confused:I am somewhat a newb to vB.Net and need some help on how to extract specific info from the piece of an html file listed below.
I need the following pieces of text specifically and I really am at a loss to what to do to accomplish this.

I need this info:
0725676401,MISDEMEANOR,RESISTING PUBLIC OFFICER,500,SECURED,,
0701831801,MISDEMEANOR,DRUG PARAPHERNALIA - POSSESSION OF,500,SECURED,ORDER FOR ARREST


This is part of the html file I need it from which can change quite often. ,
HTML:
<table cellspacing="0" border="0" id="ctl00_cphMainContent_gvCharges" width="100%">
		<tr class="gridHeaderFixedHeader">
			<th align="left" scope="col"><font face="Courier New" size="2">Court Case</font></th><th align="left" scope="col"><font face="Courier New" size="2">Arrest Type</font></th><th scope="col"><font face="Courier New" size="2">Charge Description</font></th><th align="right" scope="col"><font face="Courier New" size="2">Bond $ Amount</font></th><th scope="col"><font face="Courier New" size="2">Type</font></th><th scope="col"><font face="Courier New" size="2">Arrest Process</font></th>
		</tr><tr bgcolor="#333333">
			<td align="left" width="12%"><font face="Courier New" color="White" size="2">0725676401</font></td><td align="left" width="12%"><font face="Courier New" color="White" size="2">MISDEMEANOR</font></td><td width="35%"><font face="Courier New" color="White" size="2">RESISTING PUBLIC OFFICER</font></td><td align="right" width="8%"><font face="Courier New" color="White" size="2">500</font></td><td align="center" width="9%"><font face="Courier New" color="White" size="2">SECURED</font></td><td align="center" width="24%"><font face="Courier New" color="White" size="2"> </font></td>
		</tr><tr bgcolor="Black">
			<td align="left" width="12%"><font face="Courier New" color="White" size="2">0701831801</font></td><td align="left" width="12%"><font face="Courier New" color="White" size="2">MISDEMEANOR</font></td><td width="35%"><font face="Courier New" color="White" size="2">DRUG PARAPHERNALIA - POSSESSION OF</font></td><td align="right" width="8%"><font face="Courier New" color="White" size="2">500</font></td><td align="center" width="9%"><font face="Courier New" color="White" size="2">SECURED</font></td><td align="center" width="24%"><font face="Courier New" color="White" size="2">ORDER FOR ARREST</font></td>
		</tr>
	</table>
 
Last edited by a moderator:
I loaded the html you posted into a WebBrowser control and used the DOM to get the elements, not sure exactly where you're going with this info, but I just put it into string collections and formatted the output as lines with comma like you asked:
VB.NET:
Dim lines As New List(Of String)
Dim fields As New List(Of String)
Dim table As HtmlElement = Me.WebBrowser1.Document.GetElementsByTagName("table")(0) 'gets first table in document
For Each row As HtmlElement In table.GetElementsByTagName("tr")
    For Each cell As HtmlElement In row.GetElementsByTagName("td")
        fields.Add(cell.InnerText)
    Next
    lines.Add(String.Join(",", fields.ToArray))
    fields.Clear()
Next
MsgBox(String.Join(vbNewLine, lines.ToArray))
 
So that works if I set the webbrowser to a static url, (in the case a file I download). My issue I encounter now is trying to loop through a directory of downloaded urls which all contain the same format for the info I need. I get the following error with code pasted below:
Line of code that errors out:
VB.NET:
Dim table As HtmlElement = WebBrowser1.Document.GetElementsByTagName("table")(0) 'gets first table in document
Error Received:
Value of '0' is not valid for 'index'. 'index' should be between 0 and -1.
Parameter name: index

Full code listing:
VB.NET:
For Each FILE_NAME In Directory.GetFiles("c:\cmeck\" & r)
            'MsgBox(FILE_NAME)

            WebBrowser1.Navigate(FILE_NAME)
            Dim lines As New List(Of String)
            Dim fields As New List(Of String)
            Dim table As HtmlElement = WebBrowser1.Document.GetElementsByTagName("table")(0) 'gets first table in document
            For Each row As HtmlElement In table.GetElementsByTagName("tr")
                For Each cell As HtmlElement In row.GetElementsByTagName("td")
                    fields.Add(cell.InnerText)
                Next
                lines.Add(String.Join(",", fields.ToArray))
                'fields.Clear()
            Next
            Dim fileWrite As String = "C:\cmeck\testitems\testelemental2222.txt"
            Dim objWriter As New System.IO.StreamWriter(fileWrite, True)
            objWriter.Write(String.Join(vbNewLine & ",", lines.ToArray))
            objWriter.Close()
        Next
 
Last edited by a moderator:
Calling Navigate then immediately accessing the Document is probably not a good idea, because as you've seen in any browser it takes some time to download and display a webpage. There is a DocumentCompleted event that you can use, it usually triggers several times during the load of a webpage but you can check here if the browsers ReadyState is Complete before you start processing.
 
Ok, so I have this piece and I am still getting the error, (Value of 0 is not valid), I listed above. The one thing I noticed is that if I add a msgbox, (which is a pain since the user would have to click ok a million times), the page loads and I get the data just fine. FOr some reason I cannot get the documentcompleted part to work for me. Any suggestions?

VB.NET:
Dim lines As New List(Of String)
            Dim fields As New List(Of String)
            If (WebBrowser1.IsBusy = True) Then
               
            ElseIf (WebBrowser1.IsBusy = False) Then



            End If
[B](If I leave this in then it works) [/B]           MsgBox(WebBrowser1.Url.ToString)
[B](This line is where it craps out) [/B]           
Dim table As HtmlElement = WebBrowser1.Document.GetElementsByTagName("table")(0) 'gets first table in document
            For Each row As HtmlElement In table.GetElementsByTagName("tr")
                For Each cell As HtmlElement In row.GetElementsByTagName("td")
                    fields.Add(cell.InnerText)
                Next
                lines.Add(String.Join(",", fields.ToArray))
                fields.Clear()
            Next
            Dim strCleanFileName As String
            FILE_NAME = FILE_NAME.Remove(0, 38)
            strCleanFileName = FILE_NAME.Remove(14, 4)
            Dim fileWrite As String = "C:\cmeck\testitems\" & strCleanFileName & "txt"
            Dim objWriter As New System.IO.StreamWriter(fileWrite, True)
            objWriter.Write(String.Join(vbNewLine & ",", lines.ToArray))
            objWriter.Close()
        Next
 
Last edited by a moderator:
I said in previous post that you should check in DocumentCompleted event if the browsers ReadyState is Complete before you start processing. You haven't done that. It should be something like this:
VB.NET:
if WebBrowser1.readystate=readystate.complete then 
' it's a go.
 
Any idea what is wrong with this code, not saving as 1 line with comma delimeters

Here is the code:

VB.NET:
For Each FILE_NAME In Directory.GetFiles("c:\cmeck\" & r)


            WebBrowser1.Navigate(FILE_NAME)
            Dim lines As New List(Of String)
            Dim fields As New List(Of String)

            While (WebBrowser1.ReadyState <> WebBrowserReadyState.Complete)
                Application.DoEvents()
            End While

            Dim table As HtmlElement = WebBrowser1.Document.GetElementsByTagName("table")(0) 'gets first table in document
            For Each row As HtmlElement In table.GetElementsByTagName("tr")
                For Each cell As HtmlElement In row.GetElementsByTagName("td")
                    fields.Add(cell.InnerText)
                    lines.Add(String.Join(",", fields.ToArray))
                Next

                fields.Clear()
            Next

            Dim strCleanFileName As String
            FILE_NAME = FILE_NAME.Remove(0, 38)
            strCleanFileName = FILE_NAME
            Dim fileWrite As String = "C:\cmeck\forupload\" & r & "\" & strCleanFileName
            Dim objWriter As New System.IO.StreamWriter(fileWrite, True)
            objWriter.Write(String.Join(",", lines.ToArray))
            objWriter.Close()

        Next

The resulting file, (as I hoped), should be one line with comma delimeters between each item it finds in between the <td> tags. Anyone see something I am missing or have incorrect?
 
Last edited by a moderator:
So what is the problem? Except the file path which can't be interpreted from the info posted the code looks the same as what I posted, it worked last time I played it.
 
It ends up duplicating the items and giving me a format like this:

Name :BURCH, ELIJAH Arrest#: 1347495 1332672 1280084 1279568 1275485 All
Alias:pID#:327774
DOB:09/24/1989Race/Sex:B/M
Height:5'07"Weight:145
Arrested:12/18/2007
At:16:29By:CMPD
Address:1335 LONGBRANCH GASTONIA NC

Charges for Arrest #: 1347495
Court CaseArrest TypeCharge DescriptionBond $ AmountTypeArrest Process
0725925201MISDEMEANORSTOLEN GOODS - POSSESSION OF (MISDEMEANOR)1000SECURED



Living / Working / Governing / Visiting / Contacts
Jobs / Services / Departments / Using this Site , ,,,,,,,, ,,,,
Name :BURCH, ELIJAH Arrest#: 1347495 1332672 1280084 1279568 1275485 All
Alias:pID#:327774
DOB:09/24/1989Race/Sex:B/M
Height:5'07"Weight:145
Arrested:12/18/2007
At:16:29By:CMPD
Address:1335 LONGBRANCH GASTONIA NC

Charges for Arrest #: 1347495
Court CaseArrest TypeCharge DescriptionBond $ AmountTypeArrest Process
0725925201MISDEMEANORSTOLEN GOODS - POSSESSION OF (MISDEMEANOR)1000SECURED

, , , , ,,Name :,BURCH, ELIJAH ,Arrest#:, 1347495 1332672 1280084 1279568 1275485 All,Alias:,,PID#:,327774,DOB:09/24/1989Race/Sex:B/M
Height:5'07"Weight:145,DOB:,09/24/1989,Race/Sex:,B/M,Height:,5'07",Weight:,145,Arrested:,12/18/2007,At:16:29,By:CMPD,Address:,1335 LONGBRANCH GASTONIA NC ,Charges for Arrest #: 1347495,Court CaseArrest TypeCharge DescriptionBond $ AmountTypeArrest Process
0725925201MISDEMEANORSTOLEN GOODS - POSSESSION OF (MISDEMEANOR)1000SECURED,0725925201,MISDEMEANOR,STOLEN GOODS - POSSESSION OF (MISDEMEANOR),1000,SECURED,, , Living / Working / Governing / Visiting / Contacts
Jobs / Services / Departments / Using this Site ,Living / Working / Governing / Visiting / Contacts ,Jobs / Services / Departments / Using this Site ,
,
,
, Sheriff's Office Website
,
, ,,,,,,,, ,,,
,,,,,,,, ,
,
,
,
Name :BURCH, ELIJAH Arrest#: 1347495 1332672 1280084 1279568 1275485 All
Alias:pID#:327774
DOB:09/24/1989Race/Sex:B/M
Height:5'07"Weight:145
Arrested:12/18/2007
At:16:29By:CMPD
Address:1335 LONGBRANCH GASTONIA NC

Charges for Arrest #: 1347495
Court CaseArrest TypeCharge DescriptionBond $ AmountTypeArrest Process
0725925201MISDEMEANORSTOLEN GOODS - POSSESSION OF (MISDEMEANOR)1000SECURED

, , , , ,,Name :,BURCH, ELIJAH ,Arrest#:, 1347495 1332672 1280084 1279568 1275485 All,Alias:,,PID#:,327774,DOB:09/24/1989Race/Sex:B/M
Height:5'07"Weight:145,DOB:,09/24/1989,Race/Sex:,B/M,Height:,5'07",Weight:,145,Arrested:,12/18/2007,At:16:29,By:CMPD,Address:,1335 LONGBRANCH GASTONIA NC ,Charges for Arrest #: 1347495,Court CaseArrest TypeCharge DescriptionBond $ AmountTypeArrest Process
0725925201MISDEMEANORSTOLEN GOODS - POSSESSION OF (MISDEMEANOR)1000SECURED,0725925201,MISDEMEANOR,STOLEN GOODS - POSSESSION OF (MISDEMEANOR),1000,SECURED,,
, , , , ,
,Name :,BURCH, ELIJAH ,Arrest#:, 1347495 1332672 1280084 1279568 1275485 All
,Alias:,,PID#:,327774
,DOB:09/24/1989Race/Sex:B/M
Height:5'07"Weight:145,DOB:,09/24/1989,Race/Sex:,B/M,Height:,5'07",Weight:,145,Arrested:,12/18/2007
,DOB:,09/24/1989,Race/Sex:,B/M
,Height:,5'07",Weight:,145
,At:16:29,By:CMPD
,Address:,1335 LONGBRANCH GASTONIA NC
,Charges for Arrest #: 1347495
,Court CaseArrest TypeCharge DescriptionBond $ AmountTypeArrest Process
0725925201MISDEMEANORSTOLEN GOODS - POSSESSION OF (MISDEMEANOR)1000SECURED,0725925201,MISDEMEANOR,STOLEN GOODS - POSSESSION OF (MISDEMEANOR),1000,SECURED,
,
,0725925201,MISDEMEANOR,STOLEN GOODS - POSSESSION OF (MISDEMEANOR),1000,SECURED,
,
, Living / Working / Governing / Visiting / Contacts
Jobs / Services / Departments / Using this Site ,Living / Working / Governing / Visiting / Contacts ,Jobs / Services / Departments / Using this Site
,Living / Working / Governing / Visiting / Contacts
,Jobs / Services / Departments / Using this Site
 
Last edited by a moderator:
ok, I see you changed the loop a little, are you trying to add all cells regardless of row into one long line? If so, add all fields in the loop (no clear). Join afterwards.
 
Back
Top