Question Data Scrub a Collection of Data in Class

gspeedtech

Member
Joined
Mar 13, 2010
Messages
11
Programming Experience
Beginner
How can I data scrub a collection of data?

I am working with existing VB.NET code for a Windows Application that uses StreamWriter and Serializer to output an XML document of transaction data. Code below.

Private TransactionFile As ProjectSchema.TransactionFile

Dim Serializer As New Xml.Serialization.XmlSerializer(GetType(ProjectSchema.TransactionFile))
Dim Writer As TextWriter
Dim FilePath As String

Writer = New StreamWriter(FilePath)

Serializer.Serialize(Writer, TransactionFile)
Writer.Close()

The XML document is being uploaded to another application that does not accept "crlf".

The "TransactionFile" is a collection of data in a Class named ProjectSchema.TransactionFile. It contains various data types.

There are 5 functions to create nodes that contribute to the creation of a Master Transaction file named TransactionFile

I need to find CRLF characters in the collection of data and replace the CRLF characters with a space.

I am able to replace illegal characters at the field level with

.Name = Regex.Replace((Mid(CustomerName.Name, 1, 30)), "[^A-Za-z0-9\-/]", " ")

But I need to scrub the entire collection of data.

If I try:

TransactionFile = Regex.Replace(TransactionFile, "[^A-Za-z0-9\-/]", " ")

I get "Conversion from type 'Transaction' to type 'String' is not valid" message.
 
Last edited:
Let's say that it's my job to rubber stamp an application form. If you have one application form you give it to me and I rubber stamp it, right? Now, what if you have a stack of application forms? Do you hand it to me and I rubber stamp the stack? Of course not. I go through the stack and rubber stamp each form one by one.

You need to do the same in this situation. This is exactly the sort of thing that loops are for. You want to do the same thing multiple times? Do it in a loop. You just need to decide which is the most appropriate loop to use: For, For Each, Do or While.
 
Thanks for the reply,

My problem is, according to my limited knowledge, "Replace" only works on String objects.
How do you For/Each through a collection of Objects and target only the String objects?
 
Pardon my failure to adequately describe my inquiry.

This is an example of the code I am dealing with to create a transaction that is converted to XML and then exported. (I have changed the names in hopes of retaining confidentiality hopefully without breaking it)

VB.NET:
    Function BuildTransaction(ByVal DSTransaction As DataSetTransactions.TableTransactionsRow, ByVal SeqNumber As Integer, _
        ByVal b_GetCustomer As GetCustomer, ByVal c_GetProduct As GetProduct, ByVal d_GetAccount As GetAccount) _
     As ProjectSchema.Transaction

        Dim z_Transaction As New ProjectSchema.Transaction
        Dim x_Transactions As New DataSetTransactions
        Dim Type As String
        
        Try

            'Create MasterTransaction node
            psTransaction.MasterTransaction = GetMasterTransaction(a_Transaction)
            Type = z_Transaction.MasterTransaction.TransactionType
            z_Transaction.MasterTransaction.TransactionSeqNumber = SeqNumber


            If Type <> "Cancel" Then

                'Create Customer node
                b_GetCustomer.Invoke(DSTransaction.ID, z_Transaction, DSTransaction, SeqNumber)

                'Create Product nodes
                c_GetProduct.Invoke(DSTransaction.ID, z_Transaction, DSTransaction, SeqNumber)
                
                'Create Account node
                d_GetAccount.Invoke(DSTransaction.ID, z_Transaction, DSTransaction, SeqNumber)
             
            End If
			
			Catch ex As Exception
			Error = String.Concat(Error, vbCrLf, ex.Message, vbCrLf, ex.StackTrace)
            Me.m_oErrors.Add(Error)
            z_Transaction = Nothing
			
		End Try

        Return z_Transaction

    End Function

If I try :
VB.NET:
Dim TransactionData as datarow
For Each TransactionData In z_Transaction
                Regex.Replace(TransactionData, "vbCrLf", " ")
        Next
End If

I get a "datarow cannot be converted to a String" error on the "Replace" line of code.

I am trying to create a reuseable Function to replace vbCrLf with a space.

Perhaps I am going about it the wrong way?
 
First up, why are you using Regex.Replace when you're not using a pattern? If your doing a straight substring replacement then you should just be using String.Replace.

Secondly, whether you use Regex.Replace or String.Replace, the original String doesn't change. Both methods return a new String containing the replacements. Even if your code did run, it wouldn't do anything useful because you're not using the new String returned by the Replace method.

Finally, a DataRow is not a String. It contains multiple fields of any data type. If you want the data from a particular field then you need to get the data from that field, either by column name or numeric index.
 
Thanks for the reply,

First the reason I am using Regex.Replace is because I used it with success at the field level.
.Name = Regex.Replace((Mid(CustomerName.Name, 1, 30)), "[^A-Za-z0-9\-/]", " ")

I changed the criteria to "vbCrLf", " " just to reduce it to the minimum requirement.

Second, I am trying to get the Replace to work before completing my code to use the new string.

Finally, I realize a datarow is not a String and that is the root of my question. If replace only works with String, how can I parse through a datarow, and remove "vbCrLf"
 
Are you saying, without actually saying, that you don't know how to access a specific field in a DataRow. I would guess that there are thousands of code examples on the web that would demonstrate that. Here's one I found with a simple search:

VB.NET DataRow Tips
 
I can use this:

VB.NET:
Public Function ScrubData(ByRef STransaction)
        
        Dim sTable As New ProjectSchema.Transaction()
        Dim row1 As String
        sTable = STransaction
        row1 = STransaction.LastName()
                  If row1 <> " " Or row1 <> "" Then
                       Regex.Replace((row1), "[^A-Za-z0-9\-/]", " ")
                  End If
  End Function

to access individual fields ie sTransaction.LastName or sTransaction.FirstName, no problem.

What I want is to access all the fields in sTable. If try something like:

VB.NET:
Public Function ScrubData(ByRef STransaction)
        
        Dim sTable As New PointSchema.TransactionMaster()
        Dim row1 As String
        sTable = STransaction
        row1 = STransaction.[I]I dont know [/I]()
        'row1 = sTable.InsuredLastName()
        For Each row1 In sTable
            If row1 <> " " Or row1 <> "" Then
                Regex.Replace((row1), "[^A-Za-z0-9\-/]", " ")
            End If
        Next
End Function

I cant figure out how to get row1 to equal a variable for each field in sTable.
 
You can't turn multiple fields into a single String, or not in a useful way anyway. If you have multiple fields then you must work with each one individually. Generally, when you want to do the same thing multiple times, you use a loop. Assuming that all the columns in your table contains Strings, you could loop through the columns of the DataTable and, for each one, index the row appropriately to get the field value.

Something you have apparently missed is that Regex.Replace is not going to make any changes to the String you pass in. It returns a new String containing the changes, so you have to use that new String. If you want to make changes to your DataRow then you must get a field value, process it and then assign the result back to the field.
 
My main problem has been lack of knowledge. And Not knowing how to ask the right questions.

Now that I know that I am dealing with properties and values of a Custom Class, I was able to find a solution to accomplish what I need through reflection:

VB.NET:
Public Function ScrubData(ByRef STransaction)
    Dim sTable As New ProjectSchema.Location    
    Dim property1 As Object
    Dim value1 As Object

    sTable = STransaction

    For Each p As System.Reflection.PropertyInfo In sTable.GetType().GetProperties()
        If p.CanRead Then
            property1 = p.Name 'for testing to identify Property Name
                value1 = p.GetValue(sTable, Nothing)
                If value1 <> " " Then
                    Regex.Replace((value1), "[^A-Za-z0-9\-/]", " ")
                End If        
        End If
    Next

End Function
 
Back
Top