Results 1 to 10 of 10

Thread: Simple substitution for a long string?

  1. #1
    Super_Collector is offline VB.NET Forum Newbie
    .NET Framework
    .NET 3.5 (VS 2008)
    Join Date
    Aug 2008
    Posts
    7
    Reputation
    0

    Question Simple substitution for a long string?

    I have a 180,000 strings with each string being 300-1,100 characters in length. I want to do a simple substitution for each character of every string.

    For X0=1 to 180000
    LineIn = Origin(X0)
    Y0 = 0
    Do
    Y0 = Y0 + 1
    If asc(Mid(Linein, Y0, 1)) < 126 then Mid(Linein, Y0, 1) = chr(asc(Mid(Linein, Y0, 1)) + 126) else Mid(Linein, Y0, 1) = chr(asc(Mid(Linein, Y0, 1)) - 126)
    Loop until Y0 = Len(LineIn)
    Origin(X0) = LineIn
    Next

    The Do : Loop is a time killer. Is there a function that does simple substitution on the string without having to manually check every position?
    Last edited by Super_Collector; 06-10-2012 at 1:40 PM. Reason: typo in example code

  2. #2
    jmcilhinney's Avatar
    jmcilhinney is offline VB.NET Forum Moderator
    .NET Framework
    .NET 4.0
    Join Date
    Aug 2004
    Location
    Sydney, Australia
    Posts
    11,333
    Reputation
    1543
    This is an example of the sort of operation that gets a speed boost from using pointers, which are not supported by VB. You would benefit from creating a C# library that used unsafe code to perform that processing.

  3. #3
    Dunfiddlin's Avatar
    Dunfiddlin is offline VB.NET Forum Master
    .NET Framework
    .NET 4.0
    Join Date
    Jun 2012
    Posts
    253
    Reputation
    31
    Well, when you say simple substitution and it turns out not to be so simple ....

    What are you actually achieving here? As far as I can see, you'd be converting "z" to something that I probably can't show on this forum (or at least I'm too lazy to work out how) whilst a-umlaut becomes Backspace? If this is intended as some form of encryption there are far better ways to go about it, surely?

    For a start on your code, I'm not sure why you didn't go for ...

    For Y0 = 0 to LineIn.Length-1
    ....
    Next

    ... which would cut out at least one function.

    You'd also certainly benefit from creating a list from your strings rather than using the array (if not indeed replacing the array altogether) which would expose some useful methods. But it's hard to be specific until I have some idea of what you're actually trying to do (or indeed why!)

  4. #4
    Super_Collector is offline VB.NET Forum Newbie
    .NET Framework
    .NET 3.5 (VS 2008)
    Join Date
    Aug 2008
    Posts
    7
    Reputation
    0
    Here is the actual code I'm trying to optimize...

    It's the character by character substitution for 'Ask' (180,000 records with variable lengths) that's slowing me down.

    ---

    'Load Master Records
    X0 = 0 : Z0 = 0 : FileOpen(1, Path + "Master_File.ilx", OpenMode.Input, , , )
    Do
    X0 = X0 + 1 : Ask = LineInput(1)

    If X0 / Bar = Int(X0 / Bar) Then Form6.ProgressBar1.Increment(1) : Form6.Show()

    For Y0 = 1 To Len(Ask)
    If Asc(Mid(Ask, Y0, 1)) < 128 Then Mid(Ask, Y0, 1) = Chr(Asc(Mid(Ask, Y0, 1)) + 127) Else Mid(Ask, Y0, 1) = Chr(Asc(Mid(Ask, Y0, 1)) - 127)
    Next

    Items(X0) = Ask

    PC = Items(X0).Split("|")
    Dim exp As New Regex("-", RegexOptions.IgnoreCase)
    If exp.Matches(PC(22)).Count > 0 Then Z0 = Z0 + (26 - exp.Matches(PC(22)).Count)

    Loop Until EOF(1) : FileClose(1)

  5. #5
    Dunfiddlin's Avatar
    Dunfiddlin is offline VB.NET Forum Master
    .NET Framework
    .NET 4.0
    Join Date
    Jun 2012
    Posts
    253
    Reputation
    31
    Yes I know where the problem is. What I still don't know is why you need to do this at all. Why aren't the strings in the file in the form that you're actually going to use in the program? And it's difficult to suggest an alternative to a character by character substitution without some idea of what the substitution is actually meant to achieve. Does the new string absolutely have to have this very odd format? Why? And what do the original strings actually represent? The code doesn't really tell me anything when I have no real idea what all the variables actually represent.

  6. #6
    Herman is offline VB.NET Forum Miyagee
    .NET Framework
    .NET 4.0
    Join Date
    Oct 2011
    Location
    Montreal, QC, CA
    Posts
    448
    Reputation
    346
    Personally I would use a static table for this, and a byte array or Stringbuilder. Get rid of the Mid(), Asc(), and conditional blocks.

    Build a table in the form of an array with 256 elements (assuming ASCII). That will map replacement characters. Build it beforehand and keep it in memory.

    Example:
    ...
    arrLookupTable(126) = 253
    arrLookupTable(127) = 254
    arrLookupTable(128) = 1
    arrLookupTable(129) = 2
    ...

    Then:
    For i As Integer = 0 to arrInputString.Length - 1
    arrInputString(i) = arrLookupTable(arrInputString(i))
    Next


    That is really the same as you would do it using a lookup table and pointers in C. Although in C I would likely use a bitwise left circular shift.
    Last edited by Herman; 06-17-2012 at 11:58 AM.

  7. #7
    Dunfiddlin's Avatar
    Dunfiddlin is offline VB.NET Forum Master
    .NET Framework
    .NET 4.0
    Join Date
    Jun 2012
    Posts
    253
    Reputation
    31
    Quote Originally Posted by Herman View Post
    I would likely use a bitwise left circular shift.
    Oooh, fancy!

    The array is obviously the way to go but I'm not sure it really merits a look-up table given that it's only a choice of + or -127 and it's yet to be confirmed that all 256 characters are used in the master strings anyway. It may well turn out to be possible to do this in a calculation (the equivalent of your shift) which is why I was angling after information as to exactly what the process is meant to achieve (being a bear of little brain, I honestly can't imagine any practical purpose it serves)!

  8. #8
    Herman is offline VB.NET Forum Miyagee
    .NET Framework
    .NET 4.0
    Join Date
    Oct 2011
    Location
    Montreal, QC, CA
    Posts
    448
    Reputation
    346
    The purpose of the lookup table is to eliminate the CPU cycles it takes to repeat the same operations thousands of times. Instead you calculate your table once and you are done with it.

  9. #9
    Dunfiddlin's Avatar
    Dunfiddlin is offline VB.NET Forum Master
    .NET Framework
    .NET 4.0
    Join Date
    Jun 2012
    Posts
    253
    Reputation
    31
    Fair 'enuff but shouldn't it be ...

    1
    2
    3
    For i As Integer = 0 to arrInputString.Length - 1
    arrInputString(i) = arrLookupTable(Val(arrInputString(i)))
    Next

  10. #10
    Herman is offline VB.NET Forum Miyagee
    .NET Framework
    .NET 4.0
    Join Date
    Oct 2011
    Location
    Montreal, QC, CA
    Posts
    448
    Reputation
    346
    No need to cast, the arrays are already both of type Byte. The value of the input byte is used as the index to the lookup table, and the lookup table returns a replacement value of type Byte, that you just stick back into the original array. Val() would serve no purpose here.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Harvest time tracking