Sideway
output.to from Sideway


Built-in Componet


Regular Expression




Draft for Information Only

Content

Regular Expression Backtracking
 Linear Comparison Without Backtracking
 Backtracking with Optional Quantifiers or Alternation Constructs
 Backtracking with Nested Optional Quantifiers
 Controlling Backtracking
  Defining a Time-out Interval
  Nonbacktracking Subexpression
  Lookbehind Assertions
  Lookahead Assertions
 Scanning of Valid Email Format
 See also
 Source/Referencee

Regular Expression Backtracking

Backtracking occurs when a regular expression pattern contains optional quantifiers or alternation constructs, and the regular expression engine returns to a previous saved state to continue its search for a match. Backtracking is central to the power of regular expressions; it makes it possible for expressions to be powerful and flexible, and to match very complex patterns. At the same time, this power comes at a cost. Backtracking is often the single most important factor that affects the performance of the regular expression engine. Fortunately, the developer has control over the behavior of the regular expression engine and how it uses backtracking. This topic explains how backtracking works and how it can be controlled.

In general, a Nondeterministic Finite Automaton (NFA) engine like .NET regular expression engine places the responsibility for crafting efficient, fast regular expressions on the developer.

This topic contains the following sections:

Linear Comparison Without Backtracking

If a regular expression pattern has no optional quantifiers or alternation constructs, the regular expression engine executes in linear time. That is, after the regular expression engine matches the first language element in the pattern with text in the input string, it tries to match the next language element in the pattern with the next character or group of characters in the input string. This continues until the match either succeeds or fails. In either case, the regular expression engine advances by one character at a time in the input string.

If a regular expression pattern includes no optional quantifiers or alternation constructs, the maximum number of comparisons required to match the regular expression pattern with the input string is roughly equivalent to the number of characters in the input string.In other words, the regular expression engine runs in near-linear time if it contains no optional quantifiers or alternation constructs.

Backtracking with Optional Quantifiers or Alternation Constructs

When a regular expression includes optional quantifiers or alternation constructs, the evaluation of the input string is no longer linear. Pattern matching with an NFA engine is driven by the language elements in the regular expression and not by the characters to be matched in the input string. Therefore, the regular expression engine tries to fully match optional or alternative subexpressions. When it advances to the next language element in the subexpression and the match is unsuccessful, the regular expression engine can abandon a portion of its successful match and return to an earlier saved state in the interest of matching the regular expression as a whole with the input string. This process of returning to a previous saved state to find a match is known as backtracking.

Generally, if a regular expression pattern has a single alternation construct or a single optional quantifier, the number of comparison operations required to match the pattern is more than twice the number of characters in the input string.

Backtracking with Nested Optional Quantifiers

The number of comparison operations required to match a regular expression pattern can increase exponentially if the pattern includes a large number of alternation constructs, if it includes nested alternation constructs, or, most commonly, if it includes nested optional quantifiers.

the regular expression engine took about twice as long to find that an input string did not match the pattern as it did to identify a matching string. This is because an unsuccessful match always represents a worst-case scenario. The regular expression engine must use the regular expression to follow all possible paths through the data before it can conclude that the match is unsuccessful, and the nested parentheses create many additional paths through the data.

Comparison of the input string with the regular expression continues in this way until the regular expression engine has tried all possible combinations of matches, and then concludes that there is no match. Because of the nested quantifiers, this comparison is an O(2n) or an exponential operation, where n is the number of characters in the input string. This means that in the worst case, an input string of 30 characters requires approximately 1,073,741,824 comparisons, and an input string of 40 characters requires approximately 1,099,511,627,776 comparisons. If you use strings of these or even greater lengths, regular expression methods can take an extremely long time to complete when they process input that does not match the regular expression pattern.

Controlling Backtracking

Backtracking lets you create powerful, flexible regular expressions. However, as the previous section showed, these benefits may be coupled with unacceptably poor performance. To prevent excessive backtracking, you should define a time-out interval when you instantiate a Regex object or call a static regular expression matching method. This is discussed in the next section. In addition, .NET supports three regular expression language elements that limit or suppress backtracking and that support complex regular expressions with little or no performance penalty: nonbacktracking subexpressions, lookbehind assertions, and lookahead assertions.

Defining a Time-out Interval

Starting with the .NET Framework 4.5, you can set a time-out value that represents the longest interval the regular expression engine will search for a single match before it abandons the attempt and throws a RegexMatchTimeoutException exception. You specify the time-out interval by supplying a TimeSpan value to the Regex.Regex(String, RegexOptions, TimeSpan) constructor for instance regular expressions. In addition, each static pattern matching method has an overload with a TimeSpan parameter that allows you to specify a time-out value. By default, the time-out interval is set to Regex.InfiniteMatchTimeout and the regular expression engine does not time out.

We recommend that you always set a time-out interval if your regular expression relies on backtracking.

A RegexMatchTimeoutException exception indicates that the regular expression engine was unable to find a match within the specified time-out interval but does not indicate why the exception was thrown. The reason might be excessive backtracking, but it is also possible that the time-out interval was set too low given the system load at the time the exception was thrown. When you handle the exception, you can choose to abandon further matches with the input string or increase the time-out interval and retry the matching operation.

Nonbacktracking Subexpression

The (?> subexpression) language element suppresses backtracking in a subexpression. It is useful for preventing the performance problems associated with failed matches.

Lookbehind Assertions

.NET includes two language elements, (?<=subexpression) and (?<!subexpression), that match the previous character or characters in the input string. Both language elements are zero-width assertions; that is, they determine whether the character or characters that immediately precede the current character can be matched by subexpression, without advancing or backtracking.

(?<= subexpression ) is a positive lookbehind assertion; that is, the character or characters before the current position must match subexpression. (?<!subexpression) is a negative lookbehind assertion; that is, the character or characters before the current position must not match subexpression. Both positive and negative lookbehind assertions are most useful when subexpression is a subset of the previous subexpression.

Lookahead Assertions

.NET includes two language elements, (?=subexpression) and (?!subexpression), that match the next character or characters in the input string. Both language elements are zero-width assertions; that is, they determine whether the character or characters that immediately follow the current character can be matched by subexpression, without advancing or backtracking.

(?= subexpression ) is a positive lookahead assertion; that is, the character or characters after the current position must match subexpression. (?!subexpression) is a negative lookahead assertion; that is, the character or characters after the current position must not match subexpression. Both positive and negative lookahead assertions are most useful when subexpression is a subset of the next subexpression.

 

Scanning of Valid Email Format

Examples of Scanning of Valid Email Format.

Vb.NET ASPX File:
<%@ Page Language="VB" %>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
    <head>
       <title>Sample Page</title>
       <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
       <script runat="server">
           Sub Page_Load()
               Dim xstring As String = "dahhhvid.johgnes@prghjjosehware.com"
               Dim xmatchstr As String = ""
               lbl01.Text = xmatchstr
           End Sub
           Function showresult(xstring,xpattern)
               Dim xmatch As Match
               Dim xcaptures As CaptureCollection
               Dim ycaptures As CaptureCollection
               Dim xgroups As GroupCollection
               Dim xmatchstr As String = ""
               Dim xint As Integer
               Dim yint As Integer
               Dim zint As Integer
               xmatchstr = xmatchstr & "<br />Result of Regex.Match(string,""" & Replace(xpattern,"<","&lt;") & """): """
               xmatch = Regex.Match(xstring,xpattern)
               xmatchstr = xmatchstr & xmatch.value & """<br />"
               xcaptures = xmatch.Captures
               xmatchstr = xmatchstr & "->Result of CaptureCollection.Count: """
               xmatchstr = xmatchstr & xcaptures.Count & """<br />"
               For xint = 0 to xcaptures.Count - 1
                   xmatchstr = xmatchstr & "->->Result of CaptureCollection("& xint & ").Value, Index, Length: """
                   xmatchstr = xmatchstr & xcaptures(xint).Value & ", " & xcaptures(xint).Index & ", " & xcaptures(xint).Length & """<br />"
                   xgroups = xmatch.Groups
                   xmatchstr = xmatchstr & "->Result of GroupCollection.Count: """
                   xmatchstr = xmatchstr & xgroups.Count & """<br />"
                   For yint = 0 to xgroups.Count - 1
                       xmatchstr = xmatchstr & "->->Result of GroupCollection("& yint & ").Value, Index, Length: """
                       xmatchstr = xmatchstr & xgroups(yint).Value & ", " & xgroups(yint).Index & ", " & xgroups(yint).Length & """<br />"
                       ycaptures = xgroups(yint).Captures
                       xmatchstr = xmatchstr & "->->->Result of CaptureCollection.Count: """
                       xmatchstr = xmatchstr & ycaptures.Count & """<br />"
                       For zint = 0 to ycaptures.Count - 1
                           xmatchstr = xmatchstr & "->->->->Result of CaptureCollection("& zint & ").Value, Index, Length: """
                           xmatchstr = xmatchstr & ycaptures(zint).Value & ", " & ycaptures(zint).Index & ", " & ycaptures(zint).Length & """<br />"
                       Next
                   Next
               Next
               Return xmatchstr
           End Function
       </script>
    </head>
    <body>
       <% Response.Write ("<h1>This is a Sample Page of Scanning of of Valid Email Format</h1>") %>
       <p>
           <%-- Set on Page_Load --%>
           <asp:Label id="lbl01" runat="server" />
       </p>
    </body>
</html>
HTML Web Page Embedded Output:

See also

Source/Referencee


©sideway
close

References

  1. Active Server Pages, http://msdn.microsoft.com/en-us/library/aa286483.aspx, updated 20Dec2012
  2. ASP Overview, http://msdn.microsoft.com/en-us/library/ms524929%28v=vs.90%29.aspx, updated 21Dec2012
  3. ASP Best Practices, http://technet.microsoft.com/en-us/library/cc939157.aspx, updated 15Apr2013
  4. ASP Built-in Objects, http://msdn.microsoft.com/en-us/library/ie/ms524716(v=vs.90).aspx, updated 20Dec2012
  5. Response Object, http://msdn.microsoft.com/en-us/library/ms525405(v=vs.90).aspx, updated 20Dec2012
  6. Request Object, http://msdn.microsoft.com/en-us/library/ms524948(v=vs.90).aspx, updated 20Dec2012
  7. Server Object (IIS), http://msdn.microsoft.com/en-us/library/ms525541(v=vs.90).aspx, updated 20Dec2012
  8. Application Object (IIS), http://msdn.microsoft.com/en-us/library/ms525360(v=vs.90).aspx, updated 20Dec2012
  9. Session Object (IIS), http://msdn.microsoft.com/en-us/library/ms524319(8v=vs.90).aspx, updated 20Dec2012
  10. ASPError Object, http://msdn.microsoft.com/en-us/library/ms524942(v=vs.90).aspx, updated 20Dec2012
  11. ObjectContext Object (IIS), http://msdn.microsoft.com/en-us/library/ms525667(v=vs.90).aspx, updated 20Dec2012
  12. Debugging Global.asa Files, http://msdn.microsoft.com/en-us/library/aa291249(v=vs.71).aspx, updated 15Apr2013
  13. How to: Debug Global.asa files, http://msdn.microsoft.com/en-us/library/ms241868(v=vs.80).aspx, updated 15Apr2013
  14. Calling COM Components from ASP Pages, http://msdn.microsoft.com/en-us/library/ms524620(v=VS.90).aspx, updated 15Apr2013
  15. IIS ASP Scripting Reference, http://msdn.microsoft.com/en-us/library/ms524664(v=vs.90).aspx, updated 15Apr2013
  16. ASP Keywords, http://msdn.microsoft.com/en-us/library/ms524672(v=vs.90).aspx, updated 15Apr2013
  17. Creating Simple ASP Pages, http://msdn.microsoft.com/en-us/library/ms524741(v=vs.90).aspx, updated 15Apr2013
  18. Including Files in ASP Applications, http://msdn.microsoft.com/en-us/library/ms524876(v=vs.90).aspx, updated 15Apr2013
  19. ASP Overview, http://msdn.microsoft.com/en-us/library/ms524929(v=vs.90).aspx, updated 15Apr2013
  20. FileSystemObject Object, http://msdn.microsoft.com/en-us/library/z9ty6h50(v=vs.84).aspx, updated 14Mayr2013
  21. http://msdn.microsoft.com/en-us/library/windows/desktop/ms675944(v=vs.85).aspx, ADO Object Model, updated 15Apr2013
  22. ADO Fundamentals, http://msdn.microsoft.com/en-us/library/windows/desktop/ms680928(v=vs.85).aspx, updated 14Mayr2013
close

ID: 190800006 Last Updated: 8/6/2019 Revision: Ref:

IMAGE

Home (5)

Business

Management

HBR (3)

Information

Recreation

Hobbies (7)

Culture

Chinese (1097)

English (336)

Reference (66)

Computer

Hardware (149)

Software

Application (187)

Digitization (24)

Numeric (19)

Programming

Web (648)new

CSS (SC)

ASP.NET (SC)

Regular Expression (SC)

HTML

Knowledge Base

Common Color (SC)

Html Entity (Unicode) (SC)

Html 401 Special (SC)

OS (389)

MS Windows

Windows10 (SC)

.NET Framework (SC)

DeskTop (7)

Knowledge

Mathematics

Formulas (8)

Number Theory (206)

Algebra (20)

Trigonometry (18)

Geometry (18)

Calculus (67)

Complex Analysis (21)

Engineering

Tables (8)

Mechanical

Mechanics (1)

Rigid Bodies

Statics (92)

Dynamics (37)

Fluid (5)

Fluid Kinematics (5)

Control

Process Control (1)

Acoustics (19)

FiniteElement (2)

Biology (1)

Geography (1)


Copyright © 2000-2019 Sideway . All rights reserved Disclaimers last modified on 10 Feb 2019