Sideway
output.to from Sideway
Draft for Information Only

Content

Regular Expression Backtracking
 Linear Comparison Without Backtracking
 Backtracking with Optional Quantifiers or Alternation Constructs
 Backtracking with Nested Optional Quantifiers
 Controlling Backtracking
  Defining a Time-out Interval
  Nonbacktracking Subexpression
  Lookbehind Assertions
  Lookahead Assertions
 Scanning of Valid Email Format
 See also
 Source/Referencee

Regular Expression Backtracking

Backtracking occurs when a regular expression pattern contains optional quantifiers or alternation constructs, and the regular expression engine returns to a previous saved state to continue its search for a match. Backtracking is central to the power of regular expressions; it makes it possible for expressions to be powerful and flexible, and to match very complex patterns. At the same time, this power comes at a cost. Backtracking is often the single most important factor that affects the performance of the regular expression engine. Fortunately, the developer has control over the behavior of the regular expression engine and how it uses backtracking. This topic explains how backtracking works and how it can be controlled.

In general, a Nondeterministic Finite Automaton (NFA) engine like .NET regular expression engine places the responsibility for crafting efficient, fast regular expressions on the developer.

This topic contains the following sections:

Linear Comparison Without Backtracking

If a regular expression pattern has no optional quantifiers or alternation constructs, the regular expression engine executes in linear time. That is, after the regular expression engine matches the first language element in the pattern with text in the input string, it tries to match the next language element in the pattern with the next character or group of characters in the input string. This continues until the match either succeeds or fails. In either case, the regular expression engine advances by one character at a time in the input string.

If a regular expression pattern includes no optional quantifiers or alternation constructs, the maximum number of comparisons required to match the regular expression pattern with the input string is roughly equivalent to the number of characters in the input string.In other words, the regular expression engine runs in near-linear time if it contains no optional quantifiers or alternation constructs.

Backtracking with Optional Quantifiers or Alternation Constructs

When a regular expression includes optional quantifiers or alternation constructs, the evaluation of the input string is no longer linear. Pattern matching with an NFA engine is driven by the language elements in the regular expression and not by the characters to be matched in the input string. Therefore, the regular expression engine tries to fully match optional or alternative subexpressions. When it advances to the next language element in the subexpression and the match is unsuccessful, the regular expression engine can abandon a portion of its successful match and return to an earlier saved state in the interest of matching the regular expression as a whole with the input string. This process of returning to a previous saved state to find a match is known as backtracking.

Generally, if a regular expression pattern has a single alternation construct or a single optional quantifier, the number of comparison operations required to match the pattern is more than twice the number of characters in the input string.

Backtracking with Nested Optional Quantifiers

The number of comparison operations required to match a regular expression pattern can increase exponentially if the pattern includes a large number of alternation constructs, if it includes nested alternation constructs, or, most commonly, if it includes nested optional quantifiers.

the regular expression engine took about twice as long to find that an input string did not match the pattern as it did to identify a matching string. This is because an unsuccessful match always represents a worst-case scenario. The regular expression engine must use the regular expression to follow all possible paths through the data before it can conclude that the match is unsuccessful, and the nested parentheses create many additional paths through the data.

Comparison of the input string with the regular expression continues in this way until the regular expression engine has tried all possible combinations of matches, and then concludes that there is no match. Because of the nested quantifiers, this comparison is an O(2n) or an exponential operation, where n is the number of characters in the input string. This means that in the worst case, an input string of 30 characters requires approximately 1,073,741,824 comparisons, and an input string of 40 characters requires approximately 1,099,511,627,776 comparisons. If you use strings of these or even greater lengths, regular expression methods can take an extremely long time to complete when they process input that does not match the regular expression pattern.

Controlling Backtracking

Backtracking lets you create powerful, flexible regular expressions. However, as the previous section showed, these benefits may be coupled with unacceptably poor performance. To prevent excessive backtracking, you should define a time-out interval when you instantiate a Regex object or call a static regular expression matching method. This is discussed in the next section. In addition, .NET supports three regular expression language elements that limit or suppress backtracking and that support complex regular expressions with little or no performance penalty: nonbacktracking subexpressions, lookbehind assertions, and lookahead assertions.

Defining a Time-out Interval

Starting with the .NET Framework 4.5, you can set a time-out value that represents the longest interval the regular expression engine will search for a single match before it abandons the attempt and throws a RegexMatchTimeoutException exception. You specify the time-out interval by supplying a TimeSpan value to the Regex.Regex(String, RegexOptions, TimeSpan) constructor for instance regular expressions. In addition, each static pattern matching method has an overload with a TimeSpan parameter that allows you to specify a time-out value. By default, the time-out interval is set to Regex.InfiniteMatchTimeout and the regular expression engine does not time out.

We recommend that you always set a time-out interval if your regular expression relies on backtracking.

A RegexMatchTimeoutException exception indicates that the regular expression engine was unable to find a match within the specified time-out interval but does not indicate why the exception was thrown. The reason might be excessive backtracking, but it is also possible that the time-out interval was set too low given the system load at the time the exception was thrown. When you handle the exception, you can choose to abandon further matches with the input string or increase the time-out interval and retry the matching operation.

Nonbacktracking Subexpression

The (?> subexpression) language element suppresses backtracking in a subexpression. It is useful for preventing the performance problems associated with failed matches.

Lookbehind Assertions

.NET includes two language elements, (?<=subexpression) and (?<!subexpression), that match the previous character or characters in the input string. Both language elements are zero-width assertions; that is, they determine whether the character or characters that immediately precede the current character can be matched by subexpression, without advancing or backtracking.

(?<= subexpression ) is a positive lookbehind assertion; that is, the character or characters before the current position must match subexpression. (?<!subexpression) is a negative lookbehind assertion; that is, the character or characters before the current position must not match subexpression. Both positive and negative lookbehind assertions are most useful when subexpression is a subset of the previous subexpression.

Lookahead Assertions

.NET includes two language elements, (?=subexpression) and (?!subexpression), that match the next character or characters in the input string. Both language elements are zero-width assertions; that is, they determine whether the character or characters that immediately follow the current character can be matched by subexpression, without advancing or backtracking.

(?= subexpression ) is a positive lookahead assertion; that is, the character or characters after the current position must match subexpression. (?!subexpression) is a negative lookahead assertion; that is, the character or characters after the current position must not match subexpression. Both positive and negative lookahead assertions are most useful when subexpression is a subset of the next subexpression.

 

Scanning of Valid Email Format

Examples of Scanning of Valid Email Format.
ASP.NET Code Input:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
    <head>
       <title>Sample Page</title>
       <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
       <script runat="server">
           Sub Page_Load()
               Dim xstring As String = "dahhhvid.johgnes@prghjjosehware.com"
               Dim xmatchstr As String = ""
               lbl01.Text = xmatchstr
           End Sub
           Function showresult(xstring,xpattern)
               Dim xmatch As Match
               Dim xcaptures As CaptureCollection
               Dim ycaptures As CaptureCollection
               Dim xgroups As GroupCollection
               Dim xmatchstr As String = ""
               Dim xint As Integer
               Dim yint As Integer
               Dim zint As Integer
               xmatchstr = xmatchstr & "<br />Result of Regex.Match(string,""" & Replace(xpattern,"<","&lt;") & """): """
               xmatch = Regex.Match(xstring,xpattern)
               xmatchstr = xmatchstr & xmatch.value & """<br />"
               xcaptures = xmatch.Captures
               xmatchstr = xmatchstr & "->Result of CaptureCollection.Count: """
               xmatchstr = xmatchstr & xcaptures.Count & """<br />"
               For xint = 0 to xcaptures.Count - 1
                   xmatchstr = xmatchstr & "->->Result of CaptureCollection("& xint & ").Value, Index, Length: """
                   xmatchstr = xmatchstr & xcaptures(xint).Value & ", " & xcaptures(xint).Index & ", " & xcaptures(xint).Length & """<br />"
                   xgroups = xmatch.Groups
                   xmatchstr = xmatchstr & "->Result of GroupCollection.Count: """
                   xmatchstr = xmatchstr & xgroups.Count & """<br />"
                   For yint = 0 to xgroups.Count - 1
                       xmatchstr = xmatchstr & "->->Result of GroupCollection("& yint & ").Value, Index, Length: """
                       xmatchstr = xmatchstr & xgroups(yint).Value & ", " & xgroups(yint).Index & ", " & xgroups(yint).Length & """<br />"
                       ycaptures = xgroups(yint).Captures
                       xmatchstr = xmatchstr & "->->->Result of CaptureCollection.Count: """
                       xmatchstr = xmatchstr & ycaptures.Count & """<br />"
                       For zint = 0 to ycaptures.Count - 1
                           xmatchstr = xmatchstr & "->->->->Result of CaptureCollection("& zint & ").Value, Index, Length: """
                           xmatchstr = xmatchstr & ycaptures(zint).Value & ", " & ycaptures(zint).Index & ", " & ycaptures(zint).Length & """<br />"
                       Next
                   Next
               Next
               Return xmatchstr
           End Function
       </script>
    </head>
    <body>
       <% Response.Write ("<h1>This is a Sample Page of Scanning of of Valid Email Format</h1>") %>
       <p>
           <%-- Set on Page_Load --%>
           <asp:Label id="lbl01" runat="server" />
       </p>
    </body>
</html>
HTML Web Page Embedded Output:

See also

Source/Referencee


©sideway

ID: 190800006 Last Updated: 8/6/2019 Revision: 0 Ref:

close

References

  1. Active Server Pages,  , http://msdn.microsoft.com/en-us/library/aa286483.aspx
  2. ASP Overview,  , http://msdn.microsoft.com/en-us/library/ms524929%28v=vs.90%29.aspx
  3. ASP Best Practices,  , http://technet.microsoft.com/en-us/library/cc939157.aspx
  4. ASP Built-in Objects,  , http://msdn.microsoft.com/en-us/library/ie/ms524716(v=vs.90).aspx
  5. Response Object,  , http://msdn.microsoft.com/en-us/library/ms525405(v=vs.90).aspx
  6. Request Object,  , http://msdn.microsoft.com/en-us/library/ms524948(v=vs.90).aspx
  7. Server Object (IIS),  , http://msdn.microsoft.com/en-us/library/ms525541(v=vs.90).aspx
  8. Application Object (IIS),  , http://msdn.microsoft.com/en-us/library/ms525360(v=vs.90).aspx
  9. Session Object (IIS),  , http://msdn.microsoft.com/en-us/library/ms524319(8v=vs.90).aspx
  10. ASPError Object,  , http://msdn.microsoft.com/en-us/library/ms524942(v=vs.90).aspx
  11. ObjectContext Object (IIS),  , http://msdn.microsoft.com/en-us/library/ms525667(v=vs.90).aspx
  12. Debugging Global.asa Files,  , http://msdn.microsoft.com/en-us/library/aa291249(v=vs.71).aspx
  13. How to: Debug Global.asa files,  , http://msdn.microsoft.com/en-us/library/ms241868(v=vs.80).aspx
  14. Calling COM Components from ASP Pages,  , http://msdn.microsoft.com/en-us/library/ms524620(v=VS.90).aspx
  15. IIS ASP Scripting Reference,  , http://msdn.microsoft.com/en-us/library/ms524664(v=vs.90).aspx
  16. ASP Keywords,  , http://msdn.microsoft.com/en-us/library/ms524672(v=vs.90).aspx
  17. Creating Simple ASP Pages,  , http://msdn.microsoft.com/en-us/library/ms524741(v=vs.90).aspx
  18. Including Files in ASP Applications,  , http://msdn.microsoft.com/en-us/library/ms524876(v=vs.90).aspx
  19. ASP Overview,  , http://msdn.microsoft.com/en-us/library/ms524929(v=vs.90).aspx
  20. FileSystemObject Object,  , http://msdn.microsoft.com/en-us/library/z9ty6h50(v=vs.84).aspx
  21. http://msdn.microsoft.com/en-us/library/windows/desktop/ms675944(v=vs.85).aspx,  , ADO Object Model
  22. ADO Fundamentals,  , http://msdn.microsoft.com/en-us/library/windows/desktop/ms680928(v=vs.85).aspx
close

Latest Updated LinksValid XHTML 1.0 Transitional Valid CSS!Nu Html Checker Firefox53 Chromena IExplorerna
IMAGE

Home 5

Business

Management

HBR 3

Information

Recreation

Hobbies 8

Culture

Chinese 1097

English 339

Reference 79

Computer

Hardware 249

Software

Application 213

Digitization 32

Latex 52

Manim 205

KB 1

Numeric 19

Programming

Web 289

Unicode 504

HTML 66

CSS 65

SVG 46

ASP.NET 270

OS 429

DeskTop 7

Python 72

Knowledge

Mathematics

Formulas 8

Algebra 84

Number Theory 206

Trigonometry 31

Geometry 34

Coordinate Geometry 2

Calculus 67

Complex Analysis 21

Engineering

Tables 8

Mechanical

Mechanics 1

Rigid Bodies

Statics 92

Dynamics 37

Fluid 5

Fluid Kinematics 5

Control

Process Control 1

Acoustics 19

FiniteElement 2

Natural Sciences

Matter 1

Electric 27

Biology 1

Geography 1


Copyright © 2000-2024 Sideway . All rights reserved Disclaimers last modified on 06 September 2019