Sideway
output.to from Sideway


Regular Expression Object


Regular Expression Object


Draft for Information Only

Regular Expression Object

Regular expression is a technology of using an expression to compare with the text being searched. The expression used for comparing is a pattern of characters which are constructed following some rules and regulation. By matching the expression with the text string, regular expression can be used to search for patterns in the text, to replace strings of the text and to extract substrings of the text. A regular expression object is not a ASP object but a scripting object with reguular expression features. The regular expression of VBScript engine is implemented as a COM object.

In general, regular expression is provided as scripting objects with regular expression features. In other words, the application of regular expression object follows the regular expression syntax in design and the scripting language in syntax. Therefore the features of regular expression object include both the language or syntax of regular expression and objects of regular expression object in additon to the syntax of scripting language.

Regular Expression Features

The features of regular expression includes the expression of pattern, and the character set, the ordinary characters, special characters, and metacharacters used in regular expression.

Expression

An expression of a regular expression is a symbolic character pattern. In general sense, regular expression is an arithmetic-like symbolic single line program bounded by a pair of delimiters.

In JScript, a pair of forward slash (/) characters is used as the delimiters.

/expression/

In VBScript, a pair of quotation mark (") characters is used as the delimiters.

"expression"

Basically, an expression is used to describe the string used for matching with the searched text body. An expression is therefore a matching template composed of ordinary characters and special characters to describe a character pattern for matching with the string being searched.

Ordinary characters are literal characters bounded by a pair of square brackets as members of character set or matching characters bounded by the pair of delimiters outside the square brackets. Ordinary characters always represent or carry the same meaning of the letter itself.

Special characters are also metacharacters. Special characters are characters that represent a special meaning instead of the literal letter itself. A special character usually do not represent or carry the meaning of the letter itself.

In a general sense, a metacharacter is a character, or a sequence of characters, that is used to represent a special meaning in a computing application for easier programming by making use of some seldom used character as indicator.

 The Set of Character

Although regular expression use only one set of character, a character used in regular expressions, between a pair of square brackets, and in replacement patterns may have different meaning.

The character set of regular expression for constructing an expression are

  1. Englisht Alphabet Capital Letters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
  2. Englisht Alphabet Small Letters: a b c d e f g h i j k l m n o p q r s t u v w x y z
  3. Arabic numerals: 0 1 2 3 4 5 6 7 8 9
  4. Special Symbols of ASCII Symbols: (space) =(equals) +(plus) -(hyphen-minus) *(asterisk) /(solidus) \(reverse solidus) ^(circumflex accent) ((left parenthesis) )(right parenthesis) &(ampersand) .(full stop) :(colon) <(less-than sign) >(greater-than sign) "(quotation mark) '(apostrophe) _(low line) [(left square bracket) ](right square bracket) |(vertical bar)

Ordinary Characters

Ordinary character is a character that represents the same letter.

  1. Englisht Alphabet Capital Letters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
  2. Englisht Alphabet Small Letters: a b c d e f g h i j k l m n o p q r s t u v w x y z
  3. Arabic numerals: 0 1 2 3 4 5 6 7 8 9
  4. Special Symbols of ASCII Symbols: (space) =(equals) +(plus) -(hyphen-minus) *(asterisk) /(solidus) \(reverse solidus) ^(circumflex accent) ((left parenthesis) )(right parenthesis) &(ampersand) .(full stop) :(colon) <(less-than sign) >(greater-than sign) "(quotation mark) '(apostrophe) _(low line) [(left square bracket) ](right square bracket)

Special Characters

Special character is a metacharacter that represents a special meaning instead of the literal letter itself.

  • *: to match the previous character or subexpression zero or more times. Equivalent to {0,}.
  • +: to match the previous character or subexpression one or more times. Equivalent to {1,}.
  • ?: to match the previous character or subexpression zero or one times. Equivalent to {0,1}. to {0,1}.
    to match the previous immediate quantifier zero or one times. Equivalent to make the matching pattern non-greedy.
  • ^: to match the start position of the searched string. And to match the position following \n or \r when the Multiline property is set.
    but to match the negative of the character set when used as the first character in a bracket expression.
  • $: to match the end position of the searched string. And to match the position before \n or \r when the Multiline property is set.
  • .: to match any single character except the newline character (\n).
  • [: to mark the start of a bracket expression.
  • ]: to mark the end of a bracket expression.
  • {: to mark the start of a quantifier expression.
  • }: to mark the end of a quantifier expression.
  • (: to mark the start of a subexpression.
  • ): to mark the end of a subexpression.
  • |: to indicate a choice between two or more items.
  • /: to denote the start of a literal regular exression pattern in JScript.
    to denote the end of a literal regular expression pattern in JScript and single-character flags can be added to specify search behavior.
  • ": to denote the start of a literal regular exression pattern in VBScript.
    to denote the end of a literal regular expression pattern in JScript.
  • \: to mark the next character as a special character, a literal, a backreference, or an octal escape.
  • -:to match a range of characters between the pre- and post- hyphen characters inside a square bracket

MetaCharacters

Besides the special characters, metacharacters can also be a sequence of characters, that is escaped characters and grouped characters.

Escaped Characters

Escaped characters are specific matching characters that are represented by preceding with a backslash (\) character. Escape character may be used to represent ordinary character, nonprinting character, or metacharacter.

Ordinary Characters

Escape character can be used to represent ordinary charcter as matching character or literal character. The escape character (a single backslash \) is used to indicate that the following special character is not an operator.

  • \*: to match or represent a letter *
  • \+: to match or represent a letter +
  • \?: to match or represent a letter ?
  • \^: to match or represent a letter ^
  • \$: to match or represent a letter $
  • \.: to match or represent a letter .
  • \[: to match or represent a letter [
  • \]: to match or represent a letter ]. \ is usually not necessary
  • \{: to match or represent a letter {
  • \}: to match or represent a letter }. \ is usually not necessary
  • \(: to match or represent a letter (
  • \): to match or represent a letter ). \ is usually not necessary
  • \|: to match or represent a letter |
  • \/: to match or represent a letter /
  • \\: to match or represent a letter \
  • \-: to match or represent a letter -. \ is usually not necessary when is placed outside the square brackets, or is not placed between alphanumeric characters inside the square brackets
Nonprinting Characters

Escape character can be used to represent nonprinting charcter as matching string or literal string. The escape character (a single backslash \) is used to indicate that the following character other than special characters and character set [...], may be a defined nonprinting character.

  • \f: to match or represent a form-feed character. Equivalent to \x0C and \cL
  • \n: to match or represent a newline character. Equivalent to \x0A and \cJ
  • \r: to match or represent a carriage return character. Equivalent to \x0D and \cM
  • \t: to match or represent a tab character. Equivalent to \x09 and \cI
  • \v: to match or represent a vertical tab character. Equivalent to \x0B and \ck
MetaCharacters by escaped character

Escape character can be used to represent a metacharcter as matching strings.

  • \b: to match the boundary of a word, that is, the position between a word and a space.
  • \B: to match the non-boundary of a word, that is, the position between the first and last characters of a word.
  • \d: to match a digit character. Equivalent to [0-9]]
  • \D: to match a nondigit character. Equivalent to [^0-9]
  • \s: to match or represent any white-space character, including space, tab, and form feed. Equivalent to [ \f\n\r\t\v]
  • \S: to match or represent any non-white space character. Equivalent to [^ \f\n\r\t\v]
  • \w: to match any word (alphanumeric and underscore) character, that is A-Z, a-z, 0-9, and underscore. Equivalent to [A-Za-z0-9_]
  • \W: to match any non word (alphanumeric and underscore) character, that is any character except A-Z, a-z, 0-9, and underscore. Equivalent to [^A-Za-z0-9_]
MetaCharacters by escaped word

Escape word can be used to represent a metacharcter as matching strings.

  • \cx: to match the ASCII control character specified by x. x must be in the range of A-Z or a-z, otherwise c is assumed to be a literal "c" character, that is a simple escaped character.
  • \xn: to match the ASCII character specified by n, where n is a hexadecimal escape value of an ASCII code with exactly two digits.
  • \num: to match the saved match specified by num, where num is a positive integer reference of a saved match.
  • \n: a searching identifier to match either a backreference or an octal escape character of an ASCII code. But in general, \1 through \9 always refer to backreferences. For only one digit,
    • If n=0, n is an octal digit of the octal escape value of an escape character.
    • If n>=1 and n<=7, and \n is preceded by at least n captured subexpressions, n is the reference number of a backreference. Otherwise, n is the octal escape value of an escape character.
    • If n=8, or n=9, n is the reference number of a backreference.
  • \nm: a searching identifier to match either a backreference or an octal escape character of an ASCII code. In general, \nm is considered as a backreference, only if there is a backreference corresponding to the specified number. For only two digits,
    • If n=0, nm is octal digits of the octal escape value of an escape character. m can only be an octal digit (0-7), otherwise, m is a literal digit m and \nm backtrack to \n.
    • If n>=1 and n<=7, and
      • If \nm is preceded by at least nm captured subexpressions, nm is the reference number of a backreference.
      • If \nm is preceded by at least n and less than nm captured subexpressions, m can only be a literal digit m and \nm backtrack to \n.
      • If \nm is preceded by least than n captured subexpressions, nm can only be octal escape value of an escape character
      • If nm is octal digits of the octal escape value of an escape character. m can only be an octal digit (0-7), otherwise, m is a literal digit m and \nm backtrack to \n.
    • If n=8, or n=9, nm is the reference number of a backreference.
      • If \nm is preceded by at least nm captured subexpressions, nm is the reference number of a backreference.
      • If \nm is preceded by at least n and less than nm captured subexpressions, m can only be a literal digit m and \nm backtrack to \n.
  • \nml: a searching identifier to match either a backreference or an octal escape character of an ASCII code. In general, unless there is a backreference corresponding to the specified number, \nml is usualy considered as an octal escape value of an escape character if n is an octal digit of 0-3, and m and l are octal digits of 0-7.
  • \un: to match the Unicode character specified by n, where n is a hexadecimal escape value of a Unicode code with exactly four digits.

Grouped Charactersrs

Escaped characters can also be treated as a metacharacter to represent matching strings.

  • [...]: to mark the boundary of a character set in a bracket expression.
  • [^...]: to mark the boundary of a negative character set in a bracket expression.
  • [xyz]: to match any one of the specified characters of the character set between the pair of square brackets. e.g. x, y, or z.
  • [^xyz]: to match not any one of the specified characters of the character set between the pair of square brackets with the indicator ^. e.g. not x, y, and z.
  • [a-z]: to match any one of the specified range of characters inside the pair of square brackets. e.g. a, b, c, ..., or z.
  • [^a-z]: to match not any one of the specified range of characters inside the pair of square brackets with the not indicator ^ at the start of the square bracket set. e.g. not a, b, c, ..., and z.
  • {...}: to mark the boundary of a quantifier expression.
  • {n}: to match the previous character or subexpression exactly n times where n must be a nonnegative integer.
  • {n,}: to match the previous character or subexpression at least n times where n must be a nonnegative integer.
  • {n,m}: to match the previous character or subexpression at least n and at most m times where n and m must be a nonnegative integer, and n<=m.
  • (...): to mark the boundary of a subexpression.
  • (pattern): to match the pattern in the subexpression as one individual group and save the match.
  • (?:pattern): to match the pattern in the subexpression as one individual group only.
  • (?=pattern): a positive lookahead searching test to match the pattern in the subexpression as one individual group before the search for the next match before the matched text can be started.
  • (?!pattern): a negative lookahead searching test to match the pattern in the subexpression as one individual group before the search for the next match before the matched text can be started.

Regular Expression Objects

Regular expression object is the genetic name used to name the group of scripting objects that related to regular expression. The VBscripting regular expression objects include VBScript RegExp Object, VBScript Matches Collection Object and VBScript Match Object.

  • VBScript RegExp Object provides 3 properties and 3 methods
    • Properties
      • Pattern: a string expression used to define the regular expression
      • IgnoreCase: a boolean used to indicate whether the case of letter in a string should be considered or not
      • Global: a boolean used to indicate whether the all possible matches in a string should be tested or not
    • methods
      • Test:  to test and return a boolean value whether the regular expression test can be successfully matched or not.
      • Replace: to replace and return the computed searched-string of which a replaced searched-string is returned if the searched-string can be successfully matched otherwise the original searched-string is returned.
      • Execute: to replace and return a computed string of which a copy of replaced searched-string is returned if the searched-string can be successfully matched otherwise a copy of the original searched-string is returned.
  • VBScript Matches Collection Object, which is the result returned from the RegExp.Execute method,  provides 2 read-only properties
    • Count: a read-only value of number of Match objects in the collection
    • Item: a read-only value of a Match object to be accessed from the Matches collection object randomly.
  • VBScript Match Object, which is the object of each successful match contained within each Matches collection object, provides 3 read-only propertiess
    • FirstIndex: a read-only value of the position of the match occured in the searched-string.
    • Length: a read-only value of the total length of the matched string.
    • Value: a default read-only value of the content of the matched string.

┬ęsideway

ID: 160800009 Last Updated: 2016/8/6 Revision:

IMAGE

Home (1)

Business

Management

HBR (3)

Information

Recreation

Hobbies (7)

Culture

Chinese (1097)

English (330)

Reference (60)

Computer

Hardware (148)

Software

Application (187)

Digitization (24)

Numeric (19)

Programming

Web (530)

HTML

Knowledge Base

Common Color (1)

Html Entity (Unicode) (1)

Html 401 Special (1)

OS (366)

MS Windows

Windows10 (1)

DeskTop (5)

Knowledge

Mathematics

Formulas (8)

Number Theory (69)new

Algebra (14)

Trigonometry (18)

Geometry (18)

Calculus (66)

Engineering

Tables (8)

Mechanical

Control

Process Control (1)

Mechanics (1)

Rigid Bodies

Statics & Dynamics (128)

Fluid (5)

Fluid Kinematics (5)

Acoustics (19)

FiniteElement (2)

Biology (1)

Geography (1)


facebook
Latest Updated Links

Copyright © 2000-2018 Sideway . All rights reserved Disclaimersfacebook last modified on 08 Mar 2018