Joe Wicentowski
2018-04-23 16:22:40 UTC
Hi all,
I have encountered an unexpected challenge constructing a regex for a
pattern I am looking for. I am looking for numbers in parentheses. For
example, in the following string:
"On February 13, 1968, Secretary of State Dean Rusk sent a
message to Israeli Foreign Minister Abba Eban calling upon Israel to
endorse openly Resolution 242, and on May 13 President Johnson sent a
letter to United Arab Republic (UAR) President Gamal Abdel Nasser,
urging him to seize the unique opportunity offered by the Jarring
mission to achieve peace. (79, 171)"
... I would like to match "79" and "171" (but not "UAR" or "13" or
"1968"). I have been trying to construct a regex for use with
analyze-string to capture this pattern, but I have not been successful. I
have tried the following:
analyze-string($string, "(?:\()(?:(\d+)(?:, )?)+(?:\))")
In other words, there are these 3 components:
1. (?:\() a non-capturing group consisting of an open parens, followed by
2. (?:(\d+)(?:, )?)+ one or more non-capturing groups consisting of (a
number followed by an optional, non-matching comma-and-space), followed by
3. (?:\)) a non-capturing group consisting of a close parens
I was expecting to get the following output:
<fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions
">
<fn:non-match>On February 13, 1968, Secretary of State Dean Rusk sent a
message to Israeli Foreign Minister Abba Eban calling upon Israel to
endorse openly Resolution 242, and on May 13 President Johnson sent a
letter to United Arab Republic (UAR) President Gamal Abdel Nasser,
urging him to seize the unique opportunity offered by the Jarring
mission to achieve peace. </fn:non-match>
<fn:match>(<fn:group nr="1">79</fn:group>,
<fn:group nr="1">171</fn:group>)</fn:match>
</fn:analyze-string-result>
However, the actual result is that the first number ("79") is skipped, and
only the 2nd number ("171") is captured:
<fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions
">
<fn:non-match>On February 13, 1968, Secretary of State Dean Rusk sent a
message to Israeli Foreign Minister Abba Eban calling upon Israel to
endorse openly Resolution 242, and on May 13 President Johnson sent a
letter to United Arab Republic (UAR) President Gamal Abdel Nasser,
urging him to seize the unique opportunity offered by the Jarring
mission to achieve peace. </fn:non-match>
<fn:match>(79,
<fn:group nr="1">171</fn:group>)</fn:match>
</fn:analyze-string-result>
What am I missing? Can anyone suggest a regex that is able to capture both
numbers inside the parentheses? Or do I need to make a two-pass run
through this, finding parenthetical text with a first analyze-string like
"\(.+\)" and then looking inside its matches with a second analyze-string
like "(\d+)(?:, )?"?
Thanks,
Joe
I have encountered an unexpected challenge constructing a regex for a
pattern I am looking for. I am looking for numbers in parentheses. For
example, in the following string:
"On February 13, 1968, Secretary of State Dean Rusk sent a
message to Israeli Foreign Minister Abba Eban calling upon Israel to
endorse openly Resolution 242, and on May 13 President Johnson sent a
letter to United Arab Republic (UAR) President Gamal Abdel Nasser,
urging him to seize the unique opportunity offered by the Jarring
mission to achieve peace. (79, 171)"
... I would like to match "79" and "171" (but not "UAR" or "13" or
"1968"). I have been trying to construct a regex for use with
analyze-string to capture this pattern, but I have not been successful. I
have tried the following:
analyze-string($string, "(?:\()(?:(\d+)(?:, )?)+(?:\))")
In other words, there are these 3 components:
1. (?:\() a non-capturing group consisting of an open parens, followed by
2. (?:(\d+)(?:, )?)+ one or more non-capturing groups consisting of (a
number followed by an optional, non-matching comma-and-space), followed by
3. (?:\)) a non-capturing group consisting of a close parens
I was expecting to get the following output:
<fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions
">
<fn:non-match>On February 13, 1968, Secretary of State Dean Rusk sent a
message to Israeli Foreign Minister Abba Eban calling upon Israel to
endorse openly Resolution 242, and on May 13 President Johnson sent a
letter to United Arab Republic (UAR) President Gamal Abdel Nasser,
urging him to seize the unique opportunity offered by the Jarring
mission to achieve peace. </fn:non-match>
<fn:match>(<fn:group nr="1">79</fn:group>,
<fn:group nr="1">171</fn:group>)</fn:match>
</fn:analyze-string-result>
However, the actual result is that the first number ("79") is skipped, and
only the 2nd number ("171") is captured:
<fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions
">
<fn:non-match>On February 13, 1968, Secretary of State Dean Rusk sent a
message to Israeli Foreign Minister Abba Eban calling upon Israel to
endorse openly Resolution 242, and on May 13 President Johnson sent a
letter to United Arab Republic (UAR) President Gamal Abdel Nasser,
urging him to seize the unique opportunity offered by the Jarring
mission to achieve peace. </fn:non-match>
<fn:match>(79,
<fn:group nr="1">171</fn:group>)</fn:match>
</fn:analyze-string-result>
What am I missing? Can anyone suggest a regex that is able to capture both
numbers inside the parentheses? Or do I need to make a two-pass run
through this, finding parenthetical text with a first analyze-string like
"\(.+\)" and then looking inside its matches with a second analyze-string
like "(\d+)(?:, )?"?
Thanks,
Joe