Thursday, April 17, 2014

Scala筆記: Regular Expression是你的好朋友

如果想要檢查一個字串a是否包含另一字串b, 我們會用a.contains(b)
如果想要檢查一個字串a是否由另一字串b開始, 我們會用a.startsWith(b)
如果想要以字串b為基準對a取子字串, 我們會用a.substring(a.lastIndexOf(b)+b.size)
...
以上種種字串比對, 其實也可以透過Regular Expression來達成(JDK 1.4開始支援)
Regex這個class就是讓你輕輕鬆鬆!?使用Regular Expression

要產生Regex的pattern有幾種方式:
第一種就如同以下的範例, 透過字串的r函式就可以產生Regex的物件, (其實String沒有r函式, 而是透過implicit轉換成WrappedString才找到的, 不知道implicit是什麼的話可以參考這篇)
用三層引號則是要省掉跳脫內層反斜線的麻煩
接下來在比對字串的時候可以直接把比對pattern之後每個對應的群組抽出來,
就像以下範例將日期分成年月日三個群組,
用起來很容易但壞處就是比對失敗就會馬上有錯誤訊息跳出來.
scala> val dateP1 = """(\d\d\d\d)-(\d\d)-(\d\d)""".r
dateP1: scala.util.matching.Regex = (\d\d\d\d)-(\d\d)-(\d\d)

scala> val dateP1(year, month, day) = "2011-07-15"
year: String = 2011
month: String = 07
day: String = 15

scala> val dateP1(year, month, day) = "2011-7-15"
scala.MatchError: 2011-7-15 (of class java.lang.String)
 at .(:11)
 at .()
 at .(:11)
 at .()
 at $print()
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:704)
 at scala.tools.nsc.interpreter.IMain$Request$$anonfun$14.apply(IMain.scala:920)
 at scala.tools.nsc.interpreter.Line$$anonfun$1.apply$mcV$sp(Line.scala:43)
 at scala.tools.nsc.io.package$$anon$2.run(package.scala:25)
 at java.lang.Thread.run(Thread.java:680)

或是你可以考慮用findFirstIn, findFirstMatchIn等等Regex提供的函式來比對,
就算比對失敗也是拿到None, 讓你可以比較無痛的做接下來的處理.
scala> val copyright: String = dateP1 findFirstIn "Date of this document: 2011-07-15" match {
     |   case Some(dateP1(year, month, day)) => "Copyright "+year
     |   case None                           => "No copyright"
     | }
copyright: String = Copyright 2011

scala> val copyright: String = dateP1 findFirstIn "Date of this document: 2011-7-15" match {
     |   case Some(dateP1(year, month, day)) => "Copyright "+year
     |   case None                           => "No copyright"
     | }
copyright: String = No copyright
另一種產生Regex的方法就是老老實實的用建構式囉!

ref:
- scala api document: Regex
- regular expression metacharacter syntax
- Java Regular Expression筆記