正規化表達式(Regular Expressions,簡稱 regex 或 regexp)是一種強大的搜尋和匹配字符串的工具。正規化表達式通常用於檢索、替換、拆分字符串,以及匹配特定的模式。使用時必須 import re,以下是一些正規化表達式的基本規則和元字符:
範例說明: import re
pattern_dot = re.compile(r'he..o') match_dot = pattern_dot.search("hello") print(match_dot.group()) # Output: hello pattern_start = re.compile(r'^hello') match_start = pattern_start.search("hello world") print(match_start.group()) # Output: hello pattern_end = re.compile(r'world$') match_end = pattern_end.search("hello world") print(match_end.group()) # Output: world pattern_star = re.compile(r'go*l') match_star = pattern_star.search("goooooal") print(match_star.group()) # Output: gooooal pattern_plus = re.compile(r'go+l') match_plus = pattern_plus.search("goooooal") print(match_plus.group()) # Output: gooool pattern_question = re.compile(r'colou?r') match_question = pattern_question.search("color") print(match_question.group()) # Output: color pattern_exact = re.compile(r'\d{3}') match_exact = pattern_exact.search("123456") print(match_exact.group()) # Output: 123 pattern_range = re.compile(r'\d{2,4}') match_range = pattern_range.search("12345") print(match_range.group()) # Output: 1234 pattern_char_set = re.compile(r'[aeiou]') match_char_set = pattern_char_set.search("hello") print(match_char_set.group()) # Output: e pattern_not_char_set = re.compile(r'[^aeiou]') match_not_char_set = pattern_not_char_set.search("hello") print(match_not_char_set.group()) # Output: h pattern_or = re.compile(r'cat|dog') match_or = pattern_or.search("I have a cat") print(match_or.group()) # Output: cat pattern_group = re.compile(r'(\d{2})-(\d{2})-(\d{4})') match_group = pattern_group.search("01-15-2023") print(match_group.groups()) # Output: ('01', '15', '2023') pattern_escape = re.compile(r'\d\.\d') match_escape = pattern_escape.search("3.14") print(match_escape.group()) # Output: 3.1 非貪婪匹配(Non-Greedy Matching): 在正規表達式中,一般的匹配模式是貪婪的,即它會匹配盡可能多的字符。但有時我們可能希望匹配盡可能少的字符,這時就需要使用非貪婪匹配。在正規表達式中,非貪婪匹配使用 ? 修飾符。 範例: 使用貪婪匹配的表達式: import re pattern_greedy = re.compile(r'<.*>') match_greedy = pattern_greedy.search("<tag1>content1</tag1><tag2>content2</tag2>") print(match_greedy.group()) # Output: <tag1>content1</tag1><tag2>content2</tag2>,如此得到整個字串 使用非貪婪匹配的表達式: import re pattern_non_greedy = re.compile(r'<.*?>') match_non_greedy = pattern_non_greedy.search("<tag1>content1</tag1><tag2>content2</tag2>") print(match_non_greedy.group()) # Output: <tag1>,只得到 <tag1> 反向引用(Backreference): 反向引用是指在正規表達式中使用先前匹配的結果來進行進一步的匹配。在正規表達式中,使用圓括號 () 括起來的部分表示一個分組,這些分組可以在後面的表達式中被引用。 範例: import re pattern_backreference = re.compile(r'(\w+) is \1') match_backreference = pattern_backreference.search("word is word, python is python, regex is regex") print(match_backreference.group()) # Output: word is word |