文字列操作¶

この章で学ぶこと¶

文字列の基本操作とスライス
主要な文字列メソッド（split, join, strip, replace, find）
f-string による文字列フォーマット
エスケープ文字の使い方
文字列操作でよくある間違い

文字列の基本¶

Python の文字列（str 型）は、文字の並びを表すデータ型です。文字列はイミュータブル（変更不可）であり、一度作成した文字列を直接書き換えることはできません。

s = "Hello, Python!"
print(len(s))    # 14（文字数を取得）
print(s[0])      # H（先頭の文字）
print(s[-1])     # !（末尾の文字）

文字列のスライス¶

リストと同じようにスライスが使えます。

s = "プログラミング"

print(s[0:3])    # プログラ（先頭から 3 文字）
print(s[3:])     # ミング（4 文字目から末尾まで）
print(s[::-1])   # グンミラグロプ（逆順）

文字列の連結と繰り返し¶

# 連結（+）
first = "Hello"
second = "World"
greeting = first + ", " + second + "!"
print(greeting)  # Hello, World!

# 繰り返し（*）
line = "-" * 30
print(line)  # ------------------------------

よくある間違い

文字列はイミュータブル -- 直接変更できない:

s = "Hello"

# 間違い: 文字列の一部を直接変更しようとする
# s[0] = "h"  # TypeError: 'str' object does not support item assignment

# 正しい: 新しい文字列を作成して代入する
s = "h" + s[1:]  # 新しい文字列を作成
print(s)  # hello

# または replace を使う
s = "Hello"
s = s.replace("H", "h")
print(s)  # hello

文字列と数値を + で結合しようとする:

age = 20

# 間違い: 文字列と数値は直接結合できない
# message = "年齢は" + age + "歳"  # TypeError

# 正しい方法 1: str() で変換
message = "年齢は" + str(age) + "歳"

# 正しい方法 2: f-string を使う（推奨）
message = f"年齢は{age}歳"

実行例¶

>>> s = "Python"
>>> len(s)
6
>>> s[0]
'P'
>>> s[-1]
'n'
>>> s[0:3]
'Pyt'
>>> s[::-1]
'nohtyP'
>>> s + " 3.12"
'Python 3.12'
>>> "-" * 10
'----------'
>>> s[0] = "p"
Traceback (most recent call last):
  ...
TypeError: 'str' object does not support item assignment

主要な文字列メソッド¶

split: 文字列を分割する¶

指定した区切り文字で文字列を分割し、リストを返します。

# スペースで分割（デフォルト）
sentence = "Python は 楽しい 言語 です"
words = sentence.split()
print(words)  # ['Python', 'は', '楽しい', '言語', 'です']

# 特定の区切り文字で分割
data = "太郎,20,横浜"
fields = data.split(",")
print(fields)  # ['太郎', '20', '横浜']

# 分割回数を指定
path = "usr/local/bin/python"
parts = path.split("/", 2)  # 最大 2 回分割
print(parts)  # ['usr', 'local', 'bin/python']

よくある間違い

split() の戻り値がリストであることを忘れる:

data = "太郎,20,横浜"

# 間違い: split の結果を文字列として扱う
result = data.split(",")
# print(result[1] + 1)  # TypeError: can only concatenate str to str
# split の結果はリストで、各要素は文字列

# 正しい: 必要に応じて型変換する
result = data.split(",")
age = int(result[1])     # 文字列 "20" を整数 20 に変換
print(age + 1)           # 21

区切り文字が見つからない場合:

>>> "hello".split(",")
['hello']    # 区切り文字がなくても、要素 1 つのリストが返る（エラーにはならない）

join: リストを結合して文字列にする¶

split の逆操作です。リストの要素を指定した区切り文字で結合します。

words = ["Python", "は", "楽しい"]

# スペースで結合
sentence = " ".join(words)
print(sentence)  # Python は 楽しい

# カンマで結合
csv_line = ",".join(["太郎", "20", "横浜"])
print(csv_line)  # 太郎,20,横浜

# 改行で結合
lines = ["1行目", "2行目", "3行目"]
text = "\n".join(lines)
print(text)
# 1行目
# 2行目
# 3行目

よくある間違い

join の引数にリスト以外を渡す、またはリスト内に文字列以外がある:

# 間違い: リスト内に数値がある
numbers = [1, 2, 3]
# result = ",".join(numbers)  # TypeError: sequence item 0: expected str instance, int found

# 正しい: すべて文字列に変換してから join する
numbers = [1, 2, 3]
result = ",".join(str(n) for n in numbers)
print(result)  # "1,2,3"

join の呼び出し方を間違える:

words = ["a", "b", "c"]

# 間違い: join はリストのメソッドではなく、文字列のメソッド
# result = words.join(",")  # AttributeError

# 正しい: 区切り文字の文字列から join を呼ぶ
result = ",".join(words)  # "a,b,c"

strip: 前後の空白を除去する¶

# 前後の空白（スペース、タブ、改行）を除去
text = "  Hello, World!  \n"
print(text.strip())   # "Hello, World!"

# 左側だけ除去
print(text.lstrip())  # "Hello, World!  \n"

# 右側だけ除去
print(text.rstrip())  # "  Hello, World!"

# 特定の文字を除去
filename = "===report===.txt==="
print(filename.strip("="))  # "report===.txt"

ファイル読み込み時の strip

ファイルから 1 行ずつ読み込むとき、行末に改行文字 \n が含まれます。strip() や rstrip() で除去するのが一般的です。

with open("data.txt", "r", encoding="utf-8") as f:
    for line in f:
        line = line.strip()  # 前後の空白・改行を除去
        print(line)

replace: 文字列を置換する¶

text = "I like Java. Java is great."

# "Java" を "Python" に置換
new_text = text.replace("Java", "Python")
print(new_text)  # I like Python. Python is great.

# 置換回数を指定（最初の 1 つだけ）
new_text = text.replace("Java", "Python", 1)
print(new_text)  # I like Python. Java is great.

replace は新しい文字列を返す

文字列はイミュータブルなので、replace は元の文字列を変更せず、新しい文字列を返します。結果を使うには変数に代入する必要があります。

text = "hello"
text.replace("h", "H")  # 結果を使っていない！
print(text)              # hello（変わっていない）

# 正しい: 戻り値を変数に代入する
text = text.replace("h", "H")
print(text)              # Hello

find / index: 文字列を検索する¶

text = "Hello, Python World!"

# find: 見つかった位置（インデックス）を返す。見つからない場合は -1
print(text.find("Python"))   # 7
print(text.find("Java"))     # -1

# index: find と同じだが、見つからない場合は ValueError
print(text.index("Python"))  # 7
# print(text.index("Java"))  # ValueError

# in 演算子: 含まれているかどうかを True/False で返す
print("Python" in text)  # True
print("Java" in text)    # False

find と index の使い分け

メソッド	見つかった場合	見つからなかった場合
`find()`	位置（整数）	`-1`
`index()`	位置（整数）	`ValueError`
`in`	`True`	`False`

単に含まれているかどうかを知りたい場合は in を使うのが最も簡潔です。

その他の便利なメソッド¶

text = "Hello, World!"

# 大文字・小文字の変換
print(text.upper())       # HELLO, WORLD!
print(text.lower())       # hello, world!
print(text.capitalize())  # Hello, world!
print(text.title())       # Hello, World!

# 判定メソッド（True/False を返す）
print("abc".isalpha())    # True（すべて英字か）
print("123".isdigit())    # True（すべて数字か）
print("abc123".isalnum()) # True（英数字のみか）
print("  ".isspace())     # True（空白のみか）

# 文字数のカウント
print("banana".count("a"))  # 3

# 先頭・末尾の判定
print("hello.py".endswith(".py"))    # True
print("hello.py".startswith("he"))   # True

文字列メソッドの実行例¶

>>> "  hello  ".strip()
'hello'
>>> "hello world".split()
['hello', 'world']
>>> ",".join(["a", "b", "c"])
'a,b,c'
>>> "hello".upper()
'HELLO'
>>> "HELLO".lower()
'hello'
>>> "hello".replace("l", "L")
'heLLo'
>>> "hello world".find("world")
6
>>> "hello world".find("python")
-1
>>> "hello".startswith("he")
True
>>> "hello".endswith("lo")
True
>>> "hello".count("l")
2

f-string（フォーマット済み文字列リテラル）¶

f-string は、文字列の中に変数や式を埋め込むための構文です。文字列の前に f をつけ、波括弧 {} の中に式を書きます。

基本的な使い方¶

name = "太郎"
age = 20

# f-string で変数を埋め込む
print(f"私の名前は{name}です。{age}歳です。")
# 私の名前は太郎です。20歳です。

# 式を直接書くこともできる
print(f"来年は{age + 1}歳になります。")
# 来年は21歳になります。

# メソッドも呼び出せる
print(f"名前（大文字）: {name.upper()}")

書式指定¶

f-string ではコロン : の後に書式を指定できます。

# 小数点以下の桁数を指定
pi = 3.141592653589793
print(f"円周率: {pi:.2f}")     # 円周率: 3.14
print(f"円周率: {pi:.4f}")     # 円周率: 3.1416

# 幅を指定して右詰め
for i in range(1, 4):
    print(f"{i:3d} x 5 = {i * 5:3d}")
# 出力:
#   1 x 5 =   5
#   2 x 5 =  10
#   3 x 5 =  15

# パーセント表示
ratio = 0.856
print(f"正答率: {ratio:.1%}")  # 正答率: 85.6%

# ゼロ埋め
num = 42
print(f"{num:05d}")  # 00042

# カンマ区切り
big = 1234567890
print(f"{big:,}")  # 1,234,567,890

f-string の書式指定まとめ¶

書式	意味	例	結果
`:.2f`	小数点以下 2 桁	`f"{3.14159:.2f}"`	`3.14`
`:3d`	3 桁幅で右詰め	`f"{5:3d}"`	`5`
`:05d`	5 桁でゼロ埋め	`f"{42:05d}"`	`00042`
`:,`	カンマ区切り	`f"{1000000:,}"`	`1,000,000`
`:.1%`	パーセント表示	`f"{0.856:.1%}"`	`85.6%`
`:<10`	10 桁幅で左詰め	`f"{'hi':<10}"`	`hi`
`:>10`	10 桁幅で右詰め	`f"{'hi':>10}"`	`hi`
`:^10`	10 桁幅で中央寄せ	`f"{'hi':^10}"`	`hi`

f-string は Python 3.6 以降で使用可能

f-string は非常に読みやすく、文字列連結（+）や format() メソッドよりも推奨される方法です。本講義ではすべて f-string を使用します。

エスケープ文字¶

エスケープ文字は、バックスラッシュ \ から始まる特殊な文字列です。

エスケープ文字	意味
`\n`	改行
`\t`	タブ
`\\`	バックスラッシュ自体
`\'`	シングルクォート
`\"`	ダブルクォート

# 改行
print("1行目\n2行目\n3行目")
# 1行目
# 2行目
# 3行目

# タブ
print("名前\t点数")
print("太郎\t85")
print("花子\t92")
# 名前    点数
# 太郎    85
# 花子    92

# クォートを含む文字列
print("彼は\"Python\"が好きです")  # 彼は"Python"が好きです
print('It\'s a pen.')               # It's a pen.

raw 文字列¶

文字列の前に r をつけると、エスケープ文字が無効になります。ファイルパスや正規表現を書くときに便利です。

# 通常の文字列（\n が改行として解釈される）
print("C:\new_folder\test")
# C:
# ew_folder est

# raw 文字列（エスケープが無効になる）
print(r"C:\new_folder\test")
# C:\new_folder\test

文字列に関するよくある間違い¶

よくある間違い

1. 文字列はイミュータブル（変更不可）:

s = "hello"
# s[0] = "H"  # TypeError! 文字列は直接変更できない

# 正しい: 新しい文字列を作る
s = "H" + s[1:]   # "Hello"
s = s.replace("h", "H")  # "Hello"

2. split() の結果がリストであることを忘れる:

line = "Alice,25,Tokyo"
parts = line.split(",")
# parts は ['Alice', '25', 'Tokyo']

# 間違い: parts[1] は文字列 "25" であり、数値ではない
# age = parts[1] + 5  # TypeError

# 正しい: 数値として使うなら変換が必要
age = int(parts[1]) + 5  # 30

3. 文字列メソッドは新しい文字列を返す（元の文字列は変わらない）:

text = "hello"

# 間違い: メソッドの戻り値を使わない
text.upper()
print(text)  # hello（変わっていない！）

# 正しい: 戻り値を変数に代入する
text = text.upper()
print(text)  # HELLO

4. インデックスの範囲外アクセス:

s = "abc"
# print(s[3])  # IndexError: string index out of range
# s の有効なインデックスは 0, 1, 2（または -1, -2, -3）

# 安全にアクセスする方法
if len(s) > 3:
    print(s[3])

5. 文字列の比較に is を使う:

a = "hello"
b = "hello"

# 間違い: is は「同じオブジェクトか」を調べる（値の比較ではない）
# 場合によっては True になるが、保証されない
print(a is b)   # True になることもあるが、信頼できない

# 正しい: 値の比較には == を使う
print(a == b)   # True（常に正しく動作する）

6. エンコーディングの問題:

# 日本語を含むファイルを読み書きするとき
# 間違い: encoding を指定しない
# with open("data.txt", "r") as f:  # 環境によって文字化けする

# 正しい: encoding を明示する
with open("data.txt", "r", encoding="utf-8") as f:
    text = f.read()

実践例: 文字列処理の組み合わせ¶

CSV データの解析¶

# CSV 形式のデータを解析する
csv_data = """名前,数学,英語,物理
太郎,85,72,90
花子,92,88,76
次郎,78,65,82"""

lines = csv_data.strip().split("\n")
header = lines[0].split(",")

print(f"{'名前':>4} {'数学':>4} {'英語':>4} {'物理':>4} {'平均':>6}")
print("-" * 30)

for line in lines[1:]:
    fields = line.split(",")
    name = fields[0]
    scores = [int(s) for s in fields[1:]]
    avg = sum(scores) / len(scores)
    print(f"{name:>4} {scores[0]:>4} {scores[1]:>4} {scores[2]:>4} {avg:>6.1f}")

出力:

  名前   数学   英語   物理     平均
------------------------------
  太郎   85   72   90   82.3
  花子   92   88   76   85.3
  次郎   78   65   82   75.0

実行例: よく使う文字列操作の組み合わせ¶

>>> # ユーザー入力の前後の空白を除去して小文字に統一する
>>> user_input = "  Python  "
>>> cleaned = user_input.strip().lower()
>>> cleaned
'python'

>>> # 文字列の中の単語数を数える
>>> sentence = "Python は とても 楽しい 言語 です"
>>> word_count = len(sentence.split())
>>> word_count
6

>>> # ファイル拡張子を取得する
>>> filename = "report_2024.pdf"
>>> ext = filename.split(".")[-1]
>>> ext
'pdf'

>>> # 文字列を逆順にして回文かどうかを判定する
>>> word = "level"
>>> word == word[::-1]
True
>>> word = "hello"
>>> word == word[::-1]
False

まとめ¶

文字列はイミュータブル（変更不可）であり、メソッドは新しい文字列を返す
split() で文字列をリストに分割し、join() でリストを文字列に結合する
strip() で前後の空白を除去し、replace() で文字列を置換する
find() や in 演算子で文字列の検索ができる
f-string（f"...{変数}..."）を使うと、文字列への変数の埋め込みが簡潔に書ける
f-string の書式指定（:.2f, :3d, :, など）で表示形式を制御できる
エスケープ文字（\n, \t など）で改行やタブなどの特殊文字を表現する
r"..." の raw 文字列ではエスケープが無効になる
文字列メソッドの戻り値を変数に代入し忘れないこと（元の文字列は変わらない）