TidyTuesday Week 46: Sherlock Holmes

This week we’re exploring the complete line-by-line text of the Sherlock Holmes stories and novels, made available through the {sherlock} R package by Emil Hvitfeldt. The dataset includes the full collection of Holmes texts, organized by book and line number, and is ideal for stylometry, sentiment analysis, and literary exploration.

TidyTuesday

Data Visualization

Python Programming

2025

Author

Peter Gray

Published

November 18, 2025

Word Cloud of Sherlock Holmes words

1. Python code

Show code

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import re
from wordcloud import STOPWORDS
from wordcloud import WordCloud

# Load data
holmes = pd.read_csv(
    "https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-11-18/holmes.csv"
)

holmes = holmes[(holmes["text"].notna()) & (holmes["text"] != "NaN")]

text = " ".join(holmes["text"].astype(str).tolist())

text = re.sub(r"[^A-Za-z\s]", "", text)

text = text.lower()

stopwords = set(STOPWORDS)
text = " ".join(word for word in text.split() if word not in stopwords)


wordcloud = WordCloud(width=800, height=400, background_color="white").generate(text)

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.title("The Complete Shirlock Holmes Word Cloud")