TidyTuesday Week 46: Sherlock Holmes

This week we’re exploring the complete line-by-line text of the Sherlock Holmes stories and novels, made available through the {sherlock} R package by Emil Hvitfeldt. The dataset includes the full collection of Holmes texts, organized by book and line number, and is ideal for stylometry, sentiment analysis, and literary exploration.

TidyTuesday
Data Visualization
Python Programming
2025
Author

Peter Gray

Published

November 18, 2025

Chart Word Cloud of Sherlock Holmes words

1. Python code

Show code
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import re
from wordcloud import STOPWORDS
from wordcloud import WordCloud

# Load data
holmes = pd.read_csv(
    "https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-11-18/holmes.csv"
)

holmes = holmes[(holmes["text"].notna()) & (holmes["text"] != "NaN")]

text = " ".join(holmes["text"].astype(str).tolist())

text = re.sub(r"[^A-Za-z\s]", "", text)

text = text.lower()

stopwords = set(STOPWORDS)
text = " ".join(word for word in text.split() if word not in stopwords)


wordcloud = WordCloud(width=800, height=400, background_color="white").generate(text)

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.title("The Complete Shirlock Holmes Word Cloud")
Back to top