This week we’re exploring the complete line-by-line text of the Sherlock Holmes stories and novels, made available through the {sherlock} R package by Emil Hvitfeldt. The dataset includes the full collection of Holmes texts, organized by book and line number, and is ideal for stylometry, sentiment analysis, and literary exploration.
TidyTuesday
Data Visualization
Python Programming
2025
Author
Peter Gray
Published
November 18, 2025
Word Cloud of Sherlock Holmes words
1. Python code
Show code
import pandas as pdimport matplotlib.pyplot as pltimport matplotlib.ticker as mtickerimport refrom wordcloud import STOPWORDSfrom wordcloud import WordCloud# Load dataholmes = pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-11-18/holmes.csv")holmes = holmes[(holmes["text"].notna()) & (holmes["text"] !="NaN")]text =" ".join(holmes["text"].astype(str).tolist())text = re.sub(r"[^A-Za-z\s]", "", text)text = text.lower()stopwords =set(STOPWORDS)text =" ".join(word for word in text.split() if word notin stopwords)wordcloud = WordCloud(width=800, height=400, background_color="white").generate(text)plt.figure(figsize=(10, 5))plt.imshow(wordcloud, interpolation="bilinear")plt.axis("off")plt.title("The Complete Shirlock Holmes Word Cloud")