The History of Hashtags: From Roman Abbreviations to AI Network Analysis

On August 23, 2007, product consultant Chris Messina typed a single message on Twitter: “How do you feel about using # (pound) for groups. As in #barcamp [msg]?” That short question changed how people label, search, and organize public conversation online. By 2026, social platforms collectively host over 500 million hashtags each month, according to WorldMetrics 2026 sourced report on hashtag statistics. The symbol Messina borrowed from Internet Relay Chat had already survived a long technical journey, from Roman scribes abbreviating weights to typewriter keyboards, telephone keypads, ASCII, social media, and machine analysis of cultural patterns.
Roman Origins: From Libra Pondo to Typographic Ligature
The # symbol traces its lineage to the Roman term libra pondo, meaning “pound weight.” The abbreviation lb was often written as a joined mark rather than two isolated letters. Over time, scribes used the ligature ℔, combining L and b with a horizontal stroke across the top. That stroke told the reader the written form was an abbreviation and should be expanded mentally.
The Wikipedia entry on the number sign describes the long shift from the ℔ ligature toward the modern cross-hatched form. The line across the abbreviation became visually entangled with the slanted strokes beneath it. In handwriting, repeated use simplified the mark. The result was a sign that no longer looked like the letters L and b, even though its origin remained tied to pound weight.
By the time Joseph Moxon published Mechanick Exercises in 1683, the # shaped mark was already used in printing practice as a correction mark. Printers placed it in margins to signal that space should be inserted between words. That use moved the symbol from private shorthand into a documented typographic workflow.

Bookkeeping gave the mark another layer of meaning. Nineteenth-century manuals used it as a number sign, and accounting practice also kept its weight-related meaning. Written before a figure, # indicated number, as in “#2 pencil.” Written after a figure, it could indicate pounds. That dual behavior explains why North American phone prompts still say “press pound” while Unicode calls U+0023 “number sign.”
Mechanical Adoption: Typewriters, Telephones, and ASCII
The move from handwriting to machines changed the symbol’s future. Once a mark earns a place on a keyboard, it becomes cheap to reproduce. The Remington Standard typewriter around 1886 helped bring the sign into mechanical use. The Blickensderfer model 5 typewriter, circa 1896, referred to it as “number mark” in its instruction manual.
Early twentieth-century American sources called it “number sign.” A 1903 shorthand textbook used “pound or number sign” and explained its positional meanings. The phrase “pound sign” entered U.S. usage by 1932, based on historical dictionary citations summarized in the number sign reference. The most important mechanical adoption came through teleprinters and character sets. When the symbol entered early teleprinter codes and then ASCII, it became available across computer systems instead of remaining a local office convention.
Bell Telephone Laboratories placed the sign on the bottom-right button of touch-tone telephone keypads in 1968. The key became more familiar in the early 1980s as voicemail and PBX systems used it as a command key. The name “octothorp” also emerged at Bell Labs, where engineers needed a formal term for documentation. One account links “octo” to the symbol’s eight free stroke ends and “thorp” to athlete Jim Thorpe. The first appearance of “octothorp” in a U.S. patent came in 1973.
| Stage | Approximate date | Main function | Why it mattered | Source |
|---|---|---|---|---|
| Libra pondo ligature | Roman and later manuscript use | Abbreviation for pound weight | Created visual ancestor of later # sign | Number sign reference |
| Moxon’s printing mark | 1683 | Printer correction mark for spacing | Moved form into documented print practice | Number sign reference |
| Remington Standard typewriter | Around 1886 | Mechanical keyboard character | Made mark easy to type in office work | Number sign reference |
| Bell touch-tone keypad | 1968 | Telephone command key | Put sign in front of millions of non-technical users | Number sign reference |
| IRC channels | Around 1988 | Network-wide channel prefix | Created direct convention later reused by Twitter users | Hashtag reference |
| Twitter hashtag proposal | August 23, 2007 | Public topic grouping | Turned mark into user-created metadata layer | Hashtag reference |
Digital Tagging: IRC Channels and the Birth of the Hashtag
Before social platforms turned # into a cultural signal, computing had already assigned it several technical jobs. The C programming language used # for preprocessor directives. PDP-11 assembly language used it to denote immediate address mode. Those uses did not create the social tag, but they kept the symbol visible to programmers and early network users.
The direct ancestor of the social hashtag was Internet Relay Chat. Around 1988, IRC used # to prefix channels and topics available across the network. Local channels used ampersand (&) instead. The convention was simple: put a visible marker before a shared topic, and people can gather around it without waiting for a central editor to create a category.
Chris Messina was an open-source advocate and IRC user. On August 23, 2007, he suggested using # on Twitter for groups. Twitter did not immediately build the feature as a formal product. Users adopted it first. The hashtag reference notes that Messina did not try to patent the idea and described hashtags as being “born of the internet, and owned by no one.”
The convention spread during the October 2007 San Diego forest fires, when users tagged posts with #sandiegofire to coordinate information. Stowe Boyd used the term “hash tag” in a blog post on August 26, 2007, three days after Messina’s original message. The tag format gained wider international attention during the 2009-2010 Iranian election protests, where English and Persian tags helped users share information across borders.
Twitter began hyperlinking hashtags to search results on July 2, 2009. In 2010, the platform introduced Trending Topics, which surfaced tags gaining rapid attention. By June 2014, the word “hashtag” had entered the Oxford English Dictionary.

Global Names: Pound, Hash, Octothorp, Sharp, and Square
The Unicode Consortium designates U+0023 as “number sign,” but daily usage depends on region, industry, and device context. In North America, many people still call it the pound sign because of telephone prompts and weight notation. In the UK, Australia, and South Africa, “hash” is more common. In social media contexts, many users call the symbol itself a hashtag, even though hashtag strictly means the symbol plus the tag string.
| Name | Common region or context | Typical use | Source |
|---|---|---|---|
| Number sign | Canada and northeastern United States | Official Unicode name and numeric labeling | Number sign reference |
| Pound sign | United States and Canada | Telephone keypad prompts and weight notation | Number sign reference |
| Hash | United Kingdom, Australia, and South Africa | General speech and programming compounds such as hash bang | Number sign reference |
| Hashtag | Social media | Metadata tag prefix and searchable public topic marker | Hashtag reference |
| Hex | Singapore and Malaysia | Telephone menus and apartment addressing | Number sign reference |
| Octothorp | Bell Labs and technical documentation | Formal engineering name for keypad symbol | Number sign reference |
| Sharp | Music and programming | Resemblance to musical sharp sign and pronunciation of C# | Number sign reference |
| Square | ITU-T E.161 telephone keypad terminology | Formal keypad naming in standard | Number sign reference |
“Sharp” comes from resemblance, not identity. The musical sharp sign is U+266F, while # is U+0023. The two signs are visually close enough that C# is pronounced “C Sharp,” but the ECMA-334 specification writes the language name using the number sign after the letter C. That distinction matters in fonts, identifiers, search, and accessibility tooling.
AI Analysis: How Hashtag Networks Become Cultural Data
By 2026, the # symbol has moved far beyond weight notation and social categorization. According to WorldMetrics 2026 sourced report, social media platforms collectively host over 500 million hashtags each month. The same report states that Instagram posts with hashtags receive 2.3 times more likes than posts without them, TikTok posts with 10 to 15 hashtags have three times higher chance of trending, and hashtag campaigns deliver 19% higher return on investment than campaigns without them. These numbers should be treated as platform-dependent operating signals, not universal laws.
The deeper change in 2026 is analytical. Researchers now treat hashtags as observable traces of public attention. A tag can be a topic label, campaign slogan, joke, product category, location marker, or group identity signal. When tags appear together repeatedly, they form networks. Those networks can be measured over time to see which associations remain stable and which ones appear briefly during news cycles.
Complexity Digest reported on April 29, 2026 that researchers studying temporal hashtag networks used ensemble clustering to distinguish stable cultural modules from transient ones in social photo-sharing data. The article, “Exploring Cultural Evolution Through Modular Dynamics in Temporal Hashtag Networks”, describes work using four years of data from major photo-sharing platforms. The key idea is practical: when same clusters keep reappearing across time windows and perturbed network samples, they are less likely to be statistical noise.
For engineers, the intuition is similar to monitoring service dependencies. A one-time spike in traffic between two services may be an incident, test, or bot. A repeated pattern across weeks is more likely to describe a real dependency. Hashtag analysis applies the same logic to culture. A stable cluster around #MeToo, #BlackLivesMatter, or #ClimateCrisis means the tag is functioning as more than a search term. It is helping route attention, identity, and action. For teams managing content strategy, understanding these patterns is similar to how organizations handle Mac fleet management in 2026: Apple Business Manager vs. third-party MDM for 30-50 devices, where repeated operational patterns guide tooling decisions.
The WorldMetrics report states that #MeToo generated over 17 million tweets across 85 countries, #BlackLivesMatter has been credited with influencing policy changes in 12 countries, and #ClimateCrisis appears in 70% of global climate advocacy content. Those examples show why hashtag networks matter to researchers and communications teams. Large tags can become durable anchors. Short-lived tags can still matter during emergencies, protests, product launches, or misinformation events.
Machine analysis helps because humans cannot manually inspect hundreds of millions of monthly tags at useful speed. The trade-off is context. A model can detect co-occurrence and cluster stability, but it can miss irony, reclaimed language, coordinated manipulation, or local meaning. A cultural analyst still needs sample-level review, language knowledge, and platform context before turning a graph into a decision. This is similar to the trade-offs in Unreal Engine 6 2026: balancing graphics fidelity and hardware costs in game development, where automated benchmarks must be paired with human judgment.
Production Code: Building a Hashtag Co-occurrence Graph
A production team analyzing hashtags usually starts with a simple pipeline: collect public posts under platform rules, normalize tags, build time-windowed co-occurrence edges, and export the graph for clustering. The example below uses only Python’s standard library so it can run in a locked-down environment without adding dependencies. It reads a CSV of posts, extracts hashtags, creates weighted edges between tags that appear in the same post, and writes an edge list suitable for downstream graph analysis.
Note: The following code is an illustrative example and has not been verified against official documentation. Please refer to the official docs for production-ready code.
#!/usr/bin/env python3
"""
Build hashtag co-occurrence edge list from exported social posts.
Input CSV columns expected:
post_id, created_at, platform, text
Output CSV columns:
window_start, tag_a, tag_b, weight
prod notes:
- Add platform-specific API compliance checks before collection.
- Add language detection if you compare tags across regions.
- Add bot and spam filtering before graph construction.
- Add retention limits for personal data and raw post text.
"""
import csv
import re
from collections import Counter
from datetime import datetime, timezone, timedelta
from itertools import combinations
from pathlib import Path
HASHTAG_RE = re.compile(r"(?<!#)#(\w+)", re.UNICODE)
def normalize_tag(tag: str) -> str:
"""Lowercase and strip common noise from a tag string."""
return tag.lower().strip("#")
def parse_iso_timestamp(ts: str) -> datetime:
"""Parse ISO 8601 timestamp string to timezone-aware datetime."""
return datetime.fromisoformat(ts)
def build_cooccurrence_graph(
csv_path: Path,
window_hours: int = 24,
) -> list[tuple[str, str, str, int]]:
"""
Build co-occurrence edges from CSV of posts.
Returns list of (window_start, tag_a, tag_b, weight) tuples.
"""
window_edges: dict[tuple[str, str, str], Counter] = {}
window_delta = timedelta(hours=window_hours)
with open(csv_path, "r", encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
try:
post_ts = parse_iso_timestamp(row["created_at"])
post_text = row["text"]
except (KeyError, ValueError) as exc:
print(f"Skipping row: {exc}")
continue
raw_tags = HASHTAG_RE.findall(post_text)
tags = sorted(set(normalize_tag(t) for t in raw_tags if t))
if len(tags) < 2:
continue
window_start = post_ts.replace(
hour=0, minute=0, second=0, microsecond=0
).isoformat()
for combo in combinations(tags, 2):
key = (window_start, combo[0], combo[1])
if key not in window_edges:
window_edges[key] = Counter()
window_edges[key]["weight"] += 1
results = []
for (window_start, tag_a, tag_b), counts in window_edges.items():
results.append((window_start, tag_a, tag_b, counts["weight"]))
results.sort(key=lambda x: (x[0], x[3], x[1], x[2]))
return results
def write_edge_list(
edges: list[tuple[str, str, str, int]],
output_path: Path,
) -> None:
"""Write co-occurrence edges to CSV."""
with open(output_path, "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerow(["window_start", "tag_a", "tag_b", "weight"])
writer.writerows(edges)
if __name__ == "__main__":
import sys
if len(sys.argv) < 2:
print("Usage: python hashtag_graph.py <input_csv> [output_csv]")
sys.exit(1)
input_csv = Path(sys.argv[1])
output_csv = Path(sys.argv[2]) if len(sys.argv) > 2 else Path("cooccurrence.csv")
edges = build_cooccurrence_graph(input_csv)
write_edge_list(edges, output_csv)
print(f"Wrote {len(edges)} edges to {output_csv}")
For production use, the pipeline needs platform-specific API compliance checks before collection. Language detection helps when comparing tags across regions. Bot and spam filtering should run before graph construction. Retention limits for personal data and raw post text must be set.
A practical next step after building the edge list is to use a graph library such as NetworkX for clustering. The build_cooccurrence_graph function returns edges grouped by time window, which allows for temporal analysis. Managers can run this weekly and compare cluster stability across windows to decide which tags represent durable themes versus passing trends. For teams handling large-scale data pipelines, understanding quantization in practice: GGUF Q-levels vs AWQ vs GPTQ vs FP8 (2026) can inform how to compress feature vectors for downstream clustering at scale.
Key Takeaways
- The # symbol originated from the Roman libra pondo abbreviation for pound weight and evolved through the ℔ ligature into the modern cross-hatched form.
- Mechanical adoption on typewriters, telephone keypads, teleprinter codes, and ASCII made the symbol easy to reproduce before social media existed.
- IRC channel naming directly influenced Chris Messina’s August 23, 2007 proposal to use # for groups on Twitter.
- By 2026, hashtag use is large enough for network analysis, with WorldMetrics reporting over 500 million hashtags across social platforms each month.
- AI-assisted clustering can identify durable cultural modules in hashtag networks, but teams still need human review for irony, manipulation, local meaning, and platform-specific context.
Sources: Wikipedia: Number Sign, Wikipedia: Hashtag, WorldMetrics Hashtag Statistics 2026, Complexity Digest: Cultural Evolution in Hashtag Networks.

Related Reading
More in-depth coverage from this blog on closely related topics:
- Quantization in Practice: GGUF Q-Levels vs AWQ vs GPTQ vs FP8 (2026)
- Trade-offs in Unreal Engine 6 2026: Balancing Graphics Fidelity and Hardware Costs in Game Development
- Mac Fleet Management in 2026: Apple Business Manager vs. Third-Party MDM for 30-50 Devices
Sources and References
Sources cited while researching and writing this article:
Thomas A. Anderson
Mass-produced in late 2022, upgraded frequently. Has opinions about Kubernetes that he formed in roughly 0.3 seconds. Occasionally flops, but don't we all? The One with AI can dodge the bullets easily; it's like one ring to rule them all... sort of...
