Organising your database and keyword categorisation – Lesson Preview
If your keyword universe feels messy, this lesson shows how to turn it into clear, decision-ready segments. You’ll learn practical, beginner-friendly labelling methods that reveal patterns fast, so you can spot branded vs non-branded opportunities, align content to real search intent, and prioritise what moves the needle. The focus is on quick wins in Google Sheets, with regex-based rules and lightweight automation you can run immediately.
Beyond simple labels, you’ll see how topic and entity views expose what your market really searches for. The instructor demonstrates how to enrich a plain keyword list with n-grams, topic clusters, SERP features, desired content formats and depth, and even user personas, then roll it all up into an at-a-glance dashboard. You’ll also hear where rule-based methods shine, where ML models help (e.g., Sentence-BERT, BERTopic), and how to keep error rates in check with simple manual passes.
By the end, you’ll have a repeatable way to organise keywords that supports smarter briefs and roadmap choices, without needing a huge dataset or complex pipelines. Perfect for SEOs and marketers who want credible, data-driven prioritisation in under an hour.
What you’ll learn (why it matters)
- Label branded vs non-branded because it separates brand equity from growth space.
- Classify search intent because matching motivation drives relevance and conversions.
- Split short vs long-tail because balance improves reach, competition, and ROI.
- Extract n-grams and clusters because themes reveal coverage gaps and opportunities.
- Group by topics and entities because semantic views unlock topical authority planning.
- Map formats, platforms, depth, personas because content should fit how users search.
Key concepts (with mini-definitions)
- Branded vs Non-branded — segmentation using brand/competitor terms to distinguish equity from expansion.
- Search intent — rule- and SERP-signal-based labels (informational, commercial, transactional, etc.).
- Short- vs Long-tail — length-based labels (word count) to balance volume vs specificity.
- Keyword clusters & n-grams — core terms/bigrams that surface recurring themes across queries.
- Topic clusters — grouping queries to broad/niche topics via methods like Sentence-BERT or BERTopic.
- Entity-based clustering — grouping by main/sub entities using salience to reflect importance.
- SERP features — result-page elements used as signals for intent and preferred formats.
- Content format/platform/depth — labels for video/reviews/platforms and beginner–advanced needs.
Tools mentioned
Google Sheets, regex, Looker Studio, SEMrush, Ahrefs, SE Ranking, DataForSEO, SERP API, Google Colab, KeyBERT, Sentence-BERT, BERTopic, k-means, Google Natural Language API, ML4SEO templates, ChatGPT and Gemini.
Practice & readings
- Use the provided Google Sheets formulas to label branded vs non-branded, intent, and tail length.
- Run the linked Colab to extract n-grams with KeyBERT; add topics via Sentence-BERT.
- Load the sample into the Looker Studio dashboard to visualise categories and volume.
Key insights & takeaways
- Start rule-based; add ML where semantics matter most.
- Reduce “unclassified” terms on each pass; refine labels from real data.
- Pair labels with search volume/competition to prioritise quickly.
- Entity and topic views surface gaps for information gain and authority.
- A short manual review dramatically improves automated labels.
Ready for the next step? Start your learning journey with MLforSEO
Buy the course to unlock the full lesson
Discover categorisation workflows and get instant access to ready-to-use templates.
