{"id":53864,"date":"2025-09-15T11:36:36","date_gmt":"2025-09-15T01:36:36","guid":{"rendered":"https:\/\/www.cloudproinc.com.au\/?p=53864"},"modified":"2025-09-15T11:36:38","modified_gmt":"2025-09-15T01:36:38","slug":"preparing-input-text-for-training-llms","status":"publish","type":"post","link":"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/09\/15\/preparing-input-text-for-training-llms\/","title":{"rendered":"Preparing Input Text for Training LLMs"},"content":{"rendered":"\n<p>In this blog post Preparing Input Text for Training LLMs that Perform in Production we will walk through the decisions and steps that make training data truly useful. Whether you\u2019re pretraining from scratch or fine-tuning an existing model, disciplined data prep is where most of the performance is won.<\/p>\n\n\n\n<!--more-->\n\n\n\n<p>Preparing Input Text for Training <a href=\"https:\/\/www.cloudproinc.com.au\/index.php\/category\/llm\/\">LLMs<\/a> that Perform in Production starts with a simple idea: the model learns to predict the next token. Everything you do\u2014cleaning, deduping, chunking, and formatting\u2014should increase the signal-to-noise ratio of those tokens. We\u2019ll keep things practical and friendly, with clear explanations and copy-pasteable snippets you can adapt to your pipeline.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-the-technology-behind-modern-llm-training\">The technology behind modern LLM training<\/h2>\n\n\n\n<p>Large Language Models are transformer networks trained on next-token prediction. Text is converted into tokens by a tokenizer (commonly BPE, WordPiece, or SentencePiece). The model attends over a finite context window\u2014thousands of tokens\u2014and updates weights to minimize prediction error. Quality and structure of the token stream matter:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Tokenization: Determines how text is split. Consistency affects chunking, special tokens, and chat formatting.<\/li>\n\n\n\n<li>Objective: Pretraining uses raw text. Supervised fine-tuning (SFT) uses instruction\/response pairs. RLHF or DPO layers preference signals on top.<\/li>\n\n\n\n<li>Context window: Long documents must be chunked; boundaries and overlaps matter.<\/li>\n\n\n\n<li>Data mixture and weighting: Metadata lets you weight sources and balance domains.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-why-data-preparation-matters\">Why data preparation matters<\/h2>\n\n\n\n<p>Badly prepared data leads to brittle models: memorization from duplicates, confusion from mixed formats, or hallucinations from noisy sources. Good prep yields stable loss curves, better generalization, and fewer surprises in production.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-decide-your-dataset-shape-before-you-start\">Decide your dataset shape before you start<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-for-pretraining-style-data\">For pretraining-style data<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use JSONL with a <code>text<\/code> field.<\/li>\n\n\n\n<li>Insert explicit document separators (e.g., <code>&lt;|doc|><\/code>) so the model learns boundaries.<\/li>\n\n\n\n<li>Keep metadata in side fields for weighting and auditing.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>{\"text\": \"&lt;|doc|&gt;Title\\nBody...\\n\" , \"source\": \"kb\", \"lang\": \"en\", \"license\": \"CC-BY\"}\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-for-instruction-tuning-chat-data\">For instruction-tuning (chat) data<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prefer a <code>messages<\/code> array with roles: <code>system<\/code>, <code>user<\/code>, <code>assistant<\/code>.<\/li>\n\n\n\n<li>Apply the tokenizer\u2019s chat template so special tokens are correct.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code has-white-color has-black-background-color has-text-color has-background has-link-color wp-elements-69028f7c063568be5046ce04cd674159\"><code>{\n  \"messages\": &#91;\n    {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n    {\"role\": \"user\", \"content\": \"How do I restart the service?\"},\n    {\"role\": \"assistant\", \"content\": \"Run `systemctl restart mysvc`.\"}\n  ],\n  \"source\": \"support_runbook\",\n  \"lang\": \"en\"\n}\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-cleaning-and-normalization-that-pays-off\">Cleaning and normalization that pays off<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>HTML to text: Strip tags but preserve structure (headings, lists, code blocks). Avoid including boilerplate (menus, cookie banners).<\/li>\n\n\n\n<li>Unicode normalization: Apply NFKC to unify visually similar code points. Fix mojibake with ftfy-like tools.<\/li>\n\n\n\n<li>Whitespace control: Collapse repeated spaces; normalize newlines to <code>\\n<\/code>.<\/li>\n\n\n\n<li>Language filtering: Keep only target languages or label them accurately.<\/li>\n\n\n\n<li>PII and secrets: Redact or drop. Decide your policy upfront and apply consistently.<\/li>\n\n\n\n<li>Length filtering: Drop extremely short or extremely long junk.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-deduplication-and-diversity\">Deduplication and diversity<\/h2>\n\n\n\n<p>Duplicates inflate loss and promote memorization. Use both exact and near-duplicate detection:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Exact dedup: Hash normalized text (e.g., SHA-1).<\/li>\n\n\n\n<li>Near-dup: MinHash\/SimHash over token shingles to catch minor variations.<\/li>\n\n\n\n<li>Boilerplate removal: Dedup at paragraph or section level, not only at document level.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-chunking-for-the-model-s-context-window\">Chunking for the model\u2019s context window<\/h2>\n\n\n\n<p>Chunk by tokens, not characters. Keep chunks coherent:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Target length: e.g., 1,024\u20134,096 tokens depending on model\/context.<\/li>\n\n\n\n<li>Overlap: 50\u2013200 tokens can preserve continuity for long narratives.<\/li>\n\n\n\n<li>Respect blocks: Don\u2019t split inside code fences or tables; close blocks before cutting.<\/li>\n\n\n\n<li>Add markers: Use <code>&lt;|doc|><\/code> or end-of-turn tokens to teach structure.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-metadata-that-unlocks-control\">Metadata that unlocks control<\/h2>\n\n\n\n<p>Attach fields like <code>source<\/code>, <code>domain<\/code>, <code>lang<\/code>, <code>license<\/code>, <code>timestamp<\/code>, <code>quality_score<\/code>, <code>pii_redacted<\/code>. This enables:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Mixture weighting (e.g., upweight docs and downweight forums).<\/li>\n\n\n\n<li>Auditable provenance for governance and takedowns.<\/li>\n\n\n\n<li>Targeted evaluation slices by domain or timeframe.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-instruction-fine-tuning-specifics\">Instruction fine-tuning specifics<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Consistency: One schema and punctuation style for answers.<\/li>\n\n\n\n<li>Coverage: Include reasoning, tools, code, and error handling patterns your product needs.<\/li>\n\n\n\n<li>Difficulty curriculum: Start simple, include moderately hard tasks, avoid contrived corner cases unless product-relevant.<\/li>\n\n\n\n<li>No leakage: Keep eval-like prompts out of training.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-train-validation-test-splits-the-right-way\">Train\/validation\/test splits the right way<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Split by document, not by chunk, to avoid leakage.<\/li>\n\n\n\n<li>Deduplicate across the entire corpus before splitting.<\/li>\n\n\n\n<li>Stratify by source\/language\/length so each split mirrors production.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-governance-and-safety\">Governance and safety<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Licensing: Track licenses and respect usage terms.<\/li>\n\n\n\n<li>PII: Redact or remove. Consider hashing or placeholder tokens for structured IDs.<\/li>\n\n\n\n<li>Safety categories: Label sensitive content to control sampling or train refusal behavior.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-a-practical-pipeline-you-can-adapt\">A practical pipeline you can adapt<\/h2>\n\n\n\n<pre class=\"wp-block-code has-white-color has-black-background-color has-text-color has-background has-link-color wp-elements-5b7e165b0f6252d8365a6474a65b1ee3\"><code># pip install datasets beautifulsoup4 ftfy langdetect datasketch transformers\nfrom datasets import load_dataset, Dataset\nfrom bs4 import BeautifulSoup\nfrom langdetect import detect\nfrom datasketch import MinHash, MinHashLSH\nfrom transformers import AutoTokenizer\nimport unicodedata, hashlib, json, re\n\n# 1) Load raw HTML\/text data (example: local JSONL with {\"html\":..., \"url\":...})\nraw = load_dataset(\"json\", data_files={\"train\": \"raw.jsonl\"})&#91;\"train\"]\n\ndef html_to_text(html):\n    soup = BeautifulSoup(html, \"html.parser\")\n    # remove obvious boilerplate\n    for tag in soup(&#91;\"script\", \"style\", \"nav\", \"footer\", \"header\"]):\n        tag.decompose()\n    text = soup.get_text(\"\\n\")\n    # normalize whitespace\n    text = re.sub(r\"\\n{3,}\", \"\\n\\n\", text)\n    text = re.sub(r\"&#91; \\t]{2,}\", \" \", text)\n    return text.strip()\n\ndef normalize_unicode(s):\n    s = unicodedata.normalize(\"NFKC\", s)\n    s = s.replace(\"\\r\\n\", \"\\n\").replace(\"\\r\", \"\\n\")\n    return s\n\ndef clean_record(rec):\n    txt = rec.get(\"text\") or html_to_text(rec.get(\"html\", \"\"))\n    txt = normalize_unicode(txt)\n    if len(txt) &lt; 100:  # length filter\n        return None\n    try:\n        lang = detect(txt)\n    except Exception:\n        lang = \"unk\"\n    if lang != \"en\":  # keep only English for this example\n        return None\n    return {\n        \"text\": f\"&lt;|doc|&gt;\\n{txt}\\n\",\n        \"source\": rec.get(\"url\", \"unknown\"),\n        \"lang\": lang,\n    }\n\ncleaned = &#91;c for c in (clean_record(r) for r in raw) if c]\n\n# 2) Exact dedup by normalized hash\nseen = set()\nunique = &#91;]\nfor r in cleaned:\n    h = hashlib.sha1(r&#91;\"text\"].encode(\"utf-8\")).hexdigest()\n    if h not in seen:\n        seen.add(h)\n        unique.append(r)\n\n# 3) Near-duplicate removal with MinHash on 5-gram word shingles\nlsh = MinHashLSH(threshold=0.8, num_perm=64)\nminhashes = &#91;]\nfor i, r in enumerate(unique):\n    m = MinHash(num_perm=64)\n    tokens = re.findall(r\"\\w+\", r&#91;\"text\"].lower())\n    shingles = &#91;\" \".join(tokens&#91;j:j+5]) for j in range(max(1, len(tokens)-4))]\n    for s in shingles:\n        m.update(s.encode(\"utf-8\"))\n    lsh.insert(str(i), m)\n    minhashes.append(m)\n\nkeep_mask = &#91;True]*len(unique)\nfor i, m in enumerate(minhashes):\n    if not keep_mask&#91;i]:\n        continue\n    near = lsh.query(m)\n    for j in near:\n        j = int(j)\n        if j &gt; i:\n            keep_mask&#91;j] = False\n\ndeduped = &#91;r for r, k in zip(unique, keep_mask) if k]\n\n# 4) Token-aware chunking\nmodel_id = \"meta-llama\/Llama-3.1-8B\"  # example; use your target tokenizer\n_tok = AutoTokenizer.from_pretrained(model_id)\nMAX_TOK = 2048\nOVERLAP = 128\n\ndef chunk_text(record):\n    text = record&#91;\"text\"]\n    ids = _tok.encode(text)\n    chunks = &#91;]\n    start = 0\n    while start &lt; len(ids):\n        end = min(start + MAX_TOK, len(ids))\n        piece = _tok.decode(ids&#91;start:end], skip_special_tokens=False)\n        chunks.append({\n            \"text\": piece,\n            \"source\": record&#91;\"source\"],\n            \"lang\": record&#91;\"lang\"]\n        })\n        if end == len(ids):\n            break\n        start = end - OVERLAP  # maintain continuity\n    return chunks\n\nchunked = &#91;]\nfor r in deduped:\n    chunked.extend(chunk_text(r))\n\n# 5) Train\/val\/test split by source (document-level)\nfrom collections import defaultdict\nby_source = defaultdict(list)\nfor r in chunked:\n    by_source&#91;r&#91;\"source\"]].append(r)\n\nsources = list(by_source.keys())\nimport random\nrandom.seed(42)\nrandom.shuffle(sources)\n\nn = len(sources)\ntrain_s, val_s, test_s = sources&#91;:int(0.9*n)], sources&#91;int(0.9*n):int(0.95*n)], sources&#91;int(0.95*n):]\n\ndef gather(sources):\n    out = &#91;]\n    for s in sources:\n        out.extend(by_source&#91;s])\n    return out\n\ntrain, val, test = gather(train_s), gather(val_s), gather(test_s)\n\n# 6) Save JSONL\nfor name, data in &#91;(\"train.jsonl\", train), (\"val.jsonl\", val), (\"test.jsonl\", test)]:\n    with open(name, \"w\", encoding=\"utf-8\") as f:\n        for r in data:\n            f.write(json.dumps(r, ensure_ascii=False) + \"\\n\")\n\nprint(\"Saved chunks:\", len(train), len(val), len(test))\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-applying-a-chat-template-for-instruction-data\">Applying a chat template for instruction data<\/h3>\n\n\n\n<pre class=\"wp-block-code has-white-color has-black-background-color has-text-color has-background has-link-color wp-elements-5b37ede9a22c0641d57ca1541160a23f\"><code>from transformers import AutoTokenizer\nimport json\n\n# Suppose each record has a `messages` array as shown earlier\nchat_tokenizer = AutoTokenizer.from_pretrained(\"mistralai\/Mistral-7B-Instruct-v0.3\")\n\ndef render_chat(record):\n    rendered = chat_tokenizer.apply_chat_template(\n        record&#91;\"messages\"], tokenize=False, add_generation_prompt=False\n    )\n    return {\"text\": rendered, **{k: v for k, v in record.items() if k != \"messages\"}}\n\nwith open(\"sft_raw.jsonl\", \"r\", encoding=\"utf-8\") as inp, open(\"sft_prepared.jsonl\", \"w\", encoding=\"utf-8\") as out:\n    for line in inp:\n        rec = json.loads(line)\n        out.write(json.dumps(render_chat(rec), ensure_ascii=False) + \"\\n\")\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-common-pitfalls-to-avoid\">Common pitfalls to avoid<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Token-agnostic chunking: Splitting by characters causes variable token lengths and broken structures.<\/li>\n\n\n\n<li>Inconsistent schemas: Mixing prompt\/completion with chat-format without clear separators confuses the model.<\/li>\n\n\n\n<li>Leakage: Randomly splitting chunks rather than documents leaks content across train and eval.<\/li>\n\n\n\n<li>Overlapping duplicates: Dedup only at document level misses repeated paragraphs and boilerplate.<\/li>\n\n\n\n<li>Missing special tokens: Not using the tokenizer\u2019s chat template leads to mismatched turn boundaries.<\/li>\n\n\n\n<li>Ignoring licensing\/PII: Legal and privacy risks can derail deployments.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-a-quick-checklist\">A quick checklist<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Define your dataset shape: raw text vs chat messages.<\/li>\n\n\n\n<li>Normalize Unicode, strip boilerplate, and fix whitespace.<\/li>\n\n\n\n<li>Filter by language and length; redact PII.<\/li>\n\n\n\n<li>Deduplicate exact and near-duplicates across the whole corpus.<\/li>\n\n\n\n<li>Chunk by tokens with sensible overlap; mark document boundaries.<\/li>\n\n\n\n<li>Attach useful metadata for weighting and audits.<\/li>\n\n\n\n<li>Split by document for train\/val\/test and verify no leakage.<\/li>\n\n\n\n<li>Use chat templates for instruction data; verify special tokens.<\/li>\n\n\n\n<li>Log every transform for reproducibility and rollback.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-closing-thoughts\">Closing thoughts<\/h2>\n\n\n\n<p>LLM quality is a direct reflection of the tokens you feed the model. With consistent schemas, careful normalization, robust deduplication, and token-aware chunking, you\u2019ll get cleaner signals, smoother training, and better production behavior. Start simple, instrument your pipeline, and iterate with small evaluations. Your future self\u2014and your model\u2014will thank you.<\/p>\n\n\n\n<ul class=\"wp-block-yoast-seo-related-links yoast-seo-related-links\">\n<li><a href=\"https:\/\/www.cloudproinc.com.au\/index.php\/2024\/09\/10\/identify-azure-users-without-mfa-using-powershell\/\">Identify Azure Users Without MFA Using PowerShell<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/09\/15\/architecture-of-rag-building-reliable-retrieval-augmented-ai\/\">Architecture of RAG Building Reliable Retrieval Augmented AI<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.cloudproinc.com.au\/index.php\/2024\/03\/28\/a-guide-to-deploying-exe-apps-with-intune\/\">A Guide to Deploying EXE Apps with Intune<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/09\/15\/how-text-chunking-works-for-rag-pipelines\/\">How Text Chunking Works for RAG Pipelines<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/09\/03\/integrate-tiktoken-in-python-applications\/\">Integrate Tiktoken in Python Applications<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Practical steps to clean, normalize, chunk, and structure text for training and fine-tuning LLMs, with clear explanations and runnable code.<\/p>\n","protected":false},"author":1,"featured_media":53871,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_yoast_wpseo_focuskw":"Preparing Input Text for Training LLMs","_yoast_wpseo_title":"","_yoast_wpseo_metadesc":"Learn about preparing input text for training LLMs with practical tips for improved model performance in production.","_yoast_wpseo_opengraph-title":"","_yoast_wpseo_opengraph-description":"","_yoast_wpseo_twitter-title":"","_yoast_wpseo_twitter-description":"","_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[24,13,77],"tags":[],"class_list":["post-53864","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-blog","category-llm"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.3 (Yoast SEO v27.3) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Preparing Input Text for Training LLMs - CPI Consulting<\/title>\n<meta name=\"description\" content=\"Learn about preparing input text for training LLMs with practical tips for improved model performance in production.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/cloudproinc.azurewebsites.net\/index.php\/2025\/09\/15\/preparing-input-text-for-training-llms\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Preparing Input Text for Training LLMs\" \/>\n<meta property=\"og:description\" content=\"Learn about preparing input text for training LLMs with practical tips for improved model performance in production.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/cloudproinc.azurewebsites.net\/index.php\/2025\/09\/15\/preparing-input-text-for-training-llms\/\" \/>\n<meta property=\"og:site_name\" content=\"CPI Consulting\" \/>\n<meta property=\"article:published_time\" content=\"2025-09-15T01:36:36+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-09-15T01:36:38+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.cloudproinc.com.au\/wp-content\/uploads\/2025\/09\/preparing-input-text-for-training-llms-that-perform-in-production.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1536\" \/>\n\t<meta property=\"og:image:height\" content=\"1024\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"CPI Staff\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"CPI Staff\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/index.php\\\/2025\\\/09\\\/15\\\/preparing-input-text-for-training-llms\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/index.php\\\/2025\\\/09\\\/15\\\/preparing-input-text-for-training-llms\\\/\"},\"author\":{\"name\":\"CPI Staff\",\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/#\\\/schema\\\/person\\\/192eeeb0ce91062126ce3822ae88fe6e\"},\"headline\":\"Preparing Input Text for Training LLMs\",\"datePublished\":\"2025-09-15T01:36:36+00:00\",\"dateModified\":\"2025-09-15T01:36:38+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/index.php\\\/2025\\\/09\\\/15\\\/preparing-input-text-for-training-llms\\\/\"},\"wordCount\":899,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/index.php\\\/2025\\\/09\\\/15\\\/preparing-input-text-for-training-llms\\\/#primaryimage\"},\"thumbnailUrl\":\"\\\/wp-content\\\/uploads\\\/2025\\\/09\\\/preparing-input-text-for-training-llms-that-perform-in-production.png\",\"articleSection\":[\"AI\",\"Blog\",\"LLM\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/index.php\\\/2025\\\/09\\\/15\\\/preparing-input-text-for-training-llms\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/index.php\\\/2025\\\/09\\\/15\\\/preparing-input-text-for-training-llms\\\/\",\"url\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/index.php\\\/2025\\\/09\\\/15\\\/preparing-input-text-for-training-llms\\\/\",\"name\":\"Preparing Input Text for Training LLMs - CPI Consulting\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/index.php\\\/2025\\\/09\\\/15\\\/preparing-input-text-for-training-llms\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/index.php\\\/2025\\\/09\\\/15\\\/preparing-input-text-for-training-llms\\\/#primaryimage\"},\"thumbnailUrl\":\"\\\/wp-content\\\/uploads\\\/2025\\\/09\\\/preparing-input-text-for-training-llms-that-perform-in-production.png\",\"datePublished\":\"2025-09-15T01:36:36+00:00\",\"dateModified\":\"2025-09-15T01:36:38+00:00\",\"description\":\"Learn about preparing input text for training LLMs with practical tips for improved model performance in production.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/index.php\\\/2025\\\/09\\\/15\\\/preparing-input-text-for-training-llms\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/index.php\\\/2025\\\/09\\\/15\\\/preparing-input-text-for-training-llms\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/index.php\\\/2025\\\/09\\\/15\\\/preparing-input-text-for-training-llms\\\/#primaryimage\",\"url\":\"\\\/wp-content\\\/uploads\\\/2025\\\/09\\\/preparing-input-text-for-training-llms-that-perform-in-production.png\",\"contentUrl\":\"\\\/wp-content\\\/uploads\\\/2025\\\/09\\\/preparing-input-text-for-training-llms-that-perform-in-production.png\",\"width\":1536,\"height\":1024},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/index.php\\\/2025\\\/09\\\/15\\\/preparing-input-text-for-training-llms\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/cloudproinc.com.au\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Preparing Input Text for Training LLMs\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/#website\",\"url\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/\",\"name\":\"Cloud Pro Inc - CPI Consulting Pty Ltd\",\"description\":\"Cloud, AI &amp; Cybersecurity Consulting | Melbourne\",\"publisher\":{\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/#organization\",\"name\":\"Cloud Pro Inc - Cloud Pro Inc - CPI Consulting Pty Ltd\",\"url\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"\\\/wp-content\\\/uploads\\\/2022\\\/01\\\/favfinalfile.png\",\"contentUrl\":\"\\\/wp-content\\\/uploads\\\/2022\\\/01\\\/favfinalfile.png\",\"width\":500,\"height\":500,\"caption\":\"Cloud Pro Inc - Cloud Pro Inc - CPI Consulting Pty Ltd\"},\"image\":{\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/#\\\/schema\\\/logo\\\/image\\\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/cloudproinc.azurewebsites.net\\\/#\\\/schema\\\/person\\\/192eeeb0ce91062126ce3822ae88fe6e\",\"name\":\"CPI Staff\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/2d96eeb53b791d92c8c50dd667e3beec92c93253bb6ff21c02cfa8ca73665c70?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/2d96eeb53b791d92c8c50dd667e3beec92c93253bb6ff21c02cfa8ca73665c70?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/2d96eeb53b791d92c8c50dd667e3beec92c93253bb6ff21c02cfa8ca73665c70?s=96&d=mm&r=g\",\"caption\":\"CPI Staff\"},\"sameAs\":[\"http:\\\/\\\/www.cloudproinc.com.au\"],\"url\":\"https:\\\/\\\/www.cloudproinc.com.au\\\/index.php\\\/author\\\/cpiadmin\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Preparing Input Text for Training LLMs - CPI Consulting","description":"Learn about preparing input text for training LLMs with practical tips for improved model performance in production.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/cloudproinc.azurewebsites.net\/index.php\/2025\/09\/15\/preparing-input-text-for-training-llms\/","og_locale":"en_US","og_type":"article","og_title":"Preparing Input Text for Training LLMs","og_description":"Learn about preparing input text for training LLMs with practical tips for improved model performance in production.","og_url":"https:\/\/cloudproinc.azurewebsites.net\/index.php\/2025\/09\/15\/preparing-input-text-for-training-llms\/","og_site_name":"CPI Consulting","article_published_time":"2025-09-15T01:36:36+00:00","article_modified_time":"2025-09-15T01:36:38+00:00","og_image":[{"width":1536,"height":1024,"url":"https:\/\/www.cloudproinc.com.au\/wp-content\/uploads\/2025\/09\/preparing-input-text-for-training-llms-that-perform-in-production.png","type":"image\/png"}],"author":"CPI Staff","twitter_card":"summary_large_image","twitter_misc":{"Written by":"CPI Staff","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/cloudproinc.azurewebsites.net\/index.php\/2025\/09\/15\/preparing-input-text-for-training-llms\/#article","isPartOf":{"@id":"https:\/\/cloudproinc.azurewebsites.net\/index.php\/2025\/09\/15\/preparing-input-text-for-training-llms\/"},"author":{"name":"CPI Staff","@id":"https:\/\/cloudproinc.azurewebsites.net\/#\/schema\/person\/192eeeb0ce91062126ce3822ae88fe6e"},"headline":"Preparing Input Text for Training LLMs","datePublished":"2025-09-15T01:36:36+00:00","dateModified":"2025-09-15T01:36:38+00:00","mainEntityOfPage":{"@id":"https:\/\/cloudproinc.azurewebsites.net\/index.php\/2025\/09\/15\/preparing-input-text-for-training-llms\/"},"wordCount":899,"commentCount":0,"publisher":{"@id":"https:\/\/cloudproinc.azurewebsites.net\/#organization"},"image":{"@id":"https:\/\/cloudproinc.azurewebsites.net\/index.php\/2025\/09\/15\/preparing-input-text-for-training-llms\/#primaryimage"},"thumbnailUrl":"\/wp-content\/uploads\/2025\/09\/preparing-input-text-for-training-llms-that-perform-in-production.png","articleSection":["AI","Blog","LLM"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/cloudproinc.azurewebsites.net\/index.php\/2025\/09\/15\/preparing-input-text-for-training-llms\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/cloudproinc.azurewebsites.net\/index.php\/2025\/09\/15\/preparing-input-text-for-training-llms\/","url":"https:\/\/cloudproinc.azurewebsites.net\/index.php\/2025\/09\/15\/preparing-input-text-for-training-llms\/","name":"Preparing Input Text for Training LLMs - CPI Consulting","isPartOf":{"@id":"https:\/\/cloudproinc.azurewebsites.net\/#website"},"primaryImageOfPage":{"@id":"https:\/\/cloudproinc.azurewebsites.net\/index.php\/2025\/09\/15\/preparing-input-text-for-training-llms\/#primaryimage"},"image":{"@id":"https:\/\/cloudproinc.azurewebsites.net\/index.php\/2025\/09\/15\/preparing-input-text-for-training-llms\/#primaryimage"},"thumbnailUrl":"\/wp-content\/uploads\/2025\/09\/preparing-input-text-for-training-llms-that-perform-in-production.png","datePublished":"2025-09-15T01:36:36+00:00","dateModified":"2025-09-15T01:36:38+00:00","description":"Learn about preparing input text for training LLMs with practical tips for improved model performance in production.","breadcrumb":{"@id":"https:\/\/cloudproinc.azurewebsites.net\/index.php\/2025\/09\/15\/preparing-input-text-for-training-llms\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/cloudproinc.azurewebsites.net\/index.php\/2025\/09\/15\/preparing-input-text-for-training-llms\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/cloudproinc.azurewebsites.net\/index.php\/2025\/09\/15\/preparing-input-text-for-training-llms\/#primaryimage","url":"\/wp-content\/uploads\/2025\/09\/preparing-input-text-for-training-llms-that-perform-in-production.png","contentUrl":"\/wp-content\/uploads\/2025\/09\/preparing-input-text-for-training-llms-that-perform-in-production.png","width":1536,"height":1024},{"@type":"BreadcrumbList","@id":"https:\/\/cloudproinc.azurewebsites.net\/index.php\/2025\/09\/15\/preparing-input-text-for-training-llms\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/cloudproinc.com.au\/"},{"@type":"ListItem","position":2,"name":"Preparing Input Text for Training LLMs"}]},{"@type":"WebSite","@id":"https:\/\/cloudproinc.azurewebsites.net\/#website","url":"https:\/\/cloudproinc.azurewebsites.net\/","name":"Cloud Pro Inc - CPI Consulting Pty Ltd","description":"Cloud, AI &amp; Cybersecurity Consulting | Melbourne","publisher":{"@id":"https:\/\/cloudproinc.azurewebsites.net\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/cloudproinc.azurewebsites.net\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/cloudproinc.azurewebsites.net\/#organization","name":"Cloud Pro Inc - Cloud Pro Inc - CPI Consulting Pty Ltd","url":"https:\/\/cloudproinc.azurewebsites.net\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/cloudproinc.azurewebsites.net\/#\/schema\/logo\/image\/","url":"\/wp-content\/uploads\/2022\/01\/favfinalfile.png","contentUrl":"\/wp-content\/uploads\/2022\/01\/favfinalfile.png","width":500,"height":500,"caption":"Cloud Pro Inc - Cloud Pro Inc - CPI Consulting Pty Ltd"},"image":{"@id":"https:\/\/cloudproinc.azurewebsites.net\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/cloudproinc.azurewebsites.net\/#\/schema\/person\/192eeeb0ce91062126ce3822ae88fe6e","name":"CPI Staff","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/2d96eeb53b791d92c8c50dd667e3beec92c93253bb6ff21c02cfa8ca73665c70?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/2d96eeb53b791d92c8c50dd667e3beec92c93253bb6ff21c02cfa8ca73665c70?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/2d96eeb53b791d92c8c50dd667e3beec92c93253bb6ff21c02cfa8ca73665c70?s=96&d=mm&r=g","caption":"CPI Staff"},"sameAs":["http:\/\/www.cloudproinc.com.au"],"url":"https:\/\/www.cloudproinc.com.au\/index.php\/author\/cpiadmin\/"}]}},"jetpack_featured_media_url":"\/wp-content\/uploads\/2025\/09\/preparing-input-text-for-training-llms-that-perform-in-production.png","jetpack-related-posts":[{"id":53863,"url":"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/09\/15\/practical-ways-to-fine-tune-llms\/","url_meta":{"origin":53864,"position":0},"title":"Practical ways to fine-tune LLMs","author":"CPI Staff","date":"September 15, 2025","format":false,"excerpt":"A practical guide to LLM fine-tuning methods, when to use them, and how to implement LoRA and QLoRA with solid evaluation and safety steps.","rel":"","context":"In &quot;AI&quot;","block_context":{"text":"AI","link":"https:\/\/www.cloudproinc.com.au\/index.php\/category\/ai\/"},"img":{"alt_text":"","src":"\/wp-content\/uploads\/2025\/09\/practical-ways-to-fine-tune-llms-and-choosing-the-right-method.png","width":350,"height":200,"srcset":"\/wp-content\/uploads\/2025\/09\/practical-ways-to-fine-tune-llms-and-choosing-the-right-method.png 1x, \/wp-content\/uploads\/2025\/09\/practical-ways-to-fine-tune-llms-and-choosing-the-right-method.png 1.5x, \/wp-content\/uploads\/2025\/09\/practical-ways-to-fine-tune-llms-and-choosing-the-right-method.png 2x, \/wp-content\/uploads\/2025\/09\/practical-ways-to-fine-tune-llms-and-choosing-the-right-method.png 3x, \/wp-content\/uploads\/2025\/09\/practical-ways-to-fine-tune-llms-and-choosing-the-right-method.png 4x"},"classes":[]},{"id":56966,"url":"https:\/\/www.cloudproinc.com.au\/index.php\/2026\/02\/05\/detecting-backdoors-in-open-weight-llms\/","url_meta":{"origin":53864,"position":1},"title":"Detecting Backdoors in Open-Weight LLMs","author":"CPI Staff","date":"February 5, 2026","format":false,"excerpt":"Open-weight language models can hide \u201csleeper\u201d behaviors that only appear under specific triggers. Here\u2019s a practical, team-friendly workflow to test, detect, and reduce backdoor risk before production.","rel":"","context":"In &quot;Blog&quot;","block_context":{"text":"Blog","link":"https:\/\/www.cloudproinc.com.au\/index.php\/category\/blog\/"},"img":{"alt_text":"","src":"\/wp-content\/uploads\/2026\/02\/post-9.png","width":350,"height":200,"srcset":"\/wp-content\/uploads\/2026\/02\/post-9.png 1x, \/wp-content\/uploads\/2026\/02\/post-9.png 1.5x, \/wp-content\/uploads\/2026\/02\/post-9.png 2x, \/wp-content\/uploads\/2026\/02\/post-9.png 3x, \/wp-content\/uploads\/2026\/02\/post-9.png 4x"},"classes":[]},{"id":53721,"url":"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/27\/what-are-tensors-in-ai-and-large-language-models-llms\/","url_meta":{"origin":53864,"position":2},"title":"What Are Tensors in AI and Large Language Models (LLMs)?","author":"CPI Staff","date":"August 27, 2025","format":false,"excerpt":"In this post \"What Are Tensors in AI and Large Language Models (LLMs)?\", we\u2019ll explore what tensors are, how they are used in AI and LLMs, and why they matter for organizations looking to leverage machine learning effectively. Artificial Intelligence (AI) and Large Language Models (LLMs) like GPT-4 or LLaMA\u2026","rel":"","context":"In &quot;AI&quot;","block_context":{"text":"AI","link":"https:\/\/www.cloudproinc.com.au\/index.php\/category\/ai\/"},"img":{"alt_text":"","src":"\/wp-content\/uploads\/2025\/08\/what-are-tensors-in-ai-and-large-language-models-llms.png","width":350,"height":200,"srcset":"\/wp-content\/uploads\/2025\/08\/what-are-tensors-in-ai-and-large-language-models-llms.png 1x, \/wp-content\/uploads\/2025\/08\/what-are-tensors-in-ai-and-large-language-models-llms.png 1.5x, \/wp-content\/uploads\/2025\/08\/what-are-tensors-in-ai-and-large-language-models-llms.png 2x, \/wp-content\/uploads\/2025\/08\/what-are-tensors-in-ai-and-large-language-models-llms.png 3x, \/wp-content\/uploads\/2025\/08\/what-are-tensors-in-ai-and-large-language-models-llms.png 4x"},"classes":[]},{"id":53594,"url":"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/11\/llm-self-attention-mechanism-explained\/","url_meta":{"origin":53864,"position":3},"title":"LLM Self-Attention Mechanism Explained","author":"CPI Staff","date":"August 11, 2025","format":false,"excerpt":"In this post, \"LLM Self-Attention Mechanism Explained\"we\u2019ll break down how self-attention works, why it\u2019s important, and how to implement it with code examples. Self-attention is one of the core components powering Large Language Models (LLMs) like GPT, BERT, and Transformer-based architectures. It allows a model to dynamically focus on different\u2026","rel":"","context":"In &quot;AI&quot;","block_context":{"text":"AI","link":"https:\/\/www.cloudproinc.com.au\/index.php\/category\/ai\/"},"img":{"alt_text":"","src":"\/wp-content\/uploads\/2025\/08\/ChatGPT-Image-Aug-11-2025-08_28_04-PM.png","width":350,"height":200,"srcset":"\/wp-content\/uploads\/2025\/08\/ChatGPT-Image-Aug-11-2025-08_28_04-PM.png 1x, \/wp-content\/uploads\/2025\/08\/ChatGPT-Image-Aug-11-2025-08_28_04-PM.png 1.5x, \/wp-content\/uploads\/2025\/08\/ChatGPT-Image-Aug-11-2025-08_28_04-PM.png 2x, \/wp-content\/uploads\/2025\/08\/ChatGPT-Image-Aug-11-2025-08_28_04-PM.png 3x, \/wp-content\/uploads\/2025\/08\/ChatGPT-Image-Aug-11-2025-08_28_04-PM.png 4x"},"classes":[]},{"id":53573,"url":"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/08\/06\/how-to-code-and-build-a-gpt-large-language-model\/","url_meta":{"origin":53864,"position":4},"title":"How to Code and Build a GPT Large Language Model","author":"CPI Staff","date":"August 6, 2025","format":false,"excerpt":"In this blog post, you\u2019ll learn how to code and build a GPT LLM from scratch or fine-tune an existing one. We\u2019ll cover the architecture, key tools, libraries, frameworks, and essential resources to get you started fast. Table of contentsUnderstanding GPT LLM ArchitectureModel Architecture DiagramTools and Libraries to Build a\u2026","rel":"","context":"In &quot;AI&quot;","block_context":{"text":"AI","link":"https:\/\/www.cloudproinc.com.au\/index.php\/category\/ai\/"},"img":{"alt_text":"","src":"\/wp-content\/uploads\/2025\/08\/CreateLLM.png","width":350,"height":200,"srcset":"\/wp-content\/uploads\/2025\/08\/CreateLLM.png 1x, \/wp-content\/uploads\/2025\/08\/CreateLLM.png 1.5x, \/wp-content\/uploads\/2025\/08\/CreateLLM.png 2x, \/wp-content\/uploads\/2025\/08\/CreateLLM.png 3x, \/wp-content\/uploads\/2025\/08\/CreateLLM.png 4x"},"classes":[]},{"id":53520,"url":"https:\/\/www.cloudproinc.com.au\/index.php\/2025\/07\/21\/running-pytorch-in-microsoft-azure-machine-learning\/","url_meta":{"origin":53864,"position":5},"title":"Running PyTorch in Microsoft Azure Machine Learning","author":"CPI Staff","date":"July 21, 2025","format":false,"excerpt":"This post will walk you through what PyTorch is, how it's used in ML and LLM development, and how you can start running it in Azure ML using Jupyter notebooks. If you're working on deep learning, computer vision, or building large language models (LLMs), you've probably come across PyTorch. But\u2026","rel":"","context":"In &quot;AI&quot;","block_context":{"text":"AI","link":"https:\/\/www.cloudproinc.com.au\/index.php\/category\/ai\/"},"img":{"alt_text":"","src":"\/wp-content\/uploads\/2025\/05\/Add-bootstrap-logo.png","width":350,"height":200,"srcset":"\/wp-content\/uploads\/2025\/05\/Add-bootstrap-logo.png 1x, \/wp-content\/uploads\/2025\/05\/Add-bootstrap-logo.png 1.5x, \/wp-content\/uploads\/2025\/05\/Add-bootstrap-logo.png 2x, \/wp-content\/uploads\/2025\/05\/Add-bootstrap-logo.png 3x, \/wp-content\/uploads\/2025\/05\/Add-bootstrap-logo.png 4x"},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/posts\/53864","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/comments?post=53864"}],"version-history":[{"count":2,"href":"https:\/\/www.cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/posts\/53864\/revisions"}],"predecessor-version":[{"id":53880,"href":"https:\/\/www.cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/posts\/53864\/revisions\/53880"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/media\/53871"}],"wp:attachment":[{"href":"https:\/\/www.cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/media?parent=53864"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/categories?post=53864"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.cloudproinc.com.au\/index.php\/wp-json\/wp\/v2\/tags?post=53864"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}