{"id":143019,"date":"2026-01-24T17:22:05","date_gmt":"2026-01-24T17:22:05","guid":{"rendered":"https:\/\/darkopavic.xyz\/?p=143019"},"modified":"2026-01-24T17:24:19","modified_gmt":"2026-01-24T17:24:19","slug":"your-ai-assistant-is-lying-to-you-unless-your-retail-data-looks-like-this-1803-euroshop-exhibitors-proof-inside","status":"publish","type":"post","link":"https:\/\/darkopavic.xyz\/index.php\/2026\/01\/24\/your-ai-assistant-is-lying-to-you-unless-your-retail-data-looks-like-this-1803-euroshop-exhibitors-proof-inside\/","title":{"rendered":"Your AI Assistant Is Lying to You \u2014 Unless Your Retail Data Looks Like This (1,803 EuroShop Exhibitors, Proof Inside)"},"content":{"rendered":"\n<p>I wanted to learn what \u201cAI-ready data\u201d really means in practice, so I built an enrichment pipeline from scratch and stress\u2011tested it on the EuroShop 2026 exhibitor list. Here\u2019s what broke, what I changed, and what I learned, plus a free dataset you can use for search, rankings, planning, and chatbots.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">A 10\u2011minute problem every retailer recognizes<\/h1>\n\n\n\n<p>You\u2019re walking into a vendor meeting at a trade show. Ten minutes to go. You open an \u201cAI assistant\u201d and ask:<br>\u201cGive me a one\u2011page brief: what they do, likely strengths, risks\/unknowns, and 5 sharp questions, and back it up with evidence.\u201d<br><br>If your dataset is just a directory listing, the answer is generic. Worse: the model starts guessing.<br><br>So I tried something simple: stop blaming the model\u2026 and fix the data.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Why this matters now (retail + agentic AI)<\/h1>\n\n\n\n<p>Retail is entering an era where the *interface* to software is no longer a screen \u2014 it\u2019s a conversation, an agent, a workflow.<br>But agentic AI doesn\u2019t run on vibes. It runs on structured facts: consistent fields, reliable evidence, and clear confidence signals.<br><br>If you want AI to support real decisions (vendor ranking, expansion planning, rollout risk, investment priority), you need more than \u201ccategory tags.\u201d You need data you can trust and explain.<br><br>I wrote more about the broader retail context in my post <a href=\"https:\/\/darkopavic.xyz\/index.php\/2026\/01\/24\/from-raw-retail-data-to-agentic-ai\/\">\u201cFrom Raw Retail Data to Agentic AI\u201d<\/a>.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">My experiment: EuroShop\u2019s public exhibitor list<\/h1>\n\n\n\n<p>Instead of theorizing, I picked a dataset that\u2019s familiar to everyone in retail tech:<br>the public exhibitor list for EuroShop 2026.<br><br>Goal: turn it into something you can safely use for:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>AI search (semantic + keyword)<\/li>\n\n\n\n<li>Ranking &amp; shortlists (explainable)<\/li>\n\n\n\n<li>Predictions &amp; planning (trend signals)<\/li>\n\n\n\n<li>Chatbot \/ RAG training (low hallucination risk)<\/li>\n<\/ul>\n\n\n\n<p>The initial export looks \u201cokay\u201d\u2026 until you try to use it as an AI brain.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">The first surprise: the dataset is not the problem \u2014 the missing context is<\/h1>\n\n\n\n<p>A trade\u2011show directory is optimized for humans scrolling, not for machines reasoning.<br><br>In the raw-ish export (in this moment, maybe they will added it later):<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Only 43.5% exhibitors even have a description.<\/li>\n\n\n\n<li>Only 35.6% have any social link.<\/li>\n\n\n\n<li>PII is everywhere (emails in 90.5% of entries; phone numbers in nearly all).<\/li>\n\n\n\n<li>And here\u2019s a fun one: 1327 exhibitors contained a placeholder image fragment inside category text (pure noise).<\/li>\n<\/ul>\n\n\n\n<p>When you ask an LLM to \u201canalyze\u201d this, it has two choices: be vague \u2014 or hallucinate.<br>So the real task became: create a dataset where the model can stay grounded.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">The second surprise: websites are 60\u201380% junk<\/h1>\n\n\n\n<p>If you crawl exhibitor sites, you quickly learn that most text is not product truth:<br>cookie banners, navigation menus, country dropdowns, footers, job pages, login screens\u2026<br><br>That\u2019s why the pipeline needed an aggressive cleaning layer before any extraction.<br>Otherwise you end up tagging a company as \u201cAI\u201d because their cookie policy says \u201cwe use analytics.\u201d<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">The pipeline (plain English, no hype)<\/h1>\n\n\n\n<p>I ended up building a multi\u2011phase enrichment pipeline:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Crawl: start from the homepage, then prioritize the pages that usually contain truth (About, Products, Technology).<\/li>\n\n\n\n<li>Clean: remove boilerplate, menus, cookie text; detect navigation patterns; truncate safely.<\/li>\n\n\n\n<li>Extract: pull structured fields (what they do, products, capabilities, industries, tech signals) and capture evidence snippets.<\/li>\n\n\n\n<li>Classify: assign business category, themes, and controlled tags (not free\u2011form chaos).<\/li>\n\n\n\n<li>Score: separate truth confidence (how verified) from data richness (how complete) and flag quality issues.<\/li>\n<\/ol>\n\n\n\n<p>Key design choice: evidence-first. If a field has no evidence, it gets lower confidence \u2014 and your AI can be instructed to treat it as a hint, not a fact.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"728\" src=\"https:\/\/darkopavic.xyz\/wp-content\/uploads\/2026\/01\/Bildschirmfoto-2026-01-24-um-12.12.30-1024x728.png\" alt=\"\" class=\"wp-image-143020\" srcset=\"https:\/\/darkopavic.xyz\/wp-content\/uploads\/2026\/01\/Bildschirmfoto-2026-01-24-um-12.12.30-1024x728.png 1024w, https:\/\/darkopavic.xyz\/wp-content\/uploads\/2026\/01\/Bildschirmfoto-2026-01-24-um-12.12.30-300x213.png 300w, https:\/\/darkopavic.xyz\/wp-content\/uploads\/2026\/01\/Bildschirmfoto-2026-01-24-um-12.12.30-768x546.png 768w, https:\/\/darkopavic.xyz\/wp-content\/uploads\/2026\/01\/Bildschirmfoto-2026-01-24-um-12.12.30-1536x1092.png 1536w, https:\/\/darkopavic.xyz\/wp-content\/uploads\/2026\/01\/Bildschirmfoto-2026-01-24-um-12.12.30-2048x1456.png 2048w, https:\/\/darkopavic.xyz\/wp-content\/uploads\/2026\/01\/Bildschirmfoto-2026-01-24-um-12.12.30-416x295.png 416w, https:\/\/darkopavic.xyz\/wp-content\/uploads\/2026\/01\/Bildschirmfoto-2026-01-24-um-12.12.30-420x300.png 420w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-left\">Links to the technical deep dive:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/docs.google.com\/document\/d\/1B6JQN12pcs6QqNCRCkQzbaT4165dasa4\/edit?usp=sharing&amp;ouid=104407933488072619673&amp;rtpof=true&amp;sd=true\">App overview<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/docs.google.com\/document\/d\/1bfY_MjxneVlsmPpu6D3MEmJ3EouE8hdl\/edit?usp=sharing&amp;ouid=104407933488072619673&amp;rtpof=true&amp;sd=true\">Algo details<\/a><\/li>\n<\/ul>\n\n\n\n<h2 data-wp-context---core-fit-text=\"core\/fit-text::{&quot;fontSize&quot;:&quot;&quot;}\" data-wp-init---core-fit-text=\"core\/fit-text::callbacks.init\" data-wp-interactive data-wp-style--font-size=\"core\/fit-text::context.fontSize\" class=\"wp-block-heading has-fit-text\"><br>Raw vs AI\u2011ready: the numbers that actually matter<\/h2>\n\n\n\n<p>Here\u2019s what changed after enrichment (same 1,803 exhibitors):<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>Metric<\/td><td>Original export<\/td><td>AI\u2011ready export<\/td><\/tr><tr><td>Exhibitors<\/td><td>1803<\/td><td>1803<\/td><\/tr><tr><td>Countries<\/td><td>63<\/td><td>63<\/td><\/tr><tr><td>Approx. file size<\/td><td>8.8 MB (JSON)<\/td><td>33.4 MB (JSON, AI-ready)<\/td><\/tr><tr><td>Websites present<\/td><td>1717 (95.2%)<\/td><td>1717 (95.2%)<\/td><\/tr><tr><td>Descriptions present<\/td><td>784 (43.5%)<\/td><td>784 (43.5%)<\/td><\/tr><tr><td>Value proposition generated<\/td><td>\u2014<\/td><td>1587 (88.0%)<\/td><\/tr><tr><td>Any social link found<\/td><td>641 (35.6%)<\/td><td>893 (49.5%)<\/td><\/tr><tr><td>Avg. pages fetched per exhibitor<\/td><td>\u2014<\/td><td>1.98 (max 3)<\/td><\/tr><tr><td>PII included<\/td><td>Emails: 1632 (90.5%), Phones: 1803 (100.0%)<\/td><td>PII stripped: True (tracked via redaction counts)<\/td><\/tr><tr><td>Avg. tags per exhibitor<\/td><td>\u2014<\/td><td>2.61 (cap 12)<\/td><\/tr><tr><td>Data Richness score (median)<\/td><td>\u2014<\/td><td>47\/100<\/td><\/tr><tr><td>Overall Confidence score (median)<\/td><td>\u2014<\/td><td>53\/100<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>A couple of extra signals that became possible:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Business categories (distribution): services-other 593, industrial-manufacturing 562, retail-tech 428, storefit-fixtures 220.<\/li>\n\n\n\n<li>Top themes detected: Sustainability 207, Digital Signage 200, Other 175, Logistics &amp; Supply Chain 137, Store Design &amp; Fixtures 131, Point of Sale 122, Lighting &amp; Energy 119, Payments &amp; Checkout 103.<\/li>\n\n\n\n<li>Top exhibitor countries: Germany 502, China 236, Italy 154, Netherlands 76, Poland 66.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1024\" height=\"767\" src=\"https:\/\/darkopavic.xyz\/wp-content\/uploads\/2026\/01\/Bildschirmfoto-2026-01-24-um-12.14.14-1024x767.png\" alt=\"\" class=\"wp-image-143021\" srcset=\"https:\/\/darkopavic.xyz\/wp-content\/uploads\/2026\/01\/Bildschirmfoto-2026-01-24-um-12.14.14-1024x767.png 1024w, https:\/\/darkopavic.xyz\/wp-content\/uploads\/2026\/01\/Bildschirmfoto-2026-01-24-um-12.14.14-300x225.png 300w, https:\/\/darkopavic.xyz\/wp-content\/uploads\/2026\/01\/Bildschirmfoto-2026-01-24-um-12.14.14-768x575.png 768w, https:\/\/darkopavic.xyz\/wp-content\/uploads\/2026\/01\/Bildschirmfoto-2026-01-24-um-12.14.14-1536x1150.png 1536w, https:\/\/darkopavic.xyz\/wp-content\/uploads\/2026\/01\/Bildschirmfoto-2026-01-24-um-12.14.14-308x230.png 308w, https:\/\/darkopavic.xyz\/wp-content\/uploads\/2026\/01\/Bildschirmfoto-2026-01-24-um-12.14.14.png 1752w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h1 class=\"wp-block-heading\">What you can do with this dataset (real prompts, real value)<\/h1>\n\n\n\n<p>Below are examples that go way beyond \u201cshow me exhibitors in category X\u201d. They\u2019re designed to save time and reduce bad vendor meetings.<\/p>\n\n\n\n<p><strong>1) The 10\u2011minute meeting brief<\/strong><\/p>\n\n\n\n<p>I\u2019m meeting {company_name} in 10 minutes. Create a 1\u2011page brief: what they do, 3 likely strengths, 3 risks\/unknowns, 5 sharp questions, and a recommended next step (demo\/pilot\/workshop). Use only evidence-backed info from the dataset, and cite the evidence fields.<\/p>\n\n\n\n<p><strong>2) \u201cEdge cases only\u201d vendor shortlist (returns\/offline\/promos)<\/strong><\/p>\n\n\n\n<p>My biggest issues: offline transactions, cross\u2011channel returns, coupon stacking, and fiscal compliance changes. Find exhibitors whose capabilities suggest they can handle these edge cases. For each exhibitor: which edge case they likely handle, what evidence supports it, and what you\u2019d validate live.<\/p>\n\n\n\n<p><strong>3) Build my EuroShop visit plan (who + where + when)<\/strong><\/p>\n\n\n\n<p>I have 1 day at EuroShop. I care about {themes}. Build an itinerary with 12 exhibitors: morning\/afternoon blocks, halls\/stands, and 2 goals per meeting. Prioritize high confidence + high richness profiles, but include 2 \u2018wildcards\u2019 with strong innovation signals.<\/p>\n\n\n\n<p><strong>4) Create a retailer\u2011specific top\u201120 list<\/strong><\/p>\n\n\n\n<p>I run 180 fashion stores in 5 countries. Recommend 20 exhibitors for (checkout, loss prevention, ESL, store ops automation). Group them by problem. For each, give a 2\u2011line fit summary, key capabilities, and 3 questions that expose integration and operational risks.<\/p>\n\n\n\n<p><strong>5) Vendor risk radar (what the dataset cannot prove)<\/strong><\/p>\n\n\n\n<p>For the top 30 exhibitors by innovation_score, list the top 5 unknowns you would validate before a pilot (integration, support model, certifications, regional readiness, references). Use data_quality_flags and confidence scores to justify each unknown.<\/p>\n\n\n\n<p><strong>6) Competitive landscape map<\/strong><\/p>\n\n\n\n<p>Cluster exhibitors into 8\u201312 clusters based on themes + tech signals. For each cluster, describe the \u2018battlefield\u2019 (what they compete on), typical buyer persona, and what\u2019s changing right now.<\/p>\n\n\n\n<p><strong>7) Find partners, not vendors (for POS \/ platform providers)<\/strong><\/p>\n\n\n\n<p>I\u2019m a POS platform vendor. Identify exhibitors that look like strong partnership targets (complementary tech, integration-friendly, clear value proposition). Give 10 targets, the partnership angle, and the evidence behind it.<\/p>\n\n\n\n<p><strong>8) \u201cExplainable ranking\u201d for a decision meeting<\/strong><\/p>\n\n\n\n<p>Rank exhibitors for {use_case} using a weighted approach: 40% capability fit, 25% confidence, 20% evidence quality, 15% innovation signals. Output the score breakdown and the evidence lines used. No guesses.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">How to use it safely (so your AI doesn\u2019t overpromise)<\/h1>\n\n\n\n<p>A few rules I recommend when you plug this into an LLM, search engine, or agent:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Treat low\u2011confidence fields as hypotheses \u2014 not truth.<\/li>\n\n\n\n<li>Prefer evidence-backed fields (snippets\/sources) when generating recommendations.<\/li>\n\n\n\n<li>Use data_quality_flags to avoid ranking errors (e.g., no_pages_fetched, listing_based_extraction).<\/li>\n\n\n\n<li>Keep the OPS dataset separate from AI-ready exports (PII vs non\u2011PII).<\/li>\n\n\n\n<li>For prediction: never train on one signal; combine themes + capabilities + evidence + market context.<\/li>\n<\/ul>\n\n\n\n<h1 class=\"wp-block-heading\">What I learned (the part nobody tells you)<\/h1>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enrichment is less about \u2018more data\u2019 and more about better structure.<\/li>\n\n\n\n<li>Controlled vocabularies beat free-form tags. Otherwise your embeddings become a junkyard.<\/li>\n\n\n\n<li>Separating product technology from website stack prevents embarrassing false positives.<\/li>\n\n\n\n<li>Two scores matter: truth confidence (trust) and data richness (coverage). Don\u2019t mix them.<\/li>\n\n\n\n<li>Quality flags are not negative \u2014 they\u2019re what makes the dataset usable for serious AI.<\/li>\n<\/ul>\n\n\n\n<h1 class=\"wp-block-heading\">Want the dataset?<\/h1>\n\n\n\n<p>I\u2019m sharing the AI\u2011ready enriched dataset for EuroShop exhibitors for free (<a href=\"https:\/\/darkopavic.substack.com\/\">link in my newsletter<\/a>).<br><br>If you\u2019re a retailer, it helps you plan meetings and cut through noise.<br>If you\u2019re a POS vendor or solution provider, it helps you map the ecosystem, find partners, and spot themes early.<br><br>And if you\u2019re building agentic AI in retail: this is a practical example of how to keep LLMs grounded.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I wanted to learn what \u201cAI-ready data\u201d really means in practice, so I built an enrichment pipeline from scratch and stress\u2011tested it on the EuroShop 2026 exhibitor list. Here\u2019s what broke, what I changed, and what I learned, plus a free dataset you can use for search, rankings, planning, and chatbots. A 10\u2011minute problem every&#8230;<\/p>\n","protected":false},"author":1,"featured_media":143023,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"episode_type":"","audio_file":"","cover_image":"","cover_image_id":"","duration":"","filesize":"","date_recorded":"","explicit":"","block":"","itunes_episode_number":"","itunes_title":"","itunes_season_number":"","itunes_episode_type":"","filesize_raw":"","footnotes":""},"categories":[1,56],"tags":[],"class_list":["post-143019","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","category-technology"],"_links":{"self":[{"href":"https:\/\/darkopavic.xyz\/index.php\/wp-json\/wp\/v2\/posts\/143019","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/darkopavic.xyz\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/darkopavic.xyz\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/darkopavic.xyz\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/darkopavic.xyz\/index.php\/wp-json\/wp\/v2\/comments?post=143019"}],"version-history":[{"count":1,"href":"https:\/\/darkopavic.xyz\/index.php\/wp-json\/wp\/v2\/posts\/143019\/revisions"}],"predecessor-version":[{"id":143024,"href":"https:\/\/darkopavic.xyz\/index.php\/wp-json\/wp\/v2\/posts\/143019\/revisions\/143024"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/darkopavic.xyz\/index.php\/wp-json\/wp\/v2\/media\/143023"}],"wp:attachment":[{"href":"https:\/\/darkopavic.xyz\/index.php\/wp-json\/wp\/v2\/media?parent=143019"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/darkopavic.xyz\/index.php\/wp-json\/wp\/v2\/categories?post=143019"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/darkopavic.xyz\/index.php\/wp-json\/wp\/v2\/tags?post=143019"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}