JD Oxygen AI Item Center (Oxygen AIIC) V1: An Industrial-Scale LLM/VLM-Centric Solution for Item Understanding, Management, and Applications
Summary
This paper presents JD Oxygen AI Item Center (Oxygen AIIC), an industrial-scale platform leveraging LLMs/VLMs for item knowledge production and service, achieving high precision and recall, and delivering measurable gains in search, recommendation, and operations on JD.com.
View Cached Full Text
Cached at: 06/29/26, 05:28 AM
# \productfontAI Item Center (\productfontOxygen AIIC) V1: An Industrial-Scale LLM/VLM-Centric Solution for Item Understanding, Management, and Applications
Source: [https://arxiv.org/html/2606.28070](https://arxiv.org/html/2606.28070)
\[ Path = \./fonts/, UprightFont = TimesNewRoman\-Regular\.ttf, BoldFont = TimesNewRoman\-Bold\.ttf, ItalicFont = TimesNewRoman\-Italic\.ttf, BoldItalicFont = TimesNewRomanBold\-Italic\.ttf, Ligatures = TeX \] \[ Path = \./fonts/, UprightFont = Inter\-Regular\.ttf, BoldFont = Inter\-Bold\.ttf, \]
Oxygen AIIC, Chan Long, Chao Liu, Chaofan Chen, Chaohui Dong, Chunyuan Guo, Danping Liu, Debin Liu, Deping Xiang, Fulai Xu, Guangyue Liu, Hao Li, Huichun Hu, Jian Yang, Jianan Wang, Jianbo Zhao, Jiaoyang Li, Jiaxing Wang, Jinglong Li, Jinjin Guo, Jun Fang, Jun Liu, Kai Zhou, Li Wang, Lili Gao, Liying Chen, Luning Yang, Mengdi Zhou, Pengzhang Liu, Qi Lv, Qianyun Wang, Qixia Jiang, Ruyue Li, Shimu Liang, Shuxing Wang, Sijie Zhang, Siqi Li, Tianhao Gao, Wang Ke, Weihu Huang, Wencan Lai, Wenjie Zhang, Xiaohui Zhang, Xiaojing Dong, Ya Liu, Yifeng Zhang, Yixiang Wang, Yongtai Zhang, Yongyi Liao, Zhaoru Chen, Zhen Chen, Zhiyong Ma, Zhiyuan Liu, Zhongwei Liu, Ziyan Xing oxygen\-aiic@jd\.com
###### Abstract
JD\.com111About JD\.com:[https://corporate\.jd\.com/](https://corporate.jd.com/), one of the world’s largest e\-commerce platforms, serves over 700 million active users and millions of merchants, with a catalog of tens of billions of SKUs\. At this scale, high\-quality, structured item knowledge underpins a better consumer experience, lower management costs, and higher operational efficiency—yet producing and serving it poses three industrial\-scale challenges: fast\-emerging concepts, high\-quality knowledge production for massive SKUs, and diverse downstream requirements\. To address these challenges, we present the JD Oxygen AI Item Center \(\\productfontOxygen AIIC\), an industrial\-scale platform built on LLMs/VLMs for item\-knowledge production and service\.\\productfontOxygen AIIC is built around four core pillars: \(i\) ontology engineering driven by efficient human–AI collaboration, which supports the dynamic evolution and agile expansion of an ontology with millions of entries; \(ii\) a “Semantic Search then Discrimination” \(𝖲2𝖣\\mathsf\{S\}^\{2\}\\mathsf\{D\}\) knowledge\-identification architecture that, combined with throughput\-improvement strategies, enables scalable, extensible, and high\-throughput AI Item Library production for tens of billions of SKUs; \(iii\) self\-evolving item\-understanding LLMs/VLMs that improve in a stable and controllable manner, enabling knowledge production with 94\.2% precision and 82\.8% recall; and \(iv\) a unified item tunnel that serves as the data and service hub, delivering item knowledge with tiered freshness\.\\productfontOxygen AIIC now covers tens of thousands of JD categories and processes hundreds of millions of item updates per day on Huawei Ascend NPUs\. It has accumulated hundreds of billions of item\-knowledge assets and increased item\-information richness to 3\.35×\\timesits previous level\. Deployed across core business scenarios—including search, recommendation, operations, and category planning—\\productfontOxygen AIIC has delivered measurable gains at scale\. For example, search\-traffic coverage reaches 80\.4%, item\-information quality issues drop by 37%, the automated fill rate of core attributes during item listing exceeds 80%, and intelligent optimization of item creatives increases click\-through rate by 9%\.
## 1Introduction
JD\.com is a leading e\-commerce platform that serves over 700 million active users and millions of merchants, and manages a catalog of tens of billions of SKUs\. To deliver on its retail value proposition ofbroader selection, faster delivery, better quality, and lower cost, JD has made “cost, efficiency, and experience” its core strategic priorities\. As e\-commerce has grown rapidly, traditional item knowledge systems can no longer support this strategy effectively, giving rise to three industrial\-scale bottlenecks across the demand, supply, and operations sides, as illustrated in Figure[1](https://arxiv.org/html/2606.28070#S1.F1):
- •Demand side: incomplete item information and semantic gaps\.Incomplete item information, combined with the varied ways users describe their needs—e\.g\., “charcoal gray” vs\. “Morandi palette”—leads to semantic mismatches that degrade the user experience and reduce traffic\-allocation efficiency\(Nigamet al\.,[2019](https://arxiv.org/html/2606.28070#bib.bib46)\)\.
- •Supply side: costly item management and inefficient traffic acquisition\.Merchants are required to provide and continuously maintain multi\-dimensional product information\. However, the manual nature of this process makes it costly and inefficient, leading to lower product information quality and limiting merchants’ ability to attract traffic\.
- •Operations side: fast\-changing market trends and growing demand for fine\-grained operations\.Frequent trend shifts and increasingly fine\-grained operational requirements make trend sensing and item operations more difficult, ultimately constraining the platform’s efficiency\.
To address these bottlenecks, earlier industrial systems adopted traditional NLP techniques and pretrained models, such as BERT\-based architectures for named entity recognition \(NER\)\(Luoet al\.,[2020](https://arxiv.org/html/2606.28070#bib.bib11)\)\. However, these methods remain limited in two major respects\. First, owing to their limited model capacity and reliance on task\-specific fine\-tuning, they struggle to bridge distributional gaps across heterogeneous e\-commerce data sources and lack robustness to emerging concepts\. Second, they suffer from a “manual\-annotation bottleneck”, in which system accuracy is tightly coupled with costly and unsustainable human labeling\. As a result, these methods are difficult to deploy at scale while keeping costs low and quality high\.
Figure 1:Typical failure cases in traditional item knowledge systems across the demand, supply, and operations sides\.The rapid progress of LLMs/VLMs offers a way out of the long\-standing impasse of high labor cost and weak generalization\(Brownet al\.,[2020](https://arxiv.org/html/2606.28070#bib.bib47); Ouyanget al\.,[2022](https://arxiv.org/html/2606.28070#bib.bib48); Radfordet al\.,[2021](https://arxiv.org/html/2606.28070#bib.bib49); Liuet al\.,[2023](https://arxiv.org/html/2606.28070#bib.bib50)\)\. Benefiting from extensive world knowledge, strong zero\-/few\-shot generalization, and reasoning ability, these models enable more accurate, comprehensive, and timely ontology engineering and item\-knowledge production\.
Academia and industry have advanced intelligent item understanding along four directions: \(1\)*Domain\-specific Foundation Models*, which inject e\-commerce knowledge into general\-purpose LLMs/VLMs to equip them with retail\-domain knowledge\(Shiet al\.,[2025](https://arxiv.org/html/2606.28070#bib.bib2); Penget al\.,[2024](https://arxiv.org/html/2606.28070#bib.bib3); Heroldet al\.,[2024](https://arxiv.org/html/2606.28070#bib.bib4)\); \(2\)*Automated Ontology Expansion*, which turns the traditional expert\-driven paradigm into a semi\-automated, human–AI collaboration for dynamic ontology evolution\(Shenet al\.,[2020](https://arxiv.org/html/2606.28070#bib.bib5); Maoet al\.,[2020](https://arxiv.org/html/2606.28070#bib.bib6); Er\-Rahmadiet al\.,[2023](https://arxiv.org/html/2606.28070#bib.bib7)\); \(3\)*Large\-scale Attribute Extraction*, which moves item attribute recognition from closed\-domain entity extraction to retrieval\-augmented generation \(RAG\) and multimodal understanding\(Zhenget al\.,[2018](https://arxiv.org/html/2606.28070#bib.bib8); Wanget al\.,[2020a](https://arxiv.org/html/2606.28070#bib.bib9); Zhanget al\.,[2025](https://arxiv.org/html/2606.28070#bib.bib10)\); and \(4\)*Web\-scale Item Knowledge Graphs*, which build large, well\-aligned retail knowledge networks linking items, attributes, and complex user intents\(Luoet al\.,[2020](https://arxiv.org/html/2606.28070#bib.bib11);[2021](https://arxiv.org/html/2606.28070#bib.bib12); Zalmoutet al\.,[2021](https://arxiv.org/html/2606.28070#bib.bib13)\)\.
Figure 2:Overview of\\productfontOxygen AIIC across the item lifecycle\. Ontology, and AI Item Library jointly support category planning, merchant workflows, user understanding, search, recommendation, and platform operations\.These efforts confirm the feasibility of large models for intelligent item understanding\. However, deploying them at JD, a platform that spans virtually every retail category and manages tens of billions of items, remains challenging\. To build a highly available, high\-throughput item\-knowledge infrastructure, the following three fundamental challenges must be addressed:
- •Evolving the ontology to keep pace with heterogeneous sources and fast\-emerging concepts\.Item knowledge is multi\-source and heterogeneous, scattered across product information \(titles, main images, detail pages, etc\.\), user queries, and public web content\. New market segments and concepts emerge constantly, and the required granularity of detail continues to increase\. Capturing this knowledge comprehensively and in a timely manner, while expanding the ontology backbone quickly enough to keep pace, is the first challenge for industrial deployment\.
- •Scalable, high\-throughput, low\-cost, and high\-quality knowledge production at massive item scale\.At the scale of tens of billions of items, the AI Item Library must satisfy several stringent requirements at once: it must scale seamlessly as the ontology evolves, sustain high\-throughput production across the full catalog, and keep inference cost and latency within tight budgets, all while keeping knowledge quality consistently high\. Directly invoking large models to produce and update item knowledge would incur prohibitive inference cost and unacceptable latency\(Daoet al\.,[2022](https://arxiv.org/html/2606.28070#bib.bib51); Kwonet al\.,[2023b](https://arxiv.org/html/2606.28070#bib.bib52)\), and still fall short on quality\. Building an industrial\-scale AI Item Library therefore demands a solution that is scalable and extensible by design\.
- •Efficient support for common and scenario\-specific needs\.JD’s downstream ecosystem places highly diverse demands on the format and freshness of item knowledge\. These scenarios rest on a shared knowledge foundation, yet each carries distinct service requirements: item governance \(e\.g\., information pre\-fill and compliance checks\) depends on real\-time services; search and recommendation need high\-throughput nearline features; and category operations require offline post\-processing driven by business logic\. A single platform must serve all of these highly concurrent, domain\-specific demands at once, efficiently building on what the scenarios share while supporting what makes each distinct\.
To bridge the gap between the potential of LLMs/VLMs and the realities of industrial deployment, we build the JD Oxygen AI Item Center \(\\productfontOxygen AIIC\)\.\\productfontOxygen AIIC constructs an item ontology with millions of entries and produces high\-quality item knowledge at high throughput across tens of thousands of categories and tens of billions of SKUs\. It achieves a knowledge\-production precision/recall of 94\.2%/82\.8% with a more than 10×\\timesgain in throughput efficiency on Huawei Ascend NPUs, and has accumulated hundreds of billions of clean item\-knowledge assets, increasing item\-information richness to 3\.35×\\timesits previous level\. As shown in Figure[2](https://arxiv.org/html/2606.28070#S1.F2),\\productfontOxygen AIIC serves as an item\-knowledge hub across the full item lifecycle\. In the search scenario,\\productfontOxygen AIIC covers 80\.4% of traffic and reduces item\-information quality issues by 37%, thereby improving the shopping experience\. For category planning,\\productfontOxygen AIIC shortens decision cycles from weeks to days compared with manual workflows\. The automated fill rate of core attributes exceeds 80%, and optimization of item creatives improves click\-through rate by about 9%\.
This paper presents an industrial\-scale deployment of LLMs/VLMs for item\-knowledge infrastructure\. Our main contributions are threefold:
- •An extensible, generalizable, and self\-evolving item\-understanding LLMs/VLMs framework\.Leveraging the extensive world knowledge and strong reasoning capabilities of large models, we develop item\-understanding LLMs/VLMs through incremental learning and model self\-evolution, thereby improving knowledge\-production quality in a continuous and controllable manner\.
- •Knowledge production at the scale of tens of billions of items\.We rapidly enrich the ontology through human–AI collaboration, decouple it from model parameters through the𝖲2𝖣\\mathsf\{S\}^\{2\}\\mathsf\{D\}mechanism to accommodate continuous ontology changes and mitigate model hallucinations, and reduce computational cost so that knowledge production at the hundred\-billion scale stays efficient\.
- •A unified item tunnel and application matrix\.To address diverse requirements across business scenarios, we build an “item tunnel” as the shared data and service hub\. It maintains data freshness through tiered service levels and, together with the application matrix, supports a wide range of downstream applications, forming a sustainable business ecosystem\.
## 2Architecture Overview
\\productfont
Oxygen AIIC adopts a modular architecture with high cohesion and low coupling: core capability modules are decoupled and can be iterated independently, improving development efficiency, maintainability, and the platform’s ability to evolve in a stable and controlled manner\. As shown in Figure[3](https://arxiv.org/html/2606.28070#S2.F3), the architecture consists of five tightly coordinated modules:
Figure 3:Overall architecture of JD Oxygen AI Item Center\.\\productfontOxygen AIIC integrates ontology engineering, AI Item Library, the item understanding LLMs/VLMs, the item tunnel, and the application matrix into a closed\-loop industrial system\.##### Ontology Engineering
The ontology is the knowledge foundation of\\productfontOxygen AIIC and determines the upper bound of item\-knowledge quality and application potential\. Through efficient human–AI collaboration,\\productfontOxygen AIIC combines more than 20 years of JD expert knowledge with the world knowledge and reasoning capabilities of LLMs/VLMs to produce a high\-quality, comprehensive, and timely ontology\. Experts focus on distilling industry knowledge, while algorithms learn from it to scale ontology construction and drive continuous evolution\.
##### AI Item Library
The AI Item Library maps items to the ontology and serves as the source of item knowledge for downstream applications\. Given a continuously evolving ontology and tens of billions of items, we achieve scalable, high\-throughput production by constructing a jointly optimized model\-data\-engineering pipeline\. Decoupling the ontology from the model parameters reduces hallucinations and improves generalization, while computational load reduction, cache reuse, and asynchronous pipeline parallelism ensure efficient production at scale\.
##### Item\-Understanding LLMs/VLMs
The item\-understanding LLMs/VLMs support both ontology construction and AI Item Library production, serving as the foundation for continuous improvement in\\productfontOxygen AIIC’s data quality\. We integrate the algorithmic capabilities required by\\productfontOxygen AIIC into a highly generalizable and scalable foundation model\. Through incremental learning and model self\-evolution, the system fills targeted knowledge gaps and mitigates catastrophic forgetting, enabling model capabilities to evolve in a stable and controlled manner\.
##### Item Tunnel
The item tunnel is the central hub between\\productfontOxygen AIIC and business applications, providing a unified service layer\. To meet downstream requirements for different levels of freshness and throughput, it supports daily\-, minute\-, and second\-level production and distribution pipelines while preserving data consistency\. It enables downstream applications to consume\\productfontOxygen AIIC capabilities efficiently\.
##### Applications
The application matrix is where\\productfontOxygen AIIC delivers its value\. It turns the item\-knowledge assets and model capabilities exposed by the tunnel into standardized services and deploys them at scale across business formats, scenarios, and the end\-to\-end item lifecycle\. It bridges technology and business, serving as core infrastructure for the platform’s AI\-driven e\-commerce transformation and sustainable growth\.
In summary, the five modules of\\productfontOxygen AIIC are tightly integrated rather than operating in isolation\. Together, they form an end\-to\-end loop spanning large\-scale ontology construction, massive knowledge production, centralized asset management, tiered access, and cross\-domain feedback\. This loop preserves stable, high\-quality item knowledge while allowing the system to evolve continuously\. The following sections introduce each module in detail\.
## 3Ontology Engineering
Ontology engineering aims to build a high\-quality, comprehensive, and timely item\-knowledge foundation\(Luoet al\.,[2020](https://arxiv.org/html/2606.28070#bib.bib11);[2021](https://arxiv.org/html/2606.28070#bib.bib12); Yuet al\.,[2024](https://arxiv.org/html/2606.28070#bib.bib65);[2023](https://arxiv.org/html/2606.28070#bib.bib66); Donget al\.,[2020](https://arxiv.org/html/2606.28070#bib.bib67); Huanget al\.,[2025b](https://arxiv.org/html/2606.28070#bib.bib68)\)\. To achieve this goal, it is essential to effectively combine the domain expertise accumulated by JD over more than two decades with the large\-scale concept\-mining capabilities of LLMs/VLMs\. However, several practical challenges arise in this process:
- •Continuously emerging concepts:The volume of new concepts emerging daily makes fully manual ontology construction increasingly inadequate in both timeliness and coverage\.
- •High semantic redundancy:Knowledge extracted from heterogeneous data sources often contains numerous synonymous or highly overlapping concepts, which can quickly inflate the ontology and reduce its consistency\.
- •Achieving scale and quality at once:Ontology construction must simultaneously scale to broad coverage and maintain high, controllable quality\. Expert\-driven construction ensures quality but cannot scale, whereas large\-scale automated generation by LLMs/VLMs scales but, without sufficient oversight, introduces hallucinations and quality issues\.
To address these challenges, we adopt a human–AI collaborative framework in which experts define ontology standards and perform final quality audits, while LLMs conduct large\-scale ontology discovery, expansion, and refinement under expert guidance\. This framework enables the ontology to evolve continuously while maintaining both high quality and broad coverage\.
### 3\.1Method Overview
Figure 4:Human–AI collaborative ontology engineering\. Human experts establish the fundamental ontology backbone, while an automated pipeline dynamically discovers, fuses, and validates emerging concepts from multi\-source heterogeneous data\.As shown in Figure[4](https://arxiv.org/html/2606.28070#S3.F4), the ontology is organized around downstream business requirements\. Experts define four core element types: \(i\)*Category*, the item taxonomy \(e\.g\., apparel and underwear→\\rightarrowmen’s clothing→\\rightarrowmen’s shirts\); \(ii\)*Attribute Key*, an item feature dimension \(e\.g\., sleeve length for men’s shirts\); \(iii\)*Attribute Value*, a specific instantiation of an attribute \(e\.g\., long sleeve for sleeve length\); and \(iv\)*Scenario Tag*, a higher\-level composite concept that captures a consumption context \(e\.g\., World Cup Watch Party Bundle\)\. Categories, attribute keys and attribute values constitute the backbone of the ontology, carrying atomic item knowledge that supports the platform’s core business operations\. Scenario tags provide an additional semantic layer by aggregating atomic knowledge into higher\-level concepts, capturing multi\-dimensional semantic relationships and enabling rapid support for downstream scenario\-based demands\.
Our human–AI collaborative framework operates in an expert\-guided, AI\-driven loop\. Initially, experts establish the ontology backbone, which supplies structural prior knowledge and serves as a semantic anchor\. Based on this foundation, the automated pipeline implements a three\-stage workflow: knowledge discovery, fusion, and validation\. This pipeline continuously harvests emerging high\-frequency concepts from heterogeneous data sources and integrates them into the ontology, ensuring scalable and dynamic infrastructure evolution\.
### 3\.2Ontology Construction
#### 3\.2\.1Expert\-defined ontology backbone \(top\-down\)
The category knowledge accumulated by JD’s domain experts provides a strong foundation for identifying the item attributes that influence purchase decisions and capture business\-relevant distinctions\. To transform this expertise into structured priors that can guide algorithmic discovery, we establish a standardized expert workflow that defines the backbone of the ontology\. Specifically, experts curate the major category hierarchy together with its core attribute sets, representative attribute values, and characteristic scenario tags\.
At this stage, experts focus only on high\-value and representative knowledge rather than pursuing exhaustive coverage\. This top\-down design effectively constrains the space of knowledge generation and establishes clear semantic boundaries for the algorithms’ automated, large\-scale knowledge mining\(Lippoliset al\.,[2024](https://arxiv.org/html/2606.28070#bib.bib69); Saeedizade and Blomqvist,[2024](https://arxiv.org/html/2606.28070#bib.bib70); Lippoliset al\.,[2025](https://arxiv.org/html/2606.28070#bib.bib71); Babaei Giglouet al\.,[2023](https://arxiv.org/html/2606.28070#bib.bib72); Mateiu and Groza,[2023](https://arxiv.org/html/2606.28070#bib.bib73); Fathallahet al\.,[2024a](https://arxiv.org/html/2606.28070#bib.bib74);[b](https://arxiv.org/html/2606.28070#bib.bib75); Sunet al\.,[2025](https://arxiv.org/html/2606.28070#bib.bib76)\)\.
#### 3\.2\.2Algorithm\-driven ontology growth \(bottom\-up\)
Building upon the expert\-defined ontology backbone and continuously incorporating signals from user behavior and industry trends, the algorithms expand the ontology at scale through a bottom\-up “discovery–fusion–validation” pipeline\. Taking attribute\-value expansion as an example, the discovery stage identifies emerging concepts from heterogeneous data sources; the fusion stage consolidates synonymous and semantically related concepts into normalized candidates; and the validation stage evaluates each candidate for both quality and business importance\. Guided by continuous expert feedback, validated concepts are incorporated into the ontology in a controlled and scalable manner, enabling its sustained evolution while preserving high quality\(Edgeet al\.,[2024](https://arxiv.org/html/2606.28070#bib.bib77); Tiwariet al\.,[2025](https://arxiv.org/html/2606.28070#bib.bib78); Zhang and Soh,[2024](https://arxiv.org/html/2606.28070#bib.bib79); Yeet al\.,[2023](https://arxiv.org/html/2606.28070#bib.bib80); Baiet al\.,[2025](https://arxiv.org/html/2606.28070#bib.bib81)\)\.
##### \(1\) Knowledge discovery\.
Knowledge discovery aims to identify latent concepts from large\-scale heterogeneous data\. Although LLMs/VLMs can assist with this task, general\-purpose models are not explicitly aligned with the e\-commerce ontology and often fail to capture domain\-specific concepts, industry standards, and emerging terminology\. To address this limitation, we train a dedicated knowledge\-discovery model for the e\-commerce domain\.
Training data construction\.To support latent\-concept discovery, we organize the training data into a unified triple format⟨x,k,v⟩\\langle x,k,v\\rangle, wherexxdenotes product information, user queries, or external web content;kkdenotes an attribute key; andvvdenotes the corresponding attribute value\. We construct the training data using state\-of\-the\-art LLMs/VLMs with hundreds of billions of parameters \(strong reasoning and generalization capabilities\), via two strategies: open information extraction \(OpenIE\)\(Wanget al\.,[2023](https://arxiv.org/html/2606.28070#bib.bib82); Guiet al\.,[2023](https://arxiv.org/html/2606.28070#bib.bib83); Hanet al\.,[2023](https://arxiv.org/html/2606.28070#bib.bib84)\)and targeted attribute filling\(Luet al\.,[2022](https://arxiv.org/html/2606.28070#bib.bib85)\)\.
\(i\)OpenIE:This method is suitable for mining large\-scale corpora\. The model discovers potential attribute keyskik\_\{i\}and their corresponding attribute value setsVkiV\_\{k\_\{i\}\}fromxx:
fOpenIE\(x\)→\{\(k1,Vk1\),\(k2,Vk2\),…,\(kn,Vkn\)\}\.f\_\{\\text\{OpenIE\}\}\(x\)\\rightarrow\\\{\(k\_\{1\},V\_\{k\_\{1\}\}\),\(k\_\{2\},V\_\{k\_\{2\}\}\),\\dots,\(k\_\{n\},V\_\{k\_\{n\}\}\)\\\}\.For each extracted pair\(ki,Vki\)\(k\_\{i\},V\_\{k\_\{i\}\}\), we combine it with the original inputxxto form triple samples:
𝒯OpenIE=\{⟨x,ki,vij⟩∣vij∈Vki\}\.\\mathcal\{T\}\_\{\\text\{OpenIE\}\}=\\\{\\langle x,k\_\{i\},v\_\{ij\}\\rangle\\mid v\_\{ij\}\\in V\_\{k\_\{i\}\}\\\}\.For example, given the item title “summer ice\-silk sun\-protective cardigan”, the model extracts \{“material”: “ice silk”, “applicable season”: “summer”, “function”: “sun protection”\}\.
\(ii\)Targeted attribute filling:This method is implemented as an attribute\-specific NER task and is suitable for completing attribute values for expert\-defined or core attributes\. Given an attributekkand its business definitiondkd\_\{k\}, the model is required to extractVkV\_\{k\}for that attribute fromxx:
fNER\(x,k,dk\)→\{k:Vk\}\.f\_\{\\text\{NER\}\}\(x,k,d\_\{k\}\)\\rightarrow\\\{k:V\_\{k\}\\\}\.Eachvjv\_\{j\}inVkV\_\{k\}is similarly converted into a triple sample:
𝒯NER=\{⟨x,k,vj⟩∣vj∈Vk\}\.\\mathcal\{T\}\_\{\\text\{NER\}\}=\\\{\\langle x,k,v\_\{j\}\\rangle\\mid v\_\{j\}\\in V\_\{k\}\\\}\.For example, given the input “men’s business commuting non\-iron three\-quarter\-sleeve white shirt” and the key “sleeve length” with the description “including long sleeve, short sleeve, etc\.”, the model extracts sleeve length: \[“three\-quarter\-sleeve”\]\.
Discovery model training\.We adopt a unified supervised fine\-tuning \(SFT\) framework and formulate the two tasks above as follows:
fext\_kv\(x,IOpenIE\)→\{ki:\[vi1,vi2,…\]\},f\_\{\\text\{ext\\\_kv\}\}\(x,I\_\{\\text\{OpenIE\}\}\)\\rightarrow\\\{k\_\{i\}:\[v\_\{i1\},v\_\{i2\},\\dots\]\\\},fext\_v\(x,k,dk,INER\)→\{v1,v2,…,vm\}\.f\_\{\\text\{ext\\\_v\}\}\(x,k,d\_\{k\},I\_\{\\text\{NER\}\}\)\\rightarrow\\\{v\_\{1\},v\_\{2\},\\dots,v\_\{m\}\\\}\.BothIOpenIEI\_\{\\text\{OpenIE\}\}andINERI\_\{\\text\{NER\}\}denote task instructions that explicitly constrain the model’s extraction objective\. After training, the model achieves 91% precision and 79% recall on the candidate knowledge extraction task\.
Unless otherwise stated, all experiments use an 8B\-parameter base model\. We observe that after SFT, different base models exhibit comparable performance, with variations of less than 2% across evaluation metrics\. Therefore, the remainder of the paper focuses on the framework rather than base\-model selection\.
Applying the trained discovery model to multi\-source corpora, including item information, user queries, and external web content, yields approximately 4\.5 million latent attribute values\. Each value is represented as a standardized knowledge unit⟨c,x,k,dk,v⟩\\langle c,x,k,d\_\{k\},v\\rangle, whereccdenotes the category\.
##### \(2\) Knowledge fusion\.
Knowledge discovery improves ontology coverage but also introduces a large number of heterogeneous synonymous expressions, resulting in redundancy\. To address this issue, we adopt a three\-stage knowledge fusion strategy consisting of representation, clustering, and selection\.
- •Representation stage: latent concepts are encoded into vector representations\.
- •Clustering stage: semantically identical entities are grouped together\.
- •Selection stage: a standard ontology concept is extracted from each cluster\.
Because general\-purpose representation models lack sufficient understanding of e\-commerce semantics, we train a domain\-adapted encoder\. The encoder captures contextual semantics in e\-commerce scenarios, providing high\-fidelity and effective representations of latent concepts\. This section consists of three parts: training data construction, representation model training, and ontology fusion\.
Training data construction\.Given the input⟨c,x,k,dk,v⟩\\langle c,x,k,d\_\{k\},v\\rangleabove, the instruction for the encoder is defined asIfuse\(c,k,dk\)I\_\{\\text\{fuse\}\}\(c,k,d\_\{k\}\)\. This instruction explicitly injects categorycc, attribute keykk, and attribute descriptiondkd\_\{k\}into the encoder as context, so that attribute valuevvis encoded as:
fenc\(v,Ifuse\(c,k,dk\)\)→𝐞f\_\{\\text\{enc\}\}\\bigl\(v,\\,I\_\{\\text\{fuse\}\}\(c,k,d\_\{k\}\)\\bigr\)\\rightarrow\\mathbf\{e\}Training sample construction focuses on two questions: which ontology entries should be pulled closer as positives, and which should be pushed apart as negatives\. For an attribute valuevv, the LLM generatesNNsynonymous rewrites as the positive setV\+=\{v1\+,…,v\|V\+\|\+\}V^\{\+\}=\\\{v^\{\+\}\_\{1\},\\dots,v^\{\+\}\_\{\|V^\{\+\}\|\}\\\}\. For the current⟨c,k⟩\\langle c,k\\rangle, a general encoder retrieves expressions similar tovv, and an LLM then judges synonymy among the retrieved results\. Synonymous pairs are removed, while the near\-neighbor but non\-synonymous expressions are retained asV−=\{v1−,…,v\|V−\|−\}V^\{\-\}=\\\{v^\{\-\}\_\{1\},\\dots,v^\{\-\}\_\{\|V^\{\-\}\|\}\\\}\.
Each training sample consists of an anchorvv, a positive setV\+V^\{\+\}, and a negative setV−V^\{\-\}\. All samples share the same category–attribute context⟨c,k,dk⟩\\langle c,k,d\_\{k\}\\rangle, forcing the encoder to learn fine\-grained semantic distinctions among attribute values\. To improve representation quality for infrequent concepts, long\-tail attribute values are additionally oversampled during training\.
Representation model training\.The representation model adopts a bi\-encoder architecture with shared parameters across the two towers and an 8B\-parameter LLM as the backbone\. Given an input sequence, we append a special⟨eos⟩\\langle\\mathrm\{eos\}\\rangletoken to its end and use the final\-layer hidden state at this position as the sequence representation\. The resulting embedding isL2L\_\{2\}\-normalized, yielding the final sentence vector𝐞\\mathbf\{e\}\. The training objective is the InfoNCE loss\(Oordet al\.,[2018](https://arxiv.org/html/2606.28070#bib.bib86)\):
ℒInfoNCE=−1\|V\+\|∑i=1\|V\+\|logexp\(sim\(𝐞,𝐞i\+\)/τ\)exp\(sim\(𝐞,𝐞i\+\)/τ\)\+∑j=1\|V−\|exp\(sim\(𝐞,𝐞j−\)/τ\)\.\\mathcal\{L\}\_\{\\text\{InfoNCE\}\}=\-\\frac\{1\}\{\|V^\{\+\}\|\}\\sum\_\{i=1\}^\{\|V^\{\+\}\|\}\\log\\frac\{\\exp\\bigl\(\\mathrm\{sim\}\(\\mathbf\{e\},\\,\\mathbf\{e\}^\{\+\}\_\{i\}\)/\\tau\\bigr\)\}\{\\exp\\bigl\(\\mathrm\{sim\}\(\\mathbf\{e\},\\,\\mathbf\{e\}^\{\+\}\_\{i\}\)/\\tau\\bigr\)\+\\sum\_\{j=1\}^\{\|V^\{\-\}\|\}\\exp\\bigl\(\\mathrm\{sim\}\(\\mathbf\{e\},\\,\\mathbf\{e\}^\{\-\}\_\{j\}\)/\\tau\\bigr\)\}\.Here,𝐞\\mathbf\{e\},𝐞i\+\\mathbf\{e\}^\{\+\}\_\{i\}, and𝐞j−\\mathbf\{e\}^\{\-\}\_\{j\}are the vectors obtained by encoding the anchorvv, theii\-th positive samplevi\+v^\{\+\}\_\{i\}, and thejj\-th negative samplevj−v^\{\-\}\_\{j\}throughfencf\_\{\\text\{enc\}\}, respectively;sim\(⋅,⋅\)\\mathrm\{sim\}\(\\cdot,\\cdot\)denotes cosine similarity, andτ\\taudenotes the temperature coefficient\. The loss is averaged over all\|V\+\|\|V^\{\+\}\|positive samples\.
On the ontology\-similarity test set, our encoder raises the Spearman correlation coefficient from 0\.62 \(general\-purpose encoder\) to 0\.86\.
Ontology fusion\.Based on the representation model above, all candidate ontology units⟨c,k,v⟩\\langle c,k,v\\rangleare processed as follows:
- •Representation: each attribute value is encoded into a vector within its corresponding category–attribute context⟨c,k⟩\\langle c,k\\rangle\.
- •Clustering: vectors belonging to the same⟨c,k⟩\\langle c,k\\ranglesubspace are grouped using hierarchical clustering\(Murtagh and Legendre,[2014](https://arxiv.org/html/2606.28070#bib.bib87)\)to identify semantically equivalent concepts\.
- •Selection: within each cluster, candidate terms are ranked based on their occurrence frequency and LLM\-driven semantic evaluation\. The highest\-ranked term,v∗v^\{\*\}, is designated as the canonical ontology entry, while the remaining terms form its synonym set,VsynV\_\{\\text\{syn\}\}\.
The fusion stage produces standardized ontology candidates of the form⟨c,x,k,dk,v∗,Vsyn⟩\\langle c,x,k,d\_\{k\},v^\{\*\},V\_\{\\text\{syn\}\}\\rangle, which are subsequently passed to the validation stage\. Applying the fusion pipeline reduces the number of discovered concepts from 4\.5 million to 2\.1 million candidate ontology concepts, consolidating approximately 2\.4 million redundant concepts into their corresponding synonym setsVsynV\_\{\\text\{syn\}\}\.
##### \(3\) Knowledge validation\.
Knowledge validation aims to ensure the quality of ontology candidates generated through large\-scale discovery and fusion\. While task\-specific models achieve strong performance within their training distribution, they often suffer from limited generalization when confronted with emerging concepts and distribution shifts\(Huanget al\.,[2025a](https://arxiv.org/html/2606.28070#bib.bib88); Wataokaet al\.,[2024](https://arxiv.org/html/2606.28070#bib.bib89)\)\. In contrast, general LLMs exhibit stronger open\-domain reasoning capabilities and broader semantic coverage, enabling more reliable assessment of candidate ontology concepts\.
To combine these complementary strengths, we propose a Multi\-LLM Collaborative Verification Framework that integrates multiple LLM validators with expert knowledge\. The framework evaluates candidate ontology concepts through a sequence of quality and business\-importance assessments, ensuring that only high\-quality, important concepts are incorporated into the ontology\.
Knowledge quality validation\.To strictly control ontology quality, the validation process follows a sequential procedure: a candidate ontology entry must pass plausibility and duplication checks in order\. If any step returns “Reject”, the process stops and the candidate is blocked\.
- •Plausibility validation:the system checks whether the candidate makes common sense and satisfies category–attribute constraints\. For example, “spicy\-flavored computer” violates common sense and is rejected outright; “operating system is Android” under “laptop computer” violates category constraints and is also rejected\.
- •Duplication validation:the system incrementally compares the candidate with existing ontology entries to prevent duplicate insertion\. For example, if “CPU model” already exists, “central processing unit model” is rejected\.
To reduce variation and one\-off misjudgments from any single model, each validation stage introduces a multi\-model majority\-voting mechanism\(Schoeneggeret al\.,[2024](https://arxiv.org/html/2606.28070#bib.bib90)\):MMgeneral LLMs are called simultaneously, and each model independently outputsyi∈\{Pass,Reject,Unclear\}y\_\{i\}\\in\\\{\\text\{Pass\},\\text\{Reject\},\\text\{Unclear\}\\\}\. The final decision follows majority voting:
y=\{Pass,if∑i=1M𝕀\(yi=Pass\)\>M2,Reject,otherwise\.y=\\begin\{cases\}\\text\{Pass\},&\\text\{if \}\\sum\_\{i=1\}^\{M\}\\mathbb\{I\}\(y\_\{i\}=\\text\{Pass\}\)\>\\dfrac\{M\}\{2\},\\\\\[4\.0pt\] \\text\{Reject\},&\\text\{otherwise\.\}\\end\{cases\}where𝕀\(⋅\)\\mathbb\{I\}\(\\cdot\)is the indicator function andyiy\_\{i\}is the judgment of theii\-th model\. Only when a majority of models output “Pass” does the candidate enter the next stage; otherwise, it is immediately blocked and sent to an exception or manual\-review pool\.
Knowledge importance assessment\.After passing quality validation, candidate knowledge is further assessed for its business importance and future growth potential to determine its insertion priority and downstream processing strategy\. Inspired by the Boston Matrix, we characterize each candidate along two dimensions:*scale*and*trend*\.
- •Scale:measures the current prevalence of a candidate concept within the e\-commerce ecosystem, including item coverage and query frequency\. Concepts with high scale primarily help broaden ontology coverage\.
- •Trend:measures the future growth potential of a candidate concept, including short\-term frequency growth and changes in search popularity\. Concepts with high trend help the ontology track emerging hot topics\.
Using these two dimensions, the system classifies candidate knowledge into four types: high trend×\\timeshigh scale \(star concepts\), high trend×\\timeslow scale \(emerging\-trend concepts\), low trend×\\timeshigh scale \(stable general concepts\), and low trend×\\timeslow scale \(low\-value concepts\)\. By default, low\-value concepts are not inserted automatically\.
Expert validation and insertion\.After completing the importance assessment, the system performs tiered ontology insertion based on the evaluation results\. Candidate knowledge that receives a unanimousPassduring the quality validation stage is automatically inserted into the ontology\. Candidate knowledge withUnclearjudgments or substantial disagreement across models is forwarded to domain experts for final confirmation and boundary determination\.
In summary, the knowledge validation module focuses not only on whether candidate knowledge is correct, but also on whether it is important\. By keeping the ontology accurate, comprehensive, and responsive to emerging hot topics at once, the framework supports continuous and controllable ontology evolution\.
### 3\.3Results
Under the human–AI collaborative framework, the system has built a million\-scale, high\-quality ontology encompassing attributes, product terms, brands, and other entities\. Compared with the previous generation, the average number of characterization dimensions per item has increased to 1\.44×\\timesthe previous level, and the ontology scale has expanded by 64\.5%, significantly enhancing product information richness\. The resulting ontology covers 80\.4% of JD’s user traffic\.
## 4AI Item Library
The ontology provides the system with a standardized knowledge foundation, while the core mission of the AI Item Library is to achieve a scalable and extensible semantic mapping from large\-scale unstructured item information to the ontology\. Essentially, it identifies ontology elements from items with high precision, thereby building a stable item\-to\-ontology association chain\.
The data carried by this association chain exhibits typical industrial\-scale characteristics, covering tens of billions of SKUs, tens of thousands of categories, and millions of dynamically evolving ontology entries\. It must handle a daily stream of hundreds of millions of item\-information changes while supporting JD’s core application scenarios with high freshness\. This leads to two core technical challenges:
- •Scalability under dynamic ontology\.In conventional end\-to\-end models, ontology knowledge is tightly coupled with model parameters, making adaptation to ontology updates costly and often leading to degraded out\-of\-distribution \(OOD\) performance\. Alternatively, adapting to ontology evolution through frequent fine\-tuning or retraining incurs prohibitive computational and time costs\.
- •Throughput bottleneck under massive data\.Exhaustively evaluating all attributes for tens of billions of SKUs is highly inefficient\. Large groups of homogeneous SKU variants incur redundant processing, while most computations on sparse long\-tail attributes yield no useful signals\.
### 4\.1Method Overview
Existing methods for item knowledge recognition\(Chenet al\.,[2023](https://arxiv.org/html/2606.28070#bib.bib92); Shinzatoet al\.,[2022](https://arxiv.org/html/2606.28070#bib.bib93); Yanget al\.,[2022](https://arxiv.org/html/2606.28070#bib.bib20); Yanet al\.,[2021](https://arxiv.org/html/2606.28070#bib.bib94)\)still face significant structural limitations in real\-world industrial scenarios\. Extraction\-and\-mapping methods first extract non\-standardized values using entity recognition\(Zhenget al\.,[2018](https://arxiv.org/html/2606.28070#bib.bib8); Xuet al\.,[2019](https://arxiv.org/html/2606.28070#bib.bib19); Yanet al\.,[2021](https://arxiv.org/html/2606.28070#bib.bib94)\)or question\-answering models\(Wanget al\.,[2020b](https://arxiv.org/html/2606.28070#bib.bib95); Yanget al\.,[2023](https://arxiv.org/html/2606.28070#bib.bib96)\), and then map them to standard values\. Nevertheless, the ontology knowledge is often implicitly encoded in model parameters, making such methods difficult to adapt to ontology updates and leading to degraded generalization\. Classification\-based methods\(Chenet al\.,[2022](https://arxiv.org/html/2606.28070#bib.bib97)\)formulate attribute value recognition as a multi\-label classification problem over a closed label space\. Consequently, they fail to accommodate newly added open\-domain attributes, and each ontology update may require model retraining, making them poorly suited to the dynamically evolving attribute systems of e\-commerce platforms\. Generative models\(Sabehet al\.,[2024](https://arxiv.org/html/2606.28070#bib.bib98); Nikolakopouloset al\.,[2023](https://arxiv.org/html/2606.28070#bib.bib99); Shinzatoet al\.,[2023](https://arxiv.org/html/2606.28070#bib.bib100)\)exhibit stronger generalization capabilities and can partially mitigate OOD issues, but their outputs are difficult to constrain and are susceptible to model hallucination during direct attribute value generation\. Finally, similarity\-based retrieval methods\(Suet al\.,[2025](https://arxiv.org/html/2606.28070#bib.bib101)\)select candidate values via semantic vector matching, but their effectiveness is highly sensitive to threshold settings, resulting in diminished robustness in production environments\.
Therefore, scalability and high throughput must be treated as joint design objectives to enable effective implementation and long\-term stability in industrial\-scale business scenarios\. In light of these considerations, we propose a collaborative optimization system spanning models, data, and engineering\.
Figure 5:Production architecture of the AI Item Library\. Taking item data and a dynamically evolving ontology as input, the pipeline first mitigates computational redundancy across the SKU and attribute dimensions, and then performs precise item\-to\-ontology recognition through a two\-stage “Semantic Search then Discrimination” \(𝖲2𝖣\\mathsf\{S\}^\{2\}\\mathsf\{D\}\) engine, powered by the item understanding LLMs/VLMs\.As shown in Figure[5](https://arxiv.org/html/2606.28070#S4.F5), we propose an industrial\-scale “Semantic Search then Discrimination” \(𝖲2𝖣\\mathsf\{S\}^\{2\}\\mathsf\{D\}\) architecture\(Zouet al\.,[2025](https://arxiv.org/html/2606.28070#bib.bib102)\)that decouples the ontology from model parameters\. In the semantic search stage, the dynamically evolving ontology is externalized as a separate ontology knowledge base, enabling continuous ontology updates without model retraining\. Semantic encoders retrieve ontology entries relevant to the given item\. In the discrimination stage, the model only determines whether the item matches the retrieved ontology entries\. This formulation substantially reduces task complexity, mitigates model hallucination, and enhances generalization to ontology evolution\.
To further improve efficiency, we apply computational load reduction across both the SKU and attribute dimensions\. In the SKU dimension, similarity\-based deduplication eliminates redundant computation over homogeneous item variants\. In the attribute dimension, the conventional “full\-attribute scanning” paradigm is reformulated as “highly relevant attribute probing”, where the𝖲2𝖣\\mathsf\{S\}^\{2\}\\mathsf\{D\}process is initiated only for attributes deemed relevant\. In addition, cache reuse and asynchronous pipeline parallelism are employed to minimize redundant computation, optimize NPU utilization, and improve overall system throughput\.
### 4\.2Item Knowledge Recognition
The𝖲2𝖣\\mathsf\{S\}^\{2\}\\mathsf\{D\}architecture formulates item knowledge recognition as a two\-stage process\. The detailed implementation is described below\.
#### 4\.2\.1Semantic Search Stage
The primary objective of this stage is to model the matching relationships between items and ontology entries\. Given item information, the system retrieves the Top\-KKmost relevant ontology entries from the dynamically evolving ontology as candidates\. As in Section[3\.2\.2](https://arxiv.org/html/2606.28070#S3.SS2.SSS2), this stage injects domain knowledge into the representation model, aligning items and ontology entries within a unified semantic space\.
##### \(1\) Representation model training\.
Data construction\.In contrast to Section[3\.2\.2](https://arxiv.org/html/2606.28070#S3.SS2.SSS2), this stage adjusts both the encoding inputs and the positive/negative sample construction method\.
For item encoding, the categoryccserves as contextual information and is injected through the instruction templateIsku\(c\)I\_\{\\text\{sku\}\}\(c\)\. Given an itemxx, its representation is derived as:
fenc\(x,Isku\(c\)\)→𝐞sku\.f\_\{\\text\{enc\}\}\\bigl\(x,\\,I\_\{\\text\{sku\}\}\(c\)\\bigr\)\\rightarrow\\mathbf\{e\}\_\{\\text\{sku\}\}\.
For ontology encoding, the categorycc, attribute keykk, and attribute descriptiondkd\_\{k\}serve as contextual information and are injected through the instruction templateIval\(c,k,dk\)I\_\{\\text\{val\}\}\(c,k,d\_\{k\}\)\. Given an attribute valuevv, its representation is derived as:
fenc\(v,Ival\(c,k,dk\)\)→𝐞val\.f\_\{\\text\{enc\}\}\\bigl\(v,\\,I\_\{\\text\{val\}\}\(c,k,d\_\{k\}\)\\bigr\)\\rightarrow\\mathbf\{e\}\_\{\\text\{val\}\}\.
Both input types share the same encoder parametersfencf\_\{\\text\{enc\}\}and are differentiated solely by their respective instruction templates\.
Positive and negative sample construction\.Positive samplesVk\+V\_\{k\}^\{\+\}are derived from high\-confidence results generated during knowledge discovery\. Negative samplesVk−V\_\{k\}^\{\-\}are drawn from other attribute values under the same categoryccand attribute keykk\. These candidate values are further screened by a large language model, and only those that are not semantically grounded in the itemxxare designated as negative samples\. This strategy prioritizes standard values within the same category and attribute, which are the hardest to distinguish and therefore provide highly informative contrastive signals\.
Model training\.The model architecture, loss function, and training strategies are consistent with those described in Section[3\.2\.2](https://arxiv.org/html/2606.28070#S3.SS2.SSS2)\. Furthermore, we perform multi\-task joint training to consolidate interrelated capabilities into a unified model, using instruction templatesIfuseI\_\{\\text\{fuse\}\},IskuI\_\{\\text\{sku\}\}, andIvalI\_\{\\text\{val\}\}to distinguish the different tasks\.
This design aligns items and ontology entries within a unified semantic space\. Moreover, because the large\-scale representation model generalizes robustly, newly added ontology entries can be directly encoded by the same encoder and integrated into retrieval without model retraining\.
##### \(2\) Knowledge retrieval\.
Retrieval with the representation model\.For offline retrieval, the system maintains two vector indexes generated by the same encoder\. The first is an attribute\-value index, which encodes attribute values under each category and attribute key\. The second is an item index, which encodes comprehensive item information\. Both indexes store vector representations along with their corresponding metadata and support incremental updates\. Consequently, newly added ontology entries can be rapidly integrated without triggering model retraining\.
During semantic search, for each target category and attribute key, the system calculates the cosine similarity between the item representation and the attribute\-value representations\. Then, it retrieves the Top\-KKcandidate attribute values:
Ck=\{v1,v2,…,vK\},C\_\{k\}=\\\{v\_\{1\},v\_\{2\},\\dots,v\_\{K\}\\\},whereK=10K=10is empirically chosen to achieve strong recall while keeping downstream discrimination cost low\. Thus, the initial candidate space is reduced to a compact set of Top\-KKcandidates, significantly alleviating the computational burden of the subsequent discrimination stage\.
#### 4\.2\.2Discrimination Stage
Using the Top\-KKcandidate set from the semantic search stage, the discrimination stage performs fine\-grained matching between the item and its retrieved candidates\. Although general LLMs/VLMs contain hundreds of billions of parameters, their lack of domain\-specific knowledge prevents them from achieving high precision, typically keeping it below 80%\. Such models often cannot meet throughput and recognition\-accuracy requirements at the same time, motivating the need for a specialized discrimination model\. This section comprises two primary phases: discrimination model training and knowledge recognition\.
##### \(1\) Discrimination model training\.
Training data construction\.Based on the extraction training data constructed in Section[3\.2\.2](https://arxiv.org/html/2606.28070#S3.SS2.SSS2), given item informationxxand a target attribute keykk, the input to the discrimination model is formalized as⟨x,k,Ck⟩\\langle x,k,C\_\{k\}\\rangle, whereCkC\_\{k\}represents the candidate attribute value set underkk\. The model is tasked with identifying which values inCkC\_\{k\}are semantically consistent with the item and outputting the corresponding matching subsetCk∗⊆CkC\_\{k\}^\{\*\}\\subseteq C\_\{k\}\.
Training data construction consists of two steps: constructing the candidate setCkC\_\{k\}and deriving the supervised label setCk∗C\_\{k\}^\{\*\}\. The candidate set comprises potential positive samples and same\-attribute negative samples:
Ck=Vk\+∪Vk−,C\_\{k\}=V\_\{k\}^\{\+\}\\cup V\_\{k\}^\{\-\},whereVk\+V\_\{k\}^\{\+\}denotes the set of potential positives andVk−V\_\{k\}^\{\-\}denotes the set of negatives\. The construction ofVk\+V\_\{k\}^\{\+\}andVk−V\_\{k\}^\{\-\}aligns with the strategy employed for representation model training within the semantic search stage, as described in Section[4\.2\.1](https://arxiv.org/html/2606.28070#S4.SS2.SSS1)\.
After obtainingCkC\_\{k\}, we further generate the supervised outputCk∗C\_\{k\}^\{\*\}to guide model training\. Since the candidate values are highly relevant to the target attribute key, standard off\-the\-shelf LLMs are prone to generating false positives or false negatives when encountering closely related values, implicit item descriptions, or ambiguous boundary samples\. Therefore, we first fine\-tune an industrial\-scale foundation model on a small set of high\-quality data to derive a reference modelMTM\_\{T\}\. This reference model offers more stable attribute value recognition and stronger boundary discrimination\.
To construct large\-scale supervision data, we use the reference modelMTM\_\{T\}to generate pseudo labels, distilling its discrimination capability into the training corpus:
MT\(x,k,Ck\)→Ck∗\.M\_\{T\}\(x,k,C\_\{k\}\)\\rightarrow C\_\{k\}^\{\*\}\.
To further improve robustness in production scenarios, we implement multi\-granularity data augmentation during training, including candidate\-set augmentation and sample\-distribution calibration\. Candidate\-set augmentation randomly shuffles the order of candidate values, mitigating the model’s reliance on spurious correlations arising from candidate sequencing, attribute combinations, or template positions\. Sample\-distribution calibration controls the ratio of positive to negative samples, preventing the model from overfitting to distributions dominated by negative candidates and thereby avoiding overly conservative rejection behavior\.
Model training\.The discrimination model is trained via supervised fine\-tuning\. LetIdis\(x,k\)I\_\{\\text\{dis\}\}\(x,k\)represent the discrimination instruction formulated from item informationxxand the target attribute keykk\. Conditioned on the candidate setCkC\_\{k\}, the model is optimized to generate the supervised label setCk∗C\_\{k\}^\{\*\}:
fdis\(Ck,Idis\(x,k\)\)→Ck∗\.f\_\{\\text\{dis\}\}\\bigl\(C\_\{k\},I\_\{\\text\{dis\}\}\(x,k\)\\bigr\)\\rightarrow C\_\{k\}^\{\*\}\.
Through this training paradigm, the model learns to perform set\-based filtering within a constrained candidate space\. This design minimizes the risk of model hallucination and enhances output controllability, as the model is restricted to selecting values from the ontology\-constrained candidate set instead of freely generating unconstrained attribute values\. Furthermore, the expertise of the large\-scale reference model is transferred to a more efficient 8B\-parameter model, substantially reducing inference cost in production scenarios\.
##### \(2\) Knowledge recognition\.
During online knowledge recognition, given an itemxix\_\{i\}and an attribute keykk, the recall module initially returns the Top\-KKcandidate attribute value setCkC\_\{k\}\. The discrimination model then uses the formulation above to select the subsetCk∗C\_\{k\}^\{\*\}that is semantically consistent with the item\. If no candidate value matches the item information, the model outputs an empty set, indicating that the item has no compatible attribute value for the specifiedkk\.
Finally, the predicted result is converted into structured item knowledge:
ℛi,k=\{⟨xi,k,v⟩∣v∈Ck∗\}\.\\mathcal\{R\}\_\{i,k\}=\\\{\\langle x\_\{i\},k,v\\rangle\\mid v\\in C\_\{k\}^\{\*\}\\\}\.
Thus, the discrimination model performs precise knowledge recognition within the recalled candidate space\. The output is strictly constrained by the item ontology, which enhances system reliability and mitigates model hallucination\. Furthermore, newly added ontology entries can be readily incorporated into the online recognition workflow as soon as they are encoded into the ontology index\.
#### 4\.2\.3Results
To assess the efficacy of𝖲2𝖣\\mathsf\{S\}^\{2\}\\mathsf\{D\}, we conducted random sampling of items across diverse categories to evaluate their knowledge recognition results\. Ground\-truth labels were established through a machine\-assisted pre\-labeling and human\-verification procedure: the system first generated preliminary labels, which human annotators then validated against item information and item ontology definitions to produce final gold labels\. The resulting samples are aggregated into theitem knowledge test set, constituting the unified evaluation benchmark for subsequent model iterations\.
Following this evaluation protocol,𝖲2𝖣\\mathsf\{S\}^\{2\}\\mathsf\{D\}achieves 92% precision and 78\.3% recall\. In terms of item knowledge asset gain, compared with merchant\-provided data, the average number of attributes per SKU increased to 1\.5×\\timesthe original level, while the total volume of item\-knowledge assets \(SKU×\\timesattribute key×\\timesattribute value\) expanded by a factor of 3\.35, reaching hundreds of billions\. Of this total, merchant data accounts for 30% and AI\-generated data for 70%\. These results demonstrate that𝖲2𝖣\\mathsf\{S\}^\{2\}\\mathsf\{D\}effectively enhances item knowledge coverage while maintaining high recognition accuracy\.
Although the methods in Sections[3\.2](https://arxiv.org/html/2606.28070#S3.SS2)and[4\.2](https://arxiv.org/html/2606.28070#S4.SS2)have addressed item knowledge discovery and recognition, stable and controllable iteration remains an open challenge\. In Section[5](https://arxiv.org/html/2606.28070#S5), we consolidate and extend these capabilities into the item understanding LLMs/VLMs with a controllable, continuously evolving training framework\.
### 4\.3Throughput Efficiency Improvement
The𝖲2𝖣\\mathsf\{S\}^\{2\}\\mathsf\{D\}architecture effectively resolves the scalability challenges associated with item knowledge recognition\. Nevertheless, in industrial\-scale production environments, tens of billions of items still impose stringent requirements on data freshness and system throughput\. Consequently, throughput optimization must be grounded in a rigorous analysis of the intrinsic characteristics of large\-scale e\-commerce data\. We summarize three key observations:
- •Homogeneity of item features\.A substantial number of redundant SKU listings exist for identical physical items \(e\.g\., the iPhone 17 Pro Max\)\.
- •Skewed attribute distribution\.Common attributes are invoked frequently, whereas long\-tail attributes, such as “cuff pleat type”, are extremely sparse\.
- •Repetitive computation across SKUs\.Multiple SKUs associated with the same SPU share the vast majority of core knowledge, resulting in redundant recognition computations if processed in isolation\.
These observations inform our primary optimization objective: shifting the computational paradigm from “all SKUs×\\timesall attributes” to “differentiated SKUs×\\timeshighly relevant attributes”\. This transformation substantially reduces compute costs while preserving recognition accuracy\. To this end, we enhance throughput through three complementary strategies: computational load reduction, cache reuse, and asynchronous pipeline parallelism\.
#### 4\.3\.1Computational Load Reduction
Computational load reduction implements this paradigm shift across two dimensions: SKU\-level load reduction and attribute\-level load reduction\.
##### \(1\) SKU\-level load reduction\.
SKU\-level load reduction condenses “all SKUs” into a smaller set of differentiated SKUs\. In large\-scale item catalogs, many SKUs are homogeneous variants characterized by highly similar item information and attributes\. To reduce computational redundancy, the system first determines through semantic retrieval whether the current item can reuse existing attribute recognition results\.
Specifically, given an itemxx, we use the encoderfencf\_\{\\text\{enc\}\}trained in Section[4\.2\.1](https://arxiv.org/html/2606.28070#S4.SS2.SSS1)to derive its semantic representation\. We then construct a category\-stratified vector retrieval index\. Rather than conducting an exhaustive search over the entire item spaceSS, the system constrains retrieval to the same\-category candidate setSc⊂SS\_\{c\}\\subset S, where\|Sc\|≪\|S\|\|S\_\{c\}\|\\ll\|S\|\. Within this reduced candidate set, the system computes cosine similarity to retrieve the Top\-1 most similar item\.
If the similarity score of the Top\-1 item exceeds a predefined threshold, the system directly reuses that item’s existing attribute recognition results for the current item\. This mechanism effectively reduces redundant computations caused by homogeneous SKU variants\.
##### \(2\) Attribute\-level load reduction\.
Attribute\-level load reduction further recasts “full\-attribute scanning” as “highly relevant attribute probing”\. Rather than running𝖲2𝖣\\mathsf\{S\}^\{2\}\\mathsf\{D\}across all attributes within a category, the system adaptively constrains the attribute space through two relevance\-probing tasks: single\-SKU attribute relevance probing \(Task A\) and SKU\-pair attribute relevance probing \(Task B\)\.
###### Task definition and data construction\.
To enable attribute\-level load reduction, we train an attribute relevance\-probing model prior to the𝖲2𝖣\\mathsf\{S\}^\{2\}\\mathsf\{D\}stage\. The training data are uniformly structured as discrimination triples⟨x,K,K∗⟩\\langle x,K,K^\{\*\}\\rangle, whereKKdenotes the complete set of attribute keys under a category andK∗K^\{\*\}denotes the subset of highly relevant attribute keys predicted by the model\. By injecting the relevance instructionIdis\-kI\_\{\\text\{dis\-k\}\}, the task is formulated as:
fdis\-k\(x,K,Idis\-k\)→K∗,whereK∗⊆K\.f\_\{\\text\{dis\-k\}\}\(x,K,I\_\{\\text\{dis\-k\}\}\)\\rightarrow K^\{\*\},\\quad\\text\{where\}\\quad K^\{\*\}\\subseteq K\.
The model jointly learns two sparsification strategies under a unified multi\-task supervised fine\-tuning framework:
- •Task A: single\-SKU attribute relevance probing\.Given an itemppwith item informationxpx\_\{p\}and the complete category\-level attribute key setKK, the objective is to identify the subset of attributesK∗\(p\)K^\{\*\}\(p\)that are relevant to the item\.
- •Task B: SKU\-pair attribute relevance probing\.Given a new itemppand a reference itemp′p^\{\\prime\}associated with the same SPU, the inputxxis formulated as the differential fieldsΔ\(xp,xp′\)\\Delta\(x\_\{p\},x\_\{p^\{\\prime\}\}\)\. The objective is to identify the subset of attributesKΔ∗K^\{\*\}\_\{\\Delta\}that are affected by the differences betweenppandp′p^\{\\prime\}\.
###### Multi\-task training and unified optimization\.
We integrate Task A and Task B into a unified multi\-task SFT framework and use a dynamic mixing strategy to balance their data distributions during training\. The model is optimized using the following autoregressive objective:
ℒ=−∑tlogP\(Kt∗∣K<t∗,x,K,Idis\-k\)\.\\mathcal\{L\}=\-\\sum\_\{t\}\\log P\\left\(K^\{\*\}\_\{t\}\\mid K^\{\*\}\_\{<t\},x,K,I\_\{\\text\{dis\-k\}\}\\right\)\.This training objective encourages the model to acquire a generalized mapping from item semantics to attribute relevance, thereby enhancing its ability to determine which attributes require further processing by the𝖲2𝖣\\mathsf\{S\}^\{2\}\\mathsf\{D\}pipeline\.
###### Inference\.
During inference, different capabilities are activated through task\-specific instruction prefixes, enabling a unified model interface with task\-adaptive behavior\.
- •Task A inference\.For an individual itempp, the model receivesxpx\_\{p\}and the category\-level attribute key setKKas input and generates the highly relevant attribute subsetK∗\(p\)K^\{\*\}\(p\)\. The subsequent𝖲2𝖣\\mathsf\{S\}^\{2\}\\mathsf\{D\}process is initiated exclusively for attributes withinK∗\(p\)K^\{\*\}\(p\), whereas irrelevant attributes are assigned null values\.
- •Task B inference\.For an item pair\(p,p′\)\(p,p^\{\\prime\}\), where the attribute recognition results of the reference itemp′p^\{\\prime\}are accessible, the model uses the differential fieldsΔ\(xp,xp′\)\\Delta\(x\_\{p\},x\_\{p^\{\\prime\}\}\)and the attribute key setKKas input to predict the affected attribute subsetKΔ∗K^\{\*\}\_\{\\Delta\}\. Attributes excluded from this affected subset \(K∖KΔ∗K\\setminus K^\{\*\}\_\{\\Delta\}\) directly inherit the recognition results of the reference itemp′p^\{\\prime\}, thereby substantially reducing computational redundancy\.
Through SKU\-level semantic deduplication and attribute\-level sparsification alone, the\\productfontOxygen AIIC production pipeline achieves a threefold improvement in throughput efficiency\.
#### 4\.3\.2Extreme Cache Reuse
The𝖲2𝖣\\mathsf\{S\}^\{2\}\\mathsf\{D\}inference prompt comprises task instructions, item information, attribute keys, and candidate values\. In practice, different SKUs under the same SPU share approximately 85% of their item information, such as item descriptions and other detailed technical content\. Nevertheless, conventional approaches compute the entire prompt independently for each SKU, resulting in substantial redundancy in both memory consumption and computation\. To mitigate this inefficiency, we implement SPU\-level prefix cache reuse\.
- •Prompt structure optimization\.We place invariant information shared within an SPU, such as item descriptions, at the start of the prompt to establish a shared prefix, thereby maximizing prefix\-cache hit rates\.
- •Cache\-aware locality guarantee\.Requests are grouped by SPU and routed to designated NPU devices, preventing frequent cache eviction caused by stateless load balancing\.
- •Memory management tuning\.Given the structured nature of our prompts, we conduct source\-code\-level tuning of theblock\_sizeparameter in vLLM\(Kwonet al\.,[2023b](https://arxiv.org/html/2606.28070#bib.bib52)\)\. By keeping cache hit rates high while limiting block\-management overhead, we empirically identifyblock\_size=16as the optimal configuration for this scenario, compared with the default value of 128\. This adjustment substantially enhances memory efficiency\.
These optimizations alone improve production throughput by more than sixfold\.
#### 4\.3\.3Asynchronous Pipeline Parallelism
The three\-stage\\productfontOxygen AIIC production workflow, comprising vector generation, semantic search, and discrimination, requires coordination between heterogeneous hardware, specifically NPUs and CPUs\. In this architecture, vector generation and discrimination are offloaded to NPUs, while semantic search is processed by a Faiss\-based CPU cluster\. However, data dependencies and inter\-processor communication introduce idle intervals, resulting in compute resource underutilization and the formation of compute bubbles\. To address these inefficiencies, we propose a hierarchical architecture that integrates L0 fine\-grained data parallelism with an L1 cross\-chunk asynchronous pipeline\.
- •L0 fine\-grained data parallelism\.Rather than employing conventional coarse\-grained SPU\-level chunk partitioning, we partition large SPUs into fine\-grained SKU chunks and implement dynamic load balancing\. This prevents a single large chunk from occupying CPU resources for an extended period and blocking the entire pipeline\.
- •L1 cross\-chunk asynchronous pipeline\.We decouple sequential dependencies across chunks temporally, enabling the three stages to operate seamlessly and asynchronously\. In the vector generation stage on NPUs, vLLM workers perform dynamic sharding and process different chunks according to available memory, eliminating the need to wait for an entire chunk to be ready\. In the retrieval stage on CPUs, embeddings from multiple chunks are aggregated asynchronously to construct optimized batches, thereby maximizing vector database throughput\. In the discrimination stage on NPUs, recalled sub\-tasks are dynamically scheduled according to memory pressure, guaranteeing sustained utilization of NPU compute resources\.
This architecture alone reduces waiting latency across heterogeneous hardware and achieves more than a twofold improvement in overall system throughput\.
#### 4\.3\.4Overall Effectiveness
While each of the aforementioned strategies yields significant throughput gains, they are difficult to stack efficiently in practice\. To assess the comprehensive benefits of the integrated system, we conducted controlled experiments under a uniform physical environment \(a single Huawei Ascend 910C NPU\), using the identical ontology and sampled item set\. We measure throughput by the number of SKU×\\timesattribute key pairs processed per unit time per unit of compute, thereby reflecting the system’s capability to perform large\-scale knowledge inference\. Under identical evaluation conditions, we executed the end\-to\-end inference pipeline before and after optimization and compared the total execution times to determine the throughput improvement ratio\. All efficiency gains reported in this paper are computed according to this evaluation protocol\.
By jointly applying the three\-stage optimization strategies \(computational load reduction, cache reuse, and asynchronous pipeline parallelism\), the overall throughput efficiency is improved by more than tenfold\. As the dynamically evolving ontology expands and the attribute dimension increases, all three optimizations exhibit strong scaling effects: a larger attribute space provides greater potential for sparsification and enhances prefix cache reuse\. Consequently, system\-level throughput gains are expected to become even more pronounced as the attribute dimension continues to expand\.
## 5Item Understanding LLMs/VLMs
Figure 6:Overview of the\\productfontOxygen AIIC framework for the item understanding LLMs/VLMs\. Constructed upon a unified multi\-task item understanding foundation model, the framework supports incremental capability expansion, incorporates instruction\-following knowledge representation, and implements a closed\-loop model self\-evolution mechanism to continuously enhance model performance and data quality\.Along two dimensions—ontology engineering and the AI Item Library—the previous sections present an end\-to\-end pipeline that spans from dynamically evolving ontology construction to large\-scale item knowledge production\. Sustaining the long\-term reliability of this system, while enabling its continuous evolution, requires a robust and controllable iteration framework\. Building such a framework poses the following challenges:
- •Weak off\-the\-shelf performance in knowledge\-intensive domains:\\productfontOxygen AIIC draws on extensive industry knowledge that off\-the\-shelf models lack, which limits their effectiveness\(Gururanganet al\.,[2020](https://arxiv.org/html/2606.28070#bib.bib53)\)\.
- •Costly full fine\-tuning under frequent incremental updates:\\productfontOxygen AIIC continually expands with new domains, ontology entries, and tasks; static models cannot keep pace, while frequent full retraining is inefficient and risks catastrophic forgetting\(McCloskey and Cohen,[1989](https://arxiv.org/html/2606.28070#bib.bib33); Kirkpatricket al\.,[2017](https://arxiv.org/html/2606.28070#bib.bib34)\)\.
- •Key features buried in strong noise: items carry rich information, and their implicit fine\-grained features are easily obscured by irrelevant signals; this problem is especially pronounced for representation models\.
- •Subtle, hard\-to\-fix recognition defects: as the overall quality of item knowledge improves, the model still performs poorly on a small number of ontology recognition cases; such failures are subtle, rarely surface in aggregate metrics, cannot be resolved simply by adding training data, and instead require tailored data for repair\.
To address these challenges, and building on the stable, controllable production pipeline established before, this section introduces the item understanding LLMs/VLMs framework of\\productfontOxygen AIIC \(Figure[6](https://arxiv.org/html/2606.28070#S5.F6)\)\. Developed upon a unified multi\-task item understanding LLMs/VLMs foundation, it supports incremental capability expansion, instruction\-following knowledge representation, and a closed\-loop self\-evolution mechanism, continuously enhancing model performance while supporting both ontology engineering and the AI Item Library\.
### 5\.1Multi\-task Item Understanding LLMs/VLMs Foundation
\\productfont
Oxygen AIIC originally trained separate models for scenarios such as knowledge discovery, semantic representation, and discrimination\-based knowledge recognition\. Although their training data all describe similar item–ontology semantics, such isolated training limited data reuse and knowledge transfer across capabilities while increasing model\-management overhead\(Ruder,[2017](https://arxiv.org/html/2606.28070#bib.bib56); Raffelet al\.,[2020](https://arxiv.org/html/2606.28070#bib.bib57)\)\.
In the unified foundation, we organize the generative \(non\-representation\) knowledge\-production tasks across these scenarios into two families: knowledge extraction and knowledge recognition\. We therefore reorganize the existing training data into a multi\-task supervised fine\-tuning \(multi\-task SFT\) format\(Weiet al\.,[2022a](https://arxiv.org/html/2606.28070#bib.bib35); Chunget al\.,[2024](https://arxiv.org/html/2606.28070#bib.bib36)\)and consolidate these capabilities into the item understanding LLMs/VLMs:
1. 1\.Knowledge extraction: autonomously identifies and extracts knowledge from item information, encompassing tasks such as attribute key extraction, attribute value extraction, and key\-value extraction\.
2. 2\.Knowledge recognition: executes authenticity judgment or subset selection based on item information and a specified candidate space, including key discrimination, value discrimination, and key\-value discrimination\.
Through unified modeling of these tasks, the item understanding LLMs/VLMs acquire a generalized understanding of items, ontology entries, and other task patterns, thereby serving as the foundation for\\productfontOxygen AIIC’s model capabilities\.
### 5\.2Incremental Adaptation
The multi\-task foundation provides strong, general item understanding, but it exhibits limited generalization to continuously emerging ontology entries \(such as a newly added “pet food” category\), and frequent full retraining significantly reduces iteration efficiency\.
To address this, we introduce a task\-free, lightweight incremental learning mechanism to expand the model’s capability boundaries without fully retraining the foundation model, while preserving existing capabilities\(Houlsbyet al\.,[2019](https://arxiv.org/html/2606.28070#bib.bib58); Biesialskaet al\.,[2020](https://arxiv.org/html/2606.28070#bib.bib59)\)\. The core method is to build on the robust multi\-task foundation, develop lightweight “expert modules” for incremental requirements, and dynamically integrate them into the expert pool, enabling agile capability expansion\. Figure[7](https://arxiv.org/html/2606.28070#S5.F7)illustrates the incremental adaptation mechanism\.
Figure 7:Incremental adaptation based on LoRAM experts and adaptive expert composition\. A frozen SFT backbone is combined with multiple lightweight expert updates, and GRPO optimizes expert composition via task feedback\.##### \(1\) Rapid adaptation to incremental tasks\.
To keep up with ontology iteration, the system incrementally trains the corresponding expert modules, such as “pet food” and “beauty and skincare” domain experts\.
Low\-Rank Adaptation \(LoRA\) has become the preferred method for incremental adaptation owing to its high parameter efficiency\(Huet al\.,[2022](https://arxiv.org/html/2606.28070#bib.bib37)\)\. It implements weight updates via the product of low\-rank matrices:ΔW=ρrBA\\Delta W=\\frac\{\\rho\}\{r\}BA, whereB∈ℝn×rB\\in\\mathbb\{R\}^\{n\\times r\},A∈ℝr×mA\\in\\mathbb\{R\}^\{r\\times m\}, andrrandρ\\rhorepresent the rank and the scaling factor, respectively\.
However, conventional LoRA is constrained by its inherent low\-rank structure, and the magnitude of its weight updates is significantly lower than that of full fine\-tuning\. This impairs the early\-stage fitting dynamics, slows convergence, and prevents the model from reaching the desired accuracy under limited business data, even when the learning rate is increased\. Consequently, we introduce LoRAM initialization based on the Magnitude Principle\(Zhanget al\.,[2026b](https://arxiv.org/html/2606.28070#bib.bib38)\)\. By directly constructing an initial state with stronger update momentum, LoRAM accelerates convergence and enhances accuracy\. The strategy uses the discrete sine transform \(DST\) to construct deterministic orthogonal basesξ\\xiand integrates them with a pretrained\-weight\-based gain coefficientβ\\betato define the initial states of the low\-rank matrices:
B\(0\)=β⋅ξn,A\(0\)=β⋅ξm⊤\.B^\{\(0\)\}=\\beta\\cdot\\xi\_\{n\},\\quad A^\{\(0\)\}=\\beta\\cdot\\xi\_\{m\}^\{\\top\}\.To guarantee that the model’s initial behavior matches that of the original foundation model, the system simultaneously performs weight compensation:
W←W−β2ξnξm⊤\.W\\leftarrow W\-\\beta^\{2\}\\xi\_\{n\}\\xi\_\{m\}^\{\\top\}\.LoRAM optimizes the adapter’s magnitude dynamics without additional memory overhead or preprocessing cost, improving the fitting efficiency while raising the performance ceiling for incremental requirements\.
##### \(2\) Expert\-pool scheduling\.
LoRAM produces single\-scenario experts\. However, during inference, the model achieves higher recognition precision by dynamically aggregating multiple expert capabilities and sharing knowledge across all categories\.
To this end, we propose GROLE, an incremental learning strategy based on adaptive LoRA expert composition\(Liaoet al\.,[2026](https://arxiv.org/html/2606.28070#bib.bib91)\)\. We establish a modular expert pool in which eachΔWLoRAM,i\\Delta W\_\{\\text\{LoRAM\},i\}represents an independent capability unit\. An adaptive selectorgϕg\_\{\\phi\}estimates the fusion weights𝜶\\bm\{\\alpha\}of incremental experts from inputxxand task instructionII, with weights constrained to be nonnegative and to sum to one\. The model output is determined by the linear combination of the foundation and experts:
h=\(W0\+∑i=1nαiΔWLoRAM,i\)x\.h=\\left\(W\_\{0\}\+\\sum\_\{i=1\}^\{n\}\\alpha\_\{i\}\\Delta W\_\{\\text\{LoRAM\},i\}\\right\)x\.whereW0W\_\{0\}represents the multi\-task foundation model andnnis the number of experts\. This mechanism enables the model to achieve both general semantic understanding and specific logical judgment within a unified architecture\.
Given that expert weights lack explicit labels, we employ Group Relative Policy Optimization \(GRPO\)\(Shaoet al\.,[2024](https://arxiv.org/html/2606.28070#bib.bib41)\)to model the allocation process as feedback\-driven policy learning\. The system samplesGGgroups of expert weights\{𝜶j\}j=1G\\\{\\bm\{\\alpha\}\_\{j\}\\\}\_\{j=1\}^\{G\}from a Dirichlet distribution and calculates relative advantages\{Aj\}j=1G\\\{A\_\{j\}\\\}\_\{j=1\}^\{G\}:
Aj=rj−mean\(𝑹\)std\(𝑹\)\.A\_\{j\}=\\frac\{r\_\{j\}\-\\text\{mean\}\(\\bm\{R\}\)\}\{\\text\{std\}\(\\bm\{R\}\)\}\.whererjr\_\{j\}denotes the overall reward of thejj\-th weight vector𝜶j\\bm\{\\alpha\}\_\{j\}across mixed tasks \(defined as the negative loss\), and𝑹\\bm\{R\}is the set of rewards for the current group\{rj\}j=1G\\\{r\_\{j\}\\\}\_\{j=1\}^\{G\}\. The selector identifies the optimal policy by optimizing:
𝒥GRPO\(ϕ\)=1G∑j=1Gmin\[ρjAj,clip\(ρj,1−ϵ,1\+ϵ\)Aj\]\.\\mathcal\{J\}\_\{GRPO\}\(\\bm\{\\phi\}\)=\\frac\{1\}\{G\}\\sum\\limits\_\{j=1\}^\{G\}\\min\\left\[\\rho\_\{j\}A\_\{j\},\\operatorname\{clip\}\(\\rho\_\{j\},1\-\\epsilon,1\+\\epsilon\)A\_\{j\}\\right\]\.whereρj\\rho\_\{j\}denotes the importance sampling ratio of the selector with respect to𝜶j\\bm\{\\alpha\}\_\{j\}, andϵ\\epsilonis the clipping coefficient\. This mechanism enables the flexible composition of novel knowledge and task logic while preserving foundation capabilities, thereby supporting the ongoing expansion of item understanding\.
This approach enables\\productfontOxygen AIIC to iterate rapidly\. Meanwhile, to share item knowledge across all categories, the system periodically performs full fine\-tuning, consolidating distributed incremental capabilities back into the foundation model\. This supports the continuous integration of expert capabilities and the long\-term evolution of the foundation model\.
### 5\.3Instruction\-following Knowledge Representation
In contrast to the generation\-oriented item understanding LLMs/VLMs, knowledge representation aims to build a unified semantic space that maps item information, user queries, standardized ontology entries, and related information into the same vector space, supporting efficient alignment and retrieval at an industrial scale\.
The knowledge representation component illustrated in Figure[8](https://arxiv.org/html/2606.28070#S5.F8)is trained to follow task instructions while maintaining robust fine\-grained semantic alignment\.
Figure 8:Instruction\-following knowledge representation training\. The framework transfers reasoning capability through latent chain\-of\-thought \(Latent CoT\) distillation and enhances representational robustness via adaptive feature\-space perturbation\.In e\-commerce, representation models must extract knowledge signals from comprehensive item information\. Traditional embeddings are susceptible to interference from literal similarity and noise, often failing to capture fine\-grained details\. For example, a product detail page may obscure the “waterproof level” within text such as “waterproofing has been further upgraded, supporting IP68\-grade dust and water resistance, and passing the TÜV SÜD 2\-meter, 24\-hour test\.” To address this challenge, we propose a “Reasoning\-then\-Embedding” framework that performs reasoning prior to generating an embedding vector, distinguishing essential knowledge from redundant noise\(Weiet al\.,[2022b](https://arxiv.org/html/2606.28070#bib.bib60)\)\.
The framework is underpinned by two mechanisms\. First, a reasoning\-based representation mechanism built on Latent CoT distillation\(Denget al\.,[2023](https://arxiv.org/html/2606.28070#bib.bib42)\)transfers the teacher model’s reasoning capability into the hidden states of the student model\. Second, a spectral\-structure\-based adaptive noise injection strategy\(Liet al\.,[2026](https://arxiv.org/html/2606.28070#bib.bib43); Guoet al\.,[2026](https://arxiv.org/html/2606.28070#bib.bib116)\)dynamically adjusts perturbation intensity according to the representation distribution, strengthening the model’s semantic robustness\. Collectively, these mechanisms enable complex semantic modeling while remaining both efficient and accurate\.
##### \(1\) Implicit reasoning representation\.
In retrieval scenarios, representations need instruction\-following capability to suppress irrelevant information and noise\. To avoid the high latency associated with explicit CoT decoding\(Weiet al\.,[2022b](https://arxiv.org/html/2606.28070#bib.bib60)\), following InstEmb\(Gaoet al\.,[2026](https://arxiv.org/html/2606.28070#bib.bib119)\), we adopt implicit reasoning, completing the reasoning process within a single forward pass\.
We appendLLlearnable tokenss=\[⟨think⟩,s1,s2,…,sL,⟨/think⟩,⟨eos⟩\]s=\[\\langle\\text\{think\}\\rangle,s\_\{1\},s\_\{2\},\\dots,s\_\{L\},\\langle/\\text\{think\}\\rangle,\\langle\\text\{eos\}\\rangle\]after the input sequencexxas carriers for reasoning\. During training, a frozen teacher model provides supervision\(Hintonet al\.,[2015](https://arxiv.org/html/2606.28070#bib.bib61)\)\. The inputs are defined as follows:
xstudent=\[x;s\],xteacher=\[x;rcot\]x\_\{\\text\{student\}\}=\[x;s\],\\quad x\_\{\\text\{teacher\}\}=\[x;r\_\{cot\}\]wherercotr\_\{\\text\{cot\}\}is the explicit reasoning chain generated by the teacher model\. For example, for the attribute “waterproof level”, the teacher model can infer from wording such as “…supports IP68\-grade dust and water resistance…” that “IP68\-grade” is the core information, guiding the student model to internalize deeper semantics\.
We design a position\-aligned distillation loss that constrains the student model’s hidden statehiSh^\{S\}\_\{i\}at implicit\-token positions to align with the teacher model’s hidden statehi−1Th^\{T\}\_\{i\-1\}when processing the corresponding reasoning logic:
Ldistill=1L∑i=\|x\|\+2\|x\|\+L\+1\|hiS−hi−1T\|22\.L\_\{\\text\{distill\}\}=\\frac\{1\}\{L\}\\sum\_\{i=\|x\|\+2\}^\{\|x\|\+L\+1\}\\left\|h^\{S\}\_\{i\}\-h^\{T\}\_\{i\-1\}\\right\|\_\{2\}^\{2\}\.During inference, the hidden state of the last token⟨eos⟩\\langle\\text\{eos\}\\ranglein the implicit sequence is extracted as the reasoning\-based representation vector:e=h⟨eos⟩Se=h^\{S\}\_\{\\langle\\text\{eos\}\\rangle\}\. Without introducing autoregressive overhead, this method compresses core knowledge from unstructured descriptions into vector representations and enhances recall accuracy for fine\-grained knowledge\.
##### \(2\) Representation robustness enhancement\.
After the reasoning\-based representationeeis obtained, high\-frequency marketing terms and redundant descriptions in e\-commerce text may still cause the model to overfit superficial noise, weakening generalization in practical alignment scenarios\. To address this, we propose a spectral\-structure\-based adaptive noise injection strategy at the feature layer to enhance representation robustness\.
Given the batch representation setE∈ℝB×dE\\in\\mathbb\{R\}^\{B\\times d\}, we perform singular value decomposition \(SVD\):E=PΣQ⊤E=P\\Sigma Q^\{\\top\}\. The singular value matrixΣ\\Sigmareflects the energy distribution across semantic directions\. Based on this structure, we dynamically adjust noise intensity: perturbation is increased in dominant directions, which often correspond to high\-frequency noise patterns, to suppress overfitting, whereas perturbation is constrained in weak\-signal directions to preserve fine\-grained logical signals\. The noise modulation process is defined as follows:
Nscaled=Nrand⊙S\(Σdiag\),whereS∈\{ΣdiagΣdiag¯,ΣdiagΣdiag¯\}\.N\_\{\\text\{scaled\}\}=N\_\{\\text\{rand\}\}\\odot S\(\\Sigma\_\{\\mathrm\{diag\}\}\),\\quad\\text\{where \}S\\in\\left\\\{\\frac\{\\Sigma\_\{\\mathrm\{diag\}\}\}\{\\overline\{\\Sigma\_\{\\mathrm\{diag\}\}\}\},\\frac\{\\sqrt\{\\Sigma\_\{\\mathrm\{diag\}\}\}\}\{\\sqrt\{\\overline\{\\Sigma\_\{\\mathrm\{diag\}\}\}\}\}\\right\\\}\.whereNrandN\_\{\\text\{rand\}\}denotes Gaussian noise,Σdiag\\Sigma\_\{\\mathrm\{diag\}\}is the diagonal singular\-value vector, andΣdiag¯\\overline\{\\Sigma\_\{\\mathrm\{diag\}\}\}represents the mean singular value\. The noise is subsequently projected back to the representation space through a basis transformation and normalized by dimension to generate the perturbed reasoning representatione~\\tilde\{e\}:
N′=NscaledQ⊤,e~=e\+δnN′\.N^\{\\prime\}=N\_\{\\text\{scaled\}\}Q^\{\\top\},\\quad\\tilde\{e\}=e\+\\frac\{\\delta\}\{\\sqrt\{n\}\}N^\{\\prime\}\.whereδ\\deltais the global noise intensity coefficient andnnis the feature dimension\.
In the contrastive learning stage, we follow the InfoNCE loss defined in Section[3\.2\.2](https://arxiv.org/html/2606.28070#S3.SS2.SSS2)and replaceeewithe~\\tilde\{e\}to explicitly strengthen the semantic alignment:
LInfoNCE~=−1\|V\+\|∑i=1\|V\+\|logexp\(sim\(e~,e~i\+\)/τ\)exp\(sim\(e~,e~i\+\)/τ\)\+∑j=1\|V−\|exp\(sim\(e~,e~j−\)/τ\)\.L\_\{\\widetilde\{\\text\{InfoNCE\}\}\}=\-\\frac\{1\}\{\|V^\{\+\}\|\}\\sum\_\{i=1\}^\{\|V^\{\+\}\|\}\\log\\frac\{\\exp\(\\operatorname\{sim\}\(\\tilde\{e\},\\tilde\{e\}\_\{i\}^\{\+\}\)/\\tau\)\}\{\\exp\(\\operatorname\{sim\}\(\\tilde\{e\},\\tilde\{e\}\_\{i\}^\{\+\}\)/\\tau\)\+\\sum\_\{j=1\}^\{\|V^\{\-\}\|\}\\exp\(\\operatorname\{sim\}\(\\tilde\{e\},\\tilde\{e\}\_\{j\}^\{\-\}\)/\\tau\)\}\.The final training objective comprises the distillation loss and the modified InfoNCE loss, whereζ\\zetais a weighting coefficient:
Ltotal=Ldistill\+ζ⋅LInfoNCE~\.L\_\{\\text\{total\}\}=L\_\{\\text\{distill\}\}\+\\zeta\\cdot L\_\{\\widetilde\{\\text\{InfoNCE\}\}\}\.In summary, the model accurately extracts core semantics from complex e\-commerce item information and achieves high\-quality alignment and retrieval of atomic knowledge in vector space\.
### 5\.4Model Self\-evolution
As\\productfontOxygen AIIC developed, the item understanding LLMs/VLMs have achieved relatively stable performance on tasks such as knowledge extraction and recognition\. However, in practical business scenarios, model issues gradually shift from explicit errors to more subtle long\-tail defects\. Relying solely on full\-data expansion or periodic fine\-tuning makes stable, low\-cost continuous optimization difficult\.
We consequently develop a self\-evolution framework for the item understanding LLMs/VLMs\(Zelikmanet al\.,[2022](https://arxiv.org/html/2606.28070#bib.bib62); Madaanet al\.,[2023](https://arxiv.org/html/2606.28070#bib.bib63)\)\. The process comprises four stages \(data evaluation, data analysis, data synthesis, and data selection\) and establishes a closed\-loop system through model iteration\.
Figure[9](https://arxiv.org/html/2606.28070#S5.F9)provides a system\-level view of the self\-evolution loop; the following paragraphs delineate its four modules\.
Figure 9:Model self\-evolution framework\. Online feedback and proactive mining yield bad cases and hard cases, which are analyzed, converted into targeted synthetic data, prioritized based on sample value, and used for controlled model iteration\.##### Module 1: Data evaluation\.
The data evaluation module identifies bad cases and hard cases from online feedback and actively probed samples\. Samples are assessed along three dimensions: evidence consistency, model confidence, and perturbation stability\.
First, the system implements prompt\-constrained evidence consistency judgment, namely an LLM\-as\-a\-judge mechanism\(Zhenget al\.,[2023](https://arxiv.org/html/2606.28070#bib.bib44)\), to validate the triples⟨x,k,v⟩\\langle x,k,v\\rangleproduced by the model\. Here,xxdenotes item information,kkdenotes the attribute key, andvvdenotes the attribute value\. The evaluation criterion is whethervvcan be rigorously derived from the current inputxxalone\. If the evidence is contradictory or unsupported, the sample is assigned to the bad\-case set\.
Second, the system performs confidence estimation based on the model output distribution under greedy decoding\. At each decoding stepii, the system computes token\-level confidence as the probability margin between the top\-ranked token and the second\-ranked candidate token across the vocabularyVV, and subsequently determines sequence\-level confidence:
Ci=maxj∈VPi\(j\)−secondmaxk∈VPi\(k\),C\_\{i\}=\\max\_\{j\\in V\}P\_\{i\}\(j\)\-\\underset\{k\\in V\}\{\\text\{secondmax\}\}P\_\{i\}\(k\),Cavg=1N∑i=1NCi\.C\_\{\\text\{avg\}\}=\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}C\_\{i\}\.whereNNis the output sequence length\. Low overall confidence indicates ambiguity among multiple candidate answers, and the sample is designated as a hard\-case candidate\.
Finally, the system checks stability under input perturbation\. While preserving core semantics, it introduces minor perturbations to the order of candidate attributes and item descriptions\. If the output varies, the sample is also designated as a hard\-case candidate\.
The data evaluation module yields two sample types: bad cases and hard cases\. Bad cases are used for defect attribution and corrective data synthesis, while hard cases help identify the model’s capability boundaries\.
##### Module 2: Data analysis\.
The data analysis module uses large models to perform defect attribution for bad cases and hard cases\. For bad cases, it analyzes erroneous outputs to identify underlying defect causes\. For hard cases, it investigates sources of instability and categorizes samples for corrective training, boundary enhancement, or manual review\.
Common defects include evidence\-missing hallucination, where the input lacks explicit evidence for the target attribute while the model predicts an attribute value based on category commonsense; attribute\-boundary confusion, where the model identifies valid evidence but incorrectly associates it with a semantically similar attribute, resulting in an incorrect key\-value binding; attribute\-value expression deviation, where the output drifts semantically from the original evidence or over\-refines it by introducing unexpressed modifiers; and missing category\-specific rules, where the same attribute has different judgment criteria across categories and the model fails to capture category\-context differences\.
Using “lampshade material” as an example, if the item information only describes a “Nordic\-style desk lamp” but the model outputs “wood”, this is evidence\-missing hallucination\. If the text contains “metal lamp base, glass lampshade” but the model associates “metal” with “lampshade material”, this is attribute\-boundary confusion\. If the source text only states “glass material” but the model outputs “frosted glass”, this is attribute\-value expression deviation\.
The module outputs the defect type, impact scope, and suggested repair strategy, transitioning model iteration from indiscriminate data expansion to defect\-targeted remediation\.
##### Module 3: Data synthesis\.
The data synthesis module uses large models to synthesize targeted training samples based on identified defect types and repair suggestions\. Rather than merely increasing data volume, its primary goal is to produce information\-dense repair samples\.
For evidence\-missing hallucination, the system employs constrained synthesis to generate input samples lacking explicit evidence for the target attribute while constraining the target output to “not mentioned” or “cannot determine”, thereby reinforcing the model’s awareness of evidence boundaries\. For attribute\-boundary confusion, the system uses boundary\-calibration synthesis to formulate contrastive samples targeting easily confused attributes, helping the model learn the correct bindings between attributes and evidence spans\. For attribute\-value expression deviation, the system implements fact\-alignment synthesis to generate strictly consistent standardized outputs based on factual anchors in the source input, preventing the introduction of extraneous semantics during extraction\. For missing category\-specific rules, the system applies category\-rule enhancement synthesis, integrating category ontology definitions, attribute value spaces, and historical error distributions to generate training samples that align with the business interpretations of specific categories\.
For example, for the “lampshade material” issue, the system generates negative samples devoid of material evidence and trains the model to output “not mentioned”\. It also constructs contrastive samples containing both “metal lamp base” and “glass lampshade” to help the model distinguish the attribute boundaries between diverse components\.
To ensure the quality of synthetic data, all samples undergo evidence consistency validation and format validation; high\-risk samples are further evaluated via manual spot checks\. The module generates candidate training data accompanied by defect labels, repair objectives, and sample\-source markers\.
##### Module 4: Data selection\.
The data selection module identifies high\-utility samples from synthetic samples, historical samples, and online feedback samples to enhance iteration efficiency and model performance\.
We introduce a sample utility measurement strategy grounded in the cognitive gap\(Xieet al\.,[2023](https://arxiv.org/html/2606.28070#bib.bib45); Wanget al\.,[2026a](https://arxiv.org/html/2606.28070#bib.bib117);[b](https://arxiv.org/html/2606.28070#bib.bib118)\)\. For a candidate samplexx, we calculate the losses of the current modeluuand the expert modelwwon the sample, and define the sample utility score as:
Lexcess\(x\)=ℒ𝐮\(x\)−ℒ𝐰\(x\)\.L\_\{\\text\{excess\}\}\(x\)=\\mathcal\{L\}\_\{\\mathbf\{u\}\}\(x\)\-\\mathcal\{L\}\_\{\\mathbf\{w\}\}\(x\)\.where the current modeluurepresents the active online model or the model to be optimized, and the expert modelwwcan be a higher\-capacity teacher model, a strong baseline model, or a domain model\.
A highLexcessL\_\{\\text\{excess\}\}indicates that the current model has not yet mastered the sample, whereas the expert model exhibits proficiency, making the sample highly informative and valuable for training\. If both losses are low, the sample is likely redundant and may be reserved for distribution preservation or deferred\. If both losses are high, the sample may suffer from label noise, insufficient evidence, or semantic ambiguity, and is subsequently directed to the hard\-case pool for manual verification\.
Data selection operates in two modes\. Static offline selection evaluates the candidate sample pool at scale prior to training and constructs a specialized training set\. Dynamic online selection adaptively adjusts training batches based on model convergence during incremental training, allowing the model to prioritize the hardest and most informative samples\.
The module categorizes candidate samples based on their respective roles in the subsequent iteration: \(1\)Training set\.High\-utility samples are incorporated into the training set for model optimization\. \(2\)Current\-round test set\.Representative bad cases and stability hard cases identified in the current iteration are maintained as test samples for the attributes currently being optimized\. After each iteration, these samples are annotated via the procedure described in Section[4\.2\.3](https://arxiv.org/html/2606.28070#S4.SS2.SSS3)and then merged into the item knowledge test set, ensuring the benchmark continuously expands and supports all subsequent evaluation scenarios\. Samples identified as redundant, low\-utility, ambiguous, insufficiently grounded, or confirmed noisy are excluded from both training and validation\.
### 5\.5Metrics
Throughout the continuous iteration of the item understanding LLMs/VLMs, we conduct offline validation by computing metrics on end\-to\-end item knowledge production results using the unifieditem knowledge test set\.
After incorporating multi\-task SFT, incremental scenario adaptation, instruction\-following knowledge representation training, and model self\-evolution, the latest model attains 94\.2% precision and 82\.8% recall in end\-to\-end AI Item Library production\. Compared with the results in Section[4\.2\.3](https://arxiv.org/html/2606.28070#S4.SS2.SSS3), precision and recall exhibit increases of 2\.2% and 4\.5%, respectively\.
The evaluation results further demonstrate that only 0\.8% of attributes undergo a precision degradation of more than 5%, indicating that the model enhances overall item knowledge production quality while maintaining stable performance on the continuously growing benchmark\.
For newly added ontology entries, the end\-to\-end turnaround time from attribute mining to finalized production data is reduced from over 30 days to approximately two weeks, representing a substantial improvement in production efficiency\.
### 5\.6Platform Capabilities
During the large\-scale deployment of\\productfontOxygen AIIC, the underlying compute platform encounters two primary technical challenges: model training and inference on Huawei Ascend NPUs, and the efficient use of compute resources\. At the model level, the compute platform offers a unified training and inference framework compatible with Huawei Ascend NPUs, ensuring the robust and efficient execution of the end\-to\-end\\productfontOxygen AIIC pipeline\. At the resource level, the compute platform incorporates elastic resource scheduling for offline NPU clusters, enabling compute resource reuse across the production workflow and improving overall compute utilization\.
## 6Item Tunnel
To support ontology construction, AI Item Library production, and model self\-evolution, while enabling efficient consumption by downstream applications, we designed and implemented\\productfontOxygen AIIC’s unified item tunnel as a data and compute hub between production and applications\.
The system must address four engineering challenges\. First, there is an inherent trade\-off between data freshness and cost, while downstream consumers differ substantially in freshness requirements, item scope, and ontology coverage\. Second, the evolution of the AI Item Library relies on multiple hybrid stream\-batch pipelines; long cascades across heterogeneous engines may introduce distributed\-state synchronization problems and data inconsistencies\. Third, data flows covering tens of billions of items consume CPU, NPU, and distributed\-storage resources across the full stack, so even local inefficiencies can affect global stability and compute\-cost amortization\. Fourth, business scenarios share a common foundation, take multimodal inputs, and differ only in service details; ad hoc customization would fragment interfaces, undermine contract governance, waste resources, and weaken SLA guarantees\.
To address these challenges, we developed the item tunnel, a unified infrastructure for large\-scale knowledge management\. The item tunnel supports hundreds of millions of AI Item Library updates per day, delivers tiered freshness spanning seconds, minutes, and days, and maintains eventual consistency across heterogeneous service modalities with convergence achieved within minutes\. Powered by this infrastructure,\\productfontOxygen AIIC has built a dynamic AI Item Library covering tens of billions of items\. The system is now deployed across a broad spectrum of production applications, including search, advertising, marketing, shopping assistants, item operations, product listing, and platform governance\.
The following four engineering practices correspond to the four challenges above\.
- •Tiered\-freshness pipelines: inference tasks are assigned based on data\-change frequency and business value to provide freshness tiers from seconds to days\.Inference is the key factor governing both freshness and cost: offline inference has high throughput and NPU utilization but longer end\-to\-end latency, whereas real\-time inference responds quickly but may repeatedly process transient intermediate states and amplify compute cost\(Kwonet al\.,[2023a](https://arxiv.org/html/2606.28070#bib.bib103); Agrawalet al\.,[2024](https://arxiv.org/html/2606.28070#bib.bib104)\)\. We therefore divide computation into three paths\. The offline path uses batch computation and offline inference to build a low\-cost item\-knowledge baseline, and decouples CPU processing and NPU inference into a pipeline to improve NPU utilization\. The nearline path refreshes minute\-level increments through micro\-batches\. The real\-time path is reserved for high\-value, highly time\-sensitive items and triggers online inference only after staged pruning, retrieval, and discrimination\. Together, the real\-time, nearline, and offline paths deliver second\-, minute\-, and day\-level freshness, respectively, balancing latency, freshness, and cost\(Akidauet al\.,[2015](https://arxiv.org/html/2606.28070#bib.bib105)\)\.
- •Eventual consistency: the system resolves data\-version divergence across cascaded production paths and controls convergence time by freshness tier, down to the minute level\.The tiered\-freshness design is essentially a hybrid stream\-batch, Lambda\-style architecture\(Marz and Warren,[2015](https://arxiv.org/html/2606.28070#bib.bib106); Akidauet al\.,[2015](https://arxiv.org/html/2606.28070#bib.bib105)\), so it must prevent incorrect version overwrites and data loss caused by distributed production, replay, or delayed data\. At the same time, it must guarantee convergence under the eventual\-consistency model used in large distributed stores\(Vogels,[2009](https://arxiv.org/html/2606.28070#bib.bib107); DeCandiaet al\.,[2007](https://arxiv.org/html/2606.28070#bib.bib108)\)\. We achieve consistency through three mechanisms\. For merge\-conflict resolution, we introduce monotonically increasing version markers: streaming results use event time and batch results use snapshot time\. Based on Hudi’s MVCC and ACID capabilities, the system applies last\-write\-wins semantics at the item\-attribute granularity, so higher versions overwrite lower versions\(Apache Hudi,[2026a](https://arxiv.org/html/2606.28070#bib.bib110);[b](https://arxiv.org/html/2606.28070#bib.bib111); Amazon Web Services,[2026](https://arxiv.org/html/2606.28070#bib.bib109)\)\. For computational consistency, operators are separated into stateless and stateful classes; stateful operators always read the latest item state, and Spark partition\-level atomic commit, together with Flink checkpoint persistence and alignment, provides At\-Least\-Once recovery from persisted checkpoints\(Zahariaet al\.,[2012](https://arxiv.org/html/2606.28070#bib.bib112); Carboneet al\.,[2017](https://arxiv.org/html/2606.28070#bib.bib113)\)\. For concurrency control, compute and storage partitions are aligned by item ID, and read\-write locks plus scheduler mutual exclusion prevent read\-write conflicts and duplicate processing\. High\-freshness scenarios therefore converge within minutes, while the remaining data is exposed through daily fully consistent snapshots\.
- •Storage\-compute efficiency: physical decoupling and elastic scheduling support stable production over data flows covering tens of billions of items\.Heterogeneous cross\-cluster coordination between Spark/Flink and the vLLM inference cluster depends on three mechanisms\. First, storage and computation are aligned via a unified data contract: offline production is anchored on Parquet\-formatted shards in HDFS, real\-time production is anchored on Kafka partitions, and upstream computation pre\-aggregates by category and SPU to produce strict\-schema shards\(Apache Parquet,[2026](https://arxiv.org/html/2606.28070#bib.bib114); Krepset al\.,[2011](https://arxiv.org/html/2606.28070#bib.bib115)\)\. Second, vLLM workers dynamically claim work shards and reuse the prefix cache, turning fixed assignment into a dynamic pipeline and reducing the impact of straggler nodes\(Kwonet al\.,[2023a](https://arxiv.org/html/2606.28070#bib.bib103)\)\. Third, checkpointed operator state is keyed by item ID, so after a failure the system retries only unfinished shards at fine granularity\. This design preserves compute efficiency while preventing data loss\. Overall, this yields a production system that provides on\-demand resource allocation, retryable tasks, and end\-to\-end monitoring\.
- •Unified services: a standard service matrix replaces fragmented integrations and supports scalable reuse across business scenarios\.On top of algorithmic and engineering infrastructure, the tunnel unifies offline storage, containerization, inference engines, and service frameworks, and exposes two standard service classes\. Item data services provide online lookup of ontology and item instances, large\-scale relational tables for high\-throughput consumption, and event\-driven full or incremental change streams, covering both static retrieval and dynamic subscription\. Item algorithm services use the item understanding LLMs/VLMs to embed algorithmic capabilities into item\-lifecycle stages such as product listing validation and compliance governance, and expose high\-precision category prediction, fine\-grained attribute recognition, and long\- and short\-title generation as online AI services\. This turns scenario\-specific model use into platform\-level reuse\.
## 7Applications
\\productfont
Oxygen AIIC has constructed millions of ontology entries and hundreds of billions of high\-quality item\-knowledge assets, covering all major JD categories\. To make these core knowledge assets broadly usable across diverse business scenarios,\\productfontOxygen AIIC standardizes and productizes its foundational resources through the item tunnel’s unified delivery layer, exposing them as a portfolio of reusable capabilities\. This design closes the loop across scenarios, roles, and feedback channels, making\\productfontOxygen AIIC a strategic digital knowledge infrastructure for high\-quality business growth across the group\.
### 7\.1Consumer\-facing Applications
By improving the quality and richness of item\-information at the source,\\productfontOxygen AIIC has been integrated into core consumer shopping flows\. It supports more precise traffic distribution, improves the shopping experience, and better serves users’ diverse and personalized needs as well as emerging forms of intelligent shopping interaction\.
Search\.\\productfontOxygen AIIC has been integrated into core search traffic\-allocation stages, including recall, relevance ranking, and query understanding\. It improves the quality of item information and raises its richness to 3\.35×\\timesthe previous level, reducing the share of item\-information defects in search by 37% and thereby lowering the overall search bad\-case rate\. The AI Item Library and ontology also support search guidance and faceted filtering, helping users find desired items more quickly and accurately\.
Item detail pages\.\\productfontOxygen AIIC improves the content displayed on product detail pages by extracting the core structured information that users care about and presenting it in a form that is easy to read and understand\. Traditional detail pages often contain inconsistent information across touchpoints, repeated content, and overly technical specifications, all of which increase users’ decision\-making cost\.\\productfontOxygen AIIC mitigates information conflicts at the source, surfaces intelligent short titles, core selling points, and core attributes, and adds AI explanations, such as product\-function summaries and parameter comparisons\. These capabilities turn dense technical specifications into plain\-language descriptions, helping users quickly grasp an item’s core value and improving both user experience and conversion efficiency\.
AI shopping assistants\.With\\productfontOxygen AIIC as the item\-knowledge foundation, AI shopping assistants can access structured and standardized knowledge across categories, enabling more accurate understanding of both items and user intent, driving a smarter shopping experience\. Representative applications include:
- •Conversational e\-commerce\.Moving beyond keyword search and static shopping pages, conversational e\-commerce uses large\-scale structured item knowledge to handle fragmented, scenario\-based, and long\-tail shopping needs through natural\-language dialogue\. Users can find items, check specifications, and ask about matching or usage scenarios through casual conversation\.
- •AI comparison\.The platform automatically identifies users’ comparison needs and initiates an AI\-powered comparison flow\. It generates structured reports covering product highlights, core parameters, and user reviews, helping users compare candidate items efficiently and make decisions with less effort\.
### 7\.2Merchant and Business Operations
For item management across JD’s domestic and international businesses, self\-operated and platform sellers, and B2C/B2B/O2O formats,\\productfontOxygen AIIC combines the AI Item Library with AI\-based item understanding and generation capabilities to form an end\-to\-end operational loop spanning category planning, product listing, and product operations\. This loop improves operational efficiency, raises data quality, and supports more precise traffic acquisition\.
Category planning\.Through intelligent analysis of platform\-wide data,\\productfontOxygen AIIC shortens the decision cycle from two or three weeks to a few days and helps identify category growth opportunities more precisely\. Built on the standardized item\-knowledge system in the AI Item Library, and combined with user behavior, industry trends, and on\- and off\-site market\-performance data, this capability is delivered to merchants and category buyers as a productized tool\. Integrated AI explanations and automated reports surface supply\-demand gaps, category competition, and user\-demand preferences, helping operators identify growth opportunities and design differentiated operating strategies\.
Product listing\.The automated fill rate of core attributes exceeds 80%, improving item\-data quality at the source while reducing operating costs for category buyers and merchants\. With the item understanding LLMs/VLMs as the model foundation and the AI Item Library and ontology as the knowledge foundation,\\productfontOxygen AIIC enables AI\-assisted product listing with end\-to\-end information pre\-filling\. Merchants and category buyers only need to upload a main image or a title, and the system automatically performs category recognition, brand recognition, and attribute filling\.
Product operations\.Based on category expertise,\\productfontOxygen AIIC optimizes item creatives at scale and increases click\-through rate by about 9%\. Through multimodal learning, the system distills optimal visual standards, including resolution, color, composition, and information hierarchy, from high\-conversion item images\. It then standardizes and enhances image creatives, reorganizes shopping\-guide information, and iterates through A/B testing, achieving click\-through gains at very low compute cost\. Copy optimization reuses the same standards to accumulate industry best practices and support merchant self\-service optimization\. In addition, a fully managed AI service allows category buyers and merchants to delegate bulk creative optimization\.
### 7\.3Platform Operations
For platform\-side scenarios involving ecosystem governance and platform\-wide resource orchestration,\\productfontOxygen AIIC drives a shift from experience\-driven operations to data\-driven and AI\-assisted decision\-making\. It covers core scenarios such as merchant recruitment and assortment review, campaign page construction, audience operations, targeted advertising, product information governance, and price governance\. By linking supply and demand,\\productfontOxygen AIIC increases the utilization and value of item assets across the platform, improves operational efficiency and marketplace governance, and supports the long\-term healthy growth of the platform ecosystem\.
Marketing operations\.
- •Assortment selection\.The assortment selection platform improves product information quality and richness across all categories, supporting more efficient marketing operations\. Traditional assortment selection is often constrained by poor product information quality and fragmented information systems, leading to incomplete or inaccurate selection results\. The AI Item Library enriches product information across categories and is integrated with the assortment selection platform\. During rule configuration, it recommends categories and attributes to avoid manual omissions, reduce configuration cost, and improve both the richness and precision of assortment results\.
- •Audience profiling\.By combining product information with user behavior,\\productfontOxygen AIIC supports fine\-grained audience profiling and attribute\-based identification of high\-potential users, improving advertising conversion and lowering customer acquisition cost\. Item attributes accumulated in the AI Item Library are linked with user\-behavior data to characterize segmented customer groups\. These attributes can also support reverse audience discovery, producing precise audience segments, such as high\-potential interest groups and competitor\-intent groups, for on\-site and off\-site advertising, push notifications, and other targeted operations\.
Platform\-ecosystem development\.
- •Product information governance\.Through an end\-to\-end item\-information governance loop, we substantially reduce the share of item\-information\-related negative exposure, improving platform\-wide item\-information compliance and quality\. The system establishes full\-lifecycle product information governance, proactively intercepts non\-compliant submissions during product listing, and, after products are listed, uses intelligent inspection tools to automatically verify items across the platform, identify category and attribute errors in bulk, and prompt category buyers and merchants to correct them\.
- •Price governance\.\\productfontOxygen AIIC provides identical\-item recognition with over 90% accuracy across all physical categories in JD Retail, helping maintain a fair and orderly pricing environment\. Category buyers need to compare prices during sourcing, procurement, and selling\-price decisions to keep prices competitive while protecting profit margins\. At the same time, the platform uses a price\-rating system to guide merchants and category buyers toward more competitive and standardized pricing\. Built on high\-dimensional product information and image\-text information in the AI Item Library, the identical\-item recognition capability supports price governance, identical\-item price comparison, price control during major promotions, and abnormal\-price detection\.
## 8Related Work
Research on e\-commerce item knowledge falls into four lines: e\-commerce knowledge graphs, ontology expansion, item knowledge production, and e\-commerce domain foundation models\. Existing works have made notable progress, but most focus on isolated point solutions rather than providing a top\-level, systematic solution to the integrated infrastructure required for the production, quality inspection, consumption, and feedback of large\-scale item knowledge\. Without that closed\-loop foundation, knowledge assets cannot be reused and improved efficiently across large\-scale business scenarios\.
### 8\.1E\-commerce Knowledge Graphs
Early e\-commerce knowledge graph research mainly focused on semantic relations among items, concepts, and user needs\. AliCoCo pointed out that the traditional category–property–value \(CPV\) system is insufficient to express real shopping needs\. Therefore, it introduced “e\-commerce concepts” as intermediate semantic entries to connect high\-level intents such as “outdoor barbecue” and “gifts for the elderly” with item organization\(Luoet al\.,[2020](https://arxiv.org/html/2606.28070#bib.bib11)\)\. AliCoCo2 further extended this to e\-commerce commonsense relation modeling, characterizing commonsense mappings among scenarios, concepts, and item features so that graphs can better support search rewriting, recommendation expansion, and semantic matching\(Luoet al\.,[2021](https://arxiv.org/html/2606.28070#bib.bib12)\)\.
Subsequently, research began to mine implicit intent knowledge from user behavior\. FolkScope combines the generative capabilities of LLMs with a human\-in\-the\-loop process to distill purchase intents from co\-purchase behavior and build an intent knowledge graph\(Yuet al\.,[2023](https://arxiv.org/html/2606.28070#bib.bib66)\)\. COSMO scales e\-commerce commonsense knowledge production to industrial settings through a pipeline of “LLM generation–human\-feedback\-trained critic/classifier–small\-model scalable generation”\(Yuet al\.,[2024](https://arxiv.org/html/2606.28070#bib.bib65)\)\. In addition, item knowledge graphs have been applied to item relation modeling, recommendation, and explainable ranking\. Representative studies include systematic graph construction pipelines with taxonomy enrichment, knowledge extraction, and quality control\(Zalmoutet al\.,[2021](https://arxiv.org/html/2606.28070#bib.bib13)\), as well as relation learning for complements, substitutes, and co\-viewed items based on user behavior and multimodal product information\(Xuet al\.,[2020](https://arxiv.org/html/2606.28070#bib.bib14)\)\. Recently, LLM\-PKG further explored distilling recommendation relations and explanations inferred by LLMs into item knowledge graphs to reduce the risk of hallucination in generative models\(Wanget al\.,[2024](https://arxiv.org/html/2606.28070#bib.bib15)\)\.
Overall, these works improve the structured representation of item relations, user intents, and e\-commerce commonsense\. However, they focus on individual stages of graph construction or application, leaving closed\-loop coordination among knowledge production, validation, deployment, and feedback insufficiently explored\.
### 8\.2Ontology Expansion
Ontology and category taxonomy expansion form another important direction in e\-commerce knowledge construction\. Since e\-commerce catalogs involve rapidly changing item types, complex category hierarchies, and dynamically evolving attribute systems, statically and manually maintained taxonomies are difficult to sustain for large\-scale item understanding\. General taxonomy expansion research has proposed various automated methods, such as taxonomy expansion based on hierarchical topic models\(Zhanget al\.,[2018](https://arxiv.org/html/2606.28070#bib.bib16)\), HiExpan’s task\-guided taxonomy construction\(Shenet al\.,[2018](https://arxiv.org/html/2606.28070#bib.bib17)\), TaxoExpan’s framework for predicting parent\-child nodes when inserting new concepts into an existing taxonomy\(Shenet al\.,[2020](https://arxiv.org/html/2606.28070#bib.bib5)\), and STEAM’s mini\-path\-based self\-supervised taxonomy expansion\(Yuet al\.,[2020](https://arxiv.org/html/2606.28070#bib.bib18)\)\.
In e\-commerce scenarios, Octet formulates taxonomy expansion as a taxonomy enrichment problem for online item catalogs and uses heterogeneous relations among queries, items, and categories for self\-supervised training to adapt to continuously emerging new item types\(Maoet al\.,[2020](https://arxiv.org/html/2606.28070#bib.bib6)\)\. KATIE further focuses on category\-attribute relation discovery, attribute importance modeling, and attribute synonym merging, showing that e\-commerce ontology expansion has shifted from simple entry expansion to fine\-grained schema learning for search, recommendation, and item understanding\(Er\-Rahmadiet al\.,[2023](https://arxiv.org/html/2606.28070#bib.bib7)\)\.
Overall, related studies mainly address how to expand an ontology and which attributes are applicable to a given category\. However, these methods model ontology evolution, attribute schema recognition, and SKU\-level knowledge completion separately, making it difficult to achieve efficient system\-level coordination\.
### 8\.3Item Knowledge Production
One core task of item knowledge production is to automatically extract attribute values from titles, descriptions, parameter tables, and images\. OpenTag formulates attribute value extraction as an open\-world extraction problem, eliminating the dependence on closed\-set dictionaries\(Zhenget al\.,[2018](https://arxiv.org/html/2606.28070#bib.bib8)\)\. SUOpenTag extends extraction tasks to thousands of attributes for industrial\-scale scenarios\(Xuet al\.,[2019](https://arxiv.org/html/2606.28070#bib.bib19)\)\. AVEQA formalizes attribute value extraction as a question\-answering task, treating attributes as questions and extracting answer spans from item contexts, thereby improving adaptability to large\-scale and unseen attributes\(Wanget al\.,[2020a](https://arxiv.org/html/2606.28070#bib.bib9)\)\.
As item detail pages contain increasingly rich sources of information, research has further shifted toward multi\-source and multimodal attribute extraction\. MAVE constructs a large\-scale multi\-source item attribute value extraction dataset\(Yanget al\.,[2022](https://arxiv.org/html/2606.28070#bib.bib20)\)\. Subsequent studies jointly model text and image signals for attribute prediction and attribute value extraction\(Zhuet al\.,[2020](https://arxiv.org/html/2606.28070#bib.bib21)\), and further explore cross\-category visual attribute extraction\(Linet al\.,[2021](https://arxiv.org/html/2606.28070#bib.bib22)\), structured multimodal Transformer encoding\(Wanget al\.,[2022](https://arxiv.org/html/2606.28070#bib.bib23)\), and large\-scale multimodal attribute extraction based on generative question answering\(Khandelwalet al\.,[2023](https://arxiv.org/html/2606.28070#bib.bib24)\)\. These works show that item knowledge completion has gradually evolved from text sequence labeling into a unified extraction problem involving open attributes, multi\-source inputs, and multimodal fusion\.
Beyond extraction, industrial\-scale item knowledge systems must also determine whether extraction results are trustworthy\. Existing work formulates attribute value verification as a low\-resource task to identify whether attribute values are consistent with item descriptions\(Wanget al\.,[2020c](https://arxiv.org/html/2606.28070#bib.bib25)\)\. In recent years, LLMs have also been used for attribute extraction, normalization, and catalog quality governance\. Representative studies include evaluations of GPT\-series models for item attribute extraction and normalization\(Brinkmannet al\.,[2024](https://arxiv.org/html/2606.28070#bib.bib26)\), CatalogRAG for retrieval\-augmented LLM\-based attribute completion\(Zhanget al\.,[2025](https://arxiv.org/html/2606.28070#bib.bib10)\), combining brand knowledge bases with LLM agents for attribute repair and item matching\(Cekeret al\.,[2026](https://arxiv.org/html/2606.28070#bib.bib27)\), and multimodal self\-correcting instruction tuning for open attribute discovery\(Liet al\.,[2025](https://arxiv.org/html/2606.28070#bib.bib28)\)\.
Overall, attribute extraction and knowledge completion have progressed from open extraction and large\-scale label expansion to multi\-source modeling, multimodal fusion, and automatic validation\. However, most methods devote limited attention to production and management efficiency at massive scale, which limits their feasibility in industrial deployments\.
### 8\.4E\-commerce Domain Foundation Models
As large language models and large multimodal models have advanced, researchers have begun to build e\-commerce domain foundation models to reduce fragmentation across task\-specific models\. LLaMA\-E introduces instruction tuning into e\-commerce scenarios, covering tasks such as ad generation, title rewriting, item classification, purchase intent inference, and question answering\(Shiet al\.,[2025](https://arxiv.org/html/2606.28070#bib.bib2)\)\. eCeLLM further constructs the ECInstruct dataset, covering attribute extraction, item relation prediction, item matching, sentiment analysis, sequential recommendation, query\-item ranking, item question answering, and other tasks, demonstrating the generalization capability of instruction\-tuned LLMs in e\-commerce scenarios\(Penget al\.,[2024](https://arxiv.org/html/2606.28070#bib.bib3)\)\.
At the same time, research has also begun to introduce e\-commerce knowledge during pretraining\. LiLiuM adapts the tokenizer, training data, and multilingual capabilities for eBay’s e\-commerce setting\(Heroldet al\.,[2024](https://arxiv.org/html/2606.28070#bib.bib4)\)\. e\-Llama adapts a general foundation model to the e\-commerce domain through continued pretraining\(Heroldet al\.,[2025](https://arxiv.org/html/2606.28070#bib.bib29)\)\. Compass\-v3 trains an MoE domain model for multilingual e\-commerce scenarios in Southeast Asia, reflecting the development trend toward multilingual, domain\-specific, and scalable deployment\(Maria,[2025](https://arxiv.org/html/2606.28070#bib.bib30)\)\.
For multimodal item understanding, MOON proposes a generative multimodal representation learning framework for e\-commerce to handle multiple images paired with a single text description, background noise, and multi\-task transfer\(Zhanget al\.,[2026a](https://arxiv.org/html/2606.28070#bib.bib31)\)\. MOON2\.0 further improves item multimodal representation learning through dynamic modality balancing, multi\-granularity image\-text alignment, and collaborative image\-text enhancement\(Nieet al\.,[2025](https://arxiv.org/html/2606.28070#bib.bib32)\)\. These works show that e\-commerce domain models are moving from single\-task fine\-tuning toward multi\-task, multilingual, multimodal, and scalable foundation models\.
Although e\-commerce domain foundation models have significantly improved item semantic understanding, cross\-task transfer, and cold\-start generalization, they still focus primarily on model capabilities\. In contrast, industrial\-scale item knowledge infrastructure must also address dynamic ontology evolution, knowledge production costs, result verifiability, hybrid online\-offline serving, and downstream feedback loops\. Therefore, integrating domain foundation models, ontology, and the AI Item Library into a sustainably evolving item knowledge system remains a key challenge for both research and industrial deployment\.
## 9Conclusion
This paper presents the JD Oxygen AI Item Center \(\\productfontOxygen AIIC\), an industrial\-scale, LLM/VLM\-centric infrastructure for item knowledge production and serving in large\-scale e\-commerce scenarios\.\\productfontOxygen AIIC establishes an end\-to\-end system spanning ontology construction, item knowledge production, model evolution, item tunnel, and downstream business applications\.
For ontology engineering,\\productfontOxygen AIIC combines expert domain knowledge with the generalization and reasoning capabilities of large models through efficient human–AI collaboration\. This enables the dynamic discovery, fusion, validation, and continuous expansion of the ontology, forming a high\-quality, comprehensive, and timely updated item knowledge backbone\.
For the AI Item Library,\\productfontOxygen AIIC adopts a “semantic search then discrimination” architecture, decoupling the dynamic ontology from model parameters so that newly added ontology entries can be quickly incorporated into the production pipeline\. With computational load reduction, cache reuse, and asynchronous pipeline parallelism,\\productfontOxygen AIIC achieves high\-throughput and low\-cost item knowledge production at the scale of tens of billions of SKUs\.
For the model system,\\productfontOxygen AIIC builds a multi\-task large model for item understanding and combines incremental learning, instruction\-following representations, and model self\-evolution\. This allows the model to steadily improve in a constantly changing item ecosystem, repair long\-tail defects, and avoid systemic degradation\.
On the engineering side, the item tunnel delivers AI\-generated item knowledge efficiently, securely, and reliably to a wide range of business scenarios, including search, recommendation, product listing, governance, operations, merchant recruitment and assortment review, and identical\-item recognition, through tiered freshness pipelines, eventual consistency guarantees, and a unified service framework\.
\\productfont
Oxygen AIIC currently supports knowledge production across tens of thousands of categories and tens of billions of SKUs at JD, accumulating hundreds of billions of item\-knowledge assets and delivering significant business impact in item management, traffic distribution, and platform operations\. Our practice shows that, for industrial\-scale e\-commerce knowledge construction, relying solely on isolated model capabilities is insufficient for scalable deployment\. Only by systematically coordinating large\-model capabilities, ontology engineering, knowledge production, engineering systems, and business feedback loops can we build a scalable and continuously evolving item knowledge infrastructure\. The development of\\productfontOxygen AIIC validates the industrial feasibility of LLMs/VLMs for large\-scale e\-commerce item understanding and provides a reusable technical path for future AI\-driven item management, intelligent operations, and e\-commerce infrastructure upgrades\.
## 10Limitations and Future Work
Although\\productfontOxygen AIIC V1 has delivered significant results at its current stage of development, ultra\-large\-scale industrial deployment still poses several fundamental challenges\. \(1\) Ontology engineering: the current version has substantially expanded the ontology’s conceptual coverage\. However, relation modeling within the ontology remains preliminary and has not yet reached scale\. This limits\\productfontOxygen AIIC’s potential to empower applications and makes it difficult to store and use JD’s accumulated industry knowledge efficiently\. \(2\) AI Item Library: although the generated knowledge is already of high quality, the scale of tens of billions of items inevitably leads to numerous online failure cases that affect user experience\. Making these defects detectable and correctable in a timely manner therefore remains the central challenge\. \(3\) Item understanding LLMs/VLMs: supervised fine\-tuning can lead to catastrophic forgetting\. The key challenge is to improve domain\-specific capabilities without degrading the model’s general capabilities\.
In future work, we will address these challenges along three directions\. First, we will strengthen ontology engineering by expanding relation modeling, enriching the ontology, and enabling graph\-based reasoning\. We will further incorporate expert e\-commerce knowledge and industry know\-how into\\productfontOxygen AIIC, so that domain experience can be reused efficiently across business scenarios\. Second, we will develop an online mechanism for bad\-case discovery\. Coupled with a data flywheel and model self\-evolution, this mechanism will allow knowledge consumption to feed back into and drive knowledge production\. Finally, we will investigate more effective ways to mitigate catastrophic forgetting during domain adaptation\. Potential directions include scaling the number of experts in MoE architectures, simplifying task complexity through improved problem formulation, and enhancing prompts with explicit knowledge injection\.
## References
- A\. Agrawal, N\. Kedia, A\. Panwar, J\. Mohan, N\. Kwatra, B\. Gulavani, A\. Tumanov, and R\. Ramjee \(2024\)Taming Throughput\-Latency tradeoff in LLM inference with Sarathi\-Serve\.In18th USENIX Symposium on Operating Systems Design and Implementation \(OSDI 24\),Santa Clara, CA,pp\. 117–134\.External Links:ISBN 978\-1\-939133\-40\-3,[Link](https://www.usenix.org/conference/osdi24/presentation/agrawal)Cited by:[1st item](https://arxiv.org/html/2606.28070#S6.I1.i1.p1.1)\.
- T\. Akidau, R\. Bradshaw, C\. Chambers, S\. Chernyak, R\. J\. Fernández\-Moctezuma, R\. Lax, S\. McVeety, D\. Mills, F\. Perry, E\. Schmidt, and S\. Whittle \(2015\)The dataflow model: a practical approach to balancing correctness, latency, and cost in massive\-scale, unbounded, out\-of\-order data processing\.Proceedings of the VLDB Endowment8\(12\),pp\. 1792–1803\.External Links:[Document](https://dx.doi.org/10.14778/2824032.2824076),[Link](https://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf)Cited by:[1st item](https://arxiv.org/html/2606.28070#S6.I1.i1.p1.1),[2nd item](https://arxiv.org/html/2606.28070#S6.I1.i2.p1.1)\.
- Amazon Web Services \(2026\)Global tables: how it works \- Amazon DynamoDB\.Note:[https://docs\.aws\.amazon\.com/amazondynamodb/latest/developerguide/globaltables\_HowItWorks\.html](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/globaltables_HowItWorks.html)Accessed 2026\-06\-10Cited by:[2nd item](https://arxiv.org/html/2606.28070#S6.I1.i2.p1.1)\.
- Apache Hudi \(2026a\)Concurrency control\.Note:[https://hudi\.apache\.org/docs/concurrency\_control/](https://hudi.apache.org/docs/concurrency_control/)Version 1\.2\.0; accessed 2026\-06\-10Cited by:[2nd item](https://arxiv.org/html/2606.28070#S6.I1.i2.p1.1)\.
- Apache Hudi \(2026b\)Record merger\.Note:[https://hudi\.apache\.org/docs/record\_merger/](https://hudi.apache.org/docs/record_merger/)Version 1\.2\.0; accessed 2026\-06\-10Cited by:[2nd item](https://arxiv.org/html/2606.28070#S6.I1.i2.p1.1)\.
- Apache Parquet \(2026\)Overview\.Note:[https://parquet\.apache\.org/docs/overview/](https://parquet.apache.org/docs/overview/)Accessed 2026\-06\-10Cited by:[3rd item](https://arxiv.org/html/2606.28070#S6.I1.i3.p1.1)\.
- H\. Babaei Giglou, J\. D’Souza, and S\. Auer \(2023\)LLMs4OL: large language models for ontology learning\.InInternational semantic web conference,pp\. 408–427\.Cited by:[§3\.2\.1](https://arxiv.org/html/2606.28070#S3.SS2.SSS1.p2.1)\.
- J\. Bai, W\. Fan, Q\. Hu, Q\. Zong, C\. Li, H\. T\. Tsang, H\. Luo, Y\. Yim, H\. Huang, X\. Zhou,et al\.\(2025\)Autoschemakg: autonomous knowledge graph construction through dynamic schema induction from web\-scale corpora\.arXiv preprint arXiv:2505\.23628\.Cited by:[§3\.2\.2](https://arxiv.org/html/2606.28070#S3.SS2.SSS2.p1.1)\.
- M\. Biesialska, K\. Biesialska, and M\. R\. Costa\-Jussa \(2020\)Continual lifelong learning in natural language processing: a survey\.InProceedings of the 28th international conference on computational linguistics,pp\. 6523–6541\.Cited by:[§5\.2](https://arxiv.org/html/2606.28070#S5.SS2.p2.1)\.
- A\. Brinkmann, N\. Baumann, and C\. Bizer \(2024\)Using llms for the extraction and normalization of product attribute values\.InAdvances in Databases and Information Systems,Lecture Notes in Computer Science,pp\. 217–230\.External Links:[Document](https://dx.doi.org/10.1007/978-3-031-70626-4%5F15)Cited by:[§8\.3](https://arxiv.org/html/2606.28070#S8.SS3.p3.1)\.
- T\. Brown, B\. Mann, N\. Ryder, M\. Subbiah, J\. D\. Kaplan, P\. Dhariwal, A\. Neelakantan, P\. Shyam, G\. Sastry, A\. Askell, S\. Agarwal, A\. Herbert\-Voss, G\. Krueger, T\. Henighan, R\. Child, A\. Ramesh, D\. Ziegler, J\. Wu, C\. Winter, C\. Hesse, M\. Chen, E\. Sigler, M\. Litwin, S\. Gray, B\. Chess, J\. Clark, C\. Berner, S\. McCandlish, A\. Radford, I\. Sutskever, and D\. Amodei \(2020\)Language models are few\-shot learners\.InAdvances in Neural Information Processing Systems,H\. Larochelle, M\. Ranzato, R\. Hadsell, M\.F\. Balcan, and H\. Lin \(Eds\.\),Vol\.33,pp\. 1877–1901\.Cited by:[§1](https://arxiv.org/html/2606.28070#S1.p4.1)\.
- P\. Carbone, S\. Ewen, G\. Fóra, S\. Haridi, S\. Richter, and K\. Tzoumas \(2017\)State management in apache Flink: consistent stateful distributed stream processing\.Proceedings of the VLDB Endowment10\(12\),pp\. 1718–1729\.External Links:[Document](https://dx.doi.org/10.14778/3137765.3137777),[Link](https://www.vldb.org/pvldb/vol10/p1718-carbone.pdf)Cited by:[2nd item](https://arxiv.org/html/2606.28070#S6.I1.i2.p1.1)\.
- H\. Ceker, G\. Luo, K\. K\. Koo, P\. Mathur, W\. You, A\. Amdekar, R\. Barton, N\. KL, V\. Bansal, and K\. Bouyarmane \(2026\)Using brand knowledge bases and llm agents to enhance e\-commerce retailers’ catalog quality\.InProceedings of the Nineteenth ACM International Conference on Web Search and Data Mining,pp\. 1343–1344\.External Links:[Document](https://dx.doi.org/10.1145/3773966.3784969)Cited by:[§8\.3](https://arxiv.org/html/2606.28070#S8.SS3.p3.1)\.
- W\. Chen, K\. Shinzato, N\. Yoshinaga, and Y\. Xia \(2023\)Does named entity recognition truly not scale up to real\-world product attribute extraction?\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track,pp\. 152–159\.Cited by:[§4\.1](https://arxiv.org/html/2606.28070#S4.SS1.p1.1)\.
- W\. Chen, Y\. Xia, and K\. Shinzato \(2022\)Extreme multi\-label classification with label masking for product attribute value extraction\.InProceedings of the Fifth Workshop on e\-Commerce and NLP \(ECNLP 5\),pp\. 134–140\.Cited by:[§4\.1](https://arxiv.org/html/2606.28070#S4.SS1.p1.1)\.
- H\. W\. Chung, L\. Hou, S\. Longpre, B\. Zoph, Y\. Tay, W\. Fedus, Y\. Li, X\. Wang, M\. Dehghani, S\. Brahma,et al\.\(2024\)Scaling instruction\-finetuned language models\.Journal of Machine Learning Research25\(70\),pp\. 1–53\.Cited by:[§5\.1](https://arxiv.org/html/2606.28070#S5.SS1.p2.1)\.
- T\. Dao, D\. Fu, S\. Ermon, A\. Rudra, and C\. Ré \(2022\)Flashattention: fast and memory\-efficient exact attention with io\-awareness\.Advances in neural information processing systems35,pp\. 16344–16359\.Cited by:[2nd item](https://arxiv.org/html/2606.28070#S1.I2.i2.p1.1)\.
- G\. DeCandia, D\. Hastorun, M\. Jampani, G\. Kakulapati, A\. Lakshman, A\. Pilchin, S\. Sivasubramanian, P\. Vosshall, and W\. Vogels \(2007\)Dynamo: amazon’s highly available key\-value store\.InProceedings of Twenty\-First ACM SIGOPS Symposium on Operating Systems Principles \(SOSP ’07\),New York, NY, USA,pp\. 205–220\.External Links:[Document](https://dx.doi.org/10.1145/1294261.1294281),[Link](https://doi.org/10.1145/1294261.1294281)Cited by:[2nd item](https://arxiv.org/html/2606.28070#S6.I1.i2.p1.1)\.
- Y\. Deng, K\. Prasad, R\. Fernandez, P\. Smolensky, V\. Chaudhary, and S\. Shieber \(2023\)Implicit chain of thought reasoning via knowledge distillation\.arXiv preprint arXiv:2311\.01460\.Cited by:[§5\.3](https://arxiv.org/html/2606.28070#S5.SS3.p4.1)\.
- X\. L\. Dong, X\. He, A\. Kan, X\. Li, Y\. Liang, J\. Ma, Y\. E\. Xu, C\. Zhang, T\. Zhao, G\. Blanco Saldana,et al\.\(2020\)Autoknow: self\-driving knowledge collection for products of thousands of types\.InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,pp\. 2724–2734\.Cited by:[§3](https://arxiv.org/html/2606.28070#S3.p1.1)\.
- D\. Edge, H\. Trinh, N\. Cheng, J\. Bradley, A\. Chao, A\. Mody, S\. Truitt, D\. Metropolitansky, R\. O\. Ness, and J\. Larson \(2024\)From local to global: a graph rag approach to query\-focused summarization\.arXiv preprint arXiv:2404\.16130\.Cited by:[§3\.2\.2](https://arxiv.org/html/2606.28070#S3.SS2.SSS2.p1.1)\.
- B\. Er\-Rahmadi, A\. Oncevay, Y\. Ji, and J\. Z\. Pan \(2023\)KATIE: a system for key attributes identification in product knowledge graph construction\.InProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval,pp\. 3320–3324\.Cited by:[§1](https://arxiv.org/html/2606.28070#S1.p5.1),[§8\.2](https://arxiv.org/html/2606.28070#S8.SS2.p2.1)\.
- N\. Fathallah, A\. Das, S\. D\. Giorgis, A\. Poltronieri, P\. Haase, and L\. Kovriguina \(2024a\)Neon\-gpt: a large language model\-powered pipeline for ontology learning\.InEuropean Semantic Web Conference,pp\. 36–50\.Cited by:[§3\.2\.1](https://arxiv.org/html/2606.28070#S3.SS2.SSS1.p2.1)\.
- N\. Fathallah, S\. Staab, and A\. Algergawy \(2024b\)Llms4life: large language models for ontology learning in life sciences\.arXiv preprint arXiv:2412\.02035\.Cited by:[§3\.2\.1](https://arxiv.org/html/2606.28070#S3.SS2.SSS1.p2.1)\.
- T\. Gao, J\. Fang, X\. Zhang, Z\. Liu, C\. Liu, P\. Liu, and Q\. Jiang \(2026\)InstEmb: instruction\-following embeddings through glimpses of the future\.InForty\-third International Conference on Machine Learning,External Links:[Link](https://openreview.net/forum?id=fwWvqAOXMT)Cited by:[§5\.3](https://arxiv.org/html/2606.28070#S5.SS3.SSS0.Px1.p1.1)\.
- H\. Gui, J\. Zhang, H\. Ye, and N\. Zhang \(2023\)Instructie: a chinese instructionbased information extraction dataset\.arXiv preprint arXiv:2305\.1152710\.Cited by:[§3\.2\.2](https://arxiv.org/html/2606.28070#S3.SS2.SSS2.Px1.p2.4)\.
- J\. Guo, Y\. Li, Z\. Huang, J\. Fang, Z\. Liu, C\. Liu, P\. Liu, and Q\. Jiang \(2026\)Spectral disentanglement and enhancement: a dual\-domain contrastive framework for representation learning\.InProceedings of the ACM Web Conference 2026,pp\. 4127–4136\.Cited by:[§5\.3](https://arxiv.org/html/2606.28070#S5.SS3.p4.1)\.
- S\. Gururangan, A\. Marasović, S\. Swayamdipta, K\. Lo, I\. Beltagy, D\. Downey, and N\. A\. Smith \(2020\)Don’t stop pretraining: adapt language models to domains and tasks\.InProceedings of the 58th annual meeting of the association for computational linguistics,pp\. 8342–8360\.Cited by:[1st item](https://arxiv.org/html/2606.28070#S5.I1.i1.p1.1)\.
- R\. Han, C\. Yang, T\. Peng, P\. Tiwari, X\. Wan, L\. Liu, and B\. Wang \(2023\)An empirical study on information extraction using large language models\.arXiv preprint arXiv:2305\.14450\.Cited by:[§3\.2\.2](https://arxiv.org/html/2606.28070#S3.SS2.SSS2.Px1.p2.4)\.
- C\. Herold, M\. Kozielski, T\. Bazazo, P\. Petrushkov, Y\. Versley, S\. H\. Hashemi, P\. Cieplicka, D\. Basaj, and S\. Khadivi \(2025\)Domain adaptation of foundation llms for e\-commerce\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 6: Industry Track\),pp\. 1039–1049\.External Links:[Document](https://dx.doi.org/10.18653/v1/2025.acl-industry.74)Cited by:[§8\.4](https://arxiv.org/html/2606.28070#S8.SS4.p2.1)\.
- C\. Herold, M\. Kozielski, L\. Ekimov, P\. Petrushkov, P\. Vandenbussche, and S\. Khadivi \(2024\)LiLiuM: ebay’s large language models for e\-commerce\.arXiv preprint arXiv:2406\.12023\.Cited by:[§1](https://arxiv.org/html/2606.28070#S1.p5.1),[§8\.4](https://arxiv.org/html/2606.28070#S8.SS4.p2.1)\.
- G\. Hinton, O\. Vinyals, and J\. Dean \(2015\)Distilling the knowledge in a neural network\.arXiv preprint arXiv:1503\.02531\.Cited by:[§5\.3](https://arxiv.org/html/2606.28070#S5.SS3.SSS0.Px1.p2.3)\.
- N\. Houlsby, A\. Giurgiu, S\. Jastrzebski, B\. Morrone, Q\. De Laroussilhe, A\. Gesmundo, M\. Attariyan, and S\. Gelly \(2019\)Parameter\-efficient transfer learning for nlp\.InInternational conference on machine learning,pp\. 2790–2799\.Cited by:[§5\.2](https://arxiv.org/html/2606.28070#S5.SS2.p2.1)\.
- E\. J\. Hu, yelong shen, P\. Wallis, Z\. Allen\-Zhu, Y\. Li, S\. Wang, L\. Wang, and W\. Chen \(2022\)LoRA: low\-rank adaptation of large language models\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=nZeVKeeFYf9)Cited by:[§5\.2](https://arxiv.org/html/2606.28070#S5.SS2.SSS0.Px1.p2.5)\.
- H\. Huang, X\. Bu, H\. Zhou, Y\. Qu, J\. Liu, M\. Yang, B\. Xu, and T\. Zhao \(2025a\)An empirical study of llm\-as\-a\-judge for llm evaluation: fine\-tuned judge model is not a general substitute for gpt\-4\.InFindings of the Association for Computational Linguistics: ACL 2025,pp\. 5880–5895\.Cited by:[§3\.2\.2](https://arxiv.org/html/2606.28070#S3.SS2.SSS2.Px3.p1.1)\.
- Y\. Huang, K\. Ramo, A\. Iovine, M\. Monteiro, S\. Gokalp, A\. Bakshi, H\. Turalic, A\. Kumar, J\. Neumeier, R\. Yates,et al\.\(2025b\)AttributeForge: an agentic llm framework for automated product schema modeling\.InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track,pp\. 2106–2121\.Cited by:[§3](https://arxiv.org/html/2606.28070#S3.p1.1)\.
- A\. Khandelwal, H\. Mittal, S\. Kulkarni, and D\. Gupta \(2023\)Large scale generative multimodal attribute extraction for e\-commerce attributes\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(Volume 5: Industry Track\),pp\. 305–312\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.acl-industry.29)Cited by:[§8\.3](https://arxiv.org/html/2606.28070#S8.SS3.p2.1)\.
- J\. Kirkpatrick, R\. Pascanu, N\. Rabinowitz, J\. Veness, G\. Desjardins, A\. A\. Rusu, K\. Milan, J\. Quan, T\. Ramalho, A\. Grabska\-Barwinska,et al\.\(2017\)Overcoming catastrophic forgetting in neural networks\.Proceedings of the national academy of sciences114\(13\),pp\. 3521–3526\.Cited by:[2nd item](https://arxiv.org/html/2606.28070#S5.I1.i2.p1.1)\.
- J\. Kreps, N\. Narkhede, and J\. Rao \(2011\)Kafka: a distributed messaging system for log processing\.InProceedings of the NetDB 2011 Workshop,pp\. 1–7\.Cited by:[3rd item](https://arxiv.org/html/2606.28070#S6.I1.i3.p1.1)\.
- W\. Kwon, Z\. Li, S\. Zhuang, Y\. Sheng, L\. Zheng, C\. H\. Yu, J\. E\. Gonzalez, H\. Zhang, and I\. Stoica \(2023a\)Efficient memory management for large language model serving with PagedAttention\.InProceedings of the 29th ACM Symposium on Operating Systems Principles \(SOSP ’23\),New York, NY, USA,pp\. 611–626\.External Links:[Document](https://dx.doi.org/10.1145/3600006.3613165),[Link](https://doi.org/10.1145/3600006.3613165)Cited by:[1st item](https://arxiv.org/html/2606.28070#S6.I1.i1.p1.1),[3rd item](https://arxiv.org/html/2606.28070#S6.I1.i3.p1.1)\.
- W\. Kwon, Z\. Li, S\. Zhuang, Y\. Sheng, L\. Zheng, C\. H\. Yu, J\. Gonzalez, H\. Zhang, and I\. Stoica \(2023b\)Efficient memory management for large language model serving with pagedattention\.InProceedings of the 29th symposium on operating systems principles,pp\. 611–626\.Cited by:[2nd item](https://arxiv.org/html/2606.28070#S1.I2.i2.p1.1),[3rd item](https://arxiv.org/html/2606.28070#S4.I5.i3.p1.1)\.
- J\. Li, J\. Fang, T\. Gao, X\. Zhang, Z\. Liu, C\. Liu, P\. Liu, and Q\. Jiang \(2026\)FANoise: singular value\-adaptive noise modulation for robust multimodal representation learning\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.40,pp\. 6199–6207\.Cited by:[§5\.3](https://arxiv.org/html/2606.28070#S5.SS3.p4.1)\.
- J\. Li, Y\. Li, X\. Shen, C\. Zhang, G\. Qi, and S\. Bi \(2025\)Open\-world attribute mining for e\-commerce products with multimodal self\-correction instruction tuning\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 1702–1714\.External Links:[Document](https://dx.doi.org/10.18653/v1/2025.acl-long.85)Cited by:[§8\.3](https://arxiv.org/html/2606.28070#S8.SS3.p3.1)\.
- Y\. Liao, W\. Lai, J\. Fang, J\. Guo, X\. Zhang, Z\. Liu, C\. Liu, P\. Liu, and Q\. Jiang \(2026\)GROLE: instance\-level group relative optimization for LoRA experts in incremental learning\.InFindings of the Association for Computational Linguistics: ACL 2026,pp\. 39170–39182\.Cited by:[§5\.2](https://arxiv.org/html/2606.28070#S5.SS2.SSS0.Px2.p2.5)\.
- R\. Lin, X\. He, J\. Feng, N\. Zalmout, Y\. Liang, L\. Xiong, and X\. L\. Dong \(2021\)PAM: understanding product images in cross product category attribute extraction\.InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining,pp\. 3262–3270\.External Links:[Document](https://dx.doi.org/10.1145/3447548.3467164)Cited by:[§8\.3](https://arxiv.org/html/2606.28070#S8.SS3.p2.1)\.
- A\. S\. Lippolis, M\. Ceriani, S\. Zuppiroli, and A\. G\. Nuzzolese \(2024\)Ontogenia: ontology generation with metacognitive prompting in large language models\.InEuropean Semantic Web Conference,pp\. 259–265\.Cited by:[§3\.2\.1](https://arxiv.org/html/2606.28070#S3.SS2.SSS1.p2.1)\.
- A\. S\. Lippolis, M\. J\. Saeedizade, R\. Keskisärkkä, S\. Zuppiroli, M\. Ceriani, A\. Gangemi, E\. Blomqvist, and A\. G\. Nuzzolese \(2025\)Ontology generation using large language models\.InEuropean Semantic Web Conference,pp\. 321–341\.Cited by:[§3\.2\.1](https://arxiv.org/html/2606.28070#S3.SS2.SSS1.p2.1)\.
- H\. Liu, C\. Li, Q\. Wu, and Y\. J\. Lee \(2023\)Visual instruction tuning\.Advances in neural information processing systems36,pp\. 34892–34916\.Cited by:[§1](https://arxiv.org/html/2606.28070#S1.p4.1)\.
- Y\. Lu, Q\. Liu, D\. Dai, X\. Xiao, H\. Lin, X\. Han, L\. Sun, and H\. Wu \(2022\)Unified structure generation for universal information extraction\.InProceedings of the 60th annual meeting of the association for computational linguistics \(volume 1: long papers\),pp\. 5755–5772\.Cited by:[§3\.2\.2](https://arxiv.org/html/2606.28070#S3.SS2.SSS2.Px1.p2.4)\.
- X\. Luo, L\. Bo, J\. Wu, L\. Li, Z\. Luo, Y\. Yang, and K\. Yang \(2021\)Alicoco2: commonsense knowledge extraction, representation and application in e\-commerce\.InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining,pp\. 3385–3393\.Cited by:[§1](https://arxiv.org/html/2606.28070#S1.p5.1),[§3](https://arxiv.org/html/2606.28070#S3.p1.1),[§8\.1](https://arxiv.org/html/2606.28070#S8.SS1.p1.1)\.
- X\. Luo, L\. Liu, Y\. Yang, L\. Bo, Y\. Cao, J\. Wu, Q\. Li, K\. Yang, and K\. Q\. Zhu \(2020\)Alicoco: alibaba e\-commerce cognitive concept net\.InProceedings of the 2020 ACM SIGMOD international conference on management of data,pp\. 313–327\.Cited by:[§1](https://arxiv.org/html/2606.28070#S1.p3.1),[§1](https://arxiv.org/html/2606.28070#S1.p5.1),[§3](https://arxiv.org/html/2606.28070#S3.p1.1),[§8\.1](https://arxiv.org/html/2606.28070#S8.SS1.p1.1)\.
- A\. Madaan, N\. Tandon, P\. Gupta, S\. Hallinan, L\. Gao, S\. Wiegreffe, U\. Alon, N\. Dziri, S\. Prabhumoye, Y\. Yang,et al\.\(2023\)Self\-refine: iterative refinement with self\-feedback\.Advances in neural information processing systems36,pp\. 46534–46594\.Cited by:[§5\.4](https://arxiv.org/html/2606.28070#S5.SS4.p2.1)\.
- Y\. Mao, T\. Zhao, A\. Kan, C\. Zhang, X\. L\. Dong, C\. Faloutsos, and J\. Han \(2020\)Octet: online catalog taxonomy enrichment with self\-supervision\.InProceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining,pp\. 2247–2257\.Cited by:[§1](https://arxiv.org/html/2606.28070#S1.p5.1),[§8\.2](https://arxiv.org/html/2606.28070#S8.SS2.p2.1)\.
- S\. Maria \(2025\)Compass\-v3: scaling domain\-specific llms for multilingual e\-commerce in southeast asia\.arXiv preprint arXiv:2509\.09121\.Cited by:[§8\.4](https://arxiv.org/html/2606.28070#S8.SS4.p2.1)\.
- N\. Marz and J\. Warren \(2015\)Big data: principles and best practices of scalable realtime data systems\.Manning Publications,Shelter Island, NY\.External Links:ISBN 9781617290343Cited by:[2nd item](https://arxiv.org/html/2606.28070#S6.I1.i2.p1.1)\.
- P\. Mateiu and A\. Groza \(2023\)Ontology engineering with large language models\.In2023 25th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing \(SYNASC\),pp\. 226–229\.Cited by:[§3\.2\.1](https://arxiv.org/html/2606.28070#S3.SS2.SSS1.p2.1)\.
- M\. McCloskey and N\. J\. Cohen \(1989\)Catastrophic interference in connectionist networks: the sequential learning problem\.InPsychology of learning and motivation,Vol\.24,pp\. 109–165\.Cited by:[2nd item](https://arxiv.org/html/2606.28070#S5.I1.i2.p1.1)\.
- F\. Murtagh and P\. Legendre \(2014\)Ward’s hierarchical agglomerative clustering method: which algorithms implement ward’s criterion?\.Journal of classification31\(3\),pp\. 274–295\.Cited by:[2nd item](https://arxiv.org/html/2606.28070#S3.I3.i2.p1.1)\.
- Z\. Nie, C\. Fu, D\. Zhang, J\. Wu, W\. Guan, P\. Wang, J\. Xu, and B\. Zheng \(2025\)MOON2\.0: dynamic modality\-balanced multimodal representation learning for e\-commerce product understanding\.arXiv preprint arXiv:2511\.12449\.Cited by:[§8\.4](https://arxiv.org/html/2606.28070#S8.SS4.p3.1)\.
- P\. Nigam, Y\. Song, V\. Mohan, V\. Lakshman, W\. Ding, A\. Shingavi, C\. H\. Teo, H\. Gu, and B\. Yin \(2019\)Semantic product search\.InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,pp\. 2876–2885\.Cited by:[1st item](https://arxiv.org/html/2606.28070#S1.I1.i1.p1.1)\.
- A\. N\. Nikolakopoulos, S\. Kaul, S\. K\. Gade, B\. Dubrov, U\. Batur, and S\. A\. Khan \(2023\)Sage: structured attribute value generation for billion\-scale product catalogs\.arXiv preprint arXiv:2309\.05920\.Cited by:[§4\.1](https://arxiv.org/html/2606.28070#S4.SS1.p1.1)\.
- A\. v\. d\. Oord, Y\. Li, and O\. Vinyals \(2018\)Representation learning with contrastive predictive coding\.arXiv preprint arXiv:1807\.03748\.Cited by:[§3\.2\.2](https://arxiv.org/html/2606.28070#S3.SS2.SSS2.Px2.p5.3)\.
- L\. Ouyang, J\. Wu, X\. Jiang, D\. Almeida, C\. Wainwright, P\. Mishkin, C\. Zhang, S\. Agarwal, K\. Slama, A\. Ray,et al\.\(2022\)Training language models to follow instructions with human feedback\.Advances in neural information processing systems35,pp\. 27730–27744\.Cited by:[§1](https://arxiv.org/html/2606.28070#S1.p4.1)\.
- B\. Peng, X\. Ling, Z\. Chen, H\. Sun, and X\. Ning \(2024\)ECeLLM: generalizing large language models for e\-commerce from large\-scale, high\-quality instruction data\.InForty\-first International Conference on Machine Learning,External Links:[Link](https://openreview.net/forum?id=LWRI4uPG2X)Cited by:[§1](https://arxiv.org/html/2606.28070#S1.p5.1),[§8\.4](https://arxiv.org/html/2606.28070#S8.SS4.p1.1)\.
- A\. Radford, J\. W\. Kim, C\. Hallacy, A\. Ramesh, G\. Goh, S\. Agarwal, G\. Sastry, A\. Askell, P\. Mishkin, J\. Clark,et al\.\(2021\)Learning transferable visual models from natural language supervision\.InInternational conference on machine learning,pp\. 8748–8763\.Cited by:[§1](https://arxiv.org/html/2606.28070#S1.p4.1)\.
- C\. Raffel, N\. Shazeer, A\. Roberts, K\. Lee, S\. Narang, M\. Matena, Y\. Zhou, W\. Li, and P\. J\. Liu \(2020\)Exploring the limits of transfer learning with a unified text\-to\-text transformer\.Journal of machine learning research21\(140\),pp\. 1–67\.Cited by:[§5\.1](https://arxiv.org/html/2606.28070#S5.SS1.p1.2)\.
- S\. Ruder \(2017\)An overview of multi\-task learning in deep neural networks\.arXiv preprint arXiv:1706\.05098\.Cited by:[§5\.1](https://arxiv.org/html/2606.28070#S5.SS1.p1.2)\.
- K\. Sabeh, R\. Litschko, M\. Kacimi, B\. Plank, and J\. Gamper \(2024\)An empirical comparison of generative approaches for product attribute\-value identification\.arXiv preprint arXiv:2407\.01137\.Cited by:[§4\.1](https://arxiv.org/html/2606.28070#S4.SS1.p1.1)\.
- M\. J\. Saeedizade and E\. Blomqvist \(2024\)Navigating ontology development with large language models\.InEuropean semantic web conference,pp\. 143–161\.Cited by:[§3\.2\.1](https://arxiv.org/html/2606.28070#S3.SS2.SSS1.p2.1)\.
- P\. Schoenegger, I\. Tuminauskaite, P\. S\. Park, R\. V\. S\. Bastos, and P\. E\. Tetlock \(2024\)Wisdom of the silicon crowd: llm ensemble prediction capabilities rival human crowd accuracy\.Science Advances10\(45\),pp\. eadp1528\.Cited by:[§3\.2\.2](https://arxiv.org/html/2606.28070#S3.SS2.SSS2.Px3.p5.2)\.
- Z\. Shao, P\. Wang, Q\. Zhu, R\. Xu, J\. Song, X\. Bi, H\. Zhang, M\. Zhang, Y\. Li,et al\.\(2024\)Deepseekmath: pushing the limits of mathematical reasoning in open language models\.arXiv preprint arXiv:2402\.03300\.Cited by:[§5\.2](https://arxiv.org/html/2606.28070#S5.SS2.SSS0.Px2.p3.3)\.
- J\. Shen, Z\. Shen, C\. Xiong, C\. Wang, K\. Wang, and J\. Han \(2020\)TaxoExpan: self\-supervised taxonomy expansion with position\-enhanced graph neural network\.InProceedings of the web conference 2020,pp\. 486–497\.Cited by:[§1](https://arxiv.org/html/2606.28070#S1.p5.1),[§8\.2](https://arxiv.org/html/2606.28070#S8.SS2.p1.1)\.
- J\. Shen, Z\. Wu, D\. Lei, C\. Zhang, X\. Ren, M\. T\. Vanni, B\. M\. Sadler, and J\. Han \(2018\)HiExpan: task\-guided taxonomy construction by hierarchical tree expansion\.InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,pp\. 2180–2189\.External Links:[Document](https://dx.doi.org/10.1145/3219819.3220115)Cited by:[§8\.2](https://arxiv.org/html/2606.28070#S8.SS2.p1.1)\.
- K\. Shi, X\. Sun, D\. Wang, Y\. Fu, G\. Xu, and Q\. Li \(2025\)LLaMA\-e: empowering e\-commerce authoring with object\-interleaved instruction following\.InProceedings of the 31st International Conference on Computational Linguistics,pp\. 870–885\.Cited by:[§1](https://arxiv.org/html/2606.28070#S1.p5.1),[§8\.4](https://arxiv.org/html/2606.28070#S8.SS4.p1.1)\.
- K\. Shinzato, N\. Yoshinaga, Y\. Xia, and W\. Chen \(2022\)Simple and effective knowledge\-driven query expansion for qa\-based product attribute extraction\.InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics,pp\. 227–234\.Cited by:[§4\.1](https://arxiv.org/html/2606.28070#S4.SS1.p1.1)\.
- K\. Shinzato, N\. Yoshinaga, Y\. Xia, and W\. Chen \(2023\)A unified generative approach to product attribute\-value identification\.InFindings of the Association for Computational Linguistics: ACL 2023,pp\. 6599–6612\.Cited by:[§4\.1](https://arxiv.org/html/2606.28070#S4.SS1.p1.1)\.
- Y\. Su, H\. Zou, L\. Sun, T\. Zhang, H\. Yang, C\. L\. Yu, D\. Lo, Q\. Zhang, S\. Han, and J\. Chen \(2025\)Taclr: a scalable and efficient retrieval\-based method for industrial product attribute value identification\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 31526–31538\.Cited by:[§4\.1](https://arxiv.org/html/2606.28070#S4.SS1.p1.1)\.
- J\. Sun, S\. Qian, Z\. Han, W\. Li, Z\. Qian, D\. Yang, J\. Cao, and G\. Xue \(2025\)LKD\-kgc: domain\-specific kg construction via llm\-driven knowledge dependency parsing\.arXiv preprint arXiv:2505\.24163\.Cited by:[§3\.2\.1](https://arxiv.org/html/2606.28070#S3.SS2.SSS1.p2.1)\.
- Y\. Tiwari, O\. A\. Lone, and M\. Pal \(2025\)OntoRAG: enhancing question\-answering through automated ontology derivation from unstructured knowledge bases\.arXiv preprint arXiv:2506\.00664\.Cited by:[§3\.2\.2](https://arxiv.org/html/2606.28070#S3.SS2.SSS2.p1.1)\.
- W\. Vogels \(2009\)Eventually consistent\.Communications of the ACM52\(1\),pp\. 40–44\.External Links:[Document](https://dx.doi.org/10.1145/1435417.1435432),[Link](https://doi.org/10.1145/1435417.1435432)Cited by:[2nd item](https://arxiv.org/html/2606.28070#S6.I1.i2.p1.1)\.
- J\. Wang, D\. Xiang, J\. Xu, M\. Yi, G\. Gong, Z\. Zhang, H\. Li, P\. Liu, Z\. Chen, K\. Zhang,et al\.\(2026a\)TANDEM: bi\-level data mixture optimization with twin networks\.Advances in Neural Information Processing Systems38,pp\. 144720–144752\.Cited by:[§5\.4](https://arxiv.org/html/2606.28070#S5.SS4.SSS0.Px4.p2.3)\.
- J\. Wang, D\. Xiang, J\. Xu, Z\. Z\. Zirui Liu, G\. Gong, J\. Fang, C\. Liu, P\. Liu, T\. Liu, K\. Zhang, and Q\. Jiang \(2026b\)BLADE: scalable bi\-level adaptive data selection for llm training\.arXiv preprint arXiv:2606\.18650\.Cited by:[§5\.4](https://arxiv.org/html/2606.28070#S5.SS4.SSS0.Px4.p2.3)\.
- M\. Wang, Y\. Guo, D\. Zhang, J\. Jin, M\. Li, D\. Schonfeld, and S\. Zhou \(2024\)Enabling explainable recommendation in e\-commerce with llm\-powered product knowledge graph\.arXiv preprint arXiv:2412\.01837\.Cited by:[§8\.1](https://arxiv.org/html/2606.28070#S8.SS1.p2.1)\.
- Q\. Wang, L\. Yang, B\. Kanagal, S\. Sanghai, D\. Sivakumar, B\. Shu, Z\. Yu, and J\. Elsas \(2020a\)Learning to extract attribute value from product via question answering: a multi\-task approach\.InProceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining,pp\. 47–55\.Cited by:[§1](https://arxiv.org/html/2606.28070#S1.p5.1),[§8\.3](https://arxiv.org/html/2606.28070#S8.SS3.p1.1)\.
- Q\. Wang, L\. Yang, B\. Kanagal, S\. Sanghai, D\. Sivakumar, B\. Shu, Z\. Yu, and J\. Elsas \(2020b\)Learning to extract attribute value from product via question answering: a multi\-task approach\.InProceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining,pp\. 47–55\.Cited by:[§4\.1](https://arxiv.org/html/2606.28070#S4.SS1.p1.1)\.
- Q\. Wang, L\. Yang, J\. Wang, J\. Krishnan, B\. Dai, S\. Wang, Z\. Xu, M\. Khabsa, and H\. Ma \(2022\)SMARTAVE: structured multimodal transformer for product attribute value extraction\.InFindings of the Association for Computational Linguistics: EMNLP 2022,pp\. 263–276\.External Links:[Document](https://dx.doi.org/10.18653/v1/2022.findings-emnlp.20)Cited by:[§8\.3](https://arxiv.org/html/2606.28070#S8.SS3.p2.1)\.
- X\. Wang, W\. Zhou, C\. Zu, H\. Xia, T\. Chen, Y\. Zhang, R\. Zheng, J\. Ye, Q\. Zhang, T\. Gui,et al\.\(2023\)Instructuie: multi\-task instruction tuning for unified information extraction\.arXiv preprint arXiv:2304\.08085\.Cited by:[§3\.2\.2](https://arxiv.org/html/2606.28070#S3.SS2.SSS2.Px1.p2.4)\.
- Y\. Wang, Y\. E\. Xu, X\. Li, X\. L\. Dong, and J\. Gao \(2020c\)Automatic validation of textual attribute values in e\-commerce catalog by learning with limited labeled data\.InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,pp\. 2533–2541\.External Links:[Document](https://dx.doi.org/10.1145/3394486.3403303)Cited by:[§8\.3](https://arxiv.org/html/2606.28070#S8.SS3.p3.1)\.
- K\. Wataoka, T\. Takahashi, and R\. Ri \(2024\)Self\-preference bias in llm\-as\-a\-judge\.arXiv preprint arXiv:2410\.21819\.Cited by:[§3\.2\.2](https://arxiv.org/html/2606.28070#S3.SS2.SSS2.Px3.p1.1)\.
- J\. Wei, M\. Bosma, V\. Zhao, K\. Guu, A\. W\. Yu, B\. Lester, N\. Du, A\. M\. Dai, and Q\. V\. Le \(2022a\)Finetuned language models are zero\-shot learners\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=gEZrGCozdqR)Cited by:[§5\.1](https://arxiv.org/html/2606.28070#S5.SS1.p2.1)\.
- J\. Wei, X\. Wang, D\. Schuurmans, M\. Bosma, F\. Xia, E\. Chi, Q\. V\. Le, D\. Zhou,et al\.\(2022b\)Chain\-of\-thought prompting elicits reasoning in large language models\.Advances in neural information processing systems35,pp\. 24824–24837\.Cited by:[§5\.3](https://arxiv.org/html/2606.28070#S5.SS3.SSS0.Px1.p1.1),[§5\.3](https://arxiv.org/html/2606.28070#S5.SS3.p3.1)\.
- S\. M\. Xie, H\. Pham, X\. Dong, N\. Du, H\. Liu, Y\. Lu, P\. S\. Liang, Q\. V\. Le, T\. Ma, and A\. W\. Yu \(2023\)Doremi: optimizing data mixtures speeds up language model pretraining\.Advances in Neural Information Processing Systems36,pp\. 69798–69818\.Cited by:[§5\.4](https://arxiv.org/html/2606.28070#S5.SS4.SSS0.Px4.p2.3)\.
- D\. Xu, C\. Ruan, E\. Korpeoglu, S\. Kumar, and K\. Achan \(2020\)Product knowledge graph embedding for e\-commerce\.InProceedings of the 13th International Conference on Web Search and Data Mining,pp\. 672–680\.External Links:[Document](https://dx.doi.org/10.1145/3336191.3371778)Cited by:[§8\.1](https://arxiv.org/html/2606.28070#S8.SS1.p2.1)\.
- H\. Xu, W\. Wang, X\. Mao, X\. Jiang, and M\. Lan \(2019\)Scaling up open tagging from tens to thousands: comprehension empowered attribute value extraction from product title\.InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics,pp\. 5214–5223\.External Links:[Document](https://dx.doi.org/10.18653/v1/P19-1514)Cited by:[§4\.1](https://arxiv.org/html/2606.28070#S4.SS1.p1.1),[§8\.3](https://arxiv.org/html/2606.28070#S8.SS3.p1.1)\.
- J\. Yan, N\. Zalmout, Y\. Liang, C\. Grant, X\. Ren, and X\. L\. Dong \(2021\)Adatag: multi\-attribute value extraction from product profiles with adaptive decoding\.InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing \(Volume 1: Long Papers\),pp\. 4694–4705\.Cited by:[§4\.1](https://arxiv.org/html/2606.28070#S4.SS1.p1.1)\.
- L\. Yang, Q\. Wang, J\. Wang, X\. Quan, F\. Feng, Y\. Chen, M\. Khabsa, S\. Wang, Z\. Xu, and D\. Liu \(2023\)MixPAVE: mix\-prompt tuning for few\-shot product attribute value extraction\.InFindings of the association for computational linguistics: ACL 2023,pp\. 9978–9991\.Cited by:[§4\.1](https://arxiv.org/html/2606.28070#S4.SS1.p1.1)\.
- L\. Yang, Q\. Wang, Z\. Yu, A\. Kulkarni, S\. Sanghai, B\. Shu, J\. Elsas, and B\. Kanagal \(2022\)MAVE: a product dataset for multi\-source attribute value extraction\.InProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining,pp\. 1256–1265\.External Links:[Document](https://dx.doi.org/10.1145/3488560.3498377)Cited by:[§4\.1](https://arxiv.org/html/2606.28070#S4.SS1.p1.1),[§8\.3](https://arxiv.org/html/2606.28070#S8.SS3.p2.1)\.
- H\. Ye, H\. Gui, X\. Xu, X\. Chen, H\. Chen, and N\. Zhang \(2023\)Schema\-adaptable knowledge graph construction\.InFindings of the Association for Computational Linguistics: EMNLP 2023,pp\. 6408–6431\.Cited by:[§3\.2\.2](https://arxiv.org/html/2606.28070#S3.SS2.SSS2.p1.1)\.
- C\. Yu, X\. Liu, J\. Maia, Y\. Li, T\. Cao, Y\. Gao, Y\. Song, R\. Goutam, H\. Zhang, B\. Yin, and Z\. Li \(2024\)COSMO: a large\-scale e\-commerce common sense knowledge generation and serving system at amazon\.InCompanion of the 2024 International Conference on Management of Data,pp\. 148–160\.Cited by:[§3](https://arxiv.org/html/2606.28070#S3.p1.1),[§8\.1](https://arxiv.org/html/2606.28070#S8.SS1.p2.1)\.
- C\. Yu, W\. Wang, X\. Liu, J\. Bai, Y\. Song, Z\. Li, Y\. Gao, T\. Cao, and B\. Yin \(2023\)FolkScope: intention knowledge graph construction for e\-commerce commonsense discovery\.InFindings of the Association for Computational Linguistics: ACL 2023,pp\. 1173–1191\.Cited by:[§3](https://arxiv.org/html/2606.28070#S3.p1.1),[§8\.1](https://arxiv.org/html/2606.28070#S8.SS1.p2.1)\.
- Y\. Yu, Y\. Li, J\. Shen, H\. Feng, J\. Sun, and C\. Zhang \(2020\)STEAM: self\-supervised taxonomy expansion with mini\-paths\.InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,pp\. 1026–1035\.External Links:[Document](https://dx.doi.org/10.1145/3394486.3403145)Cited by:[§8\.2](https://arxiv.org/html/2606.28070#S8.SS2.p1.1)\.
- M\. Zaharia, M\. Chowdhury, T\. Das, A\. Dave, J\. Ma, M\. McCauley, M\. J\. Franklin, S\. Shenker, and I\. Stoica \(2012\)Resilient distributed datasets: a Fault\-Tolerant abstraction for In\-Memory cluster computing\.In9th USENIX Symposium on Networked Systems Design and Implementation \(NSDI 12\),San Jose, CA,pp\. 15–28\.External Links:ISBN 978\-931971\-92\-8,[Link](https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/zaharia)Cited by:[2nd item](https://arxiv.org/html/2606.28070#S6.I1.i2.p1.1)\.
- N\. Zalmout, C\. Zhang, X\. Li, Y\. Liang, and X\. L\. Dong \(2021\)All you need to know to build a product knowledge graph\.InProceedings of the 27th ACM SIGKDD Conference on knowledge discovery & data mining,pp\. 4090–4091\.Cited by:[§1](https://arxiv.org/html/2606.28070#S1.p5.1),[§8\.1](https://arxiv.org/html/2606.28070#S8.SS1.p2.1)\.
- E\. Zelikman, Y\. Wu, J\. Mu, and N\. Goodman \(2022\)Star: bootstrapping reasoning with reasoning\.Advances in Neural Information Processing Systems35,pp\. 15476–15488\.Cited by:[§5\.4](https://arxiv.org/html/2606.28070#S5.SS4.p2.1)\.
- B\. Zhang and H\. Soh \(2024\)Extract, define, canonicalize: an llm\-based framework for knowledge graph construction\.InProceedings of the 2024 conference on empirical methods in natural language processing,pp\. 9820–9836\.Cited by:[§3\.2\.2](https://arxiv.org/html/2606.28070#S3.SS2.SSS2.p1.1)\.
- B\. Zhang, S\. Khan, and S\. Walter \(2025\)CatalogRAG: retrieval\-guided llm prediction for multilingual e\-commerce product attributes\.Cited by:[§1](https://arxiv.org/html/2606.28070#S1.p5.1),[§8\.3](https://arxiv.org/html/2606.28070#S8.SS3.p3.1)\.
- C\. Zhang, F\. Tao, X\. Chen, J\. Shen, M\. Jiang, B\. Sadler, M\. Vanni, and J\. Han \(2018\)TaxoGen: unsupervised topic taxonomy construction by adaptive term embedding and clustering\.InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,pp\. 2701–2709\.External Links:[Document](https://dx.doi.org/10.1145/3219819.3220064)Cited by:[§8\.2](https://arxiv.org/html/2606.28070#S8.SS2.p1.1)\.
- D\. Zhang, C\. Fu, Z\. Nie, J\. Liu, W\. Guan, Y\. Gao, J\. Song, P\. Wang, J\. Xu, and B\. Zheng \(2026a\)MOON: generative mllm\-based multimodal representation learning for e\-commerce product understanding\.InProceedings of the Nineteenth ACM International Conference on Web Search and Data Mining,pp\. 924–933\.External Links:[Document](https://dx.doi.org/10.1145/3773966.3777958)Cited by:[§8\.4](https://arxiv.org/html/2606.28070#S8.SS4.p3.1)\.
- Z\. Zhang, H\. Li, Y\. Zhang, G\. Gong, J\. Wang, J\. Hu, P\. Liu, and Q\. Jiang \(2026b\)The primacy of magnitude in low\-rank adaptation\.Advances in Neural Information Processing Systems38,pp\. 39–69\.Cited by:[§5\.2](https://arxiv.org/html/2606.28070#S5.SS2.SSS0.Px1.p3.2)\.
- G\. Zheng, S\. Mukherjee, X\. L\. Dong, and F\. Li \(2018\)Opentag: open attribute value extraction from product profiles\.InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining,pp\. 1049–1058\.Cited by:[§1](https://arxiv.org/html/2606.28070#S1.p5.1),[§4\.1](https://arxiv.org/html/2606.28070#S4.SS1.p1.1),[§8\.3](https://arxiv.org/html/2606.28070#S8.SS3.p1.1)\.
- L\. Zheng, W\. Chiang, Y\. Sheng, S\. Zhuang, Z\. Wu, Y\. Zhuang, Z\. Lin, Z\. Li, D\. Li, E\. Xing,et al\.\(2023\)Judging llm\-as\-a\-judge with mt\-bench and chatbot arena\.Advances in neural information processing systems36,pp\. 46595–46623\.Cited by:[§5\.4](https://arxiv.org/html/2606.28070#S5.SS4.SSS0.Px1.p2.6)\.
- T\. Zhu, Y\. Wang, H\. Li, Y\. Wu, X\. He, and B\. Zhou \(2020\)Multimodal joint attribute prediction and value extraction for e\-commerce product\.InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing \(EMNLP\),pp\. 2129–2139\.External Links:[Document](https://dx.doi.org/10.18653/v1/2020.emnlp-main.166)Cited by:[§8\.3](https://arxiv.org/html/2606.28070#S8.SS3.p2.1)\.
- H\. Zou, H\. Yang, Y\. Su, C\. L\. Yu, Q\. Xie, C\. Lian, Q\. Zhang, S\. Han, F\. Huang, and J\. Chen \(2025\)Multi\-value\-product retrieval\-augmented generation for industrial product attribute value identification\.InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track,pp\. 2096–2105\.Cited by:[§4\.1](https://arxiv.org/html/2606.28070#S4.SS1.p3.1)\.Similar Articles
@AdinaYakup: Ovis2.6-80B-A3B > new MoE multimodal LLM from Alibaba's AIDC team 80B/3B active Apache2.0 64K context / 2880×2880 image…
Alibaba's AIDC team has released Ovis2.6-80B-A3B, an Apache 2.0 licensed Mixture of Experts multimodal LLM featuring 80B total parameters with 3B active, 64K context length, and native support for 2880×2880 images with Chain-of-Thought visual reasoning.
jdopensource/JoyAI-Echo
JD Open Source releases JoyAI-Echo (Echo-LongVideo), a text-to-audio-video diffusion model capable of generating minute-level multi-shot videos with consistent character identity and voice, using DMD distillation for 7.5x speedup.
JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence
This paper presents JoyAI-VL-Interaction, an open-source 8B-scale vision-language model that operates continuously in real-time, deciding autonomously when to respond or delegate. It includes a complete deployable system and a training recipe, outperforming Doubao and Gemini in human evaluations.
Operationalizing Document AI: A Microservice Architecture for OCR and LLM Pipelines in Production
This paper presents a microservice architecture for production document AI pipelines that combine classification, OCR, and LLM extraction, sharing design decisions and batch profiling insights that reveal OCR, not LLM parsing, dominates latency.
Lium AI
Lium AI is an AI tool designed to handle complex data, as featured on ProductHunt.