LapidaryEngine: Fully Conversational Crystal Generation

arXiv cs.LG 06/15/26, 04:00 AM Papers
Summary
LapidaryEngine is a new AI model that enables fully conversational generation of crystal materials from free-form natural language, using a pivot representation for bidirectional translation and iterative refinement. It outperforms existing text-to-crystal systems by allowing intuitive, dialogue-like interaction.
arXiv:2606.14215v1 Announce Type: new Abstract: The emergence of Large Language Models (LLMs) has inspired the vision of generating bespoke crystal materials directly from natural-language instructions, enabling users to design materials through intuitive, conversational interaction. Existing text-to-crystal generative models represent important early steps toward this goal, but they suffer from two critical limitations: (i) restricted input formats that require highly structured descriptions (e.g., chemical formulas), and (ii) one-directional generation, where models can map text to crystal but cannot perform the inverse. These limitations prevent fully conversational workflows and hinder alignment with users' inherently ambiguous and evolving desiderata. We address these challenges with LapidaryEngine, the first model to support fully conversational crystal generation. LapidaryEngine accepts free-form natural-language requests and performs iterative refinement and editing in a dialogue-like manner. The key innovation is a pivot representation, a third, intermediate form that enables bidirectional translation between text and crystal structures despite the absence of direct paired datasets. Leveraging this pivot allows robust interpretation of user feedback and precise structural control. We demonstrate LapidaryEngine across diverse tasks, including insulator discovery, stability optimization, compositional modification, and structural editing, showcasing its ability to align generated materials with user intent in an interactive manner.
Original Article
View Cached Full Text
Cached at: 06/15/26, 09:12 AM
# LapidaryEngine: Fully Conversational Crystal Generation
Source: [https://arxiv.org/html/2606.14215](https://arxiv.org/html/2606.14215)
Yusei Ito1,2Yuta Suzuki1Tomoya Murata1Masaki Adachi1 1Lattice Lab, Toyota Motor Corporation,2The University of OsakaThe work was done while the first author was an intern at Lattice Lab, Toyota Motor Corporation\.Corresponding author:[tomoya\_murata\_aa@mail\.toyota\.co\.jp](https://arxiv.org/html/2606.14215v1/mailto:[email protected])

###### Abstract

The emergence of Large Language Models \(LLMs\) has inspired the vision of generating bespoke crystal materials directly from natural\-language instructions, enabling users to design materials through intuitive, conversational interaction\. Existing text\-to\-crystal generative models represent important early steps toward this goal, but they suffer from two critical limitations: \(i\) restricted input formats that require highly structured descriptions \(e\.g\., chemical formulas\), and \(ii\) one\-directional generation, where models can map text→\\tocrystal but cannot perform the inverse\. These limitations prevent fully conversational workflows and hinder alignment with users’ inherently ambiguous and evolving desiderata\. We address these challenges with LapidaryEngine, the first model to support fully conversational crystal generation\. LapidaryEngine accepts free\-form natural\-language requests and performs iterative refinement and editing in a dialogue\-like manner\. The key innovation is a pivot representation—a third, intermediate form that enables bidirectional translation between text and crystal structures despite the absence of direct paired datasets\. Leveraging this pivot allows robust interpretation of user feedback and precise structural control\. We demonstrate LapidaryEngine across diverse tasks, including insulator discovery, stability optimization, compositional modification, and structural editing, showcasing its ability to align generated materials with user intent in an interactive manner\.

## 1Introduction

Given the remarkable success of generative models in image\[[41](https://arxiv.org/html/2606.14215#bib.bib52)\], video\[[32](https://arxiv.org/html/2606.14215#bib.bib53)\], and music synthesis\[[2](https://arxiv.org/html/2606.14215#bib.bib54)\], it is natural to expect the recent breakthroughs in generative modeling to extend to materials design—indeed, the number of AI\-for\-materials papers has rocketed dramatically\[[46](https://arxiv.org/html/2606.14215#bib.bib67),[42](https://arxiv.org/html/2606.14215#bib.bib68),[25](https://arxiv.org/html/2606.14215#bib.bib66),[35](https://arxiv.org/html/2606.14215#bib.bib65),[52](https://arxiv.org/html/2606.14215#bib.bib20)\]\. In particular, with the emergence of human\-level performance in Large Language Models \(LLMs\)\[[39](https://arxiv.org/html/2606.14215#bib.bib13),[14](https://arxiv.org/html/2606.14215#bib.bib14),[50](https://arxiv.org/html/2606.14215#bib.bib15)\], the community is now racing to apply these capabilities to core challenges in scientific fields, including hypothesis generation, experimental planning, and automated scientific reasoning\[[31](https://arxiv.org/html/2606.14215#bib.bib72),[33](https://arxiv.org/html/2606.14215#bib.bib70),[1](https://arxiv.org/html/2606.14215#bib.bib69),[43](https://arxiv.org/html/2606.14215#bib.bib71)\]\. One of the most transformative aspects of LLMs is the ability to formulate scientific problems directly in natural language—tasks that previously required carefully crafted, domain\-specific formalisms for simulations and experiments\. Among these new possibilities, text\-to\-crystal generation stands out: it offers an interface through which users can specify desired materials simply by writing natural\-language descriptions\. Because traditional materials\-science tools—such as atomistic simulations or quantum\-chemical analysis—have long been inaccessible to non\-experts, text\-to\-crystal systems hold the promise of democratizing expert knowledge\. Much like how non\-experts can now write novels or create illustrations with generative models, engineers and designers could soon generate bespoke materials tailored to their needs\.

Two early works, Chemeleon and GenMS, represent pioneering steps toward text\-to\-crystal models, demonstrating promising results in their respective applications\[[51](https://arxiv.org/html/2606.14215#bib.bib3),[40](https://arxiv.org/html/2606.14215#bib.bib2)\]\. Chemeleon shows that crystals can be generated from only a vague textual prompt describing the composition, and in the Zn\-Ti\-O system it explores the chemical space to propose new candidate crystal structures beyond the known phases\. GenMS demonstrates that natural language descriptions specifying a crystal family, such as perovskites, can be used to correctly generate crystal structures consistent with that family\. However, these approaches do not yet achieve the goal of democratization\. Why? Because crafting precise instructions itself requires expert\-level or even oracle\-level knowledge\. Natural\-language descriptions can be inherently ambiguous\. For example, consider the prompt: “Generate an insulating material that has not yet been reported in the literature\.” The space of possible answers is vast, leaving open questions about the material family, the required degree of insulation \(e\.g\., bandgap\), and various physical or chemical constraints\. To specify these details, users must already possess a clear and technically grounded design target\. Moreover, at the scientific frontier, even experts operate under uncertainty: they often do not know whether a hypothesized structure can be synthesized or whether it will exhibit the desired properties\. Their design goals typically evolve through iterative trial and error\. Requiring users to provide well\-specified, error\-free instructions a priori therefore conflicts with how real materials discovery proceeds and severely limits applicability\.

A more natural and user\-friendly interface for materials design is iterative refinement, exemplified by the conversational paradigm popularized by ChatGPT\. Users should be able to begin with a vague idea, progressively clarify their requirements as they learn more, and ultimately converge to a well\-specified target through a dialogue\-like interaction\. However, enabling*conversational*crystal generation is far from trivial\. The two existing text\-to\-crystal systems\[[51](https://arxiv.org/html/2606.14215#bib.bib3),[40](https://arxiv.org/html/2606.14215#bib.bib2)\]cannot support such interactive refinement\. This stems from two fundamental limitations:

1. \(a\)Restricted instruction formats\.Existing approaches typically constrain textual inputs to highly structured descriptions—such as chemical formulas, space groups, or symmetry tokens\[[40](https://arxiv.org/html/2606.14215#bib.bib2)\]\. These rigid formats deviate substantially from natural language, requiring expert\-level knowledge to construct valid prompts\. Moreover, they exclude intentionally ambiguous or partially specified requests, which are essential for true democratization and early\-stage ideation\.
2. \(b\)One\-directional modeling\.Current methods map*text*→\\to*crystal*, but not*crystal*→\\to*text*\[[51](https://arxiv.org/html/2606.14215#bib.bib3),[40](https://arxiv.org/html/2606.14215#bib.bib2)\]\. Without a reverse pathway, the model cannot interpret the previously generated structure, assess how it aligns with the user’s evolving intent, or incorporate feedback from earlier rounds\. As a result, iterative refinement is impossible: the system has no mechanism to update or adjust the design based on the output of prior iterations\.

![Refer to caption](https://arxiv.org/html/2606.14215v1/x1.png)Figure 1:Key idea and example of LapidaryEngine\. \(a\) As there is no dataset that directly links textual descriptions to crystal structures, our approach introduces a pivot representation bridging linguistic and structural modalities\. \(b\) Examples of crystal structures generated by our framework\. Starting from a prompt requesting an insulator with a large bandgap, the model produces an initial structure and iteratively refines it through natural language feedback\.In response, we propose LapidaryEngine—the*first\-of\-its\-kind*model to enable*fully*conversational, multi\-round refinement of crystal structures\. Our key innovation is the introduction of*a pivot representation*, inspired by classical pivot\-based machine translation\. Just as two languages without parallel data \(e\.g\., Kiswahili and Japanese\) can communicate via a shared third language \(English\), we establish a pivot representation that bridges text and crystal structure\. Crucially, while direct datasets of \(property, crystal\) pairs do not exist, both modalities can be bidirectionally mapped to a structural description—our pivot representation\. As illustrated in Fig\.[1](https://arxiv.org/html/2606.14215#S1.F1)\(a\), this pivot provides a common semantic ground through which text and crystal structures become mutually interpretable, enabling stable, iterative, conversational refinement for the first time\.

We demonstrated LapidaryEngine on diverse tasks\. Figure[1](https://arxiv.org/html/2606.14215#S1.F1)\(b\) shows the main result\. Starting from a prompt that requests an insulating material \(i\.e\., one with a large bandgap\), an initial structure is generated as a rough hypothesis\. The system then refines this structure through iterative user feedback, beginning with coarse guidance and progressively incorporating the constraints that naturally arise during crystal structure design\. This framework is not posed as a conventional optimization task\. Instead, it allows the user’s desiderata, including preferences, design intentions, and domain\-specific considerations, to be continuously injected and reflected in the evolving structure\. In addition, we conducted two tasks aimed at improving verifiable physical properties\. Each task was repeated 1,000 times, and statistical analysis confirmed that the targeted physical properties were improved\. We open\-source our code and model to the community\.

## 2Results

Toward fully conversational crystal generation, we show that a pivot representation based on structural descriptions resolves both flexibility and bidirectionality challenges\. The key idea is simple: instead of directly editing the crystal structure—which is discrete, highly constrained, and difficult to manipulate—we edit a textual description that represents the structural information\. This allows all refinement steps to remain entirely within the text domain, where LLMs excel at controlled editing and iterative improvement\. By shifting the problem from the joint \(text, crystal\) space to the text\-only pivot space, we can fully exploit the strengths of LLMs while maintaining precise control over the generated structures\. We demonstrate that this design solves the limitations that existing methods fail to overcome\.

### 2\.1Pivot representation

Although no generative model currently supports*crystal*→\\to*text*, there exists a*rule\-based*text generator for crystal structures: Robocrystallographer\[[13](https://arxiv.org/html/2606.14215#bib.bib16)\]\. Crucially, because it is rule\-based, its mapping is essentially bijective: a crystal has one corresponding textual description, and vice versa\. This property makes the Robocrystallographer\-style output an ideal pivot representation, enabling unambiguous, bidirectional translation between text and crystal\. Editing the pivot therefore directly controls the crystal structure, addressing both challenges described above\. As illustrated in Fig\.[1](https://arxiv.org/html/2606.14215#S1.F1)\(a\), our workflow proceeds as follows\. We first map the user’s imprecise natural\-language prompt to a precise pivot description using an LLM\. We then employ a GNN\-based diffusion model\[[40](https://arxiv.org/html/2606.14215#bib.bib2)\]trained on paired \(pivot, crystal\) data to generate candidate structures\. Through the pivot, we can interpret ambiguous or high\-level requests, overcoming the limitation of restricted instruction formats \(Issue \(a\)\)\. Moreover, the pivot enables true bidirectional refinement\. After generating a candidate structure, we convert the crystal back into its pivot description and update it based on user feedback\. The refined pivot is then decoded again into a new crystal structure\. This closed\-loop pipeline resolves the second limitation—one\-directional text\-only generation—and makes iterative, conversation\-style crystal design possible\.

The full workflow is illustrated in Fig\.[2](https://arxiv.org/html/2606.14215#S2.F2)\. To maximize generative quality, we adopt a Best\-of\-NNsampling strategy\[[8](https://arxiv.org/html/2606.14215#bib.bib74),[37](https://arxiv.org/html/2606.14215#bib.bib73)\]: for each pivot description, the model generatesNNcandidate structures, verifies their physical plausibility \(e\.g\., stability indicators, valid compositions\), and selects the candidate most aligned with the input description\. After a structure is produced, it is presented to the user for feedback \(*e\.g\.*, comments like “too distorted,” “replace zirconia with titanium”\) or quantitative metrics \(*e\.g\.*, density, conductivity\)\. The LLM receives this feedback alongside the pivot representation of the previously generated crystal and refines the pivot accordingly\. A new crystal is then generated from the updated pivot, and the whole process repeats until the user is satisfied\.

Through this iterative loop, the system progressively aligns the generated structures with the user’s evolving desiderata\. In this way, our framework mirrors—and directly augments—the traditional materials discovery workflow, which has long relied on repeated feedback and trial\-and\-error to refine candidate structures\. We provide details of the algorithm and its explanation in Sec\.[4\.1](https://arxiv.org/html/2606.14215#S4.SS1)\.

![Refer to caption](https://arxiv.org/html/2606.14215v1/x2.png)Figure 2:Overview of LapidaryEngine, feedback\-enabled framework for text\-guided crystal structure generation\.The LLM interprets the prompt provided by the user into a pivot structure description, and the GNN\-based generative model generates a crystal structure according to this description\. Based on the previously generated crystal structure and the user’s feedback, the LLM then creates a description of the structure for the next generation\. This approach enables the framework to leverage the LLM’s knowledge of materials science and the GNN\-based model’s geometric reasoning capability\.
### 2\.2Quantitative analysis

![Refer to caption](https://arxiv.org/html/2606.14215v1/x3.png)Figure 3:Average formation energy and bandgap with standard deviation at each feedback iteration for \(a\) stability\-focused generation and \(b\) bandgap\-focused generation\.For comparison, we also present the results from property\-only feedback and those obtained when the LLM directly generates complete crystal structure information\. The proposed method progressively improves the targeted properties across multiple iterations and outperforms the other approaches\. These results show that it makes effective use of feedback and highlight the combined strengths of the LLM and the GNN crystal generator\. Data points are connected by lines for visual clarity\.To quantitatively evaluate the effectiveness of the feedback mechanism, we tested LapidaryEngine in two cases: \(a\) generation focusing on structural stability and \(b\) generation focusing on large bandgap\. Two types of prompts were used for generation:

1. \(a\)Stability\-focused generation:Generate a highly stable material with low formation energy that has not yet been reported\.
2. \(b\)Bandgap\-focused generation:Generate a large bandgap material that has not yet been reported and is stable \(i\.e\., has a low formation energy\)\.

For each prompt type, we generated crystal structures and conducted five rounds of feedback\. We then analyzed the feedback trajectory, confirming that the formation energy progressively decreased in the stability\-focused case and that the bandgap increased in the bandgap\-focused case\.

To enable rapid feedback cycles required for statistically reliable performance evaluation, we calculated material properties using an ML model and automatically generated feedback with an LLM\. Specifically, we employed CrystalFramer\[[21](https://arxiv.org/html/2606.14215#bib.bib30)\], a crystal property predictor trained on the MEGNet dataset\[[7](https://arxiv.org/html/2606.14215#bib.bib28)\], instead of performing computationally expensive DFT calculations\[[19](https://arxiv.org/html/2606.14215#bib.bib60),[29](https://arxiv.org/html/2606.14215#bib.bib59)\]\. The pretrained weights provided by the authors were used for both formation energy and bandgap prediction\. The feedback processes were conducted using Alibaba’s Qwen3\-Next\-80B\-A3B\-Thinking model\[[50](https://arxiv.org/html/2606.14215#bib.bib15)\], and the prompt is provided in Appendix[A](https://arxiv.org/html/2606.14215#A1)\. The LLM was provided with the generated structure’s Robocrystallographer description and the property prediction results from the ML model, and it was then prompted to give feedback based on them\. We also adopted the same Qwen3 model for the generating process\. We used the LLMs without any fine\-tuning, and we selected Qwen3 since it runs locally, allows the environment to stay consistent for reproducible results, and is released under the Apache 2\.0 license, which makes it easy to use\. Please note that while the same model was used, the feedback and generation sessions were conducted separately, without shared conversation logs, to replicate human feedback conditions\.

Previous methods generate crystal structures in a single step without incorporating feedback\. Accordingly, the case with one feedback iteration corresponds to the baseline\. We further compare our method with two additional baseline settings\.Property\-only feedback: We replace the LLM feedback with only the property values predicted by CrystalFramer, to assess whether the model can leverage the linguistic feedback\.LLM only: we ask the LLM to directly generate complete crystal structures in the Crystallographic Information File \(CIF\) format, which is a domain\-specific text representation of the full three\-dimensional atomic configuration\. This baseline corresponds to a simple approach where the entire generation process, including atomic coordinates, is carried out only in the language space without using the GNN\-based generator, which is better suited to capturing complex three\-dimensional atomic arrangements\.

Figure[3](https://arxiv.org/html/2606.14215#S2.F3)shows the average formation energy and bandgap at each iteration for stability\-focused generation \(Panel \(a\)\) and bandgap\-focused generation \(Panel \(b\)\), where the values represent the mean and standard deviation obtained from five sets of 200\-generation runs\. In both cases, the proposed method outperformed the property\-only feedback case, indicating that it effectively leveraged feedback to generate improved structures in subsequent iterations\. The improvement observed even with property\-only feedback case may result from the LLM producing new structures based on its internal knowledge, a behavior also noted in studies on self\-refinement in language models\[[34](https://arxiv.org/html/2606.14215#bib.bib22)\]\.

When crystal structures were generated solely by the LLM \(with CIF format information produced directly by the model\), the stability\-focused generation showed a gradual increase in formation energy over iterations, indicating decreasing stability, and the bandgap\-focused generation performed less effectively than the proposed method\. These differences highlight the complementary strengths of the LLM, which expresses structural descriptions linguistically, and the GNN, which captures the geometric characteristics of crystal structures\.

We also verified the stability of the bandgap\-focused generation using the formation energies predicted by CrystalFramer\[[21](https://arxiv.org/html/2606.14215#bib.bib30)\]\. Structures with predicted formation energies below 0 eV/atom were considered stable, and we calculated the proportion of stable structures among 1,000 generated samples\. The proposed method achieved a stability rate of 77\.2%, and the property\-only feedback setting reached 77\.8%\. In contrast, the LLM that generated CIF files directly achieved 50\.2%\. These results indicate that the proposed method satisfies both the requirement for an increased bandgap and the requirement for structural stability with high accuracy\.

Furthermore, we assessed the generative capability of the model\. As the structural and compositional filtering ensured the physical validity of the generated structures, we checked two additional key metrics,uniquenessandnovelty\.Uniquenessrepresents the ratio of distinct structures obtained after removing duplicates using theStructureMatchermodule inpymatgen\[[38](https://arxiv.org/html/2606.14215#bib.bib42)\]\.Noveltydenotes the proportion of generated structures for which no similar structures exist among the 210,579 entries of the Materials Project database as of October 2025\[[20](https://arxiv.org/html/2606.14215#bib.bib27)\], as determined byStructureMatcher\.

Table[1](https://arxiv.org/html/2606.14215#S2.T1)summarizes the uniqueness and novelty observed in each case\. Regarding uniqueness, in the methods combined with GNN, almost all of the 1,000 generated structures were distinct, while in the case where the LLM directly generated complete crystal structure information, the rate was in the 80% range\. These differences suggest that when the LLM is allowed to output structural descriptions in natural language rather than directly producing specialized formats such as CIF, it can more fully leverage its generative capability\. As for novelty, the rate was 100% in all cases\. Since the prompts explicitly instructed the model to create new structures, it seems that the LLM followed those instructions faithfully\. Our evaluation was conducted using only the Materials Project dataset, and we acknowledge that the LLM’s internal knowledge extends beyond this source; therefore, the assessment is not entirely exhaustive\. Nevertheless, we confirmed that the generated structures were not ones that are widely known\.

Table 1:Uniqueness and novelty of generated crystal structures\.
### 2\.3Qualitative analysis

![Refer to caption](https://arxiv.org/html/2606.14215v1/x4.png)Figure 4:An Example of structure refinement through the feedback process\.The model successfully incorporated linguistic feedback into both compositional and structural refinements\. Crystal structures are visualized by Vesta\[[36](https://arxiv.org/html/2606.14215#bib.bib36)\]\.In addition to the quantitative evaluation presented in Sec\.[2\.2](https://arxiv.org/html/2606.14215#S2.SS2), we qualitatively examined the feedback obtained at each iteration and analyzed how the structure evolved accordingly\. Figure[4](https://arxiv.org/html/2606.14215#S2.F4)presents an example of the stability\-focused generation task described in Sec\.[2\.2](https://arxiv.org/html/2606.14215#S2.SS2), showing the structures before and after feedback together with a summary of the corresponding comments\.

In the example shown in Fig\.[4](https://arxiv.org/html/2606.14215#S2.F4), the initially generated structure was pointed out to lack anions, and it was suggested to replace the unrealistic metal–metal bonds with metal–anion bonds\. In addition, it was recommended that metallic atoms such as ytterbium and zirconium adopt an octahedral coordination rather than a linear one\. Reflecting these suggestions, the refined structure forms a halide perovskite structure, where the previously identified issues have been resolved, and the predicted formation energy improved from 0\.456 eV/atom to –1\.708 eV/atom\. From this result, it is evident that the received feedback was correctly reflected in the generated structures\. In particular, it is remarkable that not only elemental substitutions but also structural modifications were accurately performed\.

We also demonstrated in Fig\.[1](https://arxiv.org/html/2606.14215#S1.F1)\(b\) that crystal structures can be designed not only through organized feedback from an LLM but also through human colloquial feedback\. Using the insulator discovery task as an example, we illustrated how the crystal structure evolved over three rounds of feedback\. In the first round, we simply provided predicted property values obtained from the crystal structure encoder\. The model then modified the structure while preserving the composition, successfully widening the bandgap\. In the second round, we instructed the model to replace strontium with a more readily available element\. Although the bandgap slightly decreased, strontium was replaced with calcium, which is abundant on Earth\. In the third round, we asked the model to increase the distance between structural units to disrupt conduction pathways\. The model responded by editing the structure while maintaining the composition, resulting in enlarged inter unit distances\. Overall, these results demonstrate that our framework supports an iterative and conversational design process, in which initially vague user desiderata are gradually refined into more concrete design constraints, including restrictions on elemental composition, rather than being treated as a fixed property optimization problem\.

### 2\.4Crystal editing based on textual instructions

![Refer to caption](https://arxiv.org/html/2606.14215v1/x5.png)Figure 5:Crystal editing demonstrations: \(a\) Elemental substitution in CsCl and \(b\) Structural editing in a perovskite structure\.We confirmed that the model successfully edited the crystal structures in accordance with the text instructions and adjusted structural parameters such as lattice constants to achieve more stable configurations\. Crystal structures are visualized by Vesta\[[36](https://arxiv.org/html/2606.14215#bib.bib36)\]\.While Fig\.[2](https://arxiv.org/html/2606.14215#S2.F2)and Algorithm[1](https://arxiv.org/html/2606.14215#alg1)show a process that begins with a text prompt to generate a crystal structure from scratch, LapidaryEngine also enables crystal editing, creating new structures from existing ones under textual guidance\. This approach parallels image\-to\-image generation, where an image is modified according to text prompts\[[9](https://arxiv.org/html/2606.14215#bib.bib10),[6](https://arxiv.org/html/2606.14215#bib.bib11),[26](https://arxiv.org/html/2606.14215#bib.bib12)\]\. In this setting, the generation process is initialized with an existing crystal structure and a feedback prompt, and the procedure then starts from the upper\-right part of the feedback loop in Fig\.[2](https://arxiv.org/html/2606.14215#S2.F2)\(i\.e\., line 10 of Algorithm[1](https://arxiv.org/html/2606.14215#alg1)\)\.

We conducted the crystal editing experiment in two cases: \(a\) a relatively simple element\-substitution case and \(b\) a structural\-modification case\. In the former, we instructed the model to replace the chlorine atoms in the CsCl structure with gold atoms\. In the latter, we performed an editing task to transform the distorted perovskite structure of CaTiO3into an undistorted perovskite structure\.

In the compositional editing shown in Fig\.[5](https://arxiv.org/html/2606.14215#S2.F5)\(a\), we confirmed that the chlorine atoms are correctly substituted with gold atoms\. Interestingly, this substitution results in a change in the lattice constant from 4\.12 Å to approximately 4\.27 Å, with the generated CsAu exhibiting a larger value than CsCl\. This trend is consistent with previous experimental reports, which give lattice constants of 4\.12 Å for CsCl and 4\.26 Å for CsAu\[[47](https://arxiv.org/html/2606.14215#bib.bib44),[44](https://arxiv.org/html/2606.14215#bib.bib43)\], and suggests that lattice expansion accompanies structural stabilization\.

As for the structural editing results shown in Fig\.[5](https://arxiv.org/html/2606.14215#S2.F5)\(b\), the distorted perovskite structure of CaTiO3was successfully transformed into a distortion\-free, regular perovskite structure, which indicates that our model is capable of capturing and correcting geometric structural information\.

From these results, we verified that LapidaryEngine can intuitively modify crystal structures based on textual instructions given for the original structure\. It was also confirmed that the model not only follows the instructions but also adjusts structures such as lattice constants to achieve a more stable crystal configuration\.

## 3Discussion

In this study, we demonstrated that introducing a linguistic description of structure as an intermediate representation makes it possible to connect text and crystal structures in a feedback\-capable manner\. We adopted the notation of Robocrystallographer as the pivot representation, as it provides the most intuitive linguistic expression\[[13](https://arxiv.org/html/2606.14215#bib.bib16)\]\. However, other approaches such as SLICES\[[48](https://arxiv.org/html/2606.14215#bib.bib58)\]have also been developed to describe crystal structures linguistically\[[22](https://arxiv.org/html/2606.14215#bib.bib57)\]\. A comparative investigation of these representations will be left for future work\.

The crystal structure generation module of our proposed framework is based on Chemeleon\[[40](https://arxiv.org/html/2606.14215#bib.bib2)\]and does not impose any symmetry constraints\. As a result, most of the generated structures are classified in the lowest space groupP1, although they often roughly satisfy higher crystallographic symmetries\. The remaining deviations from exact symmetry reflect the weak sensitivity of the loss function to strict enforcement of these symmetries\. For instance, in the generated structure shown in Fig\.[5](https://arxiv.org/html/2606.14215#S2.F5)\(a\), the lattice parameters are approximately equal in length but not identical\. Recently, considerable progress has been made in crystal generation methods that explicitly incorporate symmetry\[[24](https://arxiv.org/html/2606.14215#bib.bib19),[53](https://arxiv.org/html/2606.14215#bib.bib51),[30](https://arxiv.org/html/2606.14215#bib.bib35),[11](https://arxiv.org/html/2606.14215#bib.bib45),[27](https://arxiv.org/html/2606.14215#bib.bib50)\]\. Extending these advances toward symmetry\-aware text\-to\-crystal structure generation remains an area for future work\. Another promising approach is to incorporate postprocessing methods, such as DFT relaxation and symmetry detection\[[3](https://arxiv.org/html/2606.14215#bib.bib64)\]with large tolerance to correct subtle discrepancies\. Additionally, symmetry itself has a hierarchical nature, and just as symmetry decreases during phase transitions, natural structural variations in crystals should also follow this hierarchy\[[12](https://arxiv.org/html/2606.14215#bib.bib46)\]\. Integrating this hierarchical aspect of symmetry into the iterative generation process \(*e\.g\.*, allowing the next generation step to adopt symmetry one level lower or higher than that of the original structure\) will be an important direction for future research\.

In the quantitative evaluation in Sec\.[2\.2](https://arxiv.org/html/2606.14215#S2.SS2), although the feedback came from an LLM rather than human experts, the predicted physical properties shifted toward more favorable values\. Similar performance improvements through LLM feedback have been reported in other domains\[[34](https://arxiv.org/html/2606.14215#bib.bib22)\], and it is intriguing that the same effect was also observed in the highly complex task of crystal generation\. This ability to autonomously explore high\-quality crystal structures indicates that the proposed approach can serve as a component within an agent\-based workflow for materials discovery\. When combined with the rapidly growing field of autonomous experimentation, it becomes a powerful engine for progress in crystal design and marks an important step toward a self\-driving laboratory that can independently iterate design, synthesis, and evaluation\[[5](https://arxiv.org/html/2606.14215#bib.bib55),[45](https://arxiv.org/html/2606.14215#bib.bib56)\]\.

In this study, we focused on quantifiable properties such as formation energy and bandgap, which can be evaluated through DFT calculations or their surrogate machine learning models\. Meanwhile, the performance required in practical materials development is often not something that can be directly and quantitatively assessed; rather, it tends to involve a combination of diverse and sometimes qualitative requirements\. Demonstrating our method in an actual materials development context would demand advanced expertise and practical knowledge specific to the field, and thus was not performed in this work\. The potential of this approach to extend to complex materials development remains an interesting open question\.

## 4Methods

### 4\.1Details of LapidaryEngine

Algorithm[1](https://arxiv.org/html/2606.14215#alg1)summarizes the proposed framework described in Sec\.[2\.1](https://arxiv.org/html/2606.14215#S2.SS1)\. The initially provided target property descriptionp1p\_\{1\}is converted into a Robocrystallographer\-format structural descriptiond1d\_\{1\}by using an LLM, and a pure noise state is prepared for crystal generation via a GNN\-based diffusion model\. Then the loop begins, consisting of two main stages\. The first stage \(lines 4–7\) generates a crystal structure from the structural description, while the second stage \(lines 8–13\) receives feedback on the generated crystal structure and, based on the result, produces the next structural description and the noise state to be denoised during generation\. Each stage is described in detail in the following Sec\.[4\.1\.1](https://arxiv.org/html/2606.14215#S4.SS1.SSS1)and Sec\.[4\.1\.2](https://arxiv.org/html/2606.14215#S4.SS1.SSS2)\.

#### 4\.1\.1Crystal structure generation from structural descriptions

Algorithm 1LapidaryEngine1:

2:Input:

3:Target property text

p1p\_\{1\}\(*e\.g\.*, “high electrical conductivity”\)

4:Structural description\-conditioned crystal diffusion model

𝒢\\mathcal\{G\}
5:Number of feedback iterations

KK
6:Number of generations per iteration

NN\(default: 10\)

7:Denoising strength

α∈\(0,1\]\\alpha\\in\(0,1\]\(default: 0\.1\)

8:Optimized crystal structure

c∗c^\{\*\}
9:

10:

d1←LLM\_interpret\(p1\)d\_\{1\}\\leftarrow\\texttt\{LLM\\\_interpret\}\(p\_\{1\}\)⊳\\trianglerightDetails in Appendix[B](https://arxiv.org/html/2606.14215#A2)

11:Initialize

\{𝒵i\}i=1N\\\{\\mathcal\{Z\}\_\{i\}\\\}\_\{i=1\}^\{N\}with a pure noise state for the diffusion model

12:for

k=1k=1to

KKdo

13:Generate structureckc\_\{k\}from structural description:⊳\\trianglerightDetails in Sec\.[4\.1\.1](https://arxiv.org/html/2606.14215#S4.SS1.SSS1)

14:Sample

NNcandidate structures

\{ck\(i\)\}i=1N\\\{c\_\{k\}^\{\(i\)\}\\\}\_\{i=1\}^\{N\}using the diffusion model

𝒢\\mathcal\{G\}ck\(i\)=𝒢\(𝒵i,condition=dk\)c\_\{k\}^\{\(i\)\}=\\mathcal\{G\}\(\\mathcal\{Z\}\_\{i\},\\text\{condition\}=d\_\{k\}\)
15:Evaluate structural and compositional validity, and filter out invalid samples:

\{ck\(i\)\}←\{ck\(i\)∣isValid\(ck\(i\)\)=True\}\\\{c\_\{k\}^\{\(i\)\}\\\}\\leftarrow\\\{c\_\{k\}^\{\(i\)\}\\mid\\texttt\{isValid\}\(c\_\{k\}^\{\(i\)\}\)=\\texttt\{True\}\\\}
16:Compute text–structure alignment scores and pick best:

ck←arg⁡maxi⁡AlignmentScore\(ck\(i\),dk\)c\_\{k\}\\leftarrow\\arg\\max\_\{i\}\\texttt\{AlignmentScore\}\(c\_\{k\}^\{\(i\)\},d\_\{k\}\)
17:Feedback and refinement:⊳\\trianglerightDetails in Sec\.[4\.1\.2](https://arxiv.org/html/2606.14215#S4.SS1.SSS2)

18:Provide feedback based on the generated structure

pkfb←Feedback\(ck\)p\_\{k\}^\{\\text\{fb\}\}\\leftarrow\\texttt\{Feedback\}\(c\_\{k\}\)
19:Convert

ckc\_\{k\}to structural description

dkgen←Robocrystallographer\(ck\)d^\{\\text\{gen\}\}\_\{k\}\\leftarrow\\texttt\{Robocrystallographer\}\(c\_\{k\}\)
20:Update structural description with feedback

pkfbp^\{\\text\{fb\}\}\_\{k\}:

dk\+1←LLM\_refine\(pkfb,dkgen\)d\_\{k\+1\}\\leftarrow\\texttt\{LLM\\\_refine\}\(p^\{\\text\{fb\}\}\_\{k\},d^\{\\text\{gen\}\}\_\{k\}\)
21:Initialize next step from partially noised state:

\{𝒵i\}i=1N←Add\_noise\(ck,strength=α\)\\\{\\mathcal\{Z\}\_\{i\}\\\}\_\{i=1\}^\{N\}\\leftarrow\\texttt\{Add\\\_noise\}\(c\_\{k\},\\text\{strength\}=\\alpha\)
22:Denoise from

\{𝒵i\}i=1N\\\{\\mathcal\{Z\}\_\{i\}\\\}\_\{i=1\}^\{N\}in the next iteration

23:endfor

24:return

c∗←cKc^\{\*\}\\leftarrow c\_\{K\}

We employed Chemeleon\[[40](https://arxiv.org/html/2606.14215#bib.bib2)\]to generate crystal structures from Robocrystallographer format descriptions\[[13](https://arxiv.org/html/2606.14215#bib.bib16)\]\. Details of the model are provided in Sec\.[4\.2](https://arxiv.org/html/2606.14215#S4.SS2)\. To encourage the model to generate crystals that are physically plausible and more faithful to the given text, we generatedN=10N=10candidate samples \(*i\.e\.*, each starting from a different pure noise state\)\. Among them, only the samples satisfying both structural and compositional validity were retained, following the validity criteria commonly used in previous studies\[[49](https://arxiv.org/html/2606.14215#bib.bib21),[24](https://arxiv.org/html/2606.14215#bib.bib19),[30](https://arxiv.org/html/2606.14215#bib.bib35)\]\. Specifically, structural validity was determined by ensuring that no pair of atoms was closer than 0\.5 Å, while compositional validity was checked by confirming overall charge neutrality using the SMACT library\[[10](https://arxiv.org/html/2606.14215#bib.bib41)\]\. Finally, the structure with the highest alignment score—analogous to the CLIP score\[[16](https://arxiv.org/html/2606.14215#bib.bib29)\], which indicates the degree of semantic consistency between the generated structure and the text description—was selected as the final output\. This alignment score was computed by encoding the crystal structures and the text with encoders trained during the contrastive learning stage of Chemeleon \(see Sec\.[4\.2](https://arxiv.org/html/2606.14215#S4.SS2)\), and then measuring the cosine similarity between their embeddings\. Note that if none of the structures passes the validity check, the iteration is reset and repeated\.

#### 4\.1\.2Crystal structure refinement

To design the next crystal structure, feedbackpkfb=Feedback\(ck\)p\_\{k\}^\{\\text\{fb\}\}=\\texttt\{Feedback\}\(c\_\{k\}\)is provided based on the generated crystal structureckc\_\{k\}\. Although onlyckc\_\{k\}appears as an argument, it should be noted that the feedback is not derived solely fromckc\_\{k\}itself, but from various results \(*e\.g\.*, simulation and experimental results\)\. We then obtain the Robocrystallographer representation of the previously generated crystal structure and supply it to the LLM along with feedback, as illustrated in the prompt shown in Appendix[C](https://arxiv.org/html/2606.14215#A3)\. Rather than initiating the next crystal structure generation from a pure noise state, the diffusion process starts from a partially noised version of the original structure, ensuring that the refinement stays guided by the initial configuration\. We control the level of noise through the denoising strength parameterα∈\(0,1\]\\alpha\\in\(0,1\], a technique commonly used in image\-to\-image tasks\[[9](https://arxiv.org/html/2606.14215#bib.bib10)\]\.

In this approach, crystal structure generation from feedback begins not from a pure noise state but from a partially noisy state of the previously generated crystal structure corresponding to the diffusion time stepT×αT\\times\\alpha, whereTTdenotes the fully noisy state andα\\alphacontrols the diffusion strength\. This partial noising process enables the model to refine or adjust the output while preserving its overall character\.

We examined the effect of the strength parameterα\\alphain Appendix[D](https://arxiv.org/html/2606.14215#A4)\.

### 4\.2Crystal structure generation model

We strictly followed Chemeleon for the model that generates crystal structures from linguistic structural descriptions\. After briefly describing the model architecture, we provide a detailed explanation of the dataset and training details\. For a more comprehensive description of the architecture, please refer to the original paper\[[40](https://arxiv.org/html/2606.14215#bib.bib2)\]\.

#### 4\.2\.1Model architecture

Chemeleon is a text\-guided generative model that generates crystal structures through a denoising diffusion process conditioned on text embeddings from a pretrained text encoder\. During training, Gaussian noise is added to crystal structures, and the model is trained to predict the added noise\. During inference, the model starts from a pure noise state and progressively denoises it to generate complete crystal structures\. For the lattice constants and atomic positions, it follows the framework of Denoising Diffusion Probabilistic Models \(DDPM\)\[[17](https://arxiv.org/html/2606.14215#bib.bib32)\], while for the atomic species, it adopts the Discrete Denoising Diffusion Probabilistic Models \(D3PM\) framework\[[4](https://arxiv.org/html/2606.14215#bib.bib33)\]\.

Chemeleon comprises two key elements\. The first component is Crystal CLIP, a cross\-modal contrastive learning module for pretraining the text encoder MatTPUSciBERT\[[15](https://arxiv.org/html/2606.14215#bib.bib26)\]by aligning its embeddings with the corresponding crystal structure embeddings produced by the GNN\. By bringing positive text–crystal pairs closer together and pushing negative pairs farther apart, Crystal CLIP learns a shared latent space where textual representations reflect structural geometric information\.

The second element is a classifier\-free guided denoising diffusion model\[[18](https://arxiv.org/html/2606.14215#bib.bib17)\]that predicts the noise added to each variable of the crystal structure \(*i\.e\.*, lattice matrices, atomic coordinates, and atom types\), conditioned on the text embeddings produced by the text encoder of Crystal CLIP\. The denoising network builds upon the DiffCSP framework\[[23](https://arxiv.org/html/2606.14215#bib.bib18)\], which was originally developed for crystal structure prediction tasks\.

By aligning linguistic embeddings with the geometric information of crystal structures through CLIP and training the denoising model conditioned on these embeddings, the model can generate crystal structures that follow textual instructions\.

#### 4\.2\.2Dataset and training details

We used the MEGNet dataset\[[7](https://arxiv.org/html/2606.14215#bib.bib28)\], which is a snapshot of the Materials Project database\[[20](https://arxiv.org/html/2606.14215#bib.bib27)\]\. Following the official split, the dataset was divided into 60,000, 5,000, and 4,239 samples for training, validation, and testing, respectively\. After generating textual descriptions using Robocrystallographer, we trained the model on this dataset\.

The training of Chemeleon consists of two stages: \(1\) contrastive pretraining of the Crystal CLIP module, and \(2\) text\-conditioned diffusion model training for crystal generation\.

In the contrastive learning stage, the text encoder and the GNN\-based crystal encoder are trained together so that their embeddings align within a shared latent space, which helps the text embedding capture geometric information\. The text embeddings are obtained from the\[CLS\]token of the text encoder output, and the crystal embeddings are produced by averaging the node features from the GNN\-based crystal structure encoder\. The training objective combines text\-to\-graph and graph\-to\-text cross\-entropy losses with a symmetric contrastive formulation\. A batch size of 128 is used with the Adam optimizer\[[28](https://arxiv.org/html/2606.14215#bib.bib31)\], where the learning rates for the text and graph encoders are set to1×10−51\\times 10^\{\-5\}and1×10−41\\times 10^\{\-4\}, respectively\. Training proceeds for up to 1,000 epochs, employing early stopping if the validation loss does not improve for 300 epochs\. A learning rate scheduler withReduceLROnPlateau\(patience = 200 epochs\) is applied for stability\.

During diffusion model training, the text encoder is kept frozen, and the pretrained Crystal CLIP embeddings are used as conditional inputs\. The denoising network is optimized using the Adam optimizer with a learning rate of1×10−31\\times 10^\{\-3\}, maintaining the same batch size and scheduling settings as in the contrastive learning stage\. The loss function consists of three components: atom species, lattice, and coordinate denoising losses\. Both training stages were performed on four NVIDIA H200 \(141 GB\) GPUs and took 30 hours for contrastive learning and 20 hours for diffusion model training\.

## Competing interests

All authors declare no financial or non\-financial competing interests\.

## Author contributions

Y\.I\.initiated the study, designed and implemented the method, conducted all numerical and experimental analyses, and drafted the initial manuscript\.Y\.S\.supervised the study, provided advice on the method and experimental design, and reviewed the draft manuscript\.T\.M\.provided advice on the method and experimental design and reviewed the draft manuscript\.M\.A\.revised the manuscript and provided overall project supervision, including resources, funding, and institutional support\.

## References

- \[1\]A\. Abdel\-Rehim, H\. Zenil, O\. Orhobor, M\. Fisher, R\. J\. Collins, E\. Bourne, G\. W\. Fearnley, E\. Tate, H\. X\. Smith, L\. N\. Soldatova, and R\. King\(2025\)Scientific hypothesis generation by large language models: laboratory validation in breast cancer treatment\.J\. R\. Soc\. Interface\.22\(227\),pp\. 20240674\.Cited by:[§1](https://arxiv.org/html/2606.14215#S1.p1.1)\.
- \[2\]A\. Agostinelli, T\. I\. Denk, Z\. Borsos, J\. Engel, M\. Verzetti, A\. Caillon, Q\. Huang, A\. Jansen, A\. Roberts, M\. Tagliasacchi, M\. Sharifi, N\. Zeghidour, and C\. FrankMusicLM: Generating Music From Text\.Note:Preprint at[https://doi\.org/10\.48550/arXiv\.2301\.11325](https://doi.org/10.48550/arXiv.2301.11325)\(2023\)Cited by:[§1](https://arxiv.org/html/2606.14215#S1.p1.1)\.
- \[3\]\(2024\)Spglib: a software library for crystal symmetry search\.Sci\. Technol\. Adv\. Mater\., Meth\.4\(1\),pp\. 2384822–2384836\.Cited by:[§3](https://arxiv.org/html/2606.14215#S3.p2.1)\.
- \[4\]J\. Austin, D\. D\. Johnson, J\. Ho, D\. Tarlow, and R\. Van Den Berg\(2021\)Structured denoising diffusion models in discrete state\-spaces\.Advances in Neural Information Processing Systems \(NeurIPS 2021\)34,pp\. 17981–17993\.Cited by:[§4\.2\.1](https://arxiv.org/html/2606.14215#S4.SS2.SSS1.p1.1)\.
- \[5\]B\. Burger, P\. M\. Maffettone, V\. V\. Gusev, C\. M\. Aitchison, Y\. Bai, X\. Wang, X\. Li, B\. M\. Alston, B\. Li, R\. Clowes, N\. Rankin, B\. Harris, R\. S\. Sprick, and A\. I\. Cooper\(2020\)A mobile robotic chemist\.Nature583,pp\. 237–241\.Cited by:[§3](https://arxiv.org/html/2606.14215#S3.p3.1)\.
- \[6\]H\. Chang, H\. Zhang, J\. Barber, A\. Maschinot, J\. Lezama, L\. Jiang, M\. Yang, K\. P\. Murphy, W\. T\. Freeman, M\. Rubinstein, Y\. Li, and D\. Krishnan\(2023\)Muse: text\-to\-image generation via masked generative transformers\.InProceedings of the 40th International Conference on Machine Learning \(ICML 2023\),Vol\.202,pp\. 4055–4075\.Cited by:[§2\.4](https://arxiv.org/html/2606.14215#S2.SS4.p1.1)\.
- \[7\]C\. Chen, W\. Ye, Y\. Zuo, C\. Zheng, and S\. P\. Ong\(2019\)Graph networks as a universal machine learning framework for molecules and crystals\.Chem\. Mater\.31,pp\. 9\.External Links:ISSN 1520\-5002Cited by:[§2\.2](https://arxiv.org/html/2606.14215#S2.SS2.p2.1),[§4\.2\.2](https://arxiv.org/html/2606.14215#S4.SS2.SSS2.p1.1)\.
- \[8\]K\. Cobbe, V\. Kosaraju, M\. Bavarian, M\. Chen, H\. Jun, L\. Kaiser, M\. Plappert, J\. Tworek, J\. Hilton, R\. Nakano, C\. Hesse, and J\. SchulmanTraining verifiers to solve math word problems\.Note:Preprint at[https://doi\.org/10\.48550/arXiv\.2110\.14168](https://doi.org/10.48550/arXiv.2110.14168)\(2021\)Cited by:[§2\.1](https://arxiv.org/html/2606.14215#S2.SS1.p2.2)\.
- \[9\]G\. Couairon, J\. Verbeek, H\. Schwenk, and M\. CordDiffEdit: diffusion\-based semantic image editing with mask guidance\.Note:Preprint at[https://doi\.org/10\.48550/arXiv\.2210\.11427](https://doi.org/10.48550/arXiv.2210.11427)\(2022\)Cited by:[§2\.4](https://arxiv.org/html/2606.14215#S2.SS4.p1.1),[§4\.1\.2](https://arxiv.org/html/2606.14215#S4.SS1.SSS2.p1.5)\.
- \[10\]D\. W\. Davies, K\. T\. Butler, A\. J\. Jackson, J\. M\. Skelton, K\. Morita, and A\. Walsh\(2019\)SMACT: Semiconducting materials by analogy and chemical theory\.Journal of Open Source Software4\(38\),pp\. 1361\.Cited by:[§4\.1\.1](https://arxiv.org/html/2606.14215#S4.SS1.SSS1.p1.1)\.
- \[11\]F\. Ekström Kelvinius, O\. B\. Andersson, A\. S\. Parackal, D\. Qian, R\. Armiento, and F\. Lindsten\(2025\)WyckoffDiff – a generative diffusion model for crystal symmetry\.InProceedings of the 42nd International Conference on Machine Learning \(ICML 2025\),Proceedings of Machine Learning Research\.Cited by:[§3](https://arxiv.org/html/2606.14215#S3.p2.1)\.
- \[12\]U\. Englert\(2013\)Symmetry relationships between crystal structures\. applications of crystallographic group theory in crystal chemistry\. by ulrich müller\.\.Angew\. Chem\., Int\. Ed\.52\(46\),pp\. 11973–11973\.External Links:ISSN 1521\-3773Cited by:[§3](https://arxiv.org/html/2606.14215#S3.p2.1)\.
- \[13\]A\. M\. Ganose and A\. Jain\(2019\)Robocrystallographer: automated crystal structure text descriptions and analysis\.MRS Commun\.9\(3\),pp\. 874–881\.Cited by:[§2\.1](https://arxiv.org/html/2606.14215#S2.SS1.p1.1),[§3](https://arxiv.org/html/2606.14215#S3.p1.1),[§4\.1\.1](https://arxiv.org/html/2606.14215#S4.SS1.SSS1.p1.1)\.
- \[14\]Gemini TeamGemini: a family of highly capable multimodal models\.Note:Preprint at[https://doi\.org/10\.48550/arXiv\.2312\.11805](https://doi.org/10.48550/arXiv.2312.11805)\(2023\)Cited by:[§1](https://arxiv.org/html/2606.14215#S1.p1.1)\.
- \[15\]T\. Gupta, M\. Zaki, N\. M\. A\. Krishnan, and Mausam\(2022\)MatSciBERT: A materials domain language model for text mining and information extraction\.npj Comput\. Mater\.8,pp\. 102\.Cited by:[§4\.2\.1](https://arxiv.org/html/2606.14215#S4.SS2.SSS1.p2.1)\.
- \[16\]J\. Hessel, A\. Holtzman, M\. Forbes, R\. Le Bras, and Y\. Choi\(2021\)CLIPScore: a reference\-free evaluation metric for image captioning\.InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing \(EMNLP 2021\),pp\. 7514–7528\.Cited by:[§4\.1\.1](https://arxiv.org/html/2606.14215#S4.SS1.SSS1.p1.1)\.
- \[17\]J\. Ho, A\. Jain, and P\. Abbeel\(2020\)Denoising diffusion probabilistic models\.InAdvances in Neural Information Processing Systems \(NeurIPS 2020\),Cited by:[§4\.2\.1](https://arxiv.org/html/2606.14215#S4.SS2.SSS1.p1.1)\.
- \[18\]J\. Ho and T\. SalimansClassifier\-free diffusion guidance\.Note:Preprint at[https://doi\.org/10\.48550/arXiv\.2207\.12598](https://doi.org/10.48550/arXiv.2207.12598)\(2022\)Cited by:[§4\.2\.1](https://arxiv.org/html/2606.14215#S4.SS2.SSS1.p3.1)\.
- \[19\]P\. Hohenberg and W\. Kohn\(1964\)Inhomogeneous electron gas\.Phys\. Rev\.136,pp\. B864–B871\.Cited by:[§2\.2](https://arxiv.org/html/2606.14215#S2.SS2.p2.1)\.
- \[20\]M\. K\. Horton, P\. Huck, R\. X\. Yang, J\. M\. Munro, S\. Dwaraknath, A\. M\. Ganose, R\. S\. Kingsbury, M\. Wen, J\. X\. Shen, T\. S\. Mathis, A\. D\. Kaplan, K\. Berket, J\. Riebesell, J\. George, A\. S\. Rosen, E\. W\. C\. Spotte\-Smith, M\. J\. McDermott, O\. A\. Cohen, A\. Dunn, M\. C\. Kuner, G\. Rignanese, G\. Petretto, D\. Waroquiers, S\. M\. Griffin, J\. B\. Neaton, D\. C\. Chrzan, M\. Asta, G\. Hautier, S\. Cholia, G\. Ceder, S\. P\. Ong, A\. Jain, and K\. A\. Persson\(2025\)Accelerated data\-driven materials science with the materials project\.Nat\. Mater\.24,pp\. 1522–1532\.Cited by:[§2\.2](https://arxiv.org/html/2606.14215#S2.SS2.p7.1),[§4\.2\.2](https://arxiv.org/html/2606.14215#S4.SS2.SSS2.p1.1)\.
- \[21\]Y\. Ito, T\. Taniai, R\. Igarashi, Y\. Ushiku, and K\. Ono\(2025\)Rethinking the role of frames for SE\(3\)\-invariant crystal structure modeling\.InThe Thirteenth International Conference on Learning Representations \(ICLR 2025\),Cited by:[§2\.2](https://arxiv.org/html/2606.14215#S2.SS2.p2.1),[§2\.2](https://arxiv.org/html/2606.14215#S2.SS2.p6.1)\.
- \[22\]S\. Jia, A\. Varma, P\. Manivannan, D\. Chayapathy, and V\. Fung\(2025\)Benchmarking text representations for crystal structure generation with large language models\.InAI for Accelerated Materials Design \- ICLR 2025,Cited by:[§3](https://arxiv.org/html/2606.14215#S3.p1.1)\.
- \[23\]R\. Jiao, W\. Huang, P\. Lin, J\. Han, P\. Chen, Y\. Lu, and Y\. Liu\(2023\)Crystal structure prediction by joint equivariant diffusion\.InThirty\-seventh Conference on Neural Information Processing Systems \(NeurIPS 2023\),Cited by:[§4\.2\.1](https://arxiv.org/html/2606.14215#S4.SS2.SSS1.p3.1)\.
- \[24\]R\. Jiao, W\. Huang, Y\. Liu, D\. Zhao, and Y\. Liu\(2024\)Space group constrained crystal generation\.InThe Twelfth International Conference on Learning Representations \(ICLR 2024\),Cited by:[§3](https://arxiv.org/html/2606.14215#S3.p2.1),[§4\.1\.1](https://arxiv.org/html/2606.14215#S4.SS1.SSS1.p1.1)\.
- \[25\]J\. Jumper, R\. Evans, A\. Pritzel, T\. Green, M\. Figurnov, O\. Ronneberger, K\. Tunyasuvunakool, R\. Bates, A\. Žídek, A\. Potapenko, A\. Bridgland, C\. Meyer, S\. A\. A\. Kohl, A\. J\. Ballard, A\. Cowie, B\. Romera\-Paredes, S\. Nikolov, R\. Jain, J\. Adler, T\. Back, S\. Petersen, D\. Reiman, E\. Clancy, M\. Zielinski, M\. Steinegger, M\. Pacholska, T\. Berghammer, S\. Bodenstein, D\. Silver, O\. Vinyals, A\. W\. Senior, K\. Kavukcuoglu, P\. Kohli, and D\. Hassabis\(2021\)Highly accurate protein structure prediction with alphafold\.Nature596\(7873\),pp\. 583–589\.Cited by:[§1](https://arxiv.org/html/2606.14215#S1.p1.1)\.
- \[26\]B\. Kawar, S\. Zada, O\. Lang, O\. Tov, H\. Chang, T\. Dekel, I\. Mosseri, and M\. Irani\(2023\)Imagic: text\-based real image editing with diffusion models\.InConference on Computer Vision and Pattern Recognition 2023 \(CVPR 2023\),Cited by:[§2\.4](https://arxiv.org/html/2606.14215#S2.SS4.p1.1)\.
- \[27\]N\. Kazeev, W\. Nong, I\. Romanov, R\. Zhu, A\. Ustyuzhanin, S\. Yamazaki, and K\. HippalgaonkarWyckoff transformer: generation of symmetric crystals\.Note:Preprint at[https://doi\.org/10\.48550/arXiv\.2503\.02407](https://doi.org/10.48550/arXiv.2503.02407)\(2025\)Cited by:[§3](https://arxiv.org/html/2606.14215#S3.p2.1)\.
- \[28\]D\. P\. Kingma and J\. Ba\(2015\)Adam: A Method for Stochastic Optimization\.InThe Third International Conference on Learning Representations \(ICLR 2015\),Cited by:[§4\.2\.2](https://arxiv.org/html/2606.14215#S4.SS2.SSS2.p3.2)\.
- \[29\]W\. Kohn and L\. J\. Sham\(1965\)Self\-consistent equations including exchange and correlation effects\.Phys\. Rev\.140,pp\. A1133–A1138\.Cited by:[§2\.2](https://arxiv.org/html/2606.14215#S2.SS2.p2.1)\.
- \[30\]D\. Levy, S\. S\. Panigrahi, S\. Kaba, Q\. Zhu, K\. L\. K\. Lee, M\. Galkin, S\. Miret, and S\. Ravanbakhsh\(2025\)SymmCD: symmetry\-preserving crystal generation with diffusion models\.InThe Thirteenth International Conference on Learning Representations \(ICLR 2025\),Cited by:[§3](https://arxiv.org/html/2606.14215#S3.p2.1),[§4\.1\.1](https://arxiv.org/html/2606.14215#S4.SS1.SSS1.p1.1)\.
- \[31\]A\. Lewkowycz, A\. J\. Andreassen, D\. Dohan, E\. Dyer, H\. Michalewski, V\. V\. Ramasesh, A\. Slone, C\. Anil, I\. Schlag, T\. Gutman\-Solo, Y\. Wu, B\. Neyshabur, G\. Gur\-Ari, and V\. Misra\(2022\)Solving quantitative reasoning problems with language models\.InAdvances in Neural Information Processing Systems \(NeurIPS 2022\),Cited by:[§1](https://arxiv.org/html/2606.14215#S1.p1.1)\.
- \[32\]Y\. Liu, K\. Zhang, Y\. Li, Z\. Yan, C\. Gao, R\. Chen, Z\. Yuan, Y\. Huang, H\. Sun, J\. Gao, L\. He, and L\. SunSora: A review on background, technology, limitations, and opportunities of large vision models\.Note:Preprint at[https://doi\.org/10\.48550/arXiv\.2402\.17177](https://doi.org/10.48550/arXiv.2402.17177)\(2024\)Cited by:[§1](https://arxiv.org/html/2606.14215#S1.p1.1)\.
- \[33\]C\. Lu, C\. Lu, R\. T\. Lange, J\. Foerster, J\. Clune, and D\. HaThe ai scientist: towards fully automated open\-ended scientific discovery\.Note:Preprint at[https://doi\.org/10\.48550/arXiv\.2408\.06292](https://doi.org/10.48550/arXiv.2408.06292)\(2024\)Cited by:[§1](https://arxiv.org/html/2606.14215#S1.p1.1)\.
- \[34\]A\. Madaan, N\. Tandon, P\. Gupta, S\. Hallinan, L\. Gao, S\. Wiegreffe, U\. Alon, N\. Dziri, S\. Prabhumoye, Y\. Yang, S\. Gupta, B\. P\. Majumder, K\. Hermann, S\. Welleck, A\. Yazdanbakhsh, and P\. Clark\(2023\)Self\-refine: iterative refinement with self\-feedback\.InThirty\-seventh Conference on Neural Information Processing Systems \(NeurIPS 2023\),Cited by:[§2\.2](https://arxiv.org/html/2606.14215#S2.SS2.p4.1),[§3](https://arxiv.org/html/2606.14215#S3.p3.1)\.
- \[35\]A\. Merchant, S\. Batzner, S\. S\. Schoenholz, M\. Aykol, G\. Cheon, and E\. D\. Cubuk\(2023\)Scaling deep learning for materials discovery\.Nature624,pp\. 80–85\.Cited by:[§1](https://arxiv.org/html/2606.14215#S1.p1.1)\.
- \[36\]K\. Momma and F\. Izumi\(2011\)VESTA3for three\-dimensional visualization of crystal, volumetric and morphology data\.J\. Appl\. Crystallogr\.44\(6\),pp\. 1272–1276\.Cited by:[Figure 4](https://arxiv.org/html/2606.14215#S2.F4),[Figure 5](https://arxiv.org/html/2606.14215#S2.F5)\.
- \[37\]R\. Nakano, J\. Hilton, S\. Balaji, J\. Wu, L\. Ouyang, C\. Kim, C\. Hesse, S\. Jain, V\. Kosaraju, W\. Saunders, X\. Jiang, K\. Cobbe, T\. Eloundou, G\. Krueger, K\. Button, M\. Knight, B\. Chess, and J\. SchulmanWebGPT: browser\-assisted question\-answering with human feedback\.Note:Preprint at[https://doi\.org/10\.48550/arXiv\.2112\.09332](https://doi.org/10.48550/arXiv.2112.09332)\(2021\)Cited by:[§2\.1](https://arxiv.org/html/2606.14215#S2.SS1.p2.2)\.
- \[38\]S\. P\. Ong, W\. D\. Richards, A\. Jain, G\. Hautier, M\. Kocher, S\. Cholia, D\. Gunter, V\. L\. Chevrier, K\. A\. Persson, and G\. Ceder\(2013\)Python materials genomics \(pymatgen\): a robust, open\-source python library for materials analysis\.Comput\. Mater\. Sci\.68,pp\. 314–319\.Cited by:[§2\.2](https://arxiv.org/html/2606.14215#S2.SS2.p7.1)\.
- \[39\]Cited by:[§1](https://arxiv.org/html/2606.14215#S1.p1.1)\.
- \[40\]H\. Park, A\. Onwuli, and A\. Walsh\(2025\)Exploration of crystal chemical space using text\-guided generative artificial intelligence\.Nat\. Commun\.16,pp\. 4379\.Cited by:[item a](https://arxiv.org/html/2606.14215#S1.I1.i1.p1.1),[item b](https://arxiv.org/html/2606.14215#S1.I1.i2.p1.2),[§1](https://arxiv.org/html/2606.14215#S1.p2.1),[§1](https://arxiv.org/html/2606.14215#S1.p3.1),[§2\.1](https://arxiv.org/html/2606.14215#S2.SS1.p1.1),[§3](https://arxiv.org/html/2606.14215#S3.p2.1),[§4\.1\.1](https://arxiv.org/html/2606.14215#S4.SS1.SSS1.p1.1),[§4\.2](https://arxiv.org/html/2606.14215#S4.SS2.p1.1)\.
- \[41\]R\. Rombach, A\. Blattmann, D\. Lorenz, P\. Esser, and B\. OmmerHigh\-resolution image synthesis with latent diffusion models\.Note:Preprint at[https://doi\.org/10\.48550/arXiv\.2112\.10752](https://doi.org/10.48550/arXiv.2112.10752)\(2021\)Cited by:[§1](https://arxiv.org/html/2606.14215#S1.p1.1)\.
- \[42\]J\. M\. Stokes, K\. Yang, K\. Swanson, W\. Jin, A\. Cubillos\-Ruiz, N\. M\. Donghia, C\. R\. MacNair, S\. French, L\. A\. Carfrae, Z\. Bloom\-Ackermann, V\. M\. Tran, A\. Chiappino\-Pepe, A\. H\. Badran, I\. W\. Andrews, E\. J\. Chory, G\. M\. Church, E\. D\. Brown, T\. S\. Jaakkola, R\. Barzilay, and J\. J\. Collins\(2020\)A deep learning approach to antibiotic discovery\.Cell180\(4\),pp\. 688–702\.e13\.Cited by:[§1](https://arxiv.org/html/2606.14215#S1.p1.1)\.
- \[43\]K\. Swanson, W\. Wu, N\. L\. Bulaong, J\. E\. Pak, and J\. Zou\(2025\)The virtual lab of ai agents designs new sars\-cov\-2 nanobodies\.Nature646\(8085\),pp\. 716–723\.Cited by:[§1](https://arxiv.org/html/2606.14215#S1.p1.1)\.
- \[44\]G\.A\. Tinelli and D\.F\. Holcomb\(1978\)NMR and structural properties of csau and rbau\.J\. Solid State Chem\.25\(2\),pp\. 157–168\.Cited by:[§2\.4](https://arxiv.org/html/2606.14215#S2.SS4.p3.1)\.
- \[45\]G\. Tom, S\. P\. Schmid, S\. G\. Baird, Y\. Cao, K\. Darvish, H\. Hao, S\. Lo, S\. Pablo\-García, E\. M\. Rajaonson, M\. Skreta, N\. Yoshikawa, S\. Corapi, G\. D\. Akkoc, F\. Strieth\-Kalthoff, M\. Seifrid, and A\. Aspuru\-Guzik\(2024\)Self\-driving laboratories for chemistry and materials science\.Chem\. Rev\.124,pp\. 16\.Cited by:[§3](https://arxiv.org/html/2606.14215#S3.p3.1)\.
- \[46\]H\. Wang, T\. Fu, Y\. Du, W\. Gao, K\. Huang, Z\. Liu, P\. Chandak, S\. Liu, P\. Van Katwyk, A\. Deac, A\. Anandkumar, K\. Bergen, C\. P\. Gomes, S\. Ho, P\. Kohli, J\. Lasenby, J\. Leskovec, T\. Liu, A\. Manrai, D\. Marks, B\. Ramsundar, L\. Song, J\. Sun, J\. Tang, P\. Veličković, M\. Welling, L\. Zhang, C\. W\. Coley, Y\. Bengio, and M\. Zitnik\(2023\)Scientific discovery in the age of artificial intelligence\.Nature620,pp\. 47–60\.Cited by:[§1](https://arxiv.org/html/2606.14215#S1.p1.1)\.
- \[47\]I\. R\. W\. G\. Wyckoff\(1921\)The crystal structures of the alkali halides\.J\. Wash\. Acad\. Sci\.11\(18\),pp\. 429–434\.Cited by:[§2\.4](https://arxiv.org/html/2606.14215#S2.SS4.p3.1)\.
- \[48\]H\. Xiao, R\. Li, X\. Shi, Y\. Chen, L\. Zhu, X\. Chen, and L\. Wang\(2023\)An invertible, invariant crystal representation for inverse design of solid\-state materials using generative deep learning\.Nat\. Commun\.14,pp\. 7027\.Cited by:[§3](https://arxiv.org/html/2606.14215#S3.p1.1)\.
- \[49\]T\. Xie, X\. Fu, O\. Ganea, R\. Barzilay, and T\. JaakkolaCrystal diffusion variational autoencoder for periodic material generation\.Note:Preprint at[https://doi\.org/10\.48550/arXiv\.2110\.06197](https://doi.org/10.48550/arXiv.2110.06197)\(2021\)Cited by:[§4\.1\.1](https://arxiv.org/html/2606.14215#S4.SS1.SSS1.p1.1)\.
- \[50\]A\. Yang, A\. Li, B\. Yang, B\. Zhang, B\. Hui, B\. Zheng, B\. Yu, C\. Gao, C\. Huang, C\. Lv, C\. Zheng, D\. Liu, F\. Zhou, F\. Huang, F\. Hu, H\. Ge, H\. Wei, H\. Lin, J\. Tang, J\. Yang, J\. Tu, J\. Zhang, J\. Yang, J\. Yang, J\. Zhou, J\. Zhou, J\. Lin, K\. Dang, K\. Bao, K\. Yang, L\. Yu, L\. Deng, M\. Li, M\. Xue, M\. Li, P\. Zhang, P\. Wang, Q\. Zhu, R\. Men, R\. Gao, S\. Liu, S\. Luo, T\. Li, T\. Tang, W\. Yin, X\. Ren, X\. Wang, X\. Zhang, X\. Ren, Y\. Fan, Y\. Su, Y\. Zhang, Y\. Zhang, Y\. Wan, Y\. Liu, Z\. Wang, Z\. Cui, Z\. Zhang, Z\. Zhou, and Z\. QiuQwen3 technical report\.Note:Preprint at[https://doi\.org/10\.48550/arXiv\.2505\.09388](https://doi.org/10.48550/arXiv.2505.09388)\(2025\)Cited by:[§1](https://arxiv.org/html/2606.14215#S1.p1.1),[§2\.2](https://arxiv.org/html/2606.14215#S2.SS2.p2.1)\.
- \[51\]S\. Yang, S\. Batzner, R\. Gao, M\. Aykol, A\. L\. Gaunt, B\. McMorrow, D\. J\. Rezende, D\. Schuurmans, I\. Mordatch, and E\. D\. Cubuk\(2024\)Generative hierarchical materials search\.InThe Thirty\-eighth Annual Conference on Neural Information Processing Systems \(NeurIPS 2024\),Cited by:[item b](https://arxiv.org/html/2606.14215#S1.I1.i2.p1.2),[§1](https://arxiv.org/html/2606.14215#S1.p2.1),[§1](https://arxiv.org/html/2606.14215#S1.p3.1)\.
- \[52\]C\. Zeni, R\. Pinsler, D\. Zügner, A\. Fowler, M\. Horton, X\. Fu, Z\. Wang, A\. Shysheya, J\. Crabbé, S\. Ueda, R\. Sordillo, L\. Sun, J\. Smith, B\. Nguyen, H\. Schulz, S\. Lewis, C\. Huang, Z\. Lu, Y\. Zhou, H\. Yang, H\. Hao, J\. Li, C\. Yang, W\. Li, R\. Tomioka, and T\. Xie\(2025\)A generative model for inorganic materials design\.Nature639\(8055\),pp\. 624–632\.Cited by:[§1](https://arxiv.org/html/2606.14215#S1.p1.1)\.
- \[53\]R\. Zhu, W\. Nong, S\. Yamazaki, and K\. Hippalgaonkar\(2024\)WyCryst: wyckoff inorganic crystal generator framework\.Matter7\(10\),pp\. 3469–3488\.Cited by:[§3](https://arxiv.org/html/2606.14215#S3.p2.1)\.

Appendix

## Appendix APrompt for feedback

We used the following prompt for feedback\. Although this step is intended to be performed by an expert, we conducted it using an LLM for quantitative evaluation\.

Prompt for feedbackYou are a scientist specializing in materials science with expertise in designing crystal structures\.I want to create the following material: “promptp1p\_\{1\}”\.I will provide the crystal structure and its property, so please give me advice on how to update the crystal structure\.

## Appendix BPrompt for initial structural description from property

The following prompt was adopted forLLM\_interpret, which generates structural descriptions of crystal structures from user input\. Since the crystal structure generation model requires a predefined number of atoms, we instructed the LLM to output this information using the\-\-n\_atomstag\.

Prompt for LLM\_interpretYou are a scientist specializing in materials science with expertise in designing crystal structures\.Describe the structural features typically associated with materials that exhibit the specified physical property\. Provide detailed information on crystal symmetry, lattice type, bonding characteristics, coordination environments, and common structural motifs\. Present the output in \*\*Markdown format\*\*\.\*\*Instructions:\*\*\- Follow the example format exactly\.\- Do not use general chemical formulas \(e\.g\., ABO3\)\.\- Responses that do not follow the format will be considered incorrect\.\- Carefully consider what type of crystal structure is necessary to realize the given property\.\- At the end of the description, return the number of atoms in the unit cell using the tag format \- \-n\_atoms=integer\.\- Assume that I will attempt to synthesize the proposed structure and evaluate its physical properties\.\*\*Example\*\*BaTiO3adopts a cubic perovskite structure and crystallizes in the cubic space group Pm\-3m\. Ba2\+is coordinated by twelve equivalent O2\-atoms to form BaO12cuboctahedra, which share corners with twelve equivalent BaO12cuboctahedra, faces with six equivalent BaO12cuboctahedra, and faces with eight equivalent TiO6octahedra\. All Ba–O bond lengths are 2\.83 Å\. Ti4\+is coordinated by six equivalent O2\-atoms to form TiO6octahedra, which share corners with six equivalent TiO6octahedra and faces with eight equivalent BaO12cuboctahedra\. The corner\-sharing TiO6octahedra are not tilted\. All Ti–O bond lengths are 2\.00 Å\. Each O2\-is bonded in a distorted linear coordination to four equivalent Ba2\+and two equivalent Ti4\+atoms\.\- \-n\_atoms=20

## Appendix CPrompt for refining structural descriptions

To refine the structural description, we used the following prompt forLLM\_refine\. Theitalicized partsshould be replaced with the corresponding descriptions of the generated crystal structures\.

Prompt for LLM\_refineThe synthesis of the crystal structure you proposed yielded “Formula”, whose structure is as follows: “Robocrystallographer’s descriptiondkgend\_\{k\}^\{\\text\{gen\}\}\.”–“User Instructionpkfb\.p\_\{k\}^\{\\text\{fb\}\}\.”–Can you refine the crystal structure? Please respond full crystal structure description in accordance with the format specified in the instructions\. Do not return any other things\.

## Appendix DEffect of diffusion strength

We examined the effect of the diffusion strength parameterα\\alphaintroduced in Sec\.[4\.1](https://arxiv.org/html/2606.14215#S4.SS1)by measuring thesuccess rate, defined as the proportion of cases in which at least one valid crystal structure appeared in each of the five iterations\. The validity of each structure was evaluated using the same method described in line 6 of Algorithm[1](https://arxiv.org/html/2606.14215#alg1)\. In each iteration, 30 crystal structures were generated\. For each value of the strength parameter, we conducted 300 trials for stability\-focused generation\.

Figure[A\.1](https://arxiv.org/html/2606.14215#A4.F1)\(a\) shows the dependence of the success rate on the diffusion strength, and Panel \(b\) illustrates the evolution of the average formation energy through the iterations\. Figure[A\.1](https://arxiv.org/html/2606.14215#A4.F1)\(a\) shows a clear tendency for the success rate to decrease as the diffusion strengthα\\alphaincreases\. This trend can be attributed to the reduced influence of structural information from the previous step at each iteration asα\\alphaincreases, which leads to greater structural variation and makes it more difficult to pass the validity check\. In contrast, the mean values of the formation energy tended to improve with increasingα\\alpha, as shown in Fig\.[A\.1](https://arxiv.org/html/2606.14215#A4.F1)\(b\)\. These results suggest that weakening the dependence on the previous structure allows the exploration of new favorable structures, while strengthening the dependence steers the process toward exploitation\. Therefore, the value of strengthα\\alphashould be carefully tuned according to the goal, whether one aims to make only minor structural modifications or to discover entirely new structures\.

![Refer to caption](https://arxiv.org/html/2606.14215v1/x6.png)Figure A\.1:Comparison of generation performance under different diffusion strengths\.Panels \(a\) and \(b\) show the effects of varying diffusion strengthα\\alphaon the generation success rate, and mean formation energy, respectively\. \(a\) The generation success rate decreases asα\\alphaincreases, indicating that weaker influence from the previous step leads to greater structural variation and makes it more difficult to pass the validity check\. \(b\) Mean formation energy values improve with largerα\\alpha, suggesting that reduced dependence on previous structures promotes the discovery of more favorable configurations\.
LapidaryEngine: Fully Conversational Crystal Generation

Similar Articles

@xbresson: How do we design materials with AI? Excited to introduce Crys-JEPA, a new generative technique in collaboration w/ @liu…

Crystal

CrystalReasoner: Reasoning and RL for Property-Conditioned Crystal Structure Generation

Advanced audio dialog and generation with Gemini 2.5

@paulabartabajo_: Advice for AI engineers If you're building voice agents, stop wiring up 3 separate models, for audio-to-text, text-to-a…

Submit Feedback

Similar Articles

@xbresson: How do we design materials with AI? Excited to introduce Crys-JEPA, a new generative technique in collaboration w/ @liu…
CrystalReasoner: Reasoning and RL for Property-Conditioned Crystal Structure Generation
Advanced audio dialog and generation with Gemini 2.5
@paulabartabajo_: Advice for AI engineers If you're building voice agents, stop wiring up 3 separate models, for audio-to-text, text-to-a…