Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
SubscribeTensorNet: Cartesian Tensor Representations for Efficient Learning of Molecular Potentials
The development of efficient machine learning models for molecular systems representation is becoming crucial in scientific research. We introduce TensorNet, an innovative O(3)-equivariant message-passing neural network architecture that leverages Cartesian tensor representations. By using Cartesian tensor atomic embeddings, feature mixing is simplified through matrix product operations. Furthermore, the cost-effective decomposition of these tensors into rotation group irreducible representations allows for the separate processing of scalars, vectors, and tensors when necessary. Compared to higher-rank spherical tensor models, TensorNet demonstrates state-of-the-art performance with significantly fewer parameters. For small molecule potential energies, this can be achieved even with a single interaction layer. As a result of all these properties, the model's computational cost is substantially decreased. Moreover, the accurate prediction of vector and tensor molecular quantities on top of potential energies and forces is possible. In summary, TensorNet's framework opens up a new space for the design of state-of-the-art equivariant models.
Tensor Gaussian Process with Contraction for Multi-Channel Imaging Analysis
Multi-channel imaging data is a prevalent data format in scientific fields such as astronomy and biology. The structured information and the high dimensionality of these 3-D tensor data makes the analysis an intriguing but challenging topic for statisticians and practitioners. The low-rank scalar-on-tensor regression model, in particular, has received widespread attention and has been re-formulated as a tensor Gaussian Process (Tensor-GP) model with multi-linear kernel in Yu et al. (2018). In this paper, we extend the Tensor-GP model by integrating a dimensionality reduction technique, called tensor contraction, with a Tensor-GP for a scalar-on-tensor regression task with multi-channel imaging data. This is motivated by the solar flare forecasting problem with high dimensional multi-channel imaging data. We first estimate a latent, reduced-size tensor for each data tensor and then apply a multi-linear Tensor-GP on the latent tensor data for prediction. We introduce an anisotropic total-variation regularization when conducting the tensor contraction to obtain a sparse and smooth latent tensor. We then propose an alternating proximal gradient descent algorithm for estimation. We validate our approach via extensive simulation studies and applying it to the solar flare forecasting problem.
Deep Tensor Network
In this paper, we delve into the foundational principles of tensor categories, harnessing the universal property of the tensor product to pioneer novel methodologies in deep network architectures. Our primary contribution is the introduction of the Tensor Attention and Tensor Interaction Mechanism, a groundbreaking approach that leverages the tensor category to enhance the computational efficiency and the expressiveness of deep networks, and can even be generalized into the quantum realm.
Energy-conserving equivariant GNN for elasticity of lattice architected metamaterials
Lattices are architected metamaterials whose properties strongly depend on their geometrical design. The analogy between lattices and graphs enables the use of graph neural networks (GNNs) as a faster surrogate model compared to traditional methods such as finite element modelling. In this work, we generate a big dataset of structure-property relationships for strut-based lattices. The dataset is made available to the community which can fuel the development of methods anchored in physical principles for the fitting of fourth-order tensors. In addition, we present a higher-order GNN model trained on this dataset. The key features of the model are (i) SE(3) equivariance, and (ii) consistency with the thermodynamic law of conservation of energy. We compare the model to non-equivariant models based on a number of error metrics and demonstrate its benefits in terms of predictive performance and reduced training requirements. Finally, we demonstrate an example application of the model to an architected material design task. The methods which we developed are applicable to fourth-order tensors beyond elasticity such as piezo-optical tensor etc.
Adaptive Learning of Tensor Network Structures
Tensor Networks (TN) offer a powerful framework to efficiently represent very high-dimensional objects. TN have recently shown their potential for machine learning applications and offer a unifying view of common tensor decomposition models such as Tucker, tensor train (TT) and tensor ring (TR). However, identifying the best tensor network structure from data for a given task is challenging. In this work, we leverage the TN formalism to develop a generic and efficient adaptive algorithm to jointly learn the structure and the parameters of a TN from data. Our method is based on a simple greedy approach starting from a rank one tensor and successively identifying the most promising tensor network edges for small rank increments. Our algorithm can adaptively identify TN structures with small number of parameters that effectively optimize any differentiable objective function. Experiments on tensor decomposition, tensor completion and model compression tasks demonstrate the effectiveness of the proposed algorithm. In particular, our method outperforms the state-of-the-art evolutionary topology search [Li and Sun, 2020] for tensor decomposition of images (while being orders of magnitude faster) and finds efficient tensor network structures to compress neural networks outperforming popular TT based approaches [Novikov et al., 2015].
Strivec: Sparse Tri-Vector Radiance Fields
We propose Strivec, a novel neural representation that models a 3D scene as a radiance field with sparsely distributed and compactly factorized local tensor feature grids. Our approach leverages tensor decomposition, following the recent work TensoRF, to model the tensor grids. In contrast to TensoRF which uses a global tensor and focuses on their vector-matrix decomposition, we propose to utilize a cloud of local tensors and apply the classic CANDECOMP/PARAFAC (CP) decomposition to factorize each tensor into triple vectors that express local feature distributions along spatial axes and compactly encode a local neural field. We also apply multi-scale tensor grids to discover the geometry and appearance commonalities and exploit spatial coherence with the tri-vector factorization at multiple local scales. The final radiance field properties are regressed by aggregating neural features from multiple local tensors across all scales. Our tri-vector tensors are sparsely distributed around the actual scene surface, discovered by a fast coarse reconstruction, leveraging the sparsity of a 3D scene. We demonstrate that our model can achieve better rendering quality while using significantly fewer parameters than previous methods, including TensoRF and Instant-NGP.
Stacked tensorial neural networks for reduced-order modeling of a parametric partial differential equation
Tensorial neural networks (TNNs) combine the successes of multilinear algebra with those of deep learning to enable extremely efficient reduced-order models of high-dimensional problems. Here, I describe a deep neural network architecture that fuses multiple TNNs into a larger network, intended to solve a broader class of problems than a single TNN. I evaluate this architecture, referred to as a "stacked tensorial neural network" (STNN), on a parametric PDE with three independent variables and three parameters. The three parameters correspond to one PDE coefficient and two quantities describing the domain geometry. The STNN provides an accurate reduced-order description of the solution manifold over a wide range of parameters. There is also evidence of meaningful generalization to parameter values outside its training data. Finally, while the STNN architecture is relatively simple and problem agnostic, it can be regularized to incorporate problem-specific features like symmetries and physical modeling assumptions.
In-Sensor Radio Frequency Computing for Energy-Efficient Intelligent Radar
Radio Frequency Neural Networks (RFNNs) have demonstrated advantages in realizing intelligent applications across various domains. However, as the model size of deep neural networks rapidly increases, implementing large-scale RFNN in practice requires an extensive number of RF interferometers and consumes a substantial amount of energy. To address this challenge, we propose to utilize low-rank decomposition to transform a large-scale RFNN into a compact RFNN while almost preserving its accuracy. Specifically, we develop a Tensor-Train RFNN (TT-RFNN) where each layer comprises a sequence of low-rank third-order tensors, leading to a notable reduction in parameter count, thereby optimizing RF interferometer utilization in comparison to the original large-scale RFNN. Additionally, considering the inherent physical errors when mapping TT-RFNN to RF device parameters in real-world deployment, from a general perspective, we construct the Robust TT-RFNN (RTT-RFNN) by incorporating a robustness solver on TT-RFNN to enhance its robustness. To adapt the RTT-RFNN to varying requirements of reshaping operations, we further provide a reconfigurable reshaping solution employing RF switch matrices. Empirical evaluations conducted on MNIST and CIFAR-10 datasets show the effectiveness of our proposed method.
Performance Gaps in Multi-view Clustering under the Nested Matrix-Tensor Model
We study the estimation of a planted signal hidden in a recently introduced nested matrix-tensor model, which is an extension of the classical spiked rank-one tensor model, motivated by multi-view clustering. Prior work has theoretically examined the performance of a tensor-based approach, which relies on finding a best rank-one approximation, a problem known to be computationally hard. A tractable alternative approach consists in computing instead the best rank-one (matrix) approximation of an unfolding of the observed tensor data, but its performance was hitherto unknown. We quantify here the performance gap between these two approaches, in particular by deriving the precise algorithmic threshold of the unfolding approach and demonstrating that it exhibits a BBP-type transition behavior. This work is therefore in line with recent contributions which deepen our understanding of why tensor-based methods surpass matrix-based methods in handling structured tensor data.
EN-T: Optimizing Tensor Computing Engines Performance via Encoder-Based Methodology
Tensor computations, with matrix multiplication being the primary operation, serve as the fundamental basis for data analysis, physics, machine learning, and deep learning. As the scale and complexity of data continue to grow rapidly, the demand for tensor computations has also increased significantly. To meet this demand, several research institutions have started developing dedicated hardware for tensor computations. To further improve the computational performance of tensor process units, we have reexamined the issue of computation reuse that was previously overlooked in existing architectures. As a result, we propose a novel EN-T architecture that can reduce chip area and power consumption. Furthermore, our method is compatible with existing tensor processing units. We evaluated our method on prevalent microarchitectures, the results demonstrate an average improvement in area efficiency of 8.7\%, 12.2\%, and 11.0\% for tensor computing units at computational scales of 256 GOPS, 1 TOPS, and 4 TOPS, respectively. Similarly, there were energy efficiency enhancements of 13.0\%, 17.5\%, and 15.5\%.
Approximately Optimal Core Shapes for Tensor Decompositions
This work studies the combinatorial optimization problem of finding an optimal core tensor shape, also called multilinear rank, for a size-constrained Tucker decomposition. We give an algorithm with provable approximation guarantees for its reconstruction error via connections to higher-order singular values. Specifically, we introduce a novel Tucker packing problem, which we prove is NP-hard, and give a polynomial-time approximation scheme based on a reduction to the 2-dimensional knapsack problem with a matroid constraint. We also generalize our techniques to tree tensor network decompositions. We implement our algorithm using an integer programming solver, and show that its solution quality is competitive with (and sometimes better than) the greedy algorithm that uses the true Tucker decomposition loss at each step, while also running up to 1000x faster.
Rank-adaptive spectral pruning of convolutional layers during training
The computing cost and memory demand of deep learning pipelines have grown fast in recent years and thus a variety of pruning techniques have been developed to reduce model parameters. The majority of these techniques focus on reducing inference costs by pruning the network after a pass of full training. A smaller number of methods address the reduction of training costs, mostly based on compressing the network via low-rank layer factorizations. Despite their efficiency for linear layers, these methods fail to effectively handle convolutional filters. In this work, we propose a low-parametric training method that factorizes the convolutions into tensor Tucker format and adaptively prunes the Tucker ranks of the convolutional kernel during training. Leveraging fundamental results from geometric integration theory of differential equations on tensor manifolds, we obtain a robust training algorithm that provably approximates the full baseline performance and guarantees loss descent. A variety of experiments against the full model and alternative low-rank baselines are implemented, showing that the proposed method drastically reduces the training costs, while achieving high performance, comparable to or better than the full baseline, and consistently outperforms competing low-rank approaches.
Tensorized NeuroEvolution of Augmenting Topologies for GPU Acceleration
The NeuroEvolution of Augmenting Topologies (NEAT) algorithm has received considerable recognition in the field of neuroevolution. Its effectiveness is derived from initiating with simple networks and incrementally evolving both their topologies and weights. Although its capability across various challenges is evident, the algorithm's computational efficiency remains an impediment, limiting its scalability potential. In response, this paper introduces a tensorization method for the NEAT algorithm, enabling the transformation of its diverse network topologies and associated operations into uniformly shaped tensors for computation. This advancement facilitates the execution of the NEAT algorithm in a parallelized manner across the entire population. Furthermore, we develop TensorNEAT, a library that implements the tensorized NEAT algorithm and its variants, such as CPPN and HyperNEAT. Building upon JAX, TensorNEAT promotes efficient parallel computations via automated function vectorization and hardware acceleration. Moreover, the TensorNEAT library supports various benchmark environments including Gym, Brax, and gymnax. Through evaluations across a spectrum of robotics control environments in Brax, TensorNEAT achieves up to 500x speedups compared to the existing implementations such as NEAT-Python. Source codes are available at: https://github.com/EMI-Group/tensorneat.
NeuMaDiff: Neural Material Synthesis via Hyperdiffusion
High-quality material synthesis is essential for replicating complex surface properties to create realistic digital scenes. However, existing methods often suffer from inefficiencies in time and memory, require domain expertise, or demand extensive training data, with high-dimensional material data further constraining performance. Additionally, most approaches lack multi-modal guidance capabilities and standardized evaluation metrics, limiting control and comparability in synthesis tasks. To address these limitations, we propose NeuMaDiff, a novel neural material synthesis framework utilizing hyperdiffusion. Our method employs neural fields as a low-dimensional representation and incorporates a multi-modal conditional hyperdiffusion model to learn the distribution over material weights. This enables flexible guidance through inputs such as material type, text descriptions, or reference images, providing greater control over synthesis. To support future research, we contribute two new material datasets and introduce two BRDF distributional metrics for more rigorous evaluation. We demonstrate the effectiveness of NeuMaDiff through extensive experiments, including a novel statistics-based constrained synthesis approach, which enables the generation of materials of desired categories.
Neural Relighting with Subsurface Scattering by Learning the Radiance Transfer Gradient
Reconstructing and relighting objects and scenes under varying lighting conditions is challenging: existing neural rendering methods often cannot handle the complex interactions between materials and light. Incorporating pre-computed radiance transfer techniques enables global illumination, but still struggles with materials with subsurface scattering effects. We propose a novel framework for learning the radiance transfer field via volume rendering and utilizing various appearance cues to refine geometry end-to-end. This framework extends relighting and reconstruction capabilities to handle a wider range of materials in a data-driven fashion. The resulting models produce plausible rendering results in existing and novel conditions. We will release our code and a novel light stage dataset of objects with subsurface scattering effects publicly available.
Influence-guided Data Augmentation for Neural Tensor Completion
How can we predict missing values in multi-dimensional data (or tensors) more accurately? The task of tensor completion is crucial in many applications such as personalized recommendation, image and video restoration, and link prediction in social networks. Many tensor factorization and neural network-based tensor completion algorithms have been developed to predict missing entries in partially observed tensors. However, they can produce inaccurate estimations as real-world tensors are very sparse, and these methods tend to overfit on the small amount of data. Here, we overcome these shortcomings by presenting a data augmentation technique for tensors. In this paper, we propose DAIN, a general data augmentation framework that enhances the prediction accuracy of neural tensor completion methods. Specifically, DAIN first trains a neural model and finds tensor cell importances with influence functions. After that, DAIN aggregates the cell importance to calculate the importance of each entity (i.e., an index of a dimension). Finally, DAIN augments the tensor by weighted sampling of entity importances and a value predictor. Extensive experimental results show that DAIN outperforms all data augmentation baselines in terms of enhancing imputation accuracy of neural tensor completion on four diverse real-world tensors. Ablation studies of DAIN substantiate the effectiveness of each component of DAIN. Furthermore, we show that DAIN scales near linearly to large datasets.
Thermally Averaged Magnetic Anisotropy Tensors via Machine Learning Based on Gaussian Moments
We propose a machine learning method to model molecular tensorial quantities, namely the magnetic anisotropy tensor, based on the Gaussian-moment neural-network approach. We demonstrate that the proposed methodology can achieve an accuracy of 0.3--0.4 cm^{-1} and has excellent generalization capability for out-of-sample configurations. Moreover, in combination with machine-learned interatomic potential energies based on Gaussian moments, our approach can be applied to study the dynamic behavior of magnetic anisotropy tensors and provide a unique insight into spin-phonon relaxation.
SlimmeRF: Slimmable Radiance Fields
Neural Radiance Field (NeRF) and its variants have recently emerged as successful methods for novel view synthesis and 3D scene reconstruction. However, most current NeRF models either achieve high accuracy using large model sizes, or achieve high memory-efficiency by trading off accuracy. This limits the applicable scope of any single model, since high-accuracy models might not fit in low-memory devices, and memory-efficient models might not satisfy high-quality requirements. To this end, we present SlimmeRF, a model that allows for instant test-time trade-offs between model size and accuracy through slimming, thus making the model simultaneously suitable for scenarios with different computing budgets. We achieve this through a newly proposed algorithm named Tensorial Rank Incrementation (TRaIn) which increases the rank of the model's tensorial representation gradually during training. We also observe that our model allows for more effective trade-offs in sparse-view scenarios, at times even achieving higher accuracy after being slimmed. We credit this to the fact that erroneous information such as floaters tend to be stored in components corresponding to higher ranks. Our implementation is available at https://github.com/Shiran-Yuan/SlimmeRF.
Enabling Efficient Equivariant Operations in the Fourier Basis via Gaunt Tensor Products
Developing equivariant neural networks for the E(3) group plays an important role in modeling 3D data across real-world applications. Enforcing this equivariance primarily involves the tensor products of irreducible representations (irreps). However, the computational complexity of such operations increases significantly as higher-order tensors are used. In this work, we propose a systematic approach to substantially accelerate the computation of the tensor products of irreps. We mathematically connect the commonly used Clebsch-Gordan coefficients to the Gaunt coefficients, which are integrals of products of three spherical harmonics. Through Gaunt coefficients, the tensor product of irreps becomes equivalent to the multiplication between spherical functions represented by spherical harmonics. This perspective further allows us to change the basis for the equivariant operations from spherical harmonics to a 2D Fourier basis. Consequently, the multiplication between spherical functions represented by a 2D Fourier basis can be efficiently computed via the convolution theorem and Fast Fourier Transforms. This transformation reduces the complexity of full tensor products of irreps from O(L^6) to O(L^3), where L is the max degree of irreps. Leveraging this approach, we introduce the Gaunt Tensor Product, which serves as a new method to construct efficient equivariant operations across different model architectures. Our experiments on the Open Catalyst Project and 3BPA datasets demonstrate both the increased efficiency and improved performance of our approach.
A parallel Basis Update and Galerkin Integrator for Tree Tensor Networks
Computing the numerical solution to high-dimensional tensor differential equations can lead to prohibitive computational costs and memory requirements. To reduce the memory and computational footprint, dynamical low-rank approximation (DLRA) has proven to be a promising approach. DLRA represents the solution as a low-rank tensor factorization and evolves the resulting low-rank factors in time. A central challenge in DLRA is to find time integration schemes that are robust to the arising small singular values. A robust parallel basis update & Galerkin integrator, which simultaneously evolves all low-rank factors, has recently been derived for matrix differential equations. This work extends the parallel low-rank matrix integrator to Tucker tensors and general tree tensor networks, yielding an algorithm in which all bases and connecting tensors are evolved in parallel over a time step. We formulate the algorithm, provide a robust error bound, and demonstrate the efficiency of the new integrators for problems in quantum many-body physics, uncertainty quantification, and radiative transfer.
Examples of renormalization group transformations for image sets
Using the example of configurations generated with the worm algorithm for the two-dimensional Ising model, we propose renormalization group (RG) transformations, inspired by the tensor RG, that can be applied to sets of images. We relate criticality to the logarithmic divergence of the largest principal component. We discuss the changes in link occupation under the RG transformation, suggest ways to obtain data collapse, and compare with the two state tensor RG approximation near the fixed point.
Under-Counted Tensor Completion with Neural Incorporation of Attributes
Systematic under-counting effects are observed in data collected across many disciplines, e.g., epidemiology and ecology. Under-counted tensor completion (UC-TC) is well-motivated for many data analytics tasks, e.g., inferring the case numbers of infectious diseases at unobserved locations from under-counted case numbers in neighboring regions. However, existing methods for similar problems often lack supports in theory, making it hard to understand the underlying principles and conditions beyond empirical successes. In this work, a low-rank Poisson tensor model with an expressive unknown nonlinear side information extractor is proposed for under-counted multi-aspect data. A joint low-rank tensor completion and neural network learning algorithm is designed to recover the model. Moreover, the UC-TC formulation is supported by theoretical analysis showing that the fully counted entries of the tensor and each entry's under-counting probability can be provably recovered from partial observations -- under reasonable conditions. To our best knowledge, the result is the first to offer theoretical supports for under-counted multi-aspect data completion. Simulations and real-data experiments corroborate the theoretical claims.
TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations
Achieving a balance between computational speed, prediction accuracy, and universal applicability in molecular simulations has been a persistent challenge. This paper presents substantial advancements in the TorchMD-Net software, a pivotal step forward in the shift from conventional force fields to neural network-based potentials. The evolution of TorchMD-Net into a more comprehensive and versatile framework is highlighted, incorporating cutting-edge architectures such as TensorNet. This transformation is achieved through a modular design approach, encouraging customized applications within the scientific community. The most notable enhancement is a significant improvement in computational efficiency, achieving a very remarkable acceleration in the computation of energy and forces for TensorNet models, with performance gains ranging from 2-fold to 10-fold over previous iterations. Other enhancements include highly optimized neighbor search algorithms that support periodic boundary conditions and the smooth integration with existing molecular dynamics frameworks. Additionally, the updated version introduces the capability to integrate physical priors, further enriching its application spectrum and utility in research. The software is available at https://github.com/torchmd/torchmd-net.
Reducing SO(3) Convolutions to SO(2) for Efficient Equivariant GNNs
Graph neural networks that model 3D data, such as point clouds or atoms, are typically desired to be SO(3) equivariant, i.e., equivariant to 3D rotations. Unfortunately equivariant convolutions, which are a fundamental operation for equivariant networks, increase significantly in computational complexity as higher-order tensors are used. In this paper, we address this issue by reducing the SO(3) convolutions or tensor products to mathematically equivalent convolutions in SO(2) . This is accomplished by aligning the node embeddings' primary axis with the edge vectors, which sparsifies the tensor product and reduces the computational complexity from O(L^6) to O(L^3), where L is the degree of the representation. We demonstrate the potential implications of this improvement by proposing the Equivariant Spherical Channel Network (eSCN), a graph neural network utilizing our novel approach to equivariant convolutions, which achieves state-of-the-art results on the large-scale OC-20 and OC-22 datasets.
Transfer Learning Across Heterogeneous Features For Efficient Tensor Program Generation
Tuning tensor program generation involves searching for various possible program transformation combinations for a given program on target hardware to optimize the tensor program execution. It is already a complex process because of the massive search space and exponential combinations of transformations make auto-tuning tensor program generation more challenging, especially when we have a heterogeneous target. In this research, we attempt to address these problems by learning the joint neural network and hardware features and transferring them to the new target hardware. We extensively study the existing state-of-the-art dataset, TenSet, perform comparative analysis on the test split strategies and propose methodologies to prune the dataset. We adopt an attention-inspired approach for tuning the tensor programs enabling them to embed neural network and hardware-specific features. Our approach could prune the dataset up to 45\% of the baseline without compromising the Pairwise Comparison Accuracy (PCA). Further, the proposed methodology can achieve on-par or improved mean inference time with 25%-40% of the baseline tuning time across different networks and target hardware.
Causality and Renormalization in Finite-Time-Path Out-of-Equilibrium φ^3 QFT
Our aim is to contribute to quantum field theory (QFT) formalisms useful for descriptions of short time phenomena, dominant especially in heavy ion collisions. We formulate out-of-equilibrium QFT within the finite-time-path formalism (FTP) and renormalization theory (RT). The potential conflict of FTP and RT is investigated in g phi^3 QFT, by using the retarded/advanced (R/A) basis of Green functions and dimensional renormalization (DR). For example, vertices immediately after (in time) divergent self-energy loops do not conserve energy, as integrals diverge. We "repair" them, while keeping d<4, to obtain energy conservation at those vertices. Already in the S-matrix theory, the renormalized, finite part of Feynman self-energy Sigma_{F}(p_0) does not vanish when |p_0|rightarrowinfty and cannot be split to retarded and advanced parts. In the Glaser--Epstein approach, the causality is repaired in the composite object G_F(p_0)Sigma_{F}(p_0). In the FTP approach, after repairing the vertices, the corresponding composite objects are G_R(p_0)Sigma_{R}(p_0) and Sigma_{A}(p_0)G_A(p_0). In the limit drightarrow 4, one obtains causal QFT. The tadpole contribution splits into diverging and finite parts. The diverging, constant component is eliminated by the renormalization condition langle 0|phi|0rangle =0 of the S-matrix theory. The finite, oscillating energy-nonconserving tadpole contributions vanish in the limit trightarrow infty .
Multimodal Learning for Materials
Artificial intelligence is transforming computational materials science, improving the prediction of material properties, and accelerating the discovery of novel materials. Recently, publicly available material data repositories have grown rapidly. This growth encompasses not only more materials, but also a greater variety and quantity of their associated properties. Existing machine learning efforts in materials science focus primarily on single-modality tasks, i.e., relationships between materials and a single physical property, thus not taking advantage of the rich and multimodal set of material properties. Here, we introduce Multimodal Learning for Materials (MultiMat), which enables self-supervised multi-modality training of foundation models for materials. We demonstrate our framework's potential using data from the Materials Project database on multiple axes: (i) MultiMat achieves state-of-the-art performance for challenging material property prediction tasks; (ii) MultiMat enables novel and accurate material discovery via latent space similarity, enabling screening for stable materials with desired properties; and (iii) MultiMat encodes interpretable emergent features that may provide novel scientific insights.
Predicting 3D Rigid Body Dynamics with Deep Residual Network
This study investigates the application of deep residual networks for predicting the dynamics of interacting three-dimensional rigid bodies. We present a framework combining a 3D physics simulator implemented in C++ with a deep learning model constructed using PyTorch. The simulator generates training data encompassing linear and angular motion, elastic collisions, fluid friction, gravitational effects, and damping. Our deep residual network, consisting of an input layer, multiple residual blocks, and an output layer, is designed to handle the complexities of 3D dynamics. We evaluate the network's performance using a datasetof 10,000 simulated scenarios, each involving 3-5 interacting rigid bodies. The model achieves a mean squared error of 0.015 for position predictions and 0.022 for orientation predictions, representing a 25% improvement over baseline methods. Our results demonstrate the network's ability to capture intricate physical interactions, with particular success in predicting elastic collisions and rotational dynamics. This work significantly contributes to physics-informed machine learning by showcasing the immense potential of deep residual networks in modeling complex 3D physical systems. We discuss our approach's limitations and propose future directions for improving generalization to more diverse object shapes and materials.
Evolving Normalization-Activation Layers
Normalization layers and activation functions are fundamental components in deep networks and typically co-locate with each other. Here we propose to design them using an automated approach. Instead of designing them separately, we unify them into a single tensor-to-tensor computation graph, and evolve its structure starting from basic mathematical functions. Examples of such mathematical functions are addition, multiplication and statistical moments. The use of low-level mathematical functions, in contrast to the use of high-level modules in mainstream NAS, leads to a highly sparse and large search space which can be challenging for search methods. To address the challenge, we develop efficient rejection protocols to quickly filter out candidate layers that do not work well. We also use multi-objective evolution to optimize each layer's performance across many architectures to prevent overfitting. Our method leads to the discovery of EvoNorms, a set of new normalization-activation layers with novel, and sometimes surprising structures that go beyond existing design patterns. For example, some EvoNorms do not assume that normalization and activation functions must be applied sequentially, nor need to center the feature maps, nor require explicit activation functions. Our experiments show that EvoNorms work well on image classification models including ResNets, MobileNets and EfficientNets but also transfer well to Mask R-CNN with FPN/SpineNet for instance segmentation and to BigGAN for image synthesis, outperforming BatchNorm and GroupNorm based layers in many cases.
Determination of Latent Dimensionality in International Trade Flow
Currently, high-dimensional data is ubiquitous in data science, which necessitates the development of techniques to decompose and interpret such multidimensional (aka tensor) datasets. Finding a low dimensional representation of the data, that is, its inherent structure, is one of the approaches that can serve to understand the dynamics of low dimensional latent features hidden in the data. Nonnegative RESCAL is one such technique, particularly well suited to analyze self-relational data, such as dynamic networks found in international trade flows. Nonnegative RESCAL computes a low dimensional tensor representation by finding the latent space containing multiple modalities. Estimating the dimensionality of this latent space is crucial for extracting meaningful latent features. Here, to determine the dimensionality of the latent space with nonnegative RESCAL, we propose a latent dimension determination method which is based on clustering of the solutions of multiple realizations of nonnegative RESCAL decompositions. We demonstrate the performance of our model selection method on synthetic data and then we apply our method to decompose a network of international trade flows data from International Monetary Fund and validate the resulting features against empirical facts from economic literature.
Rethinking Model Re-Basin and Linear Mode Connectivity
Recent studies suggest that with sufficiently wide models, most SGD solutions can, up to permutation, converge into the same basin. This phenomenon, known as the model re-basin regime, has significant implications for model averaging by ensuring the linear mode connectivity. However, current re-basin strategies are ineffective in many scenarios due to a lack of comprehensive understanding of underlying mechanisms. Addressing this gap, this paper provides novel insights into understanding and improving the standard practice. Firstly, we decompose re-normalization into rescaling and reshift, uncovering that rescaling plays a crucial role in re-normalization while re-basin performance is sensitive to shifts in model activation. The finding calls for a more nuanced handling of the activation shift. Secondly, we identify that the merged model suffers from the issue of activation collapse and magnitude collapse. Varying the learning rate, weight decay, and initialization method can mitigate the issues and improve model performance. Lastly, we propose a new perspective to unify the re-basin and pruning, under which a lightweight yet effective post-pruning technique is derived, which can significantly improve the model performance after pruning. Our implementation is available at https://github.com/XingyuQu/rethink-re-basin.
REACTO: Reconstructing Articulated Objects from a Single Video
In this paper, we address the challenge of reconstructing general articulated 3D objects from a single video. Existing works employing dynamic neural radiance fields have advanced the modeling of articulated objects like humans and animals from videos, but face challenges with piece-wise rigid general articulated objects due to limitations in their deformation models. To tackle this, we propose Quasi-Rigid Blend Skinning, a novel deformation model that enhances the rigidity of each part while maintaining flexible deformation of the joints. Our primary insight combines three distinct approaches: 1) an enhanced bone rigging system for improved component modeling, 2) the use of quasi-sparse skinning weights to boost part rigidity and reconstruction fidelity, and 3) the application of geodesic point assignment for precise motion and seamless deformation. Our method outperforms previous works in producing higher-fidelity 3D reconstructions of general articulated objects, as demonstrated on both real and synthetic datasets. Project page: https://chaoyuesong.github.io/REACTO.
Functional Bayesian Tucker Decomposition for Continuous-indexed Tensor Data
Tucker decomposition is a powerful tensor model to handle multi-aspect data. It demonstrates the low-rank property by decomposing the grid-structured data as interactions between a core tensor and a set of object representations (factors). A fundamental assumption of such decomposition is that there are finite objects in each aspect or mode, corresponding to discrete indexes of data entries. However, real-world data is often not naturally posed in this setting. For example, geographic data is represented as continuous indexes of latitude and longitude coordinates, and cannot fit tensor models directly. To generalize Tucker decomposition to such scenarios, we propose Functional Bayesian Tucker Decomposition (FunBaT). We treat the continuous-indexed data as the interaction between the Tucker core and a group of latent functions. We use Gaussian processes (GP) as functional priors to model the latent functions. Then, we convert each GP into a state-space prior by constructing an equivalent stochastic differential equation (SDE) to reduce computational cost. An efficient inference algorithm is developed for scalable posterior approximation based on advanced message-passing techniques. The advantage of our method is shown in both synthetic data and several real-world applications. We release the code of FunBaT at https://github.com/xuangu-fang/Functional-Bayesian-Tucker-Decomposition.