Introduction
The introduction of large-scale neural networks has led to advancements in generative modelling, specifically the ability to capture complex relationships among many variables. Autoregressive models, flow-based models, deep VAEs, and diffusion models have all contributed to this progress by breaking down the joint distribution into a series of steps, making it easier to handle the interactions among multiple variables. These models can be viewed as an exchange of messages between a sender (Alice) and a receiver (Bob), where Alice reveals something about the data at each step and Bob uses this information to improve his guess for the next message. Autoregressive models work well for language modelling but face challenges in domains like image generation. Diffusion models, on the other hand, have proven effective for image generation by progressing from coarse to fine image details. However, diffusion models have yet to match autoregression for discrete data. Bayesian Flow Networks (BFNs) offer a continuous and differentiable generative process for discrete data, making them a useful alternative to diffusion models. The rest of the paper provides a detailed explanation of BFNs, their loss functions, and their applications to various types of data.
Related Work
The related work section discusses the similarities and differences between Bayesian Flow Networks (BFNs) and other existing methods, primarily diffusion models. BFNs differ from diffusion models in that they incorporate a function from one distribution to another, rather than from data to a distribution. This allows BFNs to handle continuous inputs even when the data is discrete. In contrast, diffusion models typically use discrete samples as input. BFNs also have an inherent continuity property, as the network inputs automatically lie on the probability simplex, which is the parameter space of a categorical distribution. This ensures that BFNs directly optimize the negative log-likelihood of discrete data, unlike other continuous diffusion methods that require simplified or auxiliary loss functions to stabilize learning. For continuous data, BFNs are most closely related to variational diffusion models, but with less noisy network inputs. This is because BFNs start with the parameters of a fixed prior, while diffusion models start with pure noise. Another advantage of BFNs is their flexibility in adapting to different distributions and data types without the need to define and invert a forward process. This allows BFNs to be easily adapted to continuous, discretized, and discrete data with minimal changes to the training procedure.
Bayesian Flow Networks
The section on Bayesian Flow Networks discusses the mathematical framework and structure of the model. It introduces the input and sender distributions, which are factorized based on the parameters and the accuracy parameter, respectively. The input parameters and process time are then passed through a neural network to obtain an output vector that parameterizes the output distribution. The key distinction between the input and output distributions is that the latter can exploit context information. The Bayesian update function is derived using Bayesian inference rules, and the update distribution is obtained by marginalizing out the sender samples. The accuracy schedule is introduced for continuous-time updates. The section also defines the discrete-time and continuous-time loss functions for training the model. The continuous-time loss function is derived from the discrete-time loss function, allowing for simpler computations and avoiding the need to fix the number of steps during training.
Continuous Data
This section of the research paper discusses the handling of continuous data in Bayesian Flow Networks. The authors normalize the data to ensure it falls within a reasonable range, but note that this is not necessary for the mathematical framework. The input distribution for continuous data is a diagonal normal distribution. The prior parameters are defined as zero vectors for mean and identity matrices for precision. The paper suggests that a standard prior works better in practice. The distributions are used to inform the network's predictions rather than directly make predictions. The authors derive a Bayesian update function for the parameters and discuss the additive nature of sender accuracies. They also provide a method for setting the precision of the input distribution. The paper explains the transformation from the network outputs to an estimate of the continuous data and defines the reconstruction loss for continuous data. Pseudocode for evaluating the loss and a sample generation procedure are provided.
Discretised Data
In this section, the researchers discuss the discretization of continuous data into K bins. They provide examples such as 8-bit images being divided into 256 bins and 16-bit audio being divided into 65,536 bins. The data is represented by splitting the interval [-1, 1] into K intervals, each having a length of 2/K. They introduce various notations like k l , k c , and k r to denote the left, center, and right of interval k, where k belongs to the set of integers from 1 to K. They define a vector k(x) to represent the indices of the bins occupied by the data points x. If the data has not been discretized before, they set x to be the center of the bins. As an example, they explain how an index in an RGB image corresponds to a number in the discretized representation.
Experiments
The experiments section of the research paper evaluated Bayesian Flow Networks (BFNs) on three different benchmarks: CIFAR-10, MNIST, and text8. The continuous and discretised versions of the BFN system were compared on CIFAR-10, while the discrete version was applied to the other datasets. The network was trained using the continuous-time loss for training and the discrete-time loss for testing. The experiments used standard network architectures and training algorithms to allow for direct comparison with existing methods. The BFN performed well on the benchmarks, achieving close to state-of-the-art results. The experiments also showed that training with discretised loss was beneficial when the number of bins was low. For the text8 dataset, the BFN yielded good results, close to the best models in the literature. Overall, the experiments demonstrated the effectiveness of BFNs in generative modelling tasks.
Conclusion
The researchers introduce Bayesian Flow Networks (BFNs), a novel generative model that incorporates Bayesian inference and neural networks in an iterative modeling process. They develop discrete and continuous-time loss functions and sampling procedures for BFNs, successfully applying the model to various types of data. The study aims to inspire new perspectives and directions for future research in generative modeling.