1. Introduction
Low-level vision (LLV) refers to fundamental tasks that operate directly on pixel-
or signal-level data to restore, enhance, or reconstruct degraded visual inputs. Typical
examples include image super-resolution [1,
2], denoising [3], deblurring [4], and inpainting [5]. These tasks form the foundation of many high-level vision (HLV) applications, being
crucial for enhancing perceptual quality and frequently serving as essential preprocessing
steps for downstream objectives such as object detection [6,
7], semantic segmentation [8,
9], and scene understanding [10].
Alongside these developments, the landscape of LLV tasks has expanded significantly
with rapid advances in 3D vision methods, now encompassing areas such as 3D scene
reconstruction [11-
13], novel view synthesis [14], and occupancy field estimation [26]. These emerging tasks are increasingly vital in fields like robotics [16], autonomous driving [17], augmented reality [18,
19], and digital twin systems [20], all of which demand high-fidelity, continuous representations of complex spatial
structures, which are capabilities that traditional grid-based approaches [21] struggle to provide.
Against this backdrop, Implicit Neural Representations (INRs) have emerged as a unified
and powerful paradigm for addressing these challenges. Traditional neural representation
methods such as grid- [21], voxel- [22], or point-based models [23], which explicitly store sampled values in discrete structures, inevitably suffer
from discretization artifacts, prohibitive memory consumption due to cubic scaling,
and poor scalability across resolutions. In contrast, INRs model data as continuous
signals parameterized by coordinate-based neural networks (typically multilayer perceptrons,
MLPs), enabling resolution-independent and compact representations. Moreover, the
universality of INR frameworks allows the same architecture to be seamlessly applied
across diverse tasks and modalities, including audio [25], image restoration [24], and 3D scene reconstruction [26], all without architectural modification.
Despite their rapid advances in INRs and their remarkable versatility across domains,
the field of INRs lacks a unified and systematic overview. Most existing studies tend
to focus on specific tasks or isolated architectural choices, which can obscure the
broader methodological landscape and impede cross-domain understanding. To provide
an overview of the methodological landscape and the organization of this survey, we
illustrate in Fig. 1 a taxonomy of INR-based approaches. This taxonomy outlines the key methodological
categories, challenges, datasets, and applications that are discussed in the subsequent
sections. In this work, we provide a comprehensive survey of the foundations, design
principles, and practical applications for INRs. We classify INRs according to the
core architectural paradigms, including activation functions, positional encodings,
Fourier-based reparameterization, hybrid strategies and conditioning mechanisms. Additionally,
we discuss key challenges such as spectral bias [27] and high-frequency reconstruction, introduce benchmark datasets and evaluation metrics,
and summarize prominent real-world use cases.
Fig. 1. Overview of the taxonomy of INRs and survey structure.
2. METHODOLOGY
2.1. Activation Functions towards Spectral and Spatial Inductive Bias Control
In implicit neural representations (INRs), the design of activation functions plays
a pivotal role in shaping the model’s inductive biases. While standard activations
like ReLU [28] exhibit strong spectral bias toward low-frequency signals, recent methods propose
periodic and localized activations to enable richer signal modeling. This subsection
introduces representative activation functions (namely, SIREN
[29], Sinc
[30], HOSC
[31], FINER
[32], and WIRE
[33]), each designed to control either the spectral domain, the spatial domain, or both.
(a) SIREN: Sinusoidal representation networks
SIREN [29] adopts a sinusoidal activation of the form
where $x$ denotes the input coordinate and $\omega$ is a scaling factor that controls
the frequency range. This periodic non-linearity enables the network to effectively
represent high-frequency signals.
(b) FINER: Variable-periodic activation for flexible spectral bias tuning
FINER [32] addresses the frequency range limitation of fixed-periodic activations by employing
a variable-periodic function:
where $x$ denotes the input coordinate. Unlike SIREN’s fixed-periodic activation $\sin(\omega
x)$, this formulation introduces frequency variation that depends on the input magnitude,
thereby enabling coverage of higher-frequency components. As a result, FINER provides
a simple and architecture-agnostic means to mitigate spectral bias [34].
(c) HOSC: Preserving sharp features with tunable periodic activations
HOSC [31] modifies the sine activation by introducing a sharpness parameter:
where $x$ denotes the input coordinate and $\beta$ is a tunable parameter controlling
the sharpness of oscillations. As $\beta$ increases, $\tanh(\beta \sin x)$ approaches
a square wave, i.e., $\text{sign}(\sin x)$. Smaller values of $\beta$ result in smooth
oscillations, while larger values produce sharper transitions. This tunability enables
the model to flexibly adapt between smooth and piecewise-constant representations,
making HOSC particularly effective for tasks that require sharp boundary preservation.
(d) Sinc: Bandlimited activation with ideal frequency selectivity
The sinc activation is defined as
where $x$ denotes the input coordinate and $\pi$ determines the normalized cutoff
frequency of the ideal low-pass filter. This formulation corresponds to the impulse
response of an ideal low-pass filter. In the frequency domain, the sinc function transforms
into a rectangular pulse, providing ideal frequency selectivity by uniformly passing
components within a specific bandwidth while rejecting those outside. This property
is advantageous for reconstructing band-limited signals and suppressing aliasing artifacts
in INRs. However, sinc has infinite support in the spatial domain, which makes it
less suitable for modeling spatially localized structures and can introduce ringing
artifacts when truncated.
(e) WIRE: Wavelet-based activation for space-frequency localization
WIRE [33] employs a Gabor wavelet as its activation function:
where $x$ denotes the input coordinate, $\omega_0$ is the frequency parameter of the
cosine carrier, and $s_0$ controls the Gaussian envelope that provides spatial localization.
The cosine term enables the modeling of high-frequency details, while the Gaussian
attenuation confines the response spatially and mitigates abrupt truncation. This
combination reduces ringing artifacts, enhances robustness to weight initialization,
and improves performance in real-world settings by effectively capturing both localized
and high-frequency patterns.
Recently, beyond categories (a)–(e), there have been attempts to leverage concepts
from classical signal processing as well as to apply diverse activation functions
from deep learning. For example, FLAIR [35] is designed under the theoretical constraint of the time–frequency uncertainty principle,
enabling the model to learn both temporal localization and frequency selectivity in
a learnable manner. Unlike the periodic function–based approaches in (a) and (b),
this method focuses on modeling only the essential hidden features required for representation,
similar to (d) and (e). As a result, it achieves more efficient and sparse representations.
2.2. Positional Encodings for Frequency and Spatial Localization
While activation functions determine the network’s ability to model non-linear signals,
the way input coordinates are encoded before entering the network plays an equally
crucial role in shaping spectral and spatial inductive biases. Positional encodings
aim to embed the input coordinates $\mathbf{x}$ into a higher-dimensional space $\gamma(\mathbf{x})$
to enrich the network’s capacity to capture high-frequency content and spatial structures.
(a) Basic Fourier encoding: The simplest form of positional encoding applies a single frequency sinusoid:
This encoding projects the input onto two oscillatory bases, enabling the model to
incorporate periodicity. However, it lacks the ability to capture a broad range of
frequencies, which limits its expressiveness in complex signal reconstruction.
(b) Fixed-frequency positional encoding: To address the limited frequency coverage of the Basic Fourier Encoding, a multiscale
extension is often adopted:
where $x$ denotes the input coordinate, $\omega_j$ is the $j$-th element of a predefined
frequency set, $m$ is the embedding dimension, and $^\top$ indicates that the resulting
vector is represented as a column vector. This encoding introduces a spectrum of fixed
frequencies, thereby enhancing the model’s ability to represent both fine- and coarse-grained
spatial details compared to the Basic Fourier Encoding. Such encodings have been widely
adopted in tasks involving spatially dense predictions, such as NeRF [14] and image restoration [36].
(c) Random Fourier features (RFF): Random Fourier Features [37] further extend this idea by injecting stochasticity into the frequency components:
where $x$ denotes the input coordinate, $B$ is a Gaussian random matrix with entries
sampled from a normal distribution $\mathcal{N}(0, \omega^2)$, $\omega$ controls the
variance of frequency sampling, and $^\top$ indicates a column vector representation.
This stochastic basis sampling provides a Monte Carlo approximation of shift-invariant
kernels, thereby improving generalization to unseen signals. RFFs are particularly
effective in applications requiring robustness and uncertainty modeling [37].
(d) Wavelet positional encoding (WPE): While the previous encodings are global in nature, Wavelet Positional Encoding [38] introduces spatial locality by combining sinusoidal basis functions with Gaussian envelopes:
where $p$ denotes the input coordinate, $w_c^i$ is the center of the $i$-th Gaussian
window, $w_s^i$ is its scale parameter, and $M$ is the number of frequency pairs.
Each pair of sine and cosine terms is modulated by a Gaussian envelope, producing
localized frequency bases. This localization enables the model to capture spatially
compact and high-frequency structures. The Gaussian attenuation further suppresses
ringing artifacts and sharp transitions, making WPE especially suitable for real-world
signals with non-uniform frequency distributions.
While different positional encoding schemes enrich the representation capacity of
INRs, their computational and memory trade-offs also need to be considered. (a) Basic
Fourier Encoding requires minimal computation and memory as it applies a single sinusoidal
mapping, but it remains limited in expressiveness. (b) Fixed-Frequency Positional
Encoding extends this by projecting inputs onto multiple predefined frequencies, which
improves representation power but increases embedding dimensionality, leading to higher
computational cost and memory usage. (c) Random Fourier Features (RFF) introduce stochastic
sampling of frequency bases, enhancing generalization and robustness at the expense
of additional overhead from random matrix multiplications. (d) Wavelet Positional
Encoding (WPE) further incorporates Gaussian envelopes to provide spatial locality
and compact high-frequency modeling, yet this requires more complex operations and
additional storage. Overall, these encoding strategies illustrate a fundamental trade-off
between representation expressiveness, computation time, and memory efficiency, which
should be carefully balanced depending on the target application.
2.3. Fourier-based Reparameterization
Existing INR methods commonly suffer from the spectral bias problem [27], where networks tend to learn low-frequency components first while struggling to
capture high-frequency details. A multilayer perceptron (MLP), which underlies most
INR formulations, can be expressed as
where $\mathbf{y}^{(n-1)}$ denotes the output from the previous layer, $\mathbf{W}^{(n)}$
and $\mathbf{b}^{(n)}$ represent the learnable weight matrix and bias vector of the
$n$-th layer, respectively, and $\sigma(\cdot)$ is the nonlinear activation function.
To mitigate spectral bias, two representative strategies have been studied. In the
activation-based approach, the design of $\sigma(\cdot)$ is modified so that the network
can better capture high-frequency signals. In the PE-based approach, the input $\mathbf{y}^{(n-1)}$
is transformed into a higher-dimensional representation, thereby reducing the low-frequency
preference.
Beyond these strategies, Shi et al.
[46] proposed a Fourier-based reparameterization of the weight matrix. Instead of learning
$\mathbf{W}^{(n)}$ directly, it is expressed as the product of two components: a trainable
coefficient matrix $\mathbf{A}^{(n)} \in \mathbb{R}^{d_n \times M}$ and a fixed set
of $M$ Fourier bases $\mathbf{B}^{(n)} \in \mathbb{R}^{M \times d_{n-1}}$. Each Fourier
basis is defined by varying frequency $\omega$ and phase $\phi$ of a cosine function,
such that the $(i, j)$-th element of $\mathbf{B}^{(n)}$ is
where $\mathbf{z} = \{z_j\}_{j=1}^{d_{n-1}}$ denotes the sampling positions. The layer
output then becomes
By constraining $\mathbf{W}^{(n)}$ to lie in the span of Fourier components, this
formulation embeds frequency priors directly into the parameterization and offers
an alternative way of alleviating spectral bias, complementing activation- and PE-based
designs.
2.4. A Combined Strategy for Non-linear Compactness
Beyond the separate use of activation functions and positional encodings, recent advancements
have led to combined strategies that unify both spectral and spatial inductive biases within a single functional
design. TRIDENT [39] exemplifies this integrated approach by merging frequency-aware encoding, spatial
localization, and non-linear transformation into a cohesive representation. Rather
than treating sinusoidal encoding and activation as distinct components, TRIDENT incorporates
both through a radial basis function (RBF)-like formulation with exponential non-linearity,
enabling compact and expressive modeling within implicit neural representations (INRs).
At its core, TRIDENT encodes the input coordinate $x$ using a combination of sinusoidal
basis functions modulated by a Gaussian envelope. The function is defined as:
where $s_0$ is a scaling parameter controlling spatial concentration, and $\sigma$
determines the geometric progression of frequency components. This formulation jointly
models local and global structures while maintaining numerical stability through soft
spatial weighting.
The design of TRIDENT induces three core properties:
(a) Order compactness: By embedding sinusoidal terms within an exponential function, TRIDENT implicitly
encodes high-order polynomial behaviors via power-series expansion. This allows rich
structural details to be modeled without explicitly deep or wide networks.
(b) Frequency compactness The inclusion of multiple harmonics, spaced in a log-linear manner, allows the model
to efficiently capture low- and high-frequency components. The dual use of $\cos$
and $\sin$ further ensures balanced representation of even and odd frequency modes.
(c) Spatial compactness The Gaussian envelope localizes the response of the function, concentrating representational
energy within a compact region of the input space. This spatial attenuation mitigates
ringing artifacts and improves generalization in local-detail-sensitive tasks.
Together, these characteristics form the non-linear trilogy of TRIDENT. As a combined strategy, it serves as an effective drop-in replacement
for conventional PE + activation stacks, enhancing the expressiveness, compactness,
and task generalization of architectures for INRs.
2.5. Implicit Neural Conditioning with Prior Knowledge Embeddings
Implicit Neural Conditioning (INCODE) proposes a conditional architecture for INRs
that stabilizes training and enhances expressiveness by embedding prior knowledge
into the network’s activation modulation. At the core of INCODE [40] lies a composer network, which replaces fixed sinusoidal activations with a generalized adaptive form:
where the parameters (a, b, c, d) are dynamically predicted by a harmonizer network,
conditioned on a latent embedding extracted from a pre-trained feature encoder. This
design draws motivation from the observation that sinusoidal activations, while effective
for capturing fine details, are highly sensitive to initialization and task-dependent
frequency priors. By leveraging pre-trained representations, such as a ResNet encoder,
as a source of learned signal statistics, the harmonizer provides an informed initialization
strategy that guides the activation’s shape and phase. This approach effectively decouples
the network from manual hyperparameter tuning and stabilizes convergence, particularly
in early training. INCODE has shown robust performance across diverse modalities,
including image, audio, and 3D scene domains, and demonstrates strong generalization
in tasks such as super-resolution, inpainting, and denoising. It outperforms conventional
INRs with faster and more stable training dynamics.
2.6. Function Decomposition with Learnable Operators
The Kolmogorov–Arnold Network (KAN) draws inspiration from the Kolmogorov–Arnold representation
theorem, which states that any multivariate continuous function can be decomposed
into a finite sum of univariate functions. KAN [41] operationalizes this idea by introducing learnable univariate functions along the network edges, replacing conventional scalar weights. Specifically, each
edge implements a spline-based transformation $f_w(x)$, where $f_w$ is a parameterized
function rather than a fixed weight. Neurons in a KAN simply perform summation without
additional non-linearities, as the non-linearity is embedded in the edge functions
themselves. This architecture grants precise control over inductive biases, enabling
efficient approximation of complex, high-dimensional mappings with fewer parameters.
KANs [42] have demonstrated superior performance and representation efficiency on tasks such
as PDE solving, where they outperform traditional MLPs of similar or even larger size.
Their compositional design not only alleviates the curse of dimensionality but also
enhances interpretability, making them particularly well-suited for applications requiring
structured generalization and robust approximation behavior. [43]
3. CHALLENGES
Implicit Neural Representations (INRs) have demonstrated remarkable advantages, offering
full differentiability, smooth- ness, compactness, and adaptability to arbitrary resolution
in representing data. These properties mean that an INR models a signal as a continuous,
differentiable function (enabling gradient-based optimization and integration into
physics-inspired tasks), stores information efficiently in network weights rather
than dense grids, and can be queried at any coordinate resolution. In principle, such
traits allow INRs to capture fine-grained details without huge memory requirements
and to generalize beyond fixed grids. However, realizing these ideals in practice
presents significant open challenges. Key desired properties often conflict with one
another, and current models of INRs face limitations that impact their performance
and generalization capability on complex real-world signals.
3.1. Spectral Bias
One fundamental challenge is the spectral bias inherent in standard MLP-based INRs,
which causes a predisposition toward learning low-frequency (smooth) components of
signals at the expense of high-frequency details. Networks with conventional activations
like ReLU or $\tanh$ struggle to faithfully represent signals with rich high-frequency
content and fine details, instead favoring coarse approximations. This bias is problematic
because many signals (textures in images, sharp edges, high-pitch variations in audio,
etc.) contain critical high-frequency information. As a result, INRs with spectral
bias tend to produce overly smooth reconstructions that miss small structures or rapid
variations. The loss of detail directly degrades performance in tasks such as image
super-resolution [1], detailed 3D shape modeling, or audio synthesis [44]. Moreover, this bias can hinder generalization: a model that only learns low-frequency
structure may not adapt well when finer-scale patterns are required. Overcoming spectral
bias is thus crucial for improving INRs’ accuracy and their ability to generalize
to signals with diverse frequency content.
3.2. Mitigating Spectral Bias: Limitations and Trade-offs
Addressing this low-frequency bias and achieving faithful high-frequency representation
remain largely unsolved problems. Recent research has made progress by introducing
specialized activation functions and encoding schemes to expand the frequency response
of INRs. For instance, sinusoidal activations (as in SIREN) and positional encoding
with high- frequency Fourier features have been used to enable the network to learn
more high-frequency content than a vanilla MLP. These approaches indeed mitigate the
bias, allowing INRs to capture finer details than before. However, significant limitations
persist. Even SIREN, which leverages a periodic activation, can struggle with very
complex or higher-frequency details when those exceed the single scale periodic basis
it provides. Many enhanced models for INRs require carefully chosen hyperparameters
or initialization. schemes to balance frequency components, and they may still exhibit
trade-offs between smoothness and detail. Increasing a network’s capacity for high
frequencies (through deeper networks, Gabor wavelet activations, or extreme positional
encodings) can introduce challenges like higher computational cost and risk of overfitting
to noise. In practice, models designed to capture very fine details sometimes become
overly sensitive to minor signal variations, which harms their generalization to new
or noisier inputs. Thus, current solutions only partially address the high-frequency
challenge, and designing INR that robustly represent both low- and high-frequency
content without adverse side-effects remains an active area of research.
3.3. Balancing Compactness and Expressiveness
Another open challenge lies in balancing the compactness and expressiveness of INRs.
A hallmark of implicit representations is their memory efficiency: complex signals
are encoded in relatively few neural parameters, with memory scaling primarily according
to model size rather than output resolution. This compactness is crucial for scalability
to high-dimensional data [74]. However, there is an inherent tension between maintaining compactness and ensuring
sufficient representational capacity to capture highly detailed or large-scale structures.
In theory, the memory usage of INRs grows with the complexity of the represented function
rather than directly with output resolution, which suggests excellent scalability.
In practice, however, representing extremely high-resolution or highly complex signals
often requires larger networks or dense sampling during training, thereby diminishing
the efficiency advantage. For example, to capture fine textures or geometric details,
one may increase the network width or depth, or employ extensive multi-frequency encodings,
which increases both the parameter count and the training cost. As a result, scalability
remains a bottleneck: many INR methods struggle or become impractically slow when
extended to ultra-high-resolution images or fine-grained 3D geometries.
The key challenge, therefore, is to retain a compact representation while scaling
to these complexities. Recent approaches have explored spatially adaptive designs,
such as local subnetworks and multi-grid structures [48], which allocate higher representational capacity only where needed. However, integrating
such spatial adaptivity while maintaining full differentiability and architectural
simplicity remains non-trivial.
3.4. Resolution Adaptability
The adaptability to resolution (or resolution-independence) of INRs is a double-edged
sword. On one hand, because INRs are defined as continuous functions, a single model
can be queried at any resolution, offering built-in super-resolution and continuous
zoom capabilities. This is a major advantage over grid-based representations and contributes
to the generalization of INRs beyond fixed discretizations.
On the other hand, guaranteeing consistent fidelity across different query resolutions
is challenging. If INRs are trained on data at a certain scale or sampling density,
querying them at a much finer resolution might reveal interpolation artifacts or missing
high-frequency detail that the models never learned. The models might either over-smooth
those upsampled regions (due to spectral bias) or, if forced to fit every training
sample exactly, they could reproduce high-frequency noise or aliasing when queried
off-grid. Current training regimes for INRs often implicitly assume a certain target
resolution or distribution of sample points; using these models far outside this range
can lead to degraded quality. Techniques like multi-scale supervision and anti-aliasing
filters are being explored to ensure that the continuous representations remain faithful
when extrapolating to new scales.
Yet, achieving truly resolution-robust representations with INRs is still an open
issue. Recent advanced models (e.g., FINER networks with dynamic frequency scaling)
explicitly attempt to adapt to varying levels of detail, yielding more stable results
across scales. Nonetheless, a general solution for making INRs reliably handle arbitrary
resolution queries without retraining or quality loss has not been fully established.
5. Applications: Basic Guideline
5.1. Overview
INRs provide a powerful alternative to traditional representations across a wide range
of tasks. Their ability to represent signals as continuous functions enables applications
that require flexible resolution, memory efficiency, and smooth signal interpolation.
5.2. Detailed Examples
Signal representation: Implicit neural representations can serve as a flexible paradigm for modeling continuous
signals across different domains. As illustrated in Fig. 2, one canonical task is 2D image fitting
[24], where INRs directly regress pixel intensities of natural images, providing a fundamental
benchmark to assess their ability to approximate continuous functions in low-level
vision. Another important example is 3D occupancy prediction
[26], where INRs capture volumetric occupancy fields that encode the geometry of objects,
highlighting their capacity to model complex spatial structures in a compact and continuous
form.
Image restoration: INRs can be applied to a wide range of low-level downstream tasks, as shown in Fig. 3. The top row illustrates image denoising
[3] , where INRs remove noise from natural images and reconstruct clean signals without
relying on handcrafted filtering priors. The second row shows CT reconstruction
[47] from limited projections, in which INRs model the underlying continuous attenuation
field, enabling faithful reconstructions even from sparse-view measurements. The third
row depicts $4\times$ super-resolution (SR), where INRs upscale low-resolution inputs to recover fine-grained details. Thanks
to their capability for continuous representation, INRs naturally extend beyond fixed
integer ratios and enable arbitrary-scale SR (e.g., $\times 4.7$, $\times 8.6$, or $\times 11.4$) [1], underscoring their flexibility compared to discrete upsampling approaches.
Fig. 3. Image restoration tasks using implicit neural representations (results from
WIRE [33]).
Neural rendering: Recent advances in Neural Radiance Fields for View Synthesis (NeRF) [14] have sparked tremendous research activity in neural rendering. Importantly, Implicit
Neural Representations (INRs) have been adopted within NeRF’s MLP backbone, enabling
richer function approximation compared to conventional ReLU activations. Notably,
INR models with carefully designed non-linear activations can surpass traditional
ReLU + positional encoding (P.E.) [36] baselines, as demonstrated in Fig. 4. Furthermore, Fig. 5 shows that such activation-based INRs exhibit stronger robustness when training views
are reduced from 100 to as few as 25, significantly outperforming P.E.-based NeRF
formulations under sparse supervision.
Fig. 4. Qualitative and quantitative results on the Lego dataset. INR-based activations
(e.g., WIRE, SIREN) outperform ReLU + positional encoding both visually and in PSNR
(reported in WIRE [33]).
Fig. 5. Qualitative results with varying numbers of training views on the Drums dataset
(results from WIRE [33]).
5.3. Discussion and Future Work
By leveraging the inherent compactness of implicit neural representations (INRs),
these models can serve as competitive alternatives to transformer [70]- or CNN-based architectures [71] in low-level vision tasks, while being directly trainable and deployable in real-world
domains. Generalised Implicit Neural Representations [72] have already shown potential in handling non-Euclidean domains, such as generating
texture signals via Gray-Scott reaction-diffusion simulations on the Stanford bunny
mesh, modeling protein solvent-excluded surface graphs, and capturing social dynamics
in the US Election through county-level Facebook Social Connectedness networks. We
expect future research to further exploit such generalization, paving the way for
practical and domain-adaptive INR applications.
Furthermore, INR research should not remain limited to per-scene optimization but
expand towards multi-scene generalizable INRs, which would ensure broader applicability
beyond synthetic benchmarks. In parallel, efficiency and compression perspectives
open another promising direction. For instance, SINR [73] introduces a sparsity-driven compression framework that leverages high-dimensional
dictionary-based sparse codes, achieving substantial reductions in storage requirements
while preserving high-quality decoding across diverse modalities. Complementarily,
the bit-plane decomposition approach [74] targets the digital precision bottleneck by predicting bit-planes, enabling lossless
representation even for high bit-depth signals and facilitating faster convergence
with constant model size. Together, these lines of research highlight the importance
of advancing INR efficiency and compression to enable practical deployment across
large-scale, real-world scenarios.