PVeRA

PVeRA

Probabilistic Vector-Based Random Matrix Adaptation

WACV 2026

1MICS Laboratory, CentraleSupΓ©lec, UniversitΓ© Paris-Saclay
2IHU PRISM, National Center for Precision Medicine in Oncology, Gustave Roussy
3Institute of Computer Sciences, CONICET, Universidad de Buenos Aires

TL;DR

We propose PVeRA, a probabilistic adapter that learns a distribution over weight adaptations rather than a fixed one. Built on top of VeRA's frozen random matrices, PVeRA uses a reparameterization trick to sample latent adaptations at each forward pass, adding calibrated uncertainty estimates at virtually no extra parameter cost. On the 19-dataset VTAB-1k benchmark with a DINOv2 backbone, PVeRA outperforms VeRA and six other adapters (avg. 71.4% vs. 69.9%), stays well-calibrated, enables confidence interval estimation, and supports out-of-distribution detection.

Hugging Face integration

PVeRA is integrated into the πŸ€— PEFT library. Install PEFT and use PveraConfig as a drop-in replacement for other adapter configs. See the full documentation for all configuration options including pvera_dropout, target_modules, layers_to_transform, and sample_at_inference.

from transformers import AutoModel
from peft import PveraConfig, get_peft_model

base_model = AutoModel.from_pretrained("facebook/dinov2-base")

config = PveraConfig(r=256, target_modules=["query", "value"], pvera_dropout=0.0, sample_at_inference=False)
model = get_peft_model(base_model, config)

We release 19 pretrained PVeRA adapters based on DINOv2-B, one per VTAB-1k dataset. All adapters are available in the πŸ€— Hugging Face collection.

from transformers import AutoModel
from peft import PeftModel

# Available datasets: caltech101, cifar, dtd, flowers102, pets, sun397, svhn, camelyon,
#                     eurosat, resisc45, retinopathy, clevrcount, clevrdist, dmlab,
#                     dspritesloc, dspritesori, kittidist, smallnorbazi, smallnorbele
dataset = "caltech101"

base_model = AutoModel.from_pretrained("facebook/dinov2-base")
model = PeftModel.from_pretrained(base_model, f"leoflx/pvera_dinov2_b_{dataset}")
model.eval()

Methods

Preliminaries

Let \(\mathbf{x} \in \mathbb{R}^{l \times d}\) be the input to the attention mechanism, with \(l \in \mathbb{N}_+^*\) the sequence length and \(d \in \mathbb{N}_+^*\) the feature space dimensionality. Self-attention is obtained using linear layers with weights \(\mathbf{W}_q, \mathbf{W}_k, \mathbf{W}_v \in \mathbb{R}^{d \times d}\) and biases \(\mathbf{b}_{W_q}, \mathbf{b}_{W_k}, \mathbf{b}_{W_v} \in \mathbb{R}^d\):

\[\mathbf{x}_{\{q,k,v\}} = \mathbf{x}\mathbf{W}^T_{\{q,k,v\}} + \mathbf{b}_{W_{\{q,k,v\}}}\]
\[\text{Attention}(\mathbf{x}_q, \mathbf{x}_k, \mathbf{x}_v) = \text{softmax}\!\left(\frac{\mathbf{x}_q \mathbf{x}_k^T}{\sqrt{d}}\right)\mathbf{x}_v\]

LoRA is applied to the query and value branches. Given rank \(r \in \mathbb{N}_+^* < d\) and scaling \(\alpha \in \mathbb{R}_+^*\), its only trainable parameters are \(\mathbf{A} \in \mathbb{R}^{d \times r}\) and \(\mathbf{B} \in \mathbb{R}^{r \times d}\):

\[\mathbf{x}_{\text{LoRA}\{q,v\}} = \left(\mathbf{x}\mathbf{W}^T_{\{q,v\}} + \mathbf{b}_{W_{\{q,v\}}}\right) + \frac{\alpha}{r}\!\left(\mathbf{x}\mathbf{A}_{\{q,v\}}\right)\mathbf{B}_{\{q,v\}}\]

VeRA shares frozen random matrices \(\mathbf{A}\) and \(\mathbf{B}\) across all layers. Its only trainable parameters are \(\mathbf{d} \in \mathbb{R}^r\) and \(\mathbf{b} \in \mathbb{R}^d\):

\[\mathbf{x}_{\text{VeRA}\{q,v\}} = \left(\mathbf{x}\mathbf{W}^T_{\{q,v\}} + \mathbf{b}_{W_{\{q,v\}}}\right) + \alpha\!\left(\!\left(\mathbf{x}\mathbf{A}_{\{q,v\}} \odot \mathbf{d}_{\{q,v\}}\right)\mathbf{B}_{\{q,v\}} \odot \mathbf{b}_{\{q,v\}}\right)\]

Probabilistic Adaptation

Our proposed adapter, PVeRA, is a probabilistic adaptation of VeRA. \(\mathbf{A}_{\{q,v\}} \in \mathbb{R}^{d \times 2r}\) and \(\mathbf{d}_{\{q,v\}} \in \mathbb{R}^{2r}\) are used to generate \(\boldsymbol{\mu}_{\{q,v\}} \in \mathbb{R}^r\) and \(\boldsymbol{\sigma}_{\{q,v\}} \in \mathbb{R}^r\), representing the mean and standard deviation of a multivariate normal distribution. Using the reparameterization trick, we sample from the learned distribution of the latent space \(\mathbf{z}_{\{q,v\}} \sim \mathcal{N}(\boldsymbol{\mu}_{\{q,v\}}, \boldsymbol{\sigma}^2_{\{q,v\}})\):

\[\boldsymbol{\mu}_{\{q,v\}},\, \boldsymbol{\sigma}_{\{q,v\}} = \mathbf{x}\mathbf{A}_{\{q,v\}} \odot \mathbf{d}_{\{q,v\}}\]
\[\mathbf{z}_{\{q,v\}} = \boldsymbol{\epsilon} \odot \boldsymbol{\sigma}_{\{q,v\}} + \boldsymbol{\mu}_{\{q,v\}}, \quad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{1})\]
\[\mathbf{x}_{\text{PVeRA}\{q,v\}} = \left(\mathbf{x}\mathbf{W}^T_{\{q,v\}} + \mathbf{b}_{W_{\{q,v\}}}\right) + \alpha\!\left(\mathbf{z}_{\{q,v\}}\mathbf{B}_{\{q,v\}} \odot \mathbf{b}_{\{q,v\}}\right)\]

We use a KL divergence loss to enforce a standard Normal prior to each PVeRA adapter. With \(\beta \in \mathbb{R}_+^*\) defined as the KL loss scaling factor (defined per dataset using a grid search on the validation loss):

\[\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{classification}} + \beta \sum_{\text{layer}\in\text{ViT}} \mathcal{L}_{\text{KL,layer}}\]
\[\mathcal{L}_{\text{KL,layer}} = \frac{1}{2}\sum_{i\in\{q,v\}} D_{\text{KL}}\!\left(\mathcal{N}\!\left(\boldsymbol{\mu}_{i,\text{layer}},\, \boldsymbol{\sigma}^2_{i,\text{layer}}\right) \;\Big\|\; \mathcal{N}(\mathbf{0}, \mathbf{I})\right)\]

Inference

During inference, the adaptation can be performed in a deterministic or probabilistic fashion. For deterministic inference, we use \(\mathbf{z}_{\{q,v\}} = \boldsymbol{\mu}_{\{q,v\}}\). As with VeRA, the weights can then be merged into the original model weights, resulting in no additional inference time:

\[\mathbf{W}_{\{q,v\}} \leftarrow \mathbf{W}_{\{q,v\}} + \alpha\,\mathbf{A}_{\boldsymbol{\mu}_{\{q,v\}}}\mathbf{B}_{\{q,v\}} \odot \mathbf{b}_{\{q,v\}}\]

Alternatively, a probabilistic adaptation can be used such that adaptations are randomly drawn from the learned distribution, enabling Monte Carlo confidence interval estimation. Unless otherwise stated, we use deterministic inference in our experiments.

Results

We benchmark PVeRA against seven adapters on the 19 datasets of VTAB-1k using three sizes of DINOv2 as backbone. Results are average accuracy (%) across three random seeds. Bold indicates the best result. Differences for PVeRA are reported with respect to VeRA.

Model LinearBottleneck(IA)3AdaptFormer DoRALoRAVeRAPVeRA
DINOv2-S 53.651.564.060.1 66.166.165.9 67.5+1.6
DINOv2-B 57.068.266.669.6 71.070.569.9 71.4+1.5
DINOv2-L 56.670.263.951.3 71.273.171.9 73.3+1.4

Results on VTAB-1k across 3 models.

Uncertainty Estimation

At inference time, PVeRA can perform multiple stochastic forward passes by sampling from the learned adapter distributions. Accumulating the softmax scores across passes yields a Monte Carlo confidence interval for the predicted class.

Aerial input image

Input image

PVeRA

0 / 16 passes

BibTeX

@inproceedings{fillioux2025pvera,
  title={{PVeRA}: Probabilistic Vector-Based Random Matrix Adaptation},
  author={Fillioux, Leo and Ferrante, Enzo and Cournède, Paul-Henry and Vakalopoulou, Maria and Christodoulidis, Stergios},
  booktitle={Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
  year={2026}
}

This template was inspired from this page and this one .