NAME
Lugh::LoRA - Low-Rank Adaptation (LoRA) adapter support for Lugh
SYNOPSIS
use Lugh;
# Load base model first
my $model = Lugh::Model->new(file => 'base-model.gguf');
# Load a LoRA adapter (GGUF format)
my $lora = Lugh::LoRA->new(
adapter => 'adapter.gguf',
model => $model, # Required: validates architecture match
);
# Load a LoRA adapter (SafeTensors format)
my $lora = Lugh::LoRA->new(
adapter => 'adapter.safetensors',
model => $model,
);
# Check adapter properties
say "Alpha: ", $lora->alpha;
say "Scale: ", $lora->scale;
say "Weights: ", $lora->n_weights;
say "Format: ", $lora->format;
# Adjust the LoRA scaling factor
$lora->scale(0.5); # Half strength
# Get weight names
my @names = $lora->weight_names;
# Use with inference
my $inference = Lugh::Inference->new(model => $model);
my @logits = $inference->forward(tokens => \@tokens, lora => $lora);
DESCRIPTION
Lugh::LoRA provides support for loading and applying Low-Rank Adaptation (LoRA) adapters to base models. LoRA is an efficient fine-tuning technique that adds small rank-decomposition weight matrices to frozen pre-trained models.
The modified output is computed as:
output = original_output + (alpha / rank) * scale * B @ A @ x
Where:
AandBare the low-rank LoRA matricesalphais the scaling factor from the adapter metadatarankis the inner dimension of the decompositionscaleis a user-adjustable multiplier (default 1.0)
SUPPORTED FORMATS
GGUF Format
The native format for llama.cpp and Lugh. GGUF LoRA files contain:
Metadata indicating this is a LoRA adapter (
general.type = "adapter")The LoRA alpha value (
adapter.lora.alpha)Architecture information for validation
Paired LoRA tensors (
*.lora_a,*.lora_b)
GGUF adapters provide architecture validation - the adapter's architecture must match the base model to ensure compatibility.
SafeTensors Format
The HuggingFace format for storing tensor data. SafeTensors files contain:
A JSON header with tensor metadata (names, shapes, offsets)
Raw tensor data in little-endian format
SafeTensors LoRA files typically use HuggingFace naming conventions which are automatically translated to the internal format:
base_model.model.layers.0.self_attn.q_proj.lora_A.weight
-> blk.0.attn_q.weight
Note: SafeTensors format does not include alpha metadata, so you may need to set it manually via the alpha accessor or by checking the adapter's original config.json.
METHODS
new
my $lora = Lugh::LoRA->new(
adapter => $path, # Required: path to LoRA file
model => $model, # Required: Lugh::Model for architecture validation
scale => $scale, # Optional: scaling factor (default 1.0)
);
Creates a new LoRA adapter from a GGUF or SafeTensors file. The file format is automatically detected from the file extension.
The model parameter is required because LoRA adapters must be validated against the base model's architecture to ensure compatibility. The adapter's layer names and tensor shapes must match the base model.
Note: The file parameter is also accepted as an alias for adapter.
alpha
my $alpha = $lora->alpha;
$lora->alpha(32.0);
Get or set the alpha scaling factor. This is typically set from the adapter metadata but can be overridden for SafeTensors files or experimentation.
scale
my $scale = $lora->scale;
$lora->scale(0.5);
Get or set the user scale multiplier. The effective scaling is alpha * scale / rank. Use this to adjust LoRA influence without changing alpha.
Common values:
1.0- Full LoRA effect (default)0.5- Half LoRA effect0.0- Disable LoRA (base model only)2.0- Double LoRA effect (may cause instability)
n_weights
my $count = $lora->n_weights;
Returns the number of LoRA weight pairs in the adapter. Each weight pair consists of an A matrix and a B matrix.
format
my $fmt = $lora->format;
Returns the source format of the adapter: "gguf" or "safetensors".
weight_names
my @names = $lora->weight_names;
Returns the list of tensor names that have LoRA adaptations. Names are in the internal format (e.g., "blk.0.attn_q.weight").
CREATING LORA ADAPTERS
LoRA adapters can be created from various sources:
From PEFT (HuggingFace)
from peft import LoraConfig, get_peft_model
config = LoraConfig(
r=16, # LoRA rank
lora_alpha=32, # Scaling factor
target_modules=["q_proj", "v_proj"],
)
peft_model = get_peft_model(base_model, config)
peft_model.save_pretrained("lora_adapter/")
Convert to GGUF:
python convert_lora_to_gguf.py lora_adapter/
From llama.cpp Fine-tuning
./llama-finetune \
--model-base base.gguf \
--train-data train.txt \
--lora-out adapter.gguf \
--lora-r 16 \
--lora-alpha 32
USING LORA WITH INFERENCE
LoRA adapters integrate with all forward methods:
Basic Forward
my @logits = $inference->forward(
tokens => \@tokens,
lora => $lora,
);
With KV Cache
my $cache = $inference->create_kv_cache();
my @logits = $inference->forward_with_cache(
cache => $cache,
tokens => \@tokens,
lora => $lora,
);
With Memory Pool
my $pool = $inference->create_memory_pool();
my @logits = $inference->forward_with_pool(
pool => $pool,
tokens => \@tokens,
lora => $lora,
);
Batch Processing
my $results = $inference->forward_batch(
sequences => [\@seq1, \@seq2, \@seq3],
lora => $lora,
);
Adjusting LoRA Strength
Use the scale property to adjust LoRA influence:
$lora->scale(0.0); # Disable LoRA (base model only)
$lora->scale(0.5); # Half LoRA effect
$lora->scale(1.0); # Full LoRA effect (default)
$lora->scale(2.0); # Double LoRA effect
SEE ALSO
Lugh, Lugh::Inference, Lugh::KVCache
AUTHOR
Your Name Here
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.