NAME
Lugh::RoPE - RoPE (Rotary Position Embedding) Scaling Configuration
SYNOPSIS
use Lugh::RoPE;
# Create a default (no scaling) config
my $rope = Lugh::RoPE->new();
# Linear scaling: extend 4K context to 16K
my $rope = Lugh::RoPE->linear(4096, 16384);
# YaRN scaling: extend 4K context to 32K
my $rope = Lugh::RoPE->yarn(4096, 32768);
# Use presets
my $rope = Lugh::RoPE->linear_2x(4096); # 4K -> 8K
my $rope = Lugh::RoPE->linear_4x(4096); # 4K -> 16K
my $rope = Lugh::RoPE->yarn_32k(4096); # 4K -> 32K
my $rope = Lugh::RoPE->yarn_64k(4096); # 4K -> 64K
my $rope = Lugh::RoPE->yarn_128k(4096); # 4K -> 128K
# Manual configuration with all parameters
my $rope = Lugh::RoPE->new(
scaling_type => 'yarn',
n_ctx_orig => 4096,
target_ctx => 32768,
freq_base => 10000.0,
ext_factor => -1.0, # -1 = auto-compute
attn_factor => 1.0,
beta_fast => 32.0,
beta_slow => 1.0,
);
# Use with inference
$inference->forward($model, $tokens, { rope => $rope });
# Query configuration
say $rope->scaling_type_name; # "yarn"
say $rope->freq_scale; # 0.125 (4096/32768)
DESCRIPTION
Lugh::RoPE provides configuration for RoPE (Rotary Position Embedding) scaling, enabling models to handle context lengths beyond their training limit.
Scaling Methods
none - No scaling, use original context length
linear - Simple frequency interpolation. Works well for 2-4x extensions.
yarn - YaRN (Yet another RoPE extensioN). Combines NTK interpolation with attention temperature scaling. Better for larger extensions (4-16x+).
longrope - LongRoPE method (experimental)
CONSTRUCTORS
new
my $rope = Lugh::RoPE->new(%options);
Create a new RoPE configuration with explicit parameters.
Options:
- scaling_type
-
Type of scaling: 'none', 'linear', 'yarn', or 'longrope'. Can also use constants: ROPE_SCALING_NONE, ROPE_SCALING_LINEAR, etc.
- n_ctx_orig
-
Original training context length.
- target_ctx
-
Target extended context length.
- freq_base
-
Base frequency for RoPE. Default 0 (use model's value).
- freq_scale
-
Frequency scaling factor. Auto-computed from n_ctx_orig/target_ctx if not set.
- ext_factor
-
YaRN extension factor. -1.0 = auto-compute.
- attn_factor
-
YaRN attention temperature factor. Default 1.0.
- beta_fast
-
YaRN high-frequency boundary. Default 32.0.
- beta_slow
-
YaRN low-frequency boundary. Default 1.0.
none
my $rope = Lugh::RoPE->none();
Create a no-scaling configuration. Uses model's original context.
linear
my $rope = Lugh::RoPE->linear($n_ctx_orig, $target_ctx);
Create linear scaling configuration.
yarn
my $rope = Lugh::RoPE->yarn($n_ctx_orig, $target_ctx, %yarn_opts);
Create YaRN scaling configuration. Optional YaRN parameters:
my $rope = Lugh::RoPE->yarn(4096, 32768,
beta_fast => 16.0,
beta_slow => 2.0,
);
PRESETS
Convenient methods for common configurations:
linear_2x
my $rope = Lugh::RoPE->linear_2x($n_ctx_orig);
Linear scaling to 2x original context.
linear_4x
my $rope = Lugh::RoPE->linear_4x($n_ctx_orig);
Linear scaling to 4x original context.
yarn_32k
my $rope = Lugh::RoPE->yarn_32k($n_ctx_orig);
YaRN scaling to 32K context.
yarn_64k
my $rope = Lugh::RoPE->yarn_64k($n_ctx_orig);
YaRN scaling to 64K context.
yarn_128k
my $rope = Lugh::RoPE->yarn_128k($n_ctx_orig);
YaRN scaling to 128K context.
from_model
my $rope = Lugh::RoPE->from_model($model);
Extract RoPE configuration from a model's GGUF metadata. This reads all RoPE-related parameters that were stored when the model was created, including:
Scaling type (none, linear, yarn, longrope)
Original and target context lengths
Frequency base and scale
YaRN parameters (ext_factor, attn_factor, beta_fast, beta_slow)
Example:
use Lugh::Model;
use Lugh::RoPE;
my $model = Lugh::Model->new(file => 'model.gguf');
my $rope = Lugh::RoPE->from_model($model);
say "Model uses ", $rope->scaling_type_name, " scaling";
say "Original context: ", $rope->n_ctx_orig;
# Use extracted config (or override with forward())
$inference->forward(tokens => \@tokens, rope => $rope);
ACCESSORS
All accessors are read-only:
- scaling_type - Returns integer constant
- scaling_type_name - Returns string: 'none', 'linear', 'yarn', 'longrope'
- n_ctx_orig - Original context length
- target_ctx - Target context length
- freq_base - Base frequency
- freq_scale - Frequency scale factor
- ext_factor - YaRN extension factor
- attn_factor - YaRN attention factor
- beta_fast - YaRN beta fast
- beta_slow - YaRN beta slow
CONSTANTS
use Lugh::RoPE;
Lugh::RoPE::ROPE_SCALING_NONE() # 0
Lugh::RoPE::ROPE_SCALING_LINEAR() # 1
Lugh::RoPE::ROPE_SCALING_YARN() # 2
Lugh::RoPE::ROPE_SCALING_LONGROPE() # 3
TECHNICAL DETAILS
Linear Scaling
Simple frequency interpolation:
freq_scale = n_ctx_orig / target_ctx
The RoPE frequencies are divided by the scale factor, allowing positions beyond the training length to map to the trained frequency range.
YaRN Scaling
YaRN (Yet another RoPE extensioN) uses a more sophisticated approach:
- 1. NTK-aware interpolation that preserves high-frequency components
- 2. Attention temperature scaling to compensate for entropy changes
- 3. Smooth blending between scaled and unscaled frequencies
Parameters:
ext_factor: Controls interpolation strength. -1 = auto-compute based on scale ratio.
attn_factor: Attention temperature scaling. 1.0 = no scaling.
beta_fast: High-frequency boundary (above this, minimal scaling)
beta_slow: Low-frequency boundary (below this, full scaling)
SEE ALSO
Lugh, Lugh::Inference, Lugh::Model
Paper references:
NTK-aware: https://arxiv.org/abs/2306.15595
AUTHOR
Lugh Authors
LICENSE
Same as Perl itself.