SAMoR: Motion Modelling for Articulated Objects of Any Skeleton and Topology

1 Imperial College London  ·  2 University of Tübingen, Tübingen AI Center
Source mesh animation Source skeleton
Source
Chicken mesh Chicken skel
Chicken
Coyote mesh Coyote skel
Coyote
Eagle mesh Eagle skel
Eagle
Fire Ant mesh Fire Ant skel
Fire Ant
Rabbit mesh Rabbit skel
Rabbit
Robot mesh Robot skel
Robot
One source motion (left) is encoded into SAMoR part tokens and decoded onto six diverse target topologies without retraining. Mesh on top, skeleton below.
All images and GIFs on this page can be double-clicked to zoom in.

Abstract

We introduce SAMoR (Skeleton-Aware Motion Representation for Articulated Objects), a cross-topology motion representation that encodes each motion segment as a small fixed number (K = 8) of part tokens shared across arbitrary skeletons. SAMoR processes per-joint motion features, kinematic graph structure, and joint-name embeddings with a graph-transformer encoder, then compresses the resulting heterogeneous per-joint features into part-level tokens via cross-attention pooling and residual vector quantization, yielding a discrete motion codebook shared across rigs. To prevent the part queries from collapsing into redundant global representations, we introduce a topology-agnostic attention supervision loss, combined with random joint-name dropout to prevent over-reliance on text labels; together these encourage the part tokens to cluster joints into functional groups from names, structure, and motion jointly.

SAMoR architecture

SAMoR architecture diagram
SAMoR architecture. Top: the motion encoder block takes per-joint motion features (T, J, D) with skeleton conditioning and produces K = 8 part-motion tokens (T, K, D), which are quantized by residual VQ. Bottom: the decoder block reverses the path with inverse cross-attention conditioned on a possibly-different target skeleton, enabling cross-topology motion transfer at decode time.

Mesh Animation Pipeline

End-to-end pipeline: an AI-generated photo is first lifted into an AI-generated mesh, then processed by an auto-rigging system to attach a skinned skeleton; SAMoR generates or transfers motion onto that rig; and LBS produces the final animation. Monster and Girl use text-conditioned generation; Octopus uses cross-topology motion transfer.

Monster Generation
"The character is walking"
Monster photo
AI-generated photo
Textured mesh
AI-generated mesh
Skeleton overlay
Auto-rig + skeleton
LBS front
SAMoR + LBS (front)
LBS side
SAMoR + LBS (side)
Girl Generation
"The character does a round kick with left leg"
Girl photo
AI-generated photo
Textured mesh
AI-generated mesh
Skeleton overlay
Auto-rig + skeleton
LBS front
SAMoR + LBS (front)
LBS side
SAMoR + LBS (side)
Octopus Transfer
Octopus photo
AI-generated photo
Textured mesh
AI-generated mesh
Skeleton overlay
Auto-rig + skeleton
LBS front
SAMoR + LBS (front)
LBS side
SAMoR + LBS (side)
Source motion
source skeleton
Humanoid
cross-topology transfer
Robot Transfer
Robot photo
AI-generated photo
Textured mesh
AI-generated mesh
Skeleton overlay
Auto-rig + skeleton
LBS front
SAMoR + LBS (front)
LBS side
SAMoR + LBS (side)
Source motion
source skeleton
Humanoid
cross-topology transfer

Cross-Topology Motion Transfer

Source skeleton motion (left) decoded onto target skeleton (right) across diverse topology gaps. The one-source-to-many demo is shown above as the teaser.

Humanoid → Chicken
mesh
humanoid mesh
skeleton
humanoid skel
mesh
chicken mesh
skeleton
chicken skel
Humanoid → OXL Cartoon Character
mesh
humanoid mesh
skeleton
humanoid skel
mesh
oxl mesh
skeleton
oxl skel
Eagle → Humanoid
mesh
eagle mesh
skeleton
eagle skel
mesh
humanoid mesh
skeleton
humanoid skel
Crocodile → Lion
mesh
croc mesh
skeleton
croc skel
mesh
lion mesh
skeleton
lion skel

Text-Conditioned Motion Generation

SAMoR part tokens are decoded from a text-conditioned MaskGIT generator and rendered on the target skeleton. Each card shows one character in a 2×2 grid: top row is the mesh (rest pose | generated motion via LBS), bottom row is the skeleton (rest | generated motion).

Brown Bear
"The bear is attacking with front body"
Rest mesh (input)
static
Generated mesh (LBS)
motion
Rest skeleton
static
Generated skeleton
motion
Hulk
"The character stepped to perform a powerful punch"
Rest mesh (input)
static
Generated mesh (LBS)
motion
Rest skeleton
static
Generated skeleton
motion

Part-wise Motion Editing

The original motion is encoded into SAMoR tokens. A subset of part slots (lower body) is masked and re-generated with a new prompt, while upper-body tokens remain fixed. Each column shows the skeleton motion on top and the LBS-driven textured mesh below.

Original motion (pre-edit)
pre-edit mesh LBS pre-edit skel
Input — full body, unchanged tokens
"The women performs a rhythmic sidestep"
edited mesh LBS edited skel
After edit — lower-body tokens replaced

Part-Token Motion Composition

Lower-body part tokens from a human walking motion and upper-body part tokens from an eagle take-off motion are combined and decoded together on a chicken skeleton. The top card shows skeleton motions; the bottom card shows the corresponding LBS-driven textured meshes.

Human (lower body)
human walk mesh
Walking motion
+
Eagle (upper body)
eagle takeoff mesh
Take-off motion
decode on
chicken rest mesh
Chicken
=
Composed on Chicken
composed mesh result
Upper (Eagle) + Lower (Human)
Human (lower body)
human walk skel
Walking motion
+
Eagle (upper body)
eagle takeoff skel
Take-off motion
decode on
chicken skeleton
Chicken
=
Composed on Chicken
composed skel result
Upper (Eagle) + Lower (Human)

Limitations & future work

Limitation 1

Positions, not rotations

SAMoR operates on joint positions rather than rotations in the cross-topology setting, because heterogeneous assets lack a unified rest-pose reference that makes local rotations comparable across rigs. Recovering bone rotations for LBS mesh animation therefore requires an IK post-process, which introduces twist ambiguity; accurate twist recovery from positions alone remains an open problem.

Limitation 2

Fixed number of part groups

The K = 8 functional grouping is a fixed design choice. Learning the number and structure of part groups automatically — adapting it to the topology of the target rig — is a natural direction for future work.

BibTeX

@misc{samor2026,
  author = {TBD},
  title  = {TBD},
  year   = {TBD},
  eprint = {TBD},
  url    = {TBD}
}