SAMoR — Project Page

Abstract

We introduce SAMoR (Skeleton-Aware Motion Representation for Articulated Objects), a cross-topology motion representation that encodes each motion segment as a small fixed number (K = 8) of part tokens shared across arbitrary skeletons. SAMoR processes per-joint motion features, kinematic graph structure, and joint-name embeddings with a graph-transformer encoder, then compresses the resulting heterogeneous per-joint features into part-level tokens via cross-attention pooling and residual vector quantization, yielding a discrete motion codebook shared across rigs. To prevent the part queries from collapsing into redundant global representations, we introduce a topology-agnostic attention supervision loss, combined with random joint-name dropout to prevent over-reliance on text labels; together these encourage the part tokens to cluster joints into functional groups from names, structure, and motion jointly.

SAMoR architecture

Mesh Animation Pipeline

End-to-end pipeline: an AI-generated photo is first lifted into an AI-generated mesh, then processed by an auto-rigging system to attach a skinned skeleton; SAMoR generates or transfers motion onto that rig; and LBS produces the final animation. Monster and Girl use text-conditioned generation; Octopus uses cross-topology motion transfer.

Monster Generation

"The character is walking"

AI-generated photo

→

AI-generated mesh

→

Auto-rig + skeleton

→

SAMoR + LBS (front)

·

SAMoR + LBS (side)

Girl Generation

"The character does a round kick with left leg"

AI-generated photo

→

AI-generated mesh

→

Auto-rig + skeleton

→

SAMoR + LBS (front)

·

SAMoR + LBS (side)

Octopus Transfer

AI-generated photo

→

AI-generated mesh

→

Auto-rig + skeleton

→

SAMoR + LBS (front)

·

SAMoR + LBS (side)

Source motion

Humanoid
cross-topology transfer

Robot Transfer

AI-generated photo

→

AI-generated mesh

→

Auto-rig + skeleton

→

SAMoR + LBS (front)

·

SAMoR + LBS (side)

Source motion

Humanoid
cross-topology transfer

Cross-Topology Motion Transfer

Source skeleton motion (left) decoded onto target skeleton (right) across diverse topology gaps. The one-source-to-many demo is shown above as the teaser.

Humanoid → Chicken

mesh

skeleton

→

mesh

skeleton

Humanoid → OXL Cartoon Character

mesh

skeleton

→

mesh

skeleton

Eagle → Humanoid

mesh

skeleton

→

mesh

skeleton

Crocodile → Lion

mesh

skeleton

→

mesh

skeleton

Text-Conditioned Motion Generation

SAMoR part tokens are decoded from a text-conditioned MaskGIT generator and rendered on the target skeleton. Each card shows one character in a 2×2 grid: top row is the mesh (rest pose | generated motion via LBS), bottom row is the skeleton (rest | generated motion).

Brown Bear

"The bear is attacking with front body"

Rest mesh (input)

static

Generated mesh (LBS)

motion

Rest skeleton

static

Generated skeleton

motion

Hulk

"The character stepped to perform a powerful punch"

Rest mesh (input)

static

Generated mesh (LBS)

motion

Rest skeleton

static

Generated skeleton

motion

Part-wise Motion Editing

The original motion is encoded into SAMoR tokens. A subset of part slots (lower body) is masked and re-generated with a new prompt, while upper-body tokens remain fixed. Each column shows the skeleton motion on top and the LBS-driven textured mesh below.

Original motion (pre-edit)

Input — full body, unchanged tokens

"The women performs a rhythmic sidestep"

After edit — lower-body tokens replaced

Part-Token Motion Composition

Lower-body part tokens from a human walking motion and upper-body part tokens from an eagle take-off motion are combined and decoded together on a chicken skeleton. The top card shows skeleton motions; the bottom card shows the corresponding LBS-driven textured meshes.

Human (lower body)

Walking motion

+

Eagle (upper body)

Take-off motion

decode on

→

Chicken

=

Composed on Chicken

Upper (Eagle) + Lower (Human)

Human (lower body)

Walking motion

+

Eagle (upper body)

Take-off motion

decode on

→

Chicken

=

Composed on Chicken

Upper (Eagle) + Lower (Human)

Limitations & future work

Limitation 1

Positions, not rotations

SAMoR operates on joint positions rather than rotations in the cross-topology setting, because heterogeneous assets lack a unified rest-pose reference that makes local rotations comparable across rigs. Recovering bone rotations for LBS mesh animation therefore requires an IK post-process, which introduces twist ambiguity; accurate twist recovery from positions alone remains an open problem.

Limitation 2

Fixed number of part groups

The K = 8 functional grouping is a fixed design choice. Learning the number and structure of part groups automatically — adapting it to the topology of the target rig — is a natural direction for future work.

BibTeX

@misc{samor2026,
  author = {TBD},
  title  = {TBD},
  year   = {TBD},
  eprint = {TBD},
  url    = {TBD}
}