YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Official models of "MoChat: Joints-Grouped Spatio-Temporal Grounding LLM for Multi-Turn Motion Comprehension and Description"
Overview
MoChat is a Multimodal Large Language Model (MLLM) that revolutionizes human motion understanding through precise spatio-temporal grounding. Unlike conventional motion analysis systems, MoChat integrates:
- Motion Understanding: Performs fundamental motion comprehension and summarization.
- Spatial Limb Grounding: Accurately locates body parts involved in described movements.
- Temporal Action Grounding: Precisely identifies time boundaries corresponding to specific motion descriptions.
Models
We provide the following trained models for download:
- Joints-Grouped Skeleton Encoder for motion sequences representation.
- Two variants of motion comprehension models:
Resources
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support