Towards Suturing World Models (LTX-Video, i2v)

This repository hosts the fine-tuned LTX-Video image-to-video (i2v) diffusion model specialized for generating realistic robotic surgical suturing videos, capturing fine-grained sub-stitch actions including needle positioning, targeting, driving, and withdrawal. The model can differentiate between ideal and non-ideal surgical techniques, making it suitable for applications in surgical training, skill evaluation, and autonomous surgical system development.

Model Details

  • Base Model: LTX-Video
  • Resolution: 768×512 pixels (Adjustable)
  • Frame Length: 49 frames per generated video (Adjustable)
  • Fine-tuning Method: Low-Rank Adaptation (LoRA)
  • Data Source: Annotated laparoscopic surgery exercise videos (∼2,000 clips)

Usage Example

import os
import argparse
import torch
from diffusers.utils import export_to_video, load_image
from stg_ltx_i2v_pipeline import LTXImageToVideoSTGPipeline

def generate_video_from_image(
    image_path,
    prompt,
    output_dir="outputs",
    width=768,
    height=512,
    num_frames=49,
    lora_path="mehmetkeremturkcan/Suturing-LTX-I2V",
    lora_weight=1.0,
    prefix="suturingmodel, ",
    negative_prompt="worst quality, inconsistent motion, blurry, jittery, distorted",
    stg_mode="STG-A",
    stg_applied_layers_idx=[19],
    stg_scale=1.0,
    do_rescaling=True
):
    # Create output directory if it doesn't exist
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    # Load the model
    pipe = LTXImageToVideoSTGPipeline.from_pretrained(
        "a-r-r-o-w/LTX-Video-0.9.1-diffusers", 
        torch_dtype=torch.bfloat16, 
        local_files_only=False
    )
    # Apply LoRA weights
    pipe.load_lora_weights(
        lora_path, 
        weight_name="pytorch_lora_weights.safetensors", 
        adapter_name="suturing"
    )
    pipe.set_adapters("suturing", lora_weight)
    pipe.to("cuda")
    # Prepare the image and prompt
    image = load_image(image_path).resize((width, height))
    full_prompt = prefix + prompt if prefix else prompt
    # Generate output filename
    basename = os.path.basename(image_path).split('.')[0]
    output_filename = f"{basename}_i2v.mp4"
    output_path = os.path.join(output_dir, output_filename)
    # Generate the video
    print(f"Generating video with prompt: {full_prompt}")
    video = pipe(
        image=image,
        prompt=full_prompt,
        negative_prompt=negative_prompt,
        width=width,
        height=height,
        num_frames=num_frames,
        num_inference_steps=50,
        decode_timestep=0.03,
        decode_noise_scale=0.025,
        generator=None,
        stg_mode=stg_mode,
        stg_applied_layers_idx=stg_applied_layers_idx,
        stg_scale=stg_scale,
        do_rescaling=do_rescaling
    ).frames[0]
    
    # Export the video
    export_to_video(video, output_path, fps=24)
    print(f"Video saved to: {output_path}")
    return output_path

generate_video_from_image(
    image_path="../suturing_datasetv2/images/9_railroad_final_8487-8570_NeedleWithdrawalNonIdeal.png",
    prompt="A needlewithdrawalnonideal clip, generated from a backhand task."
)

Applications

  • Surgical Training: Generate demonstrations of both ideal and non-ideal surgical techniques for training purposes.
  • Skill Evaluation: Assess surgical skills by comparing actual procedures against model-generated standards.
  • Robotic Automation: Inform autonomous surgical robotic systems for real-time guidance and procedure automation.

Quantitative Performance

Metric Performance
L2 Reconstruction Loss 0.24501
Inference Time ~18.7 seconds per video

Future Directions

Further improvements will focus on increasing model robustness, expanding the dataset diversity, and enhancing real-time applicability to robotic surgical scenarios.

Downloads last month
12
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for mehmetkeremturkcan/Suturing-LTX-I2V

Finetuned
(14)
this model

Collection including mehmetkeremturkcan/Suturing-LTX-I2V