Spaces:
Build error
Build error
soupstick
commited on
Commit
Β·
1bfbe46
1
Parent(s):
8052866
Add Qwen2-VL Amazon listing generator files
Browse files- README.md +71 -12
- app.py +197 -0
- requirements.txt +10 -0
README.md
CHANGED
@@ -1,14 +1,73 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
---
|
13 |
|
14 |
-
|
|
|
1 |
+
# π Qwen2-VL Amazon Listing Generator (LoRA)
|
2 |
+
|
3 |
+
This Hugging Face Space showcases a **fine-tuned Qwen2-VL-7B model with LoRA adapter** trained to generate **Amazon-style product listings** from product images.
|
4 |
+
|
5 |
+
## π Features
|
6 |
+
|
7 |
+
- **Vision-Language Model**: Qwen2-VL-7B-Instruct with custom LoRA adapter
|
8 |
+
- **Amazon Listing Generation**: Creates structured product listings with:
|
9 |
+
- Product title
|
10 |
+
- Bullet points (key features)
|
11 |
+
- Product description
|
12 |
+
- Keywords
|
13 |
+
- Product category
|
14 |
+
- **CPU Optimized**: Runs on free CPU hardware (may take 1-2 minutes per generation)
|
15 |
+
|
16 |
+
## π§ Model Details
|
17 |
+
|
18 |
+
- **Base Model**: [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
|
19 |
+
- **LoRA Adapter**: [soupstick/qwen2vl-amazon-ft-lora](https://huggingface.co/soupstick/qwen2vl-amazon-ft-lora)
|
20 |
+
- **Fine-tuning**: Specialized for e-commerce product listing generation
|
21 |
+
|
22 |
+
## π― How to Use
|
23 |
+
|
24 |
+
1. **Upload Image**: Click on the image upload area and select a product photo
|
25 |
+
2. **Optional Prompt**: Modify the instruction if needed (default works well)
|
26 |
+
3. **Generate**: Click "Generate Listing" and wait for results
|
27 |
+
4. **Review Output**: Get structured Amazon-style listing in JSON format
|
28 |
+
|
29 |
+
## π Expected Output Format
|
30 |
+
|
31 |
+
```json
|
32 |
+
{
|
33 |
+
"title": "Product Title Here",
|
34 |
+
"bullet_points": [
|
35 |
+
"β’ Key feature 1",
|
36 |
+
"β’ Key feature 2",
|
37 |
+
"β’ Key feature 3"
|
38 |
+
],
|
39 |
+
"description": "Detailed product description...",
|
40 |
+
"keywords": "relevant, product, keywords",
|
41 |
+
"category": "Product > Category > Subcategory"
|
42 |
+
}
|
43 |
+
```
|
44 |
+
|
45 |
+
## β‘ Performance Notes
|
46 |
+
|
47 |
+
- **CPU Mode**: This demo runs on CPU hardware for free access
|
48 |
+
- **Processing Time**: 1-2 minutes per generation due to CPU limitations
|
49 |
+
- **Image Size**: Automatically resized to 512px for optimal performance
|
50 |
+
- **Memory Optimized**: Uses float32 and low memory settings
|
51 |
+
|
52 |
+
## π Links
|
53 |
+
|
54 |
+
- [Model Repository](https://huggingface.co/soupstick/qwen2vl-amazon-ft-lora)
|
55 |
+
- [Base Model](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
|
56 |
+
- [Qwen2-VL Paper](https://arxiv.org/abs/2409.12191)
|
57 |
+
|
58 |
+
## β οΈ Limitations
|
59 |
+
|
60 |
+
- **Demo Purpose**: This is a prototype for concept demonstration
|
61 |
+
- **Accuracy**: Results depend on training data quality and model size
|
62 |
+
- **Speed**: CPU inference is slower than GPU (upgrade hardware for faster results)
|
63 |
+
- **Languages**: Primarily trained on English product descriptions
|
64 |
+
|
65 |
+
## π οΈ Technical Stack
|
66 |
+
|
67 |
+
- **Framework**: Transformers, PEFT (LoRA), Gradio
|
68 |
+
- **Model**: Qwen2-VL-7B with custom LoRA adapter on Unsloth-AI
|
69 |
+
- **Hardware**: CPU-optimized for Hugging Face Spaces free tier
|
70 |
+
|
71 |
---
|
72 |
|
73 |
+
*Built with β€οΈ using Hugging Face Spaces*
|
app.py
ADDED
@@ -0,0 +1,197 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
import torch
|
3 |
+
import json
|
4 |
+
from PIL import Image
|
5 |
+
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
|
6 |
+
from peft import PeftModel
|
7 |
+
import warnings
|
8 |
+
warnings.filterwarnings("ignore")
|
9 |
+
|
10 |
+
# Model configuration
|
11 |
+
BASE_MODEL = "Qwen/Qwen2-VL-7B-Instruct"
|
12 |
+
ADAPTER = "soupstick/qwen2vl-amazon-ft-lora"
|
13 |
+
|
14 |
+
# Global variables for lazy loading
|
15 |
+
model = None
|
16 |
+
processor = None
|
17 |
+
|
18 |
+
def load_model():
|
19 |
+
"""Load model and processor with CPU optimization"""
|
20 |
+
global model, processor
|
21 |
+
|
22 |
+
if model is None:
|
23 |
+
print("β³ Loading model (CPU mode)...")
|
24 |
+
try:
|
25 |
+
# Force CPU usage and optimize for memory
|
26 |
+
model = Qwen2VLForConditionalGeneration.from_pretrained(
|
27 |
+
BASE_MODEL,
|
28 |
+
device_map="cpu",
|
29 |
+
torch_dtype=torch.float32, # Use float32 for CPU
|
30 |
+
trust_remote_code=True,
|
31 |
+
low_cpu_mem_usage=True,
|
32 |
+
use_cache=True
|
33 |
+
)
|
34 |
+
|
35 |
+
# Load LoRA adapter
|
36 |
+
print("β³ Loading LoRA adapter...")
|
37 |
+
model = PeftModel.from_pretrained(model, ADAPTER)
|
38 |
+
|
39 |
+
# Load processor
|
40 |
+
processor = AutoProcessor.from_pretrained(
|
41 |
+
BASE_MODEL,
|
42 |
+
trust_remote_code=True
|
43 |
+
)
|
44 |
+
|
45 |
+
print("β
Model loaded successfully!")
|
46 |
+
|
47 |
+
except Exception as e:
|
48 |
+
print(f"β Error loading model: {e}")
|
49 |
+
return False
|
50 |
+
|
51 |
+
return True
|
52 |
+
|
53 |
+
def generate_listing(image, prompt="Generate Amazon listing."):
|
54 |
+
"""Generate Amazon listing from image"""
|
55 |
+
|
56 |
+
if image is None:
|
57 |
+
return "β οΈ Please upload an image."
|
58 |
+
|
59 |
+
# Load model if not already loaded
|
60 |
+
if not load_model():
|
61 |
+
return "β Error: Could not load model. Please try again."
|
62 |
+
|
63 |
+
try:
|
64 |
+
# Resize image to reduce memory usage
|
65 |
+
if image.size[0] > 512 or image.size[1] > 512:
|
66 |
+
image.thumbnail((512, 512), Image.Resampling.LANCZOS)
|
67 |
+
|
68 |
+
# Prepare chat messages
|
69 |
+
messages = [{
|
70 |
+
"role": "user",
|
71 |
+
"content": [
|
72 |
+
{"type": "image", "image": image},
|
73 |
+
{"type": "text", "text": prompt}
|
74 |
+
],
|
75 |
+
}]
|
76 |
+
|
77 |
+
# Apply chat template
|
78 |
+
text = processor.apply_chat_template(
|
79 |
+
messages,
|
80 |
+
tokenize=False,
|
81 |
+
add_generation_prompt=True
|
82 |
+
)
|
83 |
+
|
84 |
+
# Process inputs
|
85 |
+
inputs = processor(
|
86 |
+
text=text,
|
87 |
+
images=image,
|
88 |
+
return_tensors="pt"
|
89 |
+
)
|
90 |
+
|
91 |
+
# Generate with conservative settings for CPU
|
92 |
+
print("β³ Generating listing...")
|
93 |
+
with torch.no_grad():
|
94 |
+
generated_ids = model.generate(
|
95 |
+
**inputs,
|
96 |
+
max_new_tokens=256, # Reduced for CPU
|
97 |
+
do_sample=True,
|
98 |
+
temperature=0.7,
|
99 |
+
top_p=0.8,
|
100 |
+
pad_token_id=processor.tokenizer.eos_token_id
|
101 |
+
)
|
102 |
+
|
103 |
+
# Decode output
|
104 |
+
generated_ids_trimmed = [
|
105 |
+
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
|
106 |
+
]
|
107 |
+
|
108 |
+
output_text = processor.batch_decode(
|
109 |
+
generated_ids_trimmed,
|
110 |
+
skip_special_tokens=True,
|
111 |
+
clean_up_tokenization_spaces=False
|
112 |
+
)[0]
|
113 |
+
|
114 |
+
return output_text
|
115 |
+
|
116 |
+
except Exception as e:
|
117 |
+
return f"β Error generating listing: {str(e)}"
|
118 |
+
|
119 |
+
def format_example_output():
|
120 |
+
"""Show example of expected output format"""
|
121 |
+
example = {
|
122 |
+
"title": "Premium Wireless Bluetooth Headphones with Noise Cancellation",
|
123 |
+
"bullet_points": [
|
124 |
+
"β’ Advanced noise cancellation technology for immersive audio experience",
|
125 |
+
"β’ 30-hour battery life with quick charge feature",
|
126 |
+
"β’ Premium comfort design with soft ear cushions",
|
127 |
+
"β’ Universal compatibility with all Bluetooth devices",
|
128 |
+
"β’ Built-in microphone for crystal clear calls"
|
129 |
+
],
|
130 |
+
"description": "Experience premium audio quality with these advanced wireless headphones...",
|
131 |
+
"keywords": "wireless headphones, bluetooth, noise cancelling, premium audio",
|
132 |
+
"category": "Electronics > Audio > Headphones"
|
133 |
+
}
|
134 |
+
return json.dumps(example, indent=2)
|
135 |
+
|
136 |
+
# Gradio Interface
|
137 |
+
with gr.Blocks(theme=gr.themes.Soft(), title="Amazon Listing Generator") as demo:
|
138 |
+
gr.Markdown("""
|
139 |
+
# π Qwen2-VL Amazon Listing Generator (LoRA)
|
140 |
+
|
141 |
+
Upload a product image and generate an Amazon-style listing with title, bullet points, description, keywords, and category.
|
142 |
+
|
143 |
+
**Model**: [soupstick/qwen2vl-amazon-ft-lora](https://huggingface.co/soupstick/qwen2vl-amazon-ft-lora) (Qwen2-VL-7B + LoRA)
|
144 |
+
""")
|
145 |
+
|
146 |
+
with gr.Row():
|
147 |
+
with gr.Column():
|
148 |
+
image_input = gr.Image(
|
149 |
+
type="pil",
|
150 |
+
label="οΏ½οΏ½οΏ½ Upload Product Image",
|
151 |
+
height=300
|
152 |
+
)
|
153 |
+
prompt_input = gr.Textbox(
|
154 |
+
label="π Instruction (Optional)",
|
155 |
+
value="Generate Amazon listing.",
|
156 |
+
placeholder="Enter custom instruction or use default",
|
157 |
+
lines=2
|
158 |
+
)
|
159 |
+
generate_btn = gr.Button(
|
160 |
+
"π Generate Listing",
|
161 |
+
variant="primary",
|
162 |
+
size="lg"
|
163 |
+
)
|
164 |
+
|
165 |
+
with gr.Column():
|
166 |
+
output_text = gr.Textbox(
|
167 |
+
label="π Generated Listing",
|
168 |
+
lines=15,
|
169 |
+
placeholder="Upload an image and click 'Generate Listing' to see results..."
|
170 |
+
)
|
171 |
+
|
172 |
+
# Example section
|
173 |
+
with gr.Accordion("π Expected Output Format", open=False):
|
174 |
+
gr.Code(
|
175 |
+
format_example_output(),
|
176 |
+
language="json",
|
177 |
+
label="Example JSON Structure"
|
178 |
+
)
|
179 |
+
|
180 |
+
# Event handler
|
181 |
+
generate_btn.click(
|
182 |
+
fn=generate_listing,
|
183 |
+
inputs=[image_input, prompt_input],
|
184 |
+
outputs=output_text
|
185 |
+
)
|
186 |
+
|
187 |
+
# Footer
|
188 |
+
gr.Markdown("""
|
189 |
+
---
|
190 |
+
**β οΈ Note**: This demo runs on CPU which may take 1-2 minutes per generation.
|
191 |
+
For faster inference, consider upgrading to GPU hardware.
|
192 |
+
|
193 |
+
**π Links**: [Model Card](https://huggingface.co/soupstick/qwen2vl-amazon-ft-lora) | [Base Model](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
|
194 |
+
""")
|
195 |
+
|
196 |
+
if __name__ == "__main__":
|
197 |
+
demo.queue().launch()
|
requirements.txt
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
transformers>=4.44.0
|
2 |
+
peft>=0.10.0
|
3 |
+
accelerate>=0.24.0
|
4 |
+
gradio>=4.0.0
|
5 |
+
torch>=2.0.0
|
6 |
+
torchvision>=0.15.0
|
7 |
+
Pillow>=9.0.0
|
8 |
+
numpy>=1.21.0
|
9 |
+
requests>=2.25.0
|
10 |
+
huggingface-hub>=0.17.0
|