Spaces:

alexandrospopov
/

scene_rewind

Sleeping

App Files Files Community

Alexandros Popov commited on Aug 15

Commit

23afe01

1 Parent(s): b1d5d84

changed git histories.

Browse files

Files changed (7) hide show

.gitignore +14 -0
.python-version +1 -0
main.py +328 -0
pyproject.toml +20 -0
utils.py +5 -0
uv.lock +0 -0
visual_search_agent.py +120 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,14 @@

+# Python-generated files
+__pycache__/
+*.py[oc]
+build/
+dist/
+wheels/
+*.egg-info
+# Virtual environments
+.venv
+.env
+*jpg
+todo
+*png

.python-version ADDED Viewed

	@@ -0,0 +1 @@


1	+ 3.10

main.py ADDED Viewed

	@@ -0,0 +1,328 @@

+import os
+import imageio
+import replicate
+import gradio as gr
+from dotenv import load_dotenv
+from langfuse import Langfuse, observe, get_client
+import numpy as np
+import openai
+import json
+from pydantic import BaseModel, conint
+from utils import encode_image
+import asyncio
+import logfire
+from agents import (
+    Agent,
+    function_tool,
+    Runner,
+    WebSearchTool,
+)
+import tempfile
+from functools import partial, update_wrapper
+from visual_search_agent import seach_google_for_images
+from agents import Agent, ItemHelpers, Runner, TResponseInputItem, trace
+load_dotenv()
+class ImageEvaluation(BaseModel):
+    score: conint(ge=1, le=10)
+    feedback: str
+class HistoricalGrounding(BaseModel):
+    general_description: str
+    building_architecture: str
+    roads: str
+    transportation: str
+    people: str
+    people_clothing: str
+image_description = """
+This image shows a stunning coastal view of a Mediterranean city, likely Nice on the French Riviera. The scene features:
+A wide, curving bay with striking turquoise and deep blue waters transitioning beautifully from the shoreline outward.
+A long pebble beach with scattered people relaxing, sunbathing, and strolling near the surf.
+Palm-lined promenades running parallel to the shore, adding a tropical feel.
+A wide coastal road with smooth curves following the bay, flanked by pedestrian and cycling lanes.
+A cityscape in the background, with mid-rise buildings painted in warm tones (yellows, reds, and creams), stretching up into the hills.
+Clear blue skies that suggest warm, sunny weather.
+It’s a postcard-perfect scene capturing both the relaxing beach atmosphere and the vibrant urban energy of the French Riviera."""
+historic_description = """
+In the 2nd century AD, the crescent-shaped bay that today we know as the Baie des Anges would have looked remarkably similar in outline—its wide sweep of pebbly shore backed by clear, turquoise waters—yet almost entirely undeveloped by permanent seaside structures. Instead of hotels and promenades, the littoral zone would have been bordered by low cliffs and patches of maquis scrub, with fishermen’s shingle huts and simple wooden docks marking the only human presence along the shore.
+Along the coast, the principal artery was the Via Julia Augusta, a paved Roman road laid out in the late 1st century BC to link Italia to Hispania. It hugged the contours of the bay, its broad stone slabs worn smooth by the passage of carts and marching legions. The road would have run just above the high-tide line, with occasional way-stations (mansiones) where travelers could rest and change horses ([fr.wikipedia.org](https://fr.wikipedia.org/wiki/Cemenelum?utm_source=chatgpt.com)).
+Instead of the elegant, palm-lined Promenade des Anglais, one would have seen groves of olive and fig trees and stands of umbrella pines and cypresses—common Mediterranean species carefully tended for their oil, fruit, and timber. Beyond these, terraced plots of vineyards stretched partway up the hills, their produce carried down to small coastal landing places in flat-bottomed caiques (fishing boats) moored in shallow coves ([en.wikipedia.org](https://en.wikipedia.org/wiki/Nice?utm_source=chatgpt.com)).
+Perched on the heights above the bay lay two distinct settlements. On the immediate seafront was Nikaia (Νίκαια), the older Greek foundation of c. 350 BC, which by the 2nd century AD had become a modest fishing and trading port, its stone quay dotted with amphorae-laden ships from Massalia (modern Marseille) or Sardinia. A small temple of Artemis likely crowned its acropolis, overlooking the masts and nets of local mariners ([en.wikipedia.org](https://en.wikipedia.org/wiki/Nice?utm_source=chatgpt.com)).
+Further inland, on the hill now known as Cimiez, stood Cemenelum—the Roman civitas capital of the Alpes Maritimae province. Founded in 14 BC by Augustus as a military and administrative center, by the 2nd century it boasted a forum, elaborate public baths, and an amphitheater seating some 5,000 spectators ([intltravelnews.com](https://www.intltravelnews.com/2019/cemenelum-roman-city-french-riviera.html?utm_source=chatgpt.com), [bestofniceblog.com](https://www.bestofniceblog.com/what-to-see-in-nice/history-and-science-museums/roman-baths-ruins-cimiez-archeology-mueum/?utm_source=chatgpt.com)). From the beach one would see the white limestone walls of the bath complex rising among olive trees, steam curling from the caldarium’s hypocausts, while the rounded arches of the arena peered above nearby rooftops ([historytools.org](https://www.historytools.org/stories/journey-to-the-ancient-roman-heart-of-the-french-riviera-the-nice-cimiez-archaeological-museum?utm_source=chatgpt.com)).
+Daily life on the beach would have contrasted sharply with today’s sunbathers and tourists. Instead, local Ligurian-Roman families in woolen tunics and leather sandals might gather at the water’s edge to wash wool or salt fish, while merchants in linen garments bartered amphorae of olive oil and imported wine on the shore. Slender fishermen’s boats plied the calm waters at dawn, their crews casting trammel nets into the blue-green depths.
+Above all, the scene would have been one of a working Riviera: a blend of agricultural terraces, coastal trade, and provincial administration set within the timeless curve of sand and sea—far removed from the 21st-century bustle, yet already a crossroads of Mediterranean cultures under the Pax Romana.
+"""
+logfire.configure(
+    service_name='my_agent_service',
+    send_to_logfire=False,
+)
+logfire.instrument_openai_agents()
+langfuse = get_client()
+twoBC_prompt = """Imagine how this image would look like in the 2nd BC."""
+client = openai.OpenAI()  # uses OPENAI_API_KEY
+JUDGE_PROMPT = """
+You are a historic critic.
+You are provided with the description of scenes, a location and a year.
+Your job is to judge how plausible the items describes belong that place at that era.
+You must penalize items that are out-of-time.
+Do not appraise the framing, the camera position or the camera technology.
+You must rate this "truthfullness" on a scale of 1 to 10
+and pricesely point out items that are out of time.
+"""
+@observe(name="image_captionning", capture_input=False, as_type="generation")
+def image_caption(image_path):
+    response = client.responses.create(
+        model="o4-mini-2025-04-16",
+        input=[{
+            "role": "user",
+            "content": [
+                {"type": "input_text", "text": "Describe this image, focusing on the human-maid items in the picture: buildings, roads, cloths,..."},
+                {
+                    "type": "input_image",
+                    "image_url": f"data:image/jpeg;base64,{encode_image(image_path)}",
+                },
+            ],
+        }],
+    )
+    return response.output_text
+@observe(name="llm_judge", capture_input=False, as_type="generation")  # creates a span; captures inputs/outputs automatically
+def judge_answer(image_description, location, year):
+    response = client.responses.parse(
+        model="gpt-4o-mini",
+        input=[
+            {
+            "role": "system",
+            "content": [
+                {"type": "input_text", "text": JUDGE_PROMPT},
+            ],
+            },
+            {
+            "role": "user",
+            "content": [
+                {"type": "input_text", "text": f" image description : {image_description} . location : {location} . year : {year}"}
+            ],
+        }],
+        text_format=ImageEvaluation
+    )
+    return json.loads(response.output_text)
+@observe(name="image-generation", as_type="generation")
+def generate_image(picture_design, input_image, working_directory):
+    """
+    Calls the Replicate API to generate an image based on the input image.
+    Args:
+        prompt (str): The text prompt.
+    Returns:
+        str: Path to the generated image.
+    """
+    # Gradio provides the image as a numpy array, but the replicate library expects a file path
+    # So we save the numpy array as a temporary image file
+    prompt = f"""
+    You are an expert photoshop user.
+    You are given a photo and you must transform it as to what it would have looked like at a certain time period.
+    You must apply the changes described in: {picture_design}
+    """
+    if isinstance(input_image, np.ndarray):
+        temp_image_path = "temp_input_image.png"
+        imageio.imwrite(temp_image_path, input_image)
+        input_image = temp_image_path
+    with open(input_image, "rb") as image_file:
+        output = replicate.run(
+            "black-forest-labs/flux-kontext-pro",
+            input={
+                "prompt": prompt,
+                "input_image": image_file,
+                "aspect_ratio": "match_input_image",
+                "output_format": "jpg",
+                "safety_tolerance": 2,
+                "prompt_upsampling": False
+            }
+        )
+    num_images = len(os.listdir(working_directory))
+    output_image_path = os.path.join(working_directory, f"output_{num_images}.jpg")
+    print(f"Writing image in {output_image_path}")
+    with open(output_image_path, "wb") as f:
+        for chunk in output:
+            f.write(chunk)
+    return output_image_path
+def create_rewind(image, text, date):
+    """
+    Processes the inputs from Gradio and generates an image and text.
+    """
+    prompt = f"{text} The scene is captured in the year {date}."
+    generated_image_path = generate_image(prompt)
+    output_text = f"This is the scene as it might have appeared in the year {date}."
+    return generated_image_path, output_text
+@observe(name="historical_grounding", as_type="generation")
+def get_historical_grounding(image_description, location, year):
+    instructions=f"""You are a historian. You are given the description of an image, a location and a time period.
+    You must reflect on what the scenary would look like at the period.
+    The nature of the scenary must remain unchanged : a seaside scenary remains a seaside scenary, a town center must remain a town center.
+    Be as historical accurate as possible about the items present in the image. Use the tools provided to search for images and look up information on internet.
+    For the visual description of the item, you can use the tools you are provided as well.
+    """
+    response = client.responses.parse(
+        model="gpt-5-mini-2025-08-07",
+        input=[
+            {"role": "system", "content": instructions},
+            {
+                "role": "user",
+                "content": f"image description: {image_description}, location: {location}, year: {year}",
+            },
+        ],
+        text_format=HistoricalGrounding,
+        tools=[{"type": "web_search_preview"}],
+    )
+    return response.output_text
+def define_visual_cues_agent():
+    visual_cues_agent = Agent(
+    name="Visual Search Agent",
+    instructions="""You search for visual cues to illustrate specific items within a specific time period.
+    You provide a precise description of those items.
+    You can use the tools provided to search for images and look up information on internet et specific images.""",
+    model="gpt-5-mini-2025-08-07",
+    tools=[seach_google_for_images, WebSearchTool()],
+)
+    return visual_cues_agent
+def define_picture_designer_agent(image_path, working_directory):
+    picture_designer_agent = Agent(
+        name="Picture designer agent",
+        instructions="""
+        You are "picture designer" : you produce a text that will be used by an image generation tool
+        You receive as input the description of an image and a historical grounding of what that scene would look like at certain period of time.
+        Your goal is to modify the items visible on the image in such a way, that it would plausible that this image has been taken at period of time.
+        For instance, if the image is a picture of the Eiffel tower in Paris and the period is 3th BC, obviously the eiffel tower should be replaced by something else.
+        To help you in your task, you have access to the historical visual cue helper : this tool will provide you with precise descriptions of specific items, so that you can pricesely describe what to generate.
+        This text is to be interpreted by the image generation tool Replicate.
+        Some items in the description might be flagged as violent (butcher's clever), sexual (prostitutes), unsanitary (wastes)
+        Eliminate those possibly non-compliant items with the Replicate policy.
+        """,
+        model="o4-mini-2025-04-16",
+        tools=[WebSearchTool()],
+        handoffs=[define_visual_cues_agent()],
+    )
+    return picture_designer_agent
+async def main(image_path, location, year) -> None:
+    working_directory = tempfile.mkdtemp("_scene_rewind")
+    print(f"Working in {working_directory}")
+    with trace("LLM as a judge"):
+        picture_description = image_caption(image_path)
+        historical_grounding = get_historical_grounding(picture_description, location, year)
+        picture_designer_agent = define_picture_designer_agent(image_path, working_directory)
+        input_items: list[TResponseInputItem] = [
+            {
+                "content": f"Description:{picture_description}\n Historical grounding: {historical_grounding}",
+                "role": "user"
+            }]
+        latest_outline: str | None = None
+        while True:
+            picture_design = await Runner.run(
+                picture_designer_agent,
+                input_items,
+            )
+            input_items = picture_design.to_input_list()
+            latest_outline = ItemHelpers.text_message_outputs(picture_design.new_items)
+            print("Story outline generated")
+            try:
+                output_path = generate_image(latest_outline, image_path, working_directory)
+                created_image_caption = image_caption(output_path)
+                judgment = judge_answer(created_image_caption, location, year)
+            except Exception:
+                judgment = {
+                    "score": 0,
+                    "feedback": "The image could not be produced as the content of the prompt as been flagged as sensitive by Replicate"
+                }
+            print(f"Evaluator score: {judgment['score']}")
+            if judgment["score"] > 6:
+                print("Story outline is good enough, exiting.")
+                break
+            print("Re-running with feedback")
+            input_items.append({"content": f"Feedback: {judgment['feedback']}", "role": "user"})
+    print(f"Final story outline: {latest_outline}")
+if __name__ == "__main__":
+    image_path = "images/paris.png"
+    asyncio.run(main(image_path, "Paris", 1700))
+    # iface = gr.Interface(
+    #     fn=create_rewind,
+    #     inputs=[
+    #         gr.Image(type="numpy", label="Input Image"),
+    #         gr.Textbox(label="Prompt"),
+    #         gr.Slider(minimum=-2000, maximum=2000, value=1900, label="Year")
+    #     ],
+    #     outputs=[
+    #         gr.Image(type="filepath", label="Generated Image"),
+    #         gr.Textbox(label="Generated Text")
+    #     ],
+    #     title="Scene Rewind",
+    #     description="Upload an image, provide a prompt, and select a year to travel back in time!"
+    # )
+    # iface.launch()

pyproject.toml ADDED Viewed

	@@ -0,0 +1,20 @@

+[project]
+name = "scene-rewind"
+version = "0.1.0"
+description = "Add your description here"
+readme = "README.md"
+requires-python = ">=3.10"
+dependencies = [
+    "dotenv>=0.9.9",
+    "google-search-results>=2.4.2",
+    "gradio>=5.42.0",
+    "imageio>=2.37.0",
+    "langfuse>=3.2.3",
+    "nest-asyncio>=1.6.0",
+    "openai>=1.99.6",
+    "openai-agents>=0.2.5",
+    "pydantic>=2.11.7",
+    "pydantic-ai[logfire]>=0.6.2",
+    "replicate>=1.0.7",
+    "serpapi>=0.1.5",
+]

utils.py ADDED Viewed

	@@ -0,0 +1,5 @@

+import base64
+def encode_image(image_path):
+    with open(image_path, "rb") as image_file:
+        return base64.b64encode(image_file.read()).decode("utf-8")

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff

visual_search_agent.py ADDED Viewed

	@@ -0,0 +1,120 @@

+from serpapi.google_search import GoogleSearch
+from dotenv import load_dotenv
+import os
+from openai import OpenAI
+import asyncio
+import requests
+from agents import enable_verbose_stdout_logging
+enable_verbose_stdout_logging()
+from agents import (
+    Agent,
+    function_tool,
+    Runner,
+    WebSearchTool,
+)
+load_dotenv()
+SERP_API_KEY = os.getenv("SERP_API_KEY")
+openai_client = OpenAI()
+def describe_all_images(images_results):
+    descriptions = []
+    for image in images_results:
+        description = describe_thumbnail(image["thumbnail"])
+        descriptions.append({
+            "title": image["title"],
+            "link": image["link"],
+            "description": description
+        })
+        break
+    return descriptions
+def describe_thumbnail(image_url):
+    response = openai_client.responses.create(
+        model="gpt-4.1",
+        input=[{
+            "role": "user",
+            "content": [
+                {"type": "input_text", "text": "what's in this image? To what era/year does it belong?"},
+                {
+                    "type": "input_image",
+                    "image_url": image_url,
+                },
+            ],
+        }],
+    )
+    return response.output_text + "\n"
+def search_google(item_to_find: str):
+    params = {
+    "engine": "google_images_light",
+    "q": item_to_find,
+    "api_key": SERP_API_KEY
+    }
+    search = GoogleSearch(params)
+    raw_results = search.get_dict()["images_results"]
+    images_results = [
+        {
+            "thumbnail": result["thumbnail"],
+            "link": result["link"],
+            "title": result["title"]
+        }
+        for result in raw_results
+    ]
+    return raw_results
+@function_tool
+def seach_google_for_images(item_to_find: str) -> list:
+    """
+    Search Google Images for a given item and return a list of image results, with links and descriptions.
+    Args:
+        item_to_find (str): The item or query to search for in Google Images.
+    Returns:
+        list: A list of dictionaries, each containing the image title, link, and a description generated by the image analysis.
+    """
+    raw_results = search_google(item_to_find)
+    results_w_description = describe_all_images(raw_results)
+    return results_w_description
+@function_tool
+def wikipedia_lookup(query):
+    """
+    Look up a query on Wikipedia and return the summary extract of the most relevant page.
+    Args:
+        query (str): The search term to look up on Wikipedia.
+    Returns:
+        str: The summary extract of the most relevant Wikipedia page, or a message if no page is found.
+    """
+    # Step 1: Search
+    search_url = "https://en.wikipedia.org/w/rest.php/v1/search/title"
+    params = {"q": query, "limit": 1}
+    search_resp = requests.get(search_url, params=params)
+    search_resp.raise_for_status()
+    search_data = search_resp.json()
+    if not search_data.get("pages"):
+        return f"No Wikipedia page found for '{query}'"
+    page_key = search_data["pages"][0]["key"]
+    # Step 2: Fetch summary
+    summary_url = f"https://en.wikipedia.org/api/rest_v1/page/summary/{page_key}"
+    summary_resp = requests.get(summary_url)
+    summary_resp.raise_for_status()
+    return summary_resp.json().get("extract")
+# if __name__ == "__main__":
+#     item_to_find = "mediterranean houses in the 16h century"
+#     results = search_google(item_to_find)
+#     desc = describe_all_images(results)
+#     print(desc[0])
+#     # print(wikipedia_lookup("United States Constitution"))