Spaces:

jbilcke-hf
/

tikslop

Running on CPU Upgrade

App Files Files Community

jbilcke-hf HF Staff commited on 29 days ago

Commit

e305906

1 Parent(s): ede808f

improve prompt

Browse files

Files changed (3) hide show

PROMPT_CONTEXT.md +0 -14
WEBSOCKET_FIXES.md +0 -84
server/llm_utils.py +7 -7

PROMPT_CONTEXT.md DELETED Viewed

@@ -1,14 +0,0 @@
-GENERAL CONTEXT:
-TikSlop is an app where users can generate videos using AI. What is interesting is that both search results are generated (so there is no actual search in a DB, instead a LLM hallucinate search result items, simulation a video platform à la YouTube), but also the video streams (a video is composed of an infinite stream of a few seconds long MP4 clips, that are also generated using AI, using a fast generative model that works in nearly real-time, eg it takes 4s to generate 2s of footage).
-The architecture is simple: a Flutter frontend UI with two main view (home_screen.dart for search, video_screen.dart for the ifinite video stream player). The frontend UI talks to a Python API (see api.py) using WebSockets, as we have various real-time communication needs (chat, streaming of MP4 chunks etc). This Python API is responsible for performing the actual calls to the generative video model and the LLM model (those are external servers hosted on Hugging Face, but explaining how they work is outside the scope of this documentation).
-There is a simulator integrated, which evolves a description (video prompt) over time, using a LLM.
-Users can be anonymous, but if they connect using a Hugging Face API key, they get some extra perks.
-TASK:
-Note: For the task to be validated, running the shell command "flutter build web" must succeeed.

WEBSOCKET_FIXES.md DELETED Viewed

@@ -1,84 +0,0 @@
-# WebSocket Services Fix Guide
-This document provides guidance on how to fix the WebSocket services implementation in the codebase to resolve the compilation errors.
-## Issues Identified
-1. The mixin classes (`WebSocketChatService`, `WebSocketSearchService`, `WebSocketContentGenerationService`, `WebSocketConnectionService`) access fields and methods from the `WebSocketCoreService` base class that are not actually available through the mixin mechanism.
-2. The `ClipQueueManager` had a duplicate `activeGenerations` property.
-3. The `VideoPlaybackController` was using a private field `_activeGenerations` from `ClipQueueManager`.
-## Changes Made
-1. Fixed duplicate `activeGenerations` in `ClipQueueManager`:
-   - Renamed the int getter to `activeGenerationsCount`
-   - Added a `Set<String> get activeGenerations` getter to expose the private field
-2. Updated `printQueueState` in `ClipQueueStats` to accept dynamic type for the `activeGenerations` parameter.
-3. Fixed imports for WebSocketCoreService in all mixin files.
-4. Updated VideoPlaybackController to use the public getter for activeGenerations.
-## Remaining Issues
-The main issue is with the mixin inheritance. Mixins in Dart can only access methods and fields they declare themselves or that are available in the class they are applied to. Mixins don't have visibility into private fields of the class they're "on".
-### Option 1: Refactor to use composition instead of mixins
-Instead of using mixins, refactor to use composition:
-```dart
-class WebSocketApiService {
-  final ChatService _chatService;
-  final SearchService _searchService;
-  final ConnectionService _connectionService;
-  final ContentGenerationService _contentGenerationService;
-  WebSocketApiService() :
-    _chatService = ChatService(),
-    _searchService = SearchService(),
-    _connectionService = ConnectionService(),
-    _contentGenerationService = ContentGenerationService();
-  // Forward methods to the appropriate service
-}
-```
-### Option 2: Make private fields protected
-Make the necessary fields and methods protected (rename from `_fieldName` to `fieldName` or create protected getters/setters).
-### Option 3: Implement the WebSocketCore interface in each mixin
-Define an interface that all the mixins implement, rather than using "on WebSocketCoreService":
-```dart
-abstract class WebSocketCoreInterface {
-  bool get isConnected;
-  bool get isInMaintenance;
-  ConnectionStatus get status;
-  // Add all methods and properties needed by the mixins
-}
-class WebSocketCoreService implements WebSocketCoreInterface {
-  // Implementation
-}
-mixin WebSocketChatService implements WebSocketCoreInterface {
-  // Implementation that uses the interface methods
-}
-```
-## Steps to Fix
-1. Define a shared interface/abstract class that includes all the methods and properties needed by the mixins
-2. Update WebSocketCoreService to implement this interface
-3. Update all mixins to implement this interface rather than using "on WebSocketCoreService"
-4. In the final WebSocketApiService class, implement the interface and have it delegate to the core service
-## For Now
-As a temporary solution, a simplified version of main.dart has been created that forces the app into maintenance mode, bypassing the WebSocket initialization and connection issues.

server/llm_utils.py CHANGED Viewed

@@ -20,9 +20,9 @@ For the style, be creative, for instance you can use anything like a "documentar
 If the user ask for something specific eg "movie screencap", "movie scene", "documentary footage" "animation" as a style etc.
 Keep it minimalist but still descriptive, don't use bullets points, use simple words, go to the essential to describe style (cinematic, documentary footage, 3D rendering..), camera modes and angles, characters, age, gender, action, location, lighting, country, costume, time, weather, textures, color palette.. etc). Write about 80 words, and use between 2 and 3 sentences.
 The most import part is to describe the actions and movements in the scene, so don't forget that!
-Don't describe sound, so ever say things like "atmospheric music playing in the background".
-Instead describe the visual elements we can see in the background, be precise, (if there are anything, cars, objects, people, bricks, birds, clouds, trees, leaves or grass then say it so etc).
-Make the result unique and different from previous search results. ONLY RETURN YAML AND WITH ENGLISH CONTENT, NOT CHINESE - DO NOT ADD ANY OTHER COMMENT!
 # Context
 This is attempt {current_attempt}.
@@ -53,11 +53,11 @@ Instructions:
 3. Create a natural progression from previous clips
 4. Take into account user suggestions (chat messages) into the scene
 5. IMPORTANT: viewers have shared messages, consider their input in priority to guide your story, and incorporate relevant suggestions or reactions into your narrative evolution.
-6. Keep visual consistency with previous clips (in most cases you should repeat the same exact description of the location, characters etc but only change a few elements. If this is a webcam scenario, don't touch the camera orientation or focus)
 7. Return ONLY the caption text, no additional formatting or explanation
 8. Write in English, about 200 words.
 9. Keep the visual style consistant, but content as well (repeat the style, character, locations, appearance etc..from the previous description, when it makes sense).
-10. Your caption must describe visual elements of the scene in details, including: camera angle and focus, people's appearance, age, look, costumes, clothes, the location visual characteristics and geometry, lighting, action, objects, weather, textures, lighting.
 11. Please write in the same style as the original description, by keeping things brief etc.
 Remember to obey to what users said in the chat history!!
@@ -85,8 +85,8 @@ Instructions:
 6. Keep visual consistency with previous clips (in most cases you should repeat the same exact description of the location, characters etc but only change a few elements. If this is a webcam scenario, don't touch the camera orientation or focus)
 7. Return ONLY the caption text, no additional formatting or explanation
 8. Write in English, about 200 words.
-9. Keep the visual style consistant, but content as well (repeat the style, character, locations, appearance etc..from the previous description, when it makes sense).
-10. Your caption must describe visual elements of the scene in details, including: camera angle and focus, people's appearance, age, look, costumes, clothes, the location visual characteristics and geometry, lighting, action, objects, weather, textures, lighting.
 11. Please write in the same style as the original description, by keeping things brief etc.
 Remember to obey to what users said in the chat history!!

 If the user ask for something specific eg "movie screencap", "movie scene", "documentary footage" "animation" as a style etc.
 Keep it minimalist but still descriptive, don't use bullets points, use simple words, go to the essential to describe style (cinematic, documentary footage, 3D rendering..), camera modes and angles, characters, age, gender, action, location, lighting, country, costume, time, weather, textures, color palette.. etc). Write about 80 words, and use between 2 and 3 sentences.
 The most import part is to describe the actions and movements in the scene, so don't forget that!
+Don't describe sound, never say things like "atmospheric music playing in the background".
+Only describe the visual elements, be precise, (if there are anything, cars, objects, people, bricks, birds, clouds, trees, leaves or grass then make sure to include it in your caption).
+Make the result unique and different from previous search results. ONLY RETURN YAML AND WITH ENGLISH CONTENT, NOT CHINESE - DO NOT ADD YOU OWN OBSERVATIONS, INTERPREATIONS OR PERSONAL COMMENT!
 # Context
 This is attempt {current_attempt}.
 3. Create a natural progression from previous clips
 4. Take into account user suggestions (chat messages) into the scene
 5. IMPORTANT: viewers have shared messages, consider their input in priority to guide your story, and incorporate relevant suggestions or reactions into your narrative evolution.
+6. Keep visual consistency with previous clips (in most cases you should repeat the same exact and detailed description of the location, characters etc but only change a few elements. If this is a webcam scenario, don't touch the camera orientation or focus)
 7. Return ONLY the caption text, no additional formatting or explanation
 8. Write in English, about 200 words.
 9. Keep the visual style consistant, but content as well (repeat the style, character, locations, appearance etc..from the previous description, when it makes sense).
+10. Your caption must describe visual elements of the scene in extreme details, including: camera angle and focus, people's appearance, age, look, costumes, clothes, the location visual characteristics and geometry, lighting, action, objects, weather, textures, lighting.
 11. Please write in the same style as the original description, by keeping things brief etc.
 Remember to obey to what users said in the chat history!!
 6. Keep visual consistency with previous clips (in most cases you should repeat the same exact description of the location, characters etc but only change a few elements. If this is a webcam scenario, don't touch the camera orientation or focus)
 7. Return ONLY the caption text, no additional formatting or explanation
 8. Write in English, about 200 words.
+9. Keep the visual style consistant, descriptive, detailed, but content as well (repeat the style, character, locations, appearance etc..from the previous description, when it makes sense).
+10. Your caption must describe visual elements of the scene in extreme details, including: camera angle and focus, people's appearance, age, look, costumes, clothes, the location visual characteristics and geometry, lighting, action, objects, weather, textures, lighting.
 11. Please write in the same style as the original description, by keeping things brief etc.
 Remember to obey to what users said in the chat history!!