jbilcke-hf HF Staff commited on
Commit
e305906
·
1 Parent(s): ede808f

improve prompt

Browse files
Files changed (3) hide show
  1. PROMPT_CONTEXT.md +0 -14
  2. WEBSOCKET_FIXES.md +0 -84
  3. server/llm_utils.py +7 -7
PROMPT_CONTEXT.md DELETED
@@ -1,14 +0,0 @@
1
- GENERAL CONTEXT:
2
-
3
- TikSlop is an app where users can generate videos using AI. What is interesting is that both search results are generated (so there is no actual search in a DB, instead a LLM hallucinate search result items, simulation a video platform à la YouTube), but also the video streams (a video is composed of an infinite stream of a few seconds long MP4 clips, that are also generated using AI, using a fast generative model that works in nearly real-time, eg it takes 4s to generate 2s of footage).
4
-
5
- The architecture is simple: a Flutter frontend UI with two main view (home_screen.dart for search, video_screen.dart for the ifinite video stream player). The frontend UI talks to a Python API (see api.py) using WebSockets, as we have various real-time communication needs (chat, streaming of MP4 chunks etc). This Python API is responsible for performing the actual calls to the generative video model and the LLM model (those are external servers hosted on Hugging Face, but explaining how they work is outside the scope of this documentation).
6
-
7
- There is a simulator integrated, which evolves a description (video prompt) over time, using a LLM.
8
-
9
- Users can be anonymous, but if they connect using a Hugging Face API key, they get some extra perks.
10
-
11
- TASK:
12
-
13
-
14
- Note: For the task to be validated, running the shell command "flutter build web" must succeeed.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
WEBSOCKET_FIXES.md DELETED
@@ -1,84 +0,0 @@
1
- # WebSocket Services Fix Guide
2
-
3
- This document provides guidance on how to fix the WebSocket services implementation in the codebase to resolve the compilation errors.
4
-
5
- ## Issues Identified
6
-
7
- 1. The mixin classes (`WebSocketChatService`, `WebSocketSearchService`, `WebSocketContentGenerationService`, `WebSocketConnectionService`) access fields and methods from the `WebSocketCoreService` base class that are not actually available through the mixin mechanism.
8
-
9
- 2. The `ClipQueueManager` had a duplicate `activeGenerations` property.
10
-
11
- 3. The `VideoPlaybackController` was using a private field `_activeGenerations` from `ClipQueueManager`.
12
-
13
- ## Changes Made
14
-
15
- 1. Fixed duplicate `activeGenerations` in `ClipQueueManager`:
16
- - Renamed the int getter to `activeGenerationsCount`
17
- - Added a `Set<String> get activeGenerations` getter to expose the private field
18
-
19
- 2. Updated `printQueueState` in `ClipQueueStats` to accept dynamic type for the `activeGenerations` parameter.
20
-
21
- 3. Fixed imports for WebSocketCoreService in all mixin files.
22
-
23
- 4. Updated VideoPlaybackController to use the public getter for activeGenerations.
24
-
25
- ## Remaining Issues
26
-
27
- The main issue is with the mixin inheritance. Mixins in Dart can only access methods and fields they declare themselves or that are available in the class they are applied to. Mixins don't have visibility into private fields of the class they're "on".
28
-
29
- ### Option 1: Refactor to use composition instead of mixins
30
-
31
- Instead of using mixins, refactor to use composition:
32
-
33
- ```dart
34
- class WebSocketApiService {
35
- final ChatService _chatService;
36
- final SearchService _searchService;
37
- final ConnectionService _connectionService;
38
- final ContentGenerationService _contentGenerationService;
39
-
40
- WebSocketApiService() :
41
- _chatService = ChatService(),
42
- _searchService = SearchService(),
43
- _connectionService = ConnectionService(),
44
- _contentGenerationService = ContentGenerationService();
45
-
46
- // Forward methods to the appropriate service
47
- }
48
- ```
49
-
50
- ### Option 2: Make private fields protected
51
-
52
- Make the necessary fields and methods protected (rename from `_fieldName` to `fieldName` or create protected getters/setters).
53
-
54
- ### Option 3: Implement the WebSocketCore interface in each mixin
55
-
56
- Define an interface that all the mixins implement, rather than using "on WebSocketCoreService":
57
-
58
- ```dart
59
- abstract class WebSocketCoreInterface {
60
- bool get isConnected;
61
- bool get isInMaintenance;
62
- ConnectionStatus get status;
63
- // Add all methods and properties needed by the mixins
64
- }
65
-
66
- class WebSocketCoreService implements WebSocketCoreInterface {
67
- // Implementation
68
- }
69
-
70
- mixin WebSocketChatService implements WebSocketCoreInterface {
71
- // Implementation that uses the interface methods
72
- }
73
- ```
74
-
75
- ## Steps to Fix
76
-
77
- 1. Define a shared interface/abstract class that includes all the methods and properties needed by the mixins
78
- 2. Update WebSocketCoreService to implement this interface
79
- 3. Update all mixins to implement this interface rather than using "on WebSocketCoreService"
80
- 4. In the final WebSocketApiService class, implement the interface and have it delegate to the core service
81
-
82
- ## For Now
83
-
84
- As a temporary solution, a simplified version of main.dart has been created that forces the app into maintenance mode, bypassing the WebSocket initialization and connection issues.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
server/llm_utils.py CHANGED
@@ -20,9 +20,9 @@ For the style, be creative, for instance you can use anything like a "documentar
20
  If the user ask for something specific eg "movie screencap", "movie scene", "documentary footage" "animation" as a style etc.
21
  Keep it minimalist but still descriptive, don't use bullets points, use simple words, go to the essential to describe style (cinematic, documentary footage, 3D rendering..), camera modes and angles, characters, age, gender, action, location, lighting, country, costume, time, weather, textures, color palette.. etc). Write about 80 words, and use between 2 and 3 sentences.
22
  The most import part is to describe the actions and movements in the scene, so don't forget that!
23
- Don't describe sound, so ever say things like "atmospheric music playing in the background".
24
- Instead describe the visual elements we can see in the background, be precise, (if there are anything, cars, objects, people, bricks, birds, clouds, trees, leaves or grass then say it so etc).
25
- Make the result unique and different from previous search results. ONLY RETURN YAML AND WITH ENGLISH CONTENT, NOT CHINESE - DO NOT ADD ANY OTHER COMMENT!
26
 
27
  # Context
28
  This is attempt {current_attempt}.
@@ -53,11 +53,11 @@ Instructions:
53
  3. Create a natural progression from previous clips
54
  4. Take into account user suggestions (chat messages) into the scene
55
  5. IMPORTANT: viewers have shared messages, consider their input in priority to guide your story, and incorporate relevant suggestions or reactions into your narrative evolution.
56
- 6. Keep visual consistency with previous clips (in most cases you should repeat the same exact description of the location, characters etc but only change a few elements. If this is a webcam scenario, don't touch the camera orientation or focus)
57
  7. Return ONLY the caption text, no additional formatting or explanation
58
  8. Write in English, about 200 words.
59
  9. Keep the visual style consistant, but content as well (repeat the style, character, locations, appearance etc..from the previous description, when it makes sense).
60
- 10. Your caption must describe visual elements of the scene in details, including: camera angle and focus, people's appearance, age, look, costumes, clothes, the location visual characteristics and geometry, lighting, action, objects, weather, textures, lighting.
61
  11. Please write in the same style as the original description, by keeping things brief etc.
62
 
63
  Remember to obey to what users said in the chat history!!
@@ -85,8 +85,8 @@ Instructions:
85
  6. Keep visual consistency with previous clips (in most cases you should repeat the same exact description of the location, characters etc but only change a few elements. If this is a webcam scenario, don't touch the camera orientation or focus)
86
  7. Return ONLY the caption text, no additional formatting or explanation
87
  8. Write in English, about 200 words.
88
- 9. Keep the visual style consistant, but content as well (repeat the style, character, locations, appearance etc..from the previous description, when it makes sense).
89
- 10. Your caption must describe visual elements of the scene in details, including: camera angle and focus, people's appearance, age, look, costumes, clothes, the location visual characteristics and geometry, lighting, action, objects, weather, textures, lighting.
90
  11. Please write in the same style as the original description, by keeping things brief etc.
91
 
92
  Remember to obey to what users said in the chat history!!
 
20
  If the user ask for something specific eg "movie screencap", "movie scene", "documentary footage" "animation" as a style etc.
21
  Keep it minimalist but still descriptive, don't use bullets points, use simple words, go to the essential to describe style (cinematic, documentary footage, 3D rendering..), camera modes and angles, characters, age, gender, action, location, lighting, country, costume, time, weather, textures, color palette.. etc). Write about 80 words, and use between 2 and 3 sentences.
22
  The most import part is to describe the actions and movements in the scene, so don't forget that!
23
+ Don't describe sound, never say things like "atmospheric music playing in the background".
24
+ Only describe the visual elements, be precise, (if there are anything, cars, objects, people, bricks, birds, clouds, trees, leaves or grass then make sure to include it in your caption).
25
+ Make the result unique and different from previous search results. ONLY RETURN YAML AND WITH ENGLISH CONTENT, NOT CHINESE - DO NOT ADD YOU OWN OBSERVATIONS, INTERPREATIONS OR PERSONAL COMMENT!
26
 
27
  # Context
28
  This is attempt {current_attempt}.
 
53
  3. Create a natural progression from previous clips
54
  4. Take into account user suggestions (chat messages) into the scene
55
  5. IMPORTANT: viewers have shared messages, consider their input in priority to guide your story, and incorporate relevant suggestions or reactions into your narrative evolution.
56
+ 6. Keep visual consistency with previous clips (in most cases you should repeat the same exact and detailed description of the location, characters etc but only change a few elements. If this is a webcam scenario, don't touch the camera orientation or focus)
57
  7. Return ONLY the caption text, no additional formatting or explanation
58
  8. Write in English, about 200 words.
59
  9. Keep the visual style consistant, but content as well (repeat the style, character, locations, appearance etc..from the previous description, when it makes sense).
60
+ 10. Your caption must describe visual elements of the scene in extreme details, including: camera angle and focus, people's appearance, age, look, costumes, clothes, the location visual characteristics and geometry, lighting, action, objects, weather, textures, lighting.
61
  11. Please write in the same style as the original description, by keeping things brief etc.
62
 
63
  Remember to obey to what users said in the chat history!!
 
85
  6. Keep visual consistency with previous clips (in most cases you should repeat the same exact description of the location, characters etc but only change a few elements. If this is a webcam scenario, don't touch the camera orientation or focus)
86
  7. Return ONLY the caption text, no additional formatting or explanation
87
  8. Write in English, about 200 words.
88
+ 9. Keep the visual style consistant, descriptive, detailed, but content as well (repeat the style, character, locations, appearance etc..from the previous description, when it makes sense).
89
+ 10. Your caption must describe visual elements of the scene in extreme details, including: camera angle and focus, people's appearance, age, look, costumes, clothes, the location visual characteristics and geometry, lighting, action, objects, weather, textures, lighting.
90
  11. Please write in the same style as the original description, by keeping things brief etc.
91
 
92
  Remember to obey to what users said in the chat history!!