Spaces:
Sleeping
Sleeping
Commit
·
293af3a
1
Parent(s):
8c64a35
Add system map and worker architecture details
Browse files- system_map.md +27 -0
system_map.md
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
1. The system's architecture is designed to mitigate online toxicity by transforming text inputs into less provocative forms using Large Language Models (LLMs), which are pivotal in analysing and refining text.
|
| 2 |
+
4. Different workers, or LLM interfaces are defined, each suited for specific operational environments.
|
| 3 |
+
5. The HTTP server worker is optimised for development purposes, facilitating dynamic updates without necessitating server restarts, it can work offline, with or without a GPU using the `llama-cpp-python` library, provided a downloaded model.
|
| 4 |
+
6. An in-memory worker is used by the serverless worker.
|
| 5 |
+
7. For on-demand, scalable processing, the system includes a RunPod API worker that leverages serverless GPU functions.
|
| 6 |
+
8. Additionally, the Mistral API worker offers a paid service alternative for text processing tasks.
|
| 7 |
+
9. A set of environment variables are predefined to configure the LLM workers' functionality.
|
| 8 |
+
10. The `LLM_WORKER` environment variable sets the active LLM worker.
|
| 9 |
+
11. The `N_GPU_LAYERS` environment variable allows for the specification of GPU layers utilised, defaulting to the maximum available, used when the LLM worker is ran with a GPU.
|
| 10 |
+
12. `CONTEXT_SIZE` is an adjustable parameter that defines the extent of text the LLM can process concurrently.
|
| 11 |
+
13. The `LLM_MODEL_PATH` environment variable indicates the LLM model's storage location, which can be either local or sourced from the HuggingFace Hub.
|
| 12 |
+
14. The system enforces some rate limiting to maintain service integrity and equitable resource distribution.
|
| 13 |
+
15. The `LAST_REQUEST_TIME` and `REQUEST_INTERVAL` global variables are used for Mistral rate limiting.
|
| 14 |
+
16. The system's worker architecture is somewhat modular, enabling easy integration or replacement of components such as LLM workers.
|
| 15 |
+
18. The system is capable of streaming responses in some modes, allowing for real-time interaction with the LLM.
|
| 16 |
+
19. The `llm_streaming` function handles communication with the LLM via HTTP streaming when the server worker is active.
|
| 17 |
+
20. The `llm_stream_sans_network` function provides an alternative for local LLM inference without network dependency.
|
| 18 |
+
21. For serverless deployment, the `llm_stream_serverless` function interfaces with the RunPod API.
|
| 19 |
+
22. The `llm_stream_mistral_api` function facilitates interaction with the Mistral API for text processing.
|
| 20 |
+
23. The system includes a utility function, `replace_text`, for template-based text replacement operations.
|
| 21 |
+
24. A scoring function, `calculate_overall_score`, amalgamates different metrics to evaluate the text transformation's effectiveness.
|
| 22 |
+
25. The `query_ai_prompt` function serves as a dispatcher, directing text processing requests to the chosen LLM worker.
|
| 23 |
+
27. The `inference_binary_check` function within `app.py` ensures compatibility with the available hardware, particularly GPU presence.
|
| 24 |
+
28. The system provides a user interface through Gradio, enabling end-users to interact with the text transformation service.
|
| 25 |
+
29. The `chill_out` function in `app.py` is the entry point for processing user inputs through the Gradio interface.
|
| 26 |
+
30. The `improvement_loop` function in `chill.py` controls the iterative process of text refinement using the LLM.
|
| 27 |
+
|