Spaces:
Running
Running
Pylint & CVE fix
#1
by
barunsaha
- opened
This view is limited to 50 files because it contains too many changes.
See the raw diff here.
- .gitattributes +0 -2
- .streamlit/config.toml +0 -10
- README.md +14 -73
- app.py +215 -398
- clarifai_grpc_helper.py +71 -0
- examples/example_04.json +0 -3
- file_embeddings/embeddings.npy +0 -3
- file_embeddings/icons.npy +0 -3
- global_config.py +12 -138
- helpers/__init__.py +0 -0
- helpers/icons_embeddings.py +0 -166
- helpers/image_search.py +0 -148
- helpers/llm_helper.py +0 -201
- helpers/pptx_helper.py +0 -987
- helpers/text_helper.py +0 -83
- icons/png128/0-circle.png +0 -0
- icons/png128/1-circle.png +0 -0
- icons/png128/123.png +0 -0
- icons/png128/2-circle.png +0 -0
- icons/png128/3-circle.png +0 -0
- icons/png128/4-circle.png +0 -0
- icons/png128/5-circle.png +0 -0
- icons/png128/6-circle.png +0 -0
- icons/png128/7-circle.png +0 -0
- icons/png128/8-circle.png +0 -0
- icons/png128/9-circle.png +0 -0
- icons/png128/activity.png +0 -0
- icons/png128/airplane.png +0 -0
- icons/png128/alarm.png +0 -0
- icons/png128/alien-head.png +0 -0
- icons/png128/alphabet.png +0 -0
- icons/png128/amazon.png +0 -0
- icons/png128/amritsar-golden-temple.png +0 -0
- icons/png128/amsterdam-canal.png +0 -0
- icons/png128/amsterdam-windmill.png +0 -0
- icons/png128/android.png +0 -0
- icons/png128/angkor-wat.png +0 -0
- icons/png128/apple.png +0 -0
- icons/png128/archive.png +0 -0
- icons/png128/argentina-obelisk.png +0 -0
- icons/png128/artificial-intelligence-brain.png +0 -0
- icons/png128/atlanta.png +0 -0
- icons/png128/austin.png +0 -0
- icons/png128/automation-decision.png +0 -0
- icons/png128/award.png +0 -0
- icons/png128/balloon.png +0 -0
- icons/png128/ban.png +0 -0
- icons/png128/bandaid.png +0 -0
- icons/png128/bangalore.png +0 -0
- icons/png128/bank.png +0 -0
.gitattributes
CHANGED
@@ -33,5 +33,3 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
-
*.pptx filter=lfs diff=lfs merge=lfs -text
|
37 |
-
pptx_templates/Minimalist_sales_pitch.pptx filter=lfs diff=lfs merge=lfs -text
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
.streamlit/config.toml
DELETED
@@ -1,10 +0,0 @@
|
|
1 |
-
[server]
|
2 |
-
runOnSave = true
|
3 |
-
headless = false
|
4 |
-
maxUploadSize = 0
|
5 |
-
|
6 |
-
[browser]
|
7 |
-
gatherUsageStats = false
|
8 |
-
|
9 |
-
[theme]
|
10 |
-
base = "dark"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README.md
CHANGED
@@ -4,7 +4,7 @@ emoji: 🏢
|
|
4 |
colorFrom: yellow
|
5 |
colorTo: green
|
6 |
sdk: streamlit
|
7 |
-
sdk_version: 1.
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
license: mit
|
@@ -16,95 +16,36 @@ We spend a lot of time on creating the slides and organizing our thoughts for an
|
|
16 |
With SlideDeck AI, co-create slide decks on any topic with Generative Artificial Intelligence.
|
17 |
Describe your topic and let SlideDeck AI generate a PowerPoint slide deck for you—it's as simple as that!
|
18 |
|
|
|
|
|
19 |
|
20 |
# Process
|
21 |
|
22 |
SlideDeck AI works in the following way:
|
23 |
|
24 |
-
1. Given a topic description, it uses
|
25 |
The output is generated as structured JSON data based on a pre-defined schema.
|
26 |
-
2.
|
27 |
-
3. Subsequently, it uses the `python-pptx` library to generate the slides,
|
28 |
based on the JSON data from the previous step.
|
29 |
-
|
30 |
-
|
31 |
-
For example, one can ask to add another slide or modify an existing slide.
|
32 |
-
A history of instructions is maintained.
|
33 |
-
5. Every time SlideDeck AI generates a PowerPoint presentation, a download button is provided.
|
34 |
-
Clicking on the button will download the file.
|
35 |
|
36 |
-
|
37 |
-
# Summary of the LLMs
|
38 |
-
|
39 |
-
SlideDeck AI allows the use of different LLMs from four online providers—Hugging Face, Google, Cohere, and Together AI. These service providers—even the latter three—offer generous free usage of relevant LLMs without requiring any billing information.
|
40 |
-
|
41 |
-
Based on several experiments, SlideDeck AI generally recommends the use of Mistral NeMo and Gemini Flash to generate the slide decks.
|
42 |
-
|
43 |
-
The supported LLMs offer different styles of content generation. Use one of the following LLMs along with relevant API keys/access tokens, as appropriate, to create the content of the slide deck:
|
44 |
-
|
45 |
-
| LLM | Provider (code) | Requires API key | Characteristics |
|
46 |
-
|:---------------------------------| :------- |:----------------------------------------------------------------------------|:-------------------------|
|
47 |
-
| Mistral 7B Instruct v0.2 | Hugging Face (`hf`) | Optional but encouraged; [get here](https://huggingface.co/settings/tokens) | Faster, shorter content |
|
48 |
-
| Mistral NeMo Instruct 2407 | Hugging Face (`hf`) | Optional but encouraged; [get here](https://huggingface.co/settings/tokens) | Slower, longer content |
|
49 |
-
| Gemini 1.5 Flash | Google Gemini API (`gg`) | Mandatory; [get here](https://aistudio.google.com/apikey) | Faster, longer content |
|
50 |
-
| Gemini 2.0 Flash | Google Gemini API (`gg`) | Mandatory; [get here](https://aistudio.google.com/apikey) | Faster, longer content |
|
51 |
-
| Command R+ | Cohere (`co`) | Mandatory; [get here](https://dashboard.cohere.com/api-keys) | Shorter, simpler content |
|
52 |
-
| Llama 3.3 70B Instruct Turbo | Together AI (`to`) | Mandatory; [get here](https://api.together.ai/settings/api-keys) | Detailed, slower |
|
53 |
-
| Llama 3.1 8B Instruct Turbo 128K | Together AI (`to`) | Mandatory; [get here](https://api.together.ai/settings/api-keys) | Shorter |
|
54 |
-
|
55 |
-
The Mistral models (via Hugging Face) do not mandatorily require an access token. However, you are encouraged to get and use your own Hugging Face access token.
|
56 |
-
|
57 |
-
In addition, offline LLMs provided by Ollama can be used. Read below to know more.
|
58 |
-
|
59 |
-
|
60 |
-
# Icons
|
61 |
-
|
62 |
-
SlideDeck AI uses a subset of icons from [bootstrap-icons-1.11.3](https://github.com/twbs/icons)
|
63 |
-
(MIT license) in the slides. A few icons from [SVG Repo](https://www.svgrepo.com/)
|
64 |
-
(CC0, MIT, and Apache licenses) are also used.
|
65 |
|
66 |
|
67 |
# Local Development
|
68 |
|
69 |
-
SlideDeck AI uses
|
70 |
-
|
71 |
-
|
72 |
-
Visit the respective websites to obtain the
|
73 |
-
|
74 |
-
## Offline LLMs Using Ollama
|
75 |
-
|
76 |
-
SlideDeck AI allows the use of offline LLMs to generate the contents of the slide decks. This is typically suitable for individuals or organizations who would like to use self-hosted LLMs for privacy concerns, for example.
|
77 |
-
|
78 |
-
Offline LLMs are made available via Ollama. Therefore, a pre-requisite here is to have [Ollama installed](https://ollama.com/download) on the system and the desired [LLM](https://ollama.com/search) pulled locally.
|
79 |
-
|
80 |
-
In addition, the `RUN_IN_OFFLINE_MODE` environment variable needs to be set to `True` to enable the offline mode. This, for example, can be done using a `.env` file or from the terminal. The typical steps to use SlideDeck AI in offline mode (in a `bash` shell) are as follows:
|
81 |
-
|
82 |
-
```bash
|
83 |
-
ollama list # View locally available LLMs
|
84 |
-
export RUN_IN_OFFLINE_MODE=True # Enable the offline mode to use Ollama
|
85 |
-
git clone https://github.com/barun-saha/slide-deck-ai.git
|
86 |
-
cd slide-deck-ai
|
87 |
-
python -m venv venv # Create a virtual environment
|
88 |
-
source venv/bin/activate # On a Linux system
|
89 |
-
pip install -r requirements.txt
|
90 |
-
streamlit run ./app.py # Run the application
|
91 |
-
```
|
92 |
-
|
93 |
-
The `.env` file should be created inside the `slide-deck-ai` directory.
|
94 |
-
|
95 |
-
The UI is similar to the online mode. However, rather than selecting an LLM from a list, one has to write the name of the Ollama model to be used in a textbox. There is no API key asked here.
|
96 |
-
|
97 |
-
The online and offline modes are mutually exclusive. So, setting `RUN_IN_OFFLINE_MODE` to `False` will make SlideDeck AI use the online LLMs (i.e., the "original mode."). By default, `RUN_IN_OFFLINE_MODE` is set to `False`.
|
98 |
-
|
99 |
-
Finally, the focus is on using offline LLMs, not going completely offline. So, Internet connectivity would still be required to fetch the images from Pexels.
|
100 |
|
101 |
|
102 |
# Live Demo
|
103 |
|
104 |
-
|
105 |
-
- [Demo video](https://youtu.be/QvAKzNKtk9k) of the chat interface on YouTube
|
106 |
|
107 |
|
108 |
# Award
|
109 |
|
110 |
-
SlideDeck AI has won the 3rd Place in the [Llama 2 Hackathon with Clarifai](https://lablab.ai/event/llama-2-hackathon-with-clarifai)
|
|
|
4 |
colorFrom: yellow
|
5 |
colorTo: green
|
6 |
sdk: streamlit
|
7 |
+
sdk_version: 1.26.0
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
license: mit
|
|
|
16 |
With SlideDeck AI, co-create slide decks on any topic with Generative Artificial Intelligence.
|
17 |
Describe your topic and let SlideDeck AI generate a PowerPoint slide deck for you—it's as simple as that!
|
18 |
|
19 |
+
SlideDeck AI is powered by [Mistral 7B Instruct](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1).
|
20 |
+
Originally, it was built using the Llama 2 API provided by Clarifai.
|
21 |
|
22 |
# Process
|
23 |
|
24 |
SlideDeck AI works in the following way:
|
25 |
|
26 |
+
1. Given a topic description, it uses Mistral 7B Instruct to generate the outline/contents of the slides.
|
27 |
The output is generated as structured JSON data based on a pre-defined schema.
|
28 |
+
2. Subsequently, it uses the `python-pptx` library to generate the slides,
|
|
|
29 |
based on the JSON data from the previous step.
|
30 |
+
Here, a user can choose from a set of three pre-defined presentation templates.
|
31 |
+
3. In addition, it uses Metaphor to fetch Web pages related to the topic.
|
|
|
|
|
|
|
|
|
32 |
|
33 |
+
4. ~~Finally, it uses Stable Diffusion 2 to generate an image, based on the title and each slide heading.~~
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
|
36 |
# Local Development
|
37 |
|
38 |
+
SlideDeck AI uses [Mistral 7B Instruct](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)
|
39 |
+
via the Hugging Face Inference API.
|
40 |
+
To run this project by yourself, you need to provide the `HUGGINGFACEHUB_API_TOKEN` and `METAPHOR_API_KEY` API keys,
|
41 |
+
for example, in a `.env` file. Visit the respective websites to obtain the keys.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
|
43 |
|
44 |
# Live Demo
|
45 |
|
46 |
+
[SlideDeck AI](https://huggingface.co/spaces/barunsaha/slide-deck-ai)
|
|
|
47 |
|
48 |
|
49 |
# Award
|
50 |
|
51 |
+
SlideDeck AI has won the 3rd Place in the [Llama 2 Hackathon with Clarifai](https://lablab.ai/event/llama-2-hackathon-with-clarifai).
|
app.py
CHANGED
@@ -1,493 +1,310 @@
|
|
1 |
-
"""
|
2 |
-
Streamlit app containing the UI and the application logic.
|
3 |
-
"""
|
4 |
-
import datetime
|
5 |
-
import logging
|
6 |
-
import os
|
7 |
import pathlib
|
8 |
-
import
|
9 |
import tempfile
|
10 |
-
from typing import List,
|
11 |
|
12 |
-
import httpx
|
13 |
-
import huggingface_hub
|
14 |
import json5
|
15 |
-
import
|
16 |
-
import requests
|
17 |
import streamlit as st
|
18 |
-
from dotenv import load_dotenv
|
19 |
-
from langchain_community.chat_message_histories import StreamlitChatMessageHistory
|
20 |
-
from langchain_core.messages import HumanMessage
|
21 |
-
from langchain_core.prompts import ChatPromptTemplate
|
22 |
|
23 |
-
import
|
|
|
24 |
from global_config import GlobalConfig
|
25 |
-
from helpers import llm_helper, pptx_helper, text_helper
|
26 |
-
|
27 |
-
|
28 |
-
load_dotenv()
|
29 |
-
|
30 |
|
31 |
-
RUN_IN_OFFLINE_MODE = os.getenv('RUN_IN_OFFLINE_MODE', 'False').lower() == 'true'
|
32 |
|
|
|
|
|
33 |
|
34 |
-
@st.cache_data
|
35 |
-
def _load_strings() -> dict:
|
36 |
-
"""
|
37 |
-
Load various strings to be displayed in the app.
|
38 |
-
:return: The dictionary of strings.
|
39 |
-
"""
|
40 |
|
41 |
-
|
42 |
-
|
|
|
|
|
43 |
|
44 |
|
45 |
@st.cache_data
|
46 |
-
def
|
47 |
"""
|
48 |
-
|
49 |
|
50 |
-
:param
|
51 |
-
:return: The
|
52 |
"""
|
53 |
|
54 |
-
|
55 |
-
|
56 |
-
template = in_file.read()
|
57 |
-
else:
|
58 |
-
with open(GlobalConfig.INITIAL_PROMPT_TEMPLATE, 'r', encoding='utf-8') as in_file:
|
59 |
-
template = in_file.read()
|
60 |
-
|
61 |
-
return template
|
62 |
|
63 |
|
64 |
-
|
65 |
-
|
66 |
-
selected_provider: str,
|
67 |
-
selected_model: str,
|
68 |
-
user_key: str,
|
69 |
-
) -> bool:
|
70 |
"""
|
71 |
-
|
72 |
|
73 |
-
:
|
74 |
-
:param selected_provider: The LLM provider.
|
75 |
-
:param selected_model: Name of the model.
|
76 |
-
:param user_key: User-provided API key.
|
77 |
-
:return: `True` if all inputs "look" OK; `False` otherwise.
|
78 |
"""
|
79 |
|
80 |
-
|
81 |
-
handle_error(
|
82 |
-
'Not enough information provided!'
|
83 |
-
' Please be a little more descriptive and type a few words'
|
84 |
-
' with a few characters :)',
|
85 |
-
False
|
86 |
-
)
|
87 |
-
return False
|
88 |
-
|
89 |
-
if not selected_provider or not selected_model:
|
90 |
-
handle_error('No valid LLM provider and/or model name found!', False)
|
91 |
-
return False
|
92 |
-
|
93 |
-
if not llm_helper.is_valid_llm_provider_model(selected_provider, selected_model, user_key):
|
94 |
-
handle_error(
|
95 |
-
'The LLM settings do not look correct. Make sure that an API key/access token'
|
96 |
-
' is provided if the selected LLM requires it. An API key should be 6-64 characters'
|
97 |
-
' long, only containing alphanumeric characters, hyphens, and underscores.',
|
98 |
-
False
|
99 |
-
)
|
100 |
-
return False
|
101 |
-
|
102 |
-
return True
|
103 |
|
104 |
|
105 |
-
|
106 |
-
|
107 |
-
Display an error message in the app.
|
108 |
-
|
109 |
-
:param error_msg: The error message to be displayed.
|
110 |
-
:param should_log: If `True`, log the message.
|
111 |
"""
|
|
|
112 |
|
113 |
-
|
114 |
-
|
115 |
-
|
116 |
-
st.error(error_msg)
|
117 |
-
|
118 |
-
|
119 |
-
def reset_api_key():
|
120 |
-
"""
|
121 |
-
Clear API key input when a different LLM is selected from the dropdown list.
|
122 |
"""
|
123 |
|
124 |
-
|
125 |
-
|
126 |
-
|
127 |
-
|
128 |
-
|
129 |
-
# Session variables
|
130 |
-
CHAT_MESSAGES = 'chat_messages'
|
131 |
-
DOWNLOAD_FILE_KEY = 'download_file_name'
|
132 |
-
IS_IT_REFINEMENT = 'is_it_refinement'
|
133 |
-
|
134 |
-
|
135 |
-
logger = logging.getLogger(__name__)
|
136 |
-
|
137 |
-
texts = list(GlobalConfig.PPTX_TEMPLATE_FILES.keys())
|
138 |
-
captions = [GlobalConfig.PPTX_TEMPLATE_FILES[x]['caption'] for x in texts]
|
139 |
-
|
140 |
-
with st.sidebar:
|
141 |
-
# The PPT templates
|
142 |
-
pptx_template = st.sidebar.radio(
|
143 |
-
'1: Select a presentation template:',
|
144 |
-
texts,
|
145 |
-
captions=captions,
|
146 |
-
horizontal=True
|
147 |
)
|
148 |
|
149 |
-
|
150 |
-
|
151 |
-
|
152 |
-
|
153 |
-
|
154 |
-
|
155 |
-
|
156 |
-
|
157 |
-
|
158 |
-
|
159 |
-
|
160 |
-
|
161 |
-
|
162 |
-
|
163 |
-
|
164 |
-
|
165 |
-
|
166 |
-
|
167 |
-
|
168 |
-
|
169 |
-
|
170 |
-
|
171 |
-
|
172 |
-
|
173 |
-
|
174 |
-
|
175 |
-
),
|
176 |
-
type='password',
|
177 |
-
key='api_key_input'
|
178 |
-
)
|
179 |
|
180 |
|
181 |
def build_ui():
|
182 |
"""
|
183 |
-
Display the input elements for content generation.
|
184 |
"""
|
185 |
|
|
|
|
|
186 |
st.title(APP_TEXT['app_name'])
|
187 |
st.subheader(APP_TEXT['caption'])
|
188 |
st.markdown(
|
189 |
-
'
|
|
|
|
|
|
|
|
|
|
|
190 |
)
|
191 |
|
192 |
-
|
193 |
-
|
194 |
-
|
195 |
-
(
|
196 |
-
|
197 |
-
|
198 |
-
|
199 |
-
|
200 |
-
|
|
|
|
|
201 |
)
|
202 |
|
203 |
-
|
204 |
-
|
205 |
|
206 |
-
|
|
|
|
|
|
|
|
|
|
|
207 |
|
|
|
|
|
208 |
|
209 |
-
|
210 |
-
|
211 |
-
|
212 |
-
"""
|
213 |
|
214 |
-
|
215 |
-
|
|
|
|
|
216 |
|
217 |
-
|
218 |
-
|
219 |
|
220 |
-
|
221 |
-
|
222 |
-
|
223 |
-
|
224 |
-
|
|
|
|
|
225 |
)
|
226 |
|
227 |
-
# Since Streamlit app reloads at every interaction, display the chat history
|
228 |
-
# from the save session state
|
229 |
-
for msg in history.messages:
|
230 |
-
st.chat_message(msg.type).code(msg.content, language='json')
|
231 |
-
|
232 |
-
if prompt := st.chat_input(
|
233 |
-
placeholder=APP_TEXT['chat_placeholder'],
|
234 |
-
max_chars=GlobalConfig.LLM_MODEL_MAX_INPUT_LENGTH
|
235 |
-
):
|
236 |
-
provider, llm_name = llm_helper.get_provider_model(
|
237 |
-
llm_provider_to_use,
|
238 |
-
use_ollama=RUN_IN_OFFLINE_MODE
|
239 |
-
)
|
240 |
|
241 |
-
|
242 |
-
|
|
|
243 |
|
244 |
-
|
245 |
-
|
246 |
-
|
247 |
-
|
248 |
-
|
249 |
-
|
250 |
-
|
251 |
-
|
252 |
-
|
253 |
-
|
254 |
-
|
255 |
-
|
256 |
-
formatted_template = prompt_template.format(
|
257 |
-
**{
|
258 |
-
'instructions': '\n'.join(list_of_msgs),
|
259 |
-
'previous_content': _get_last_response(),
|
260 |
-
}
|
261 |
-
)
|
262 |
-
else:
|
263 |
-
formatted_template = prompt_template.format(**{'question': prompt})
|
264 |
-
|
265 |
-
progress_bar = st.progress(0, 'Preparing to call LLM...')
|
266 |
-
response = ''
|
267 |
|
268 |
try:
|
269 |
-
|
270 |
-
|
271 |
-
|
272 |
-
|
273 |
-
|
274 |
-
|
275 |
-
|
276 |
-
|
277 |
-
|
278 |
-
'
|
279 |
-
|
280 |
-
|
281 |
-
|
|
|
|
|
|
|
282 |
)
|
283 |
return
|
284 |
|
285 |
-
|
286 |
-
response += _
|
287 |
-
|
288 |
-
# Update the progress bar with an approx progress percentage
|
289 |
-
progress_bar.progress(
|
290 |
-
min(
|
291 |
-
len(response) / gcfg.get_max_output_tokens(llm_provider_to_use),
|
292 |
-
0.95
|
293 |
-
),
|
294 |
-
text='Streaming content...this might take a while...'
|
295 |
-
)
|
296 |
-
except (httpx.ConnectError, requests.exceptions.ConnectionError):
|
297 |
-
handle_error(
|
298 |
-
'A connection error occurred while streaming content from the LLM endpoint.'
|
299 |
-
' Unfortunately, the slide deck cannot be generated. Please try again later.'
|
300 |
-
' Alternatively, try selecting a different LLM from the dropdown list. If you are'
|
301 |
-
' using Ollama, make sure that Ollama is already running on your system.',
|
302 |
-
True
|
303 |
-
)
|
304 |
-
return
|
305 |
-
except huggingface_hub.errors.ValidationError as ve:
|
306 |
-
handle_error(
|
307 |
-
f'An error occurred while trying to generate the content: {ve}'
|
308 |
-
'\nPlease try again with a significantly shorter input text.',
|
309 |
-
True
|
310 |
-
)
|
311 |
-
return
|
312 |
-
except ollama.ResponseError:
|
313 |
-
handle_error(
|
314 |
-
f'The model `{llm_name}` is unavailable with Ollama on your system.'
|
315 |
-
f' Make sure that you have provided the correct LLM name or pull it using'
|
316 |
-
f' `ollama pull {llm_name}`. View LLMs available locally by running `ollama list`.',
|
317 |
-
True
|
318 |
-
)
|
319 |
-
return
|
320 |
-
except Exception as ex:
|
321 |
-
handle_error(
|
322 |
-
f'An unexpected error occurred while generating the content: {ex}'
|
323 |
-
'\nPlease try again later, possibly with different inputs.'
|
324 |
-
' Alternatively, try selecting a different LLM from the dropdown list.'
|
325 |
-
' If you are using Cohere, Gemini, or Together AI models, make sure that you have'
|
326 |
-
' provided a correct API key.',
|
327 |
-
True
|
328 |
-
)
|
329 |
-
return
|
330 |
-
|
331 |
-
history.add_user_message(prompt)
|
332 |
-
history.add_ai_message(response)
|
333 |
-
|
334 |
-
# The content has been generated as JSON
|
335 |
-
# There maybe trailing ``` at the end of the response -- remove them
|
336 |
-
# To be careful: ``` may be part of the content as well when code is generated
|
337 |
-
response = text_helper.get_clean_json(response)
|
338 |
-
logger.info(
|
339 |
-
'Cleaned JSON length: %d', len(response)
|
340 |
-
)
|
341 |
|
342 |
-
|
343 |
-
|
344 |
-
GlobalConfig.LLM_PROGRESS_MAX,
|
345 |
-
text='Finding photos online and generating the slide deck...'
|
346 |
-
)
|
347 |
-
progress_bar.progress(1.0, text='Done!')
|
348 |
-
st.chat_message('ai').code(response, language='json')
|
349 |
|
350 |
-
|
351 |
-
|
|
|
352 |
|
353 |
-
|
354 |
-
|
355 |
-
len(st.session_state[CHAT_MESSAGES]) / 2
|
356 |
-
)
|
357 |
|
358 |
|
359 |
-
def
|
360 |
"""
|
361 |
-
|
362 |
-
deck, the path may be to an empty file.
|
363 |
|
364 |
-
:param
|
365 |
-
:
|
|
|
366 |
"""
|
367 |
|
368 |
-
|
369 |
-
parsed_data = json5.loads(json_str)
|
370 |
-
except ValueError:
|
371 |
-
handle_error(
|
372 |
-
'Encountered error while parsing JSON...will fix it and retry',
|
373 |
-
True
|
374 |
-
)
|
375 |
-
try:
|
376 |
-
parsed_data = json5.loads(text_helper.fix_malformed_json(json_str))
|
377 |
-
except ValueError:
|
378 |
-
handle_error(
|
379 |
-
'Encountered an error again while fixing JSON...'
|
380 |
-
'the slide deck cannot be created, unfortunately ☹'
|
381 |
-
'\nPlease try again later.',
|
382 |
-
True
|
383 |
-
)
|
384 |
-
return None
|
385 |
-
except RecursionError:
|
386 |
-
handle_error(
|
387 |
-
'Encountered a recursion error while parsing JSON...'
|
388 |
-
'the slide deck cannot be created, unfortunately ☹'
|
389 |
-
'\nPlease try again later.',
|
390 |
-
True
|
391 |
-
)
|
392 |
-
return None
|
393 |
-
except Exception:
|
394 |
-
handle_error(
|
395 |
-
'Encountered an error while parsing JSON...'
|
396 |
-
'the slide deck cannot be created, unfortunately ☹'
|
397 |
-
'\nPlease try again later.',
|
398 |
-
True
|
399 |
-
)
|
400 |
-
return None
|
401 |
-
|
402 |
-
if DOWNLOAD_FILE_KEY in st.session_state:
|
403 |
-
path = pathlib.Path(st.session_state[DOWNLOAD_FILE_KEY])
|
404 |
-
else:
|
405 |
-
temp = tempfile.NamedTemporaryFile(delete=False, suffix='.pptx')
|
406 |
-
path = pathlib.Path(temp.name)
|
407 |
-
st.session_state[DOWNLOAD_FILE_KEY] = str(path)
|
408 |
-
|
409 |
-
if temp:
|
410 |
-
temp.close()
|
411 |
|
412 |
try:
|
413 |
-
|
414 |
-
|
415 |
-
parsed_data,
|
416 |
-
slides_template=pptx_template,
|
417 |
-
output_file_path=path
|
418 |
-
)
|
419 |
except Exception as ex:
|
420 |
-
st.error(
|
421 |
-
|
422 |
-
|
423 |
-
|
424 |
-
|
425 |
-
|
426 |
-
def _is_it_refinement() -> bool:
|
427 |
-
"""
|
428 |
-
Whether it is the initial prompt or a refinement.
|
429 |
-
|
430 |
-
:return: True if it is the initial prompt; False otherwise.
|
431 |
-
"""
|
432 |
-
|
433 |
-
if IS_IT_REFINEMENT in st.session_state:
|
434 |
-
return True
|
435 |
|
436 |
-
|
437 |
-
# Prepare for the next call
|
438 |
-
st.session_state[IS_IT_REFINEMENT] = True
|
439 |
-
return True
|
440 |
|
441 |
-
|
|
|
442 |
|
|
|
443 |
|
444 |
-
def _get_user_messages() -> List[str]:
|
445 |
-
"""
|
446 |
-
Get a list of user messages submitted until now from the session state.
|
447 |
|
448 |
-
|
449 |
"""
|
|
|
450 |
|
451 |
-
|
452 |
-
|
453 |
-
|
454 |
-
|
455 |
-
|
456 |
-
def _get_last_response() -> str:
|
457 |
"""
|
458 |
-
Get the last response generated by AI.
|
459 |
|
460 |
-
|
461 |
-
|
462 |
|
463 |
-
|
|
|
|
|
|
|
|
|
464 |
|
|
|
|
|
465 |
|
466 |
-
|
467 |
-
|
468 |
-
|
|
|
|
|
|
|
|
|
|
|
469 |
|
470 |
-
|
471 |
-
|
472 |
|
473 |
-
|
474 |
-
view_messages.json(st.session_state[CHAT_MESSAGES])
|
475 |
|
476 |
|
477 |
-
def
|
478 |
"""
|
479 |
-
|
480 |
|
481 |
-
:param
|
482 |
"""
|
483 |
|
484 |
-
|
485 |
-
|
486 |
-
|
487 |
-
|
488 |
-
|
489 |
-
|
490 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
491 |
|
492 |
|
493 |
def main():
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
import pathlib
|
2 |
+
import logging
|
3 |
import tempfile
|
4 |
+
from typing import List, Tuple
|
5 |
|
|
|
|
|
6 |
import json5
|
7 |
+
import metaphor_python as metaphor
|
|
|
8 |
import streamlit as st
|
|
|
|
|
|
|
|
|
9 |
|
10 |
+
import llm_helper
|
11 |
+
import pptx_helper
|
12 |
from global_config import GlobalConfig
|
|
|
|
|
|
|
|
|
|
|
13 |
|
|
|
14 |
|
15 |
+
APP_TEXT = json5.loads(open(GlobalConfig.APP_STRINGS_FILE, 'r', encoding='utf-8').read())
|
16 |
+
GB_CONVERTER = 2 ** 30
|
17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
+
logging.basicConfig(
|
20 |
+
level=GlobalConfig.LOG_LEVEL,
|
21 |
+
format='%(asctime)s - %(message)s',
|
22 |
+
)
|
23 |
|
24 |
|
25 |
@st.cache_data
|
26 |
+
def get_contents_wrapper(text: str) -> str:
|
27 |
"""
|
28 |
+
Fetch and cache the slide deck contents on a topic by calling an external API.
|
29 |
|
30 |
+
:param text: The presentation topic
|
31 |
+
:return: The slide deck contents or outline in JSON format
|
32 |
"""
|
33 |
|
34 |
+
logging.info('LLM call because of cache miss...')
|
35 |
+
return llm_helper.generate_slides_content(text).strip()
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
|
37 |
|
38 |
+
@st.cache_resource
|
39 |
+
def get_metaphor_client_wrapper() -> metaphor.Metaphor:
|
|
|
|
|
|
|
|
|
40 |
"""
|
41 |
+
Create a Metaphor client for semantic Web search.
|
42 |
|
43 |
+
:return: Metaphor instance
|
|
|
|
|
|
|
|
|
44 |
"""
|
45 |
|
46 |
+
return metaphor.Metaphor(api_key=GlobalConfig.METAPHOR_API_KEY)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
47 |
|
48 |
|
49 |
+
@st.cache_data
|
50 |
+
def get_web_search_results_wrapper(text: str) -> List[Tuple[str, str]]:
|
|
|
|
|
|
|
|
|
51 |
"""
|
52 |
+
Fetch and cache the Web search results on a given topic.
|
53 |
|
54 |
+
:param text: The topic
|
55 |
+
:return: A list of (title, link) tuples
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
"""
|
57 |
|
58 |
+
results = []
|
59 |
+
search_results = get_metaphor_client_wrapper().search(
|
60 |
+
text,
|
61 |
+
use_autoprompt=True,
|
62 |
+
num_results=5
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
63 |
)
|
64 |
|
65 |
+
for a_result in search_results.results:
|
66 |
+
results.append((a_result.title, a_result.url))
|
67 |
+
|
68 |
+
return results
|
69 |
+
|
70 |
+
|
71 |
+
# def get_disk_used_percentage() -> float:
|
72 |
+
# """
|
73 |
+
# Compute the disk usage.
|
74 |
+
#
|
75 |
+
# :return: Percentage of the disk space currently used
|
76 |
+
# """
|
77 |
+
#
|
78 |
+
# total, used, free = shutil.disk_usage(__file__)
|
79 |
+
# total = total // GB_CONVERTER
|
80 |
+
# used = used // GB_CONVERTER
|
81 |
+
# free = free // GB_CONVERTER
|
82 |
+
# used_perc = 100.0 * used / total
|
83 |
+
#
|
84 |
+
# logging.debug(f'Total: {total} GB\n'
|
85 |
+
# f'Used: {used} GB\n'
|
86 |
+
# f'Free: {free} GB')
|
87 |
+
#
|
88 |
+
# logging.debug('\n'.join(os.listdir()))
|
89 |
+
#
|
90 |
+
# return used_perc
|
|
|
|
|
|
|
|
|
91 |
|
92 |
|
93 |
def build_ui():
|
94 |
"""
|
95 |
+
Display the input elements for content generation. Only covers the first step.
|
96 |
"""
|
97 |
|
98 |
+
# get_disk_used_percentage()
|
99 |
+
|
100 |
st.title(APP_TEXT['app_name'])
|
101 |
st.subheader(APP_TEXT['caption'])
|
102 |
st.markdown(
|
103 |
+
'Powered by'
|
104 |
+
' [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2).'
|
105 |
+
)
|
106 |
+
st.markdown(
|
107 |
+
'*If the JSON is generated or parsed incorrectly, try again later by making minor changes'
|
108 |
+
' to the input text.*'
|
109 |
)
|
110 |
|
111 |
+
with st.form('my_form'):
|
112 |
+
# Topic input
|
113 |
+
try:
|
114 |
+
with open(GlobalConfig.PRELOAD_DATA_FILE, 'r', encoding='utf-8') as in_file:
|
115 |
+
preload_data = json5.loads(in_file.read())
|
116 |
+
except (FileExistsError, FileNotFoundError):
|
117 |
+
preload_data = {'topic': '', 'audience': ''}
|
118 |
+
|
119 |
+
topic = st.text_area(
|
120 |
+
APP_TEXT['input_labels'][0],
|
121 |
+
value=preload_data['topic']
|
122 |
)
|
123 |
|
124 |
+
texts = list(GlobalConfig.PPTX_TEMPLATE_FILES.keys())
|
125 |
+
captions = [GlobalConfig.PPTX_TEMPLATE_FILES[x]['caption'] for x in texts]
|
126 |
|
127 |
+
pptx_template = st.radio(
|
128 |
+
'Select a presentation template:',
|
129 |
+
texts,
|
130 |
+
captions=captions,
|
131 |
+
horizontal=True
|
132 |
+
)
|
133 |
|
134 |
+
st.divider()
|
135 |
+
submit = st.form_submit_button('Generate slide deck')
|
136 |
|
137 |
+
if submit:
|
138 |
+
# st.write(f'Clicked {time.time()}')
|
139 |
+
st.session_state.submitted = True
|
|
|
140 |
|
141 |
+
# https://github.com/streamlit/streamlit/issues/3832#issuecomment-1138994421
|
142 |
+
if 'submitted' in st.session_state:
|
143 |
+
progress_text = 'Generating the slides...give it a moment'
|
144 |
+
progress_bar = st.progress(0, text=progress_text)
|
145 |
|
146 |
+
topic_txt = topic.strip()
|
147 |
+
generate_presentation(topic_txt, pptx_template, progress_bar)
|
148 |
|
149 |
+
st.divider()
|
150 |
+
st.text(APP_TEXT['tos'])
|
151 |
+
st.text(APP_TEXT['tos2'])
|
152 |
+
|
153 |
+
st.markdown(
|
154 |
+
'![Visitors]'
|
155 |
+
'(https://api.visitorbadge.io/api/visitors?path=https%3A%2F%2Fhuggingface.co%2Fspaces%2Fbarunsaha%2Fslide-deck-ai&countColor=%23263759)'
|
156 |
)
|
157 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
158 |
|
159 |
+
def generate_presentation(topic: str, pptx_template: str, progress_bar):
|
160 |
+
"""
|
161 |
+
Process the inputs to generate the slides.
|
162 |
|
163 |
+
:param topic: The presentation topic based on which contents are to be generated
|
164 |
+
:param pptx_template: The PowerPoint template name to be used
|
165 |
+
:param progress_bar: Progress bar from the page
|
166 |
+
:return:
|
167 |
+
"""
|
168 |
+
|
169 |
+
topic_length = len(topic)
|
170 |
+
logging.debug('Input length:: topic: %s', topic_length)
|
171 |
+
|
172 |
+
if topic_length >= 10:
|
173 |
+
logging.debug('Topic: %s', topic)
|
174 |
+
target_length = min(topic_length, GlobalConfig.LLM_MODEL_MAX_INPUT_LENGTH)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
175 |
|
176 |
try:
|
177 |
+
# Step 1: Generate the contents in JSON format using an LLM
|
178 |
+
json_str = process_slides_contents(topic[:target_length], progress_bar)
|
179 |
+
logging.debug('Truncated topic: %s', topic[:target_length])
|
180 |
+
logging.debug('Length of JSON: %d', len(json_str))
|
181 |
+
|
182 |
+
# Step 2: Generate the slide deck based on the template specified
|
183 |
+
if len(json_str) > 0:
|
184 |
+
st.info(
|
185 |
+
'Tip: The generated content doesn\'t look so great?'
|
186 |
+
' Need alternatives? Just change your description text and try again.',
|
187 |
+
icon="💡️"
|
188 |
+
)
|
189 |
+
else:
|
190 |
+
st.error(
|
191 |
+
'Unfortunately, JSON generation failed, so the next steps would lead'
|
192 |
+
' to nowhere. Try again or come back later.'
|
193 |
)
|
194 |
return
|
195 |
|
196 |
+
all_headers = generate_slide_deck(json_str, pptx_template, progress_bar)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
197 |
|
198 |
+
# Step 3: Bonus stuff: Web references and AI art
|
199 |
+
show_bonus_stuff(all_headers)
|
|
|
|
|
|
|
|
|
|
|
200 |
|
201 |
+
except ValueError as ve:
|
202 |
+
st.error(f'Unfortunately, an error occurred: {ve}! '
|
203 |
+
f'Please change the text, try again later, or report it, sharing your inputs.')
|
204 |
|
205 |
+
else:
|
206 |
+
st.error('Not enough information provided! Please be little more descriptive :)')
|
|
|
|
|
207 |
|
208 |
|
209 |
+
def process_slides_contents(text: str, progress_bar: st.progress) -> str:
|
210 |
"""
|
211 |
+
Convert given text into structured data and display. Update the UI.
|
|
|
212 |
|
213 |
+
:param text: The topic description for the presentation
|
214 |
+
:param progress_bar: Progress bar for this step
|
215 |
+
:return: The contents as a JSON-formatted string
|
216 |
"""
|
217 |
|
218 |
+
json_str = ''
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
219 |
|
220 |
try:
|
221 |
+
logging.info('Calling LLM for content generation on the topic: %s', text)
|
222 |
+
json_str = get_contents_wrapper(text)
|
|
|
|
|
|
|
|
|
223 |
except Exception as ex:
|
224 |
+
st.error(
|
225 |
+
f'An exception occurred while trying to convert to JSON. It could be because of heavy'
|
226 |
+
f' traffic or something else. Try doing it again or try again later.'
|
227 |
+
f'\nError message: {ex}'
|
228 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
229 |
|
230 |
+
progress_bar.progress(50, text='Contents generated')
|
|
|
|
|
|
|
231 |
|
232 |
+
with st.expander('The generated contents (in JSON format)'):
|
233 |
+
st.code(json_str, language='json')
|
234 |
|
235 |
+
return json_str
|
236 |
|
|
|
|
|
|
|
237 |
|
238 |
+
def generate_slide_deck(json_str: str, pptx_template: str, progress_bar) -> List:
|
239 |
"""
|
240 |
+
Create a slide deck.
|
241 |
|
242 |
+
:param json_str: The contents in JSON format
|
243 |
+
:param pptx_template: The PPTX template name
|
244 |
+
:param progress_bar: Progress bar
|
245 |
+
:return: A list of all slide headers and the title
|
|
|
|
|
246 |
"""
|
|
|
247 |
|
248 |
+
progress_text = 'Creating the slide deck...give it a moment'
|
249 |
+
progress_bar.progress(75, text=progress_text)
|
250 |
|
251 |
+
# # Get a unique name for the file to save -- use the session ID
|
252 |
+
# ctx = st_sr.get_script_run_ctx()
|
253 |
+
# session_id = ctx.session_id
|
254 |
+
# timestamp = time.time()
|
255 |
+
# output_file_name = f'{session_id}_{timestamp}.pptx'
|
256 |
|
257 |
+
temp = tempfile.NamedTemporaryFile(delete=False, suffix='.pptx')
|
258 |
+
path = pathlib.Path(temp.name)
|
259 |
|
260 |
+
logging.info('Creating PPTX file...')
|
261 |
+
all_headers = pptx_helper.generate_powerpoint_presentation(
|
262 |
+
json_str,
|
263 |
+
as_yaml=False,
|
264 |
+
slides_template=pptx_template,
|
265 |
+
output_file_path=path
|
266 |
+
)
|
267 |
+
progress_bar.progress(100, text='Done!')
|
268 |
|
269 |
+
with open(path, 'rb') as f:
|
270 |
+
st.download_button('Download PPTX file', f, file_name='Presentation.pptx')
|
271 |
|
272 |
+
return all_headers
|
|
|
273 |
|
274 |
|
275 |
+
def show_bonus_stuff(ppt_headers: List[str]):
|
276 |
"""
|
277 |
+
Show bonus stuff for the presentation.
|
278 |
|
279 |
+
:param ppt_headers: A list of the slide headings.
|
280 |
"""
|
281 |
|
282 |
+
# Use the presentation title and the slide headers to find relevant info online
|
283 |
+
logging.info('Calling Metaphor search...')
|
284 |
+
ppt_text = ' '.join(ppt_headers)
|
285 |
+
search_results = get_web_search_results_wrapper(ppt_text)
|
286 |
+
md_text_items = []
|
287 |
+
|
288 |
+
for (title, link) in search_results:
|
289 |
+
md_text_items.append(f'[{title}]({link})')
|
290 |
+
|
291 |
+
with st.expander('Related Web references'):
|
292 |
+
st.markdown('\n\n'.join(md_text_items))
|
293 |
+
|
294 |
+
logging.info('Done!')
|
295 |
+
|
296 |
+
# # Avoid image generation. It costs time and an API call, so just limit to the text generation.
|
297 |
+
# with st.expander('AI-generated image on the presentation topic'):
|
298 |
+
# logging.info('Calling SDXL for image generation...')
|
299 |
+
# # img_empty.write('')
|
300 |
+
# # img_text.write(APP_TEXT['image_info'])
|
301 |
+
# image = get_ai_image_wrapper(ppt_text)
|
302 |
+
#
|
303 |
+
# if len(image) > 0:
|
304 |
+
# image = base64.b64decode(image)
|
305 |
+
# st.image(image, caption=ppt_text)
|
306 |
+
# st.info('Tip: Right-click on the image to save it.', icon="💡️")
|
307 |
+
# logging.info('Image added')
|
308 |
|
309 |
|
310 |
def main():
|
clarifai_grpc_helper.py
ADDED
@@ -0,0 +1,71 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from clarifai_grpc.channel.clarifai_channel import ClarifaiChannel
|
2 |
+
from clarifai_grpc.grpc.api import resources_pb2, service_pb2, service_pb2_grpc
|
3 |
+
from clarifai_grpc.grpc.api.status import status_code_pb2
|
4 |
+
|
5 |
+
from global_config import GlobalConfig
|
6 |
+
|
7 |
+
|
8 |
+
CHANNEL = ClarifaiChannel.get_grpc_channel()
|
9 |
+
STUB = service_pb2_grpc.V2Stub(CHANNEL)
|
10 |
+
|
11 |
+
METADATA = (
|
12 |
+
('authorization', 'Key ' + GlobalConfig.CLARIFAI_PAT),
|
13 |
+
)
|
14 |
+
|
15 |
+
USER_DATA_OBJECT = resources_pb2.UserAppIDSet(
|
16 |
+
user_id=GlobalConfig.CLARIFAI_USER_ID,
|
17 |
+
app_id=GlobalConfig.CLARIFAI_APP_ID
|
18 |
+
)
|
19 |
+
|
20 |
+
RAW_TEXT = '''You are a helpful, intelligent chatbot. Create the slides for a presentation on the given topic. Include main headings for each slide, detailed bullet points for each slide. Add relevant content to each slide. Do not output any blank line.
|
21 |
+
|
22 |
+
Topic:
|
23 |
+
Talk about AI, covering what it is and how it works. Add its pros, cons, and future prospects. Also, cover its job prospects.
|
24 |
+
'''
|
25 |
+
|
26 |
+
|
27 |
+
def get_text_from_llm(prompt: str) -> str:
|
28 |
+
post_model_outputs_response = STUB.PostModelOutputs(
|
29 |
+
service_pb2.PostModelOutputsRequest(
|
30 |
+
user_app_id=USER_DATA_OBJECT, # The userDataObject is created in the overview and is required when using a PAT
|
31 |
+
model_id=GlobalConfig.CLARIFAI_MODEL_ID,
|
32 |
+
# version_id=MODEL_VERSION_ID, # This is optional. Defaults to the latest model version
|
33 |
+
inputs=[
|
34 |
+
resources_pb2.Input(
|
35 |
+
data=resources_pb2.Data(
|
36 |
+
text=resources_pb2.Text(
|
37 |
+
raw=prompt
|
38 |
+
)
|
39 |
+
)
|
40 |
+
)
|
41 |
+
]
|
42 |
+
),
|
43 |
+
metadata=METADATA
|
44 |
+
)
|
45 |
+
|
46 |
+
if post_model_outputs_response.status.code != status_code_pb2.SUCCESS:
|
47 |
+
print(post_model_outputs_response.status)
|
48 |
+
raise Exception(f"Post model outputs failed, status: {post_model_outputs_response.status.description}")
|
49 |
+
|
50 |
+
# Since we have one input, one output will exist here
|
51 |
+
output = post_model_outputs_response.outputs[0]
|
52 |
+
|
53 |
+
# print("Completion:\n")
|
54 |
+
# print(output.data.text.raw)
|
55 |
+
|
56 |
+
return output.data.text.raw
|
57 |
+
|
58 |
+
|
59 |
+
if __name__ == '__main__':
|
60 |
+
topic = ('Talk about AI, covering what it is and how it works.'
|
61 |
+
' Add its pros, cons, and future prospects.'
|
62 |
+
' Also, cover its job prospects.'
|
63 |
+
)
|
64 |
+
print(topic)
|
65 |
+
|
66 |
+
with open(GlobalConfig.SLIDES_TEMPLATE_FILE, 'r') as in_file:
|
67 |
+
prompt_txt = in_file.read()
|
68 |
+
prompt_txt = prompt_txt.replace('{topic}', topic)
|
69 |
+
response_txt = get_text_from_llm(prompt_txt)
|
70 |
+
|
71 |
+
print('Output:\n', response_txt)
|
examples/example_04.json
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
{
|
2 |
-
"topic": "12 slides on a basic tutorial on Python along with examples"
|
3 |
-
}
|
|
|
|
|
|
|
|
file_embeddings/embeddings.npy
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:64a1ba79b20c81ba7ed6604468736f74ae89813fe378191af1d8574c008b3ab5
|
3 |
-
size 326784
|
|
|
|
|
|
|
|
file_embeddings/icons.npy
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:ce5ce4c86bb213915606921084b3516464154edcae12f4bc708d62c6bd7acebb
|
3 |
-
size 51168
|
|
|
|
|
|
|
|
global_config.py
CHANGED
@@ -1,7 +1,3 @@
|
|
1 |
-
"""
|
2 |
-
A set of configurations used by the app.
|
3 |
-
"""
|
4 |
-
import logging
|
5 |
import os
|
6 |
|
7 |
from dataclasses import dataclass
|
@@ -13,154 +9,32 @@ load_dotenv()
|
|
13 |
|
14 |
@dataclass(frozen=True)
|
15 |
class GlobalConfig:
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
PROVIDER_GOOGLE_GEMINI = 'gg'
|
22 |
-
PROVIDER_HUGGING_FACE = 'hf'
|
23 |
-
PROVIDER_OLLAMA = 'ol'
|
24 |
-
PROVIDER_TOGETHER_AI = 'to'
|
25 |
-
VALID_PROVIDERS = {
|
26 |
-
PROVIDER_COHERE,
|
27 |
-
PROVIDER_GOOGLE_GEMINI,
|
28 |
-
PROVIDER_HUGGING_FACE,
|
29 |
-
PROVIDER_OLLAMA,
|
30 |
-
PROVIDER_TOGETHER_AI
|
31 |
-
}
|
32 |
-
VALID_MODELS = {
|
33 |
-
'[co]command-r-08-2024': {
|
34 |
-
'description': 'simpler, slower',
|
35 |
-
'max_new_tokens': 4096,
|
36 |
-
'paid': True,
|
37 |
-
},
|
38 |
-
'[gg]gemini-1.5-flash-002': {
|
39 |
-
'description': 'faster, detailed',
|
40 |
-
'max_new_tokens': 8192,
|
41 |
-
'paid': True,
|
42 |
-
},
|
43 |
-
'[gg]gemini-2.0-flash-exp': {
|
44 |
-
'description': 'fast, detailed',
|
45 |
-
'max_new_tokens': 8192,
|
46 |
-
'paid': True,
|
47 |
-
},
|
48 |
-
'[hf]mistralai/Mistral-7B-Instruct-v0.2': {
|
49 |
-
'description': 'faster, shorter',
|
50 |
-
'max_new_tokens': 8192,
|
51 |
-
'paid': False,
|
52 |
-
},
|
53 |
-
'[hf]mistralai/Mistral-Nemo-Instruct-2407': {
|
54 |
-
'description': 'longer response',
|
55 |
-
'max_new_tokens': 10240,
|
56 |
-
'paid': False,
|
57 |
-
},
|
58 |
-
'[to]meta-llama/Llama-3.3-70B-Instruct-Turbo': {
|
59 |
-
'description': 'detailed, slower',
|
60 |
-
'max_new_tokens': 4096,
|
61 |
-
'paid': True,
|
62 |
-
},
|
63 |
-
'[to]meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo-128K': {
|
64 |
-
'description': 'shorter, faster',
|
65 |
-
'max_new_tokens': 4096,
|
66 |
-
'paid': True,
|
67 |
-
},
|
68 |
-
}
|
69 |
-
LLM_PROVIDER_HELP = (
|
70 |
-
'LLM provider codes:\n\n'
|
71 |
-
'- **[co]**: Cohere\n'
|
72 |
-
'- **[gg]**: Google Gemini API\n'
|
73 |
-
'- **[hf]**: Hugging Face Inference API\n'
|
74 |
-
'- **[to]**: Together AI\n\n'
|
75 |
-
'[Find out more](https://github.com/barun-saha/slide-deck-ai?tab=readme-ov-file#summary-of-the-llms)'
|
76 |
-
)
|
77 |
-
DEFAULT_MODEL_INDEX = 3
|
78 |
-
LLM_MODEL_TEMPERATURE = 0.2
|
79 |
-
LLM_MODEL_MIN_OUTPUT_LENGTH = 100
|
80 |
-
LLM_MODEL_MAX_INPUT_LENGTH = 400 # characters
|
81 |
|
82 |
HUGGINGFACEHUB_API_TOKEN = os.environ.get('HUGGINGFACEHUB_API_TOKEN', '')
|
|
|
83 |
|
84 |
LOG_LEVEL = 'DEBUG'
|
85 |
-
COUNT_TOKENS = False
|
86 |
APP_STRINGS_FILE = 'strings.json'
|
87 |
PRELOAD_DATA_FILE = 'examples/example_02.json'
|
88 |
SLIDES_TEMPLATE_FILE = 'langchain_templates/template_combined.txt'
|
89 |
-
|
90 |
-
REFINEMENT_PROMPT_TEMPLATE = 'langchain_templates/chat_prompts/refinement_template_v4_two_cols_img.txt'
|
91 |
-
|
92 |
-
LLM_PROGRESS_MAX = 90
|
93 |
-
ICONS_DIR = 'icons/png128/'
|
94 |
-
TINY_BERT_MODEL = 'gaunernst/bert-mini-uncased'
|
95 |
-
EMBEDDINGS_FILE_NAME = 'file_embeddings/embeddings.npy'
|
96 |
-
ICONS_FILE_NAME = 'file_embeddings/icons.npy'
|
97 |
|
98 |
PPTX_TEMPLATE_FILES = {
|
99 |
-
'
|
100 |
'file': 'pptx_templates/Blank.pptx',
|
101 |
-
'caption': 'A good start
|
102 |
},
|
103 |
'Ion Boardroom': {
|
104 |
'file': 'pptx_templates/Ion_Boardroom.pptx',
|
105 |
-
'caption': 'Make some bold decisions
|
106 |
-
},
|
107 |
-
'Minimalist Sales Pitch': {
|
108 |
-
'file': 'pptx_templates/Minimalist_sales_pitch.pptx',
|
109 |
-
'caption': 'In high contrast ⬛'
|
110 |
},
|
111 |
'Urban Monochrome': {
|
112 |
'file': 'pptx_templates/Urban_monochrome.pptx',
|
113 |
-
'caption': 'Marvel in a monochrome dream
|
114 |
-
}
|
115 |
}
|
116 |
-
|
117 |
-
# This is a long text, so not incorporated as a string in `strings.json`
|
118 |
-
CHAT_USAGE_INSTRUCTIONS = (
|
119 |
-
'Briefly describe your topic of presentation in the textbox provided below. For example:\n'
|
120 |
-
'- Make a slide deck on AI.'
|
121 |
-
'\n\n'
|
122 |
-
'Subsequently, you can add follow-up instructions, e.g.:\n'
|
123 |
-
'- Can you add a slide on GPUs?'
|
124 |
-
'\n\n'
|
125 |
-
' You can also ask it to refine any particular slide, e.g.:\n'
|
126 |
-
'- Make the slide with title \'Examples of AI\' a bit more descriptive.'
|
127 |
-
'\n\n'
|
128 |
-
'Finally, click on the download button at the bottom to download the slide deck.'
|
129 |
-
' See this [demo video](https://youtu.be/QvAKzNKtk9k) for a brief walkthrough.\n\n'
|
130 |
-
'Currently, three LLMs providers and four LLMs are supported:'
|
131 |
-
' **Mistral 7B Instruct v0.2** and **Mistral Nemo Instruct 2407** via Hugging Face'
|
132 |
-
' Inference Endpoint; **Gemini 1.5 Flash** via Gemini API; and **Command R+** via Cohere'
|
133 |
-
' API. If one is not available, choose the other from the dropdown list. A [summary of'
|
134 |
-
' the supported LLMs]('
|
135 |
-
'https://github.com/barun-saha/slide-deck-ai/blob/main/README.md#summary-of-the-llms)'
|
136 |
-
' is available for reference.\n\n'
|
137 |
-
' SlideDeck AI does not have access to the Web, apart for searching for images relevant'
|
138 |
-
' to the slides. Photos are added probabilistically; transparency needs to be changed'
|
139 |
-
' manually, if required.\n\n'
|
140 |
-
'[SlideDeck AI](https://github.com/barun-saha/slide-deck-ai) is an Open-Source project,'
|
141 |
-
' released under the'
|
142 |
-
' [MIT license](https://github.com/barun-saha/slide-deck-ai?tab=MIT-1-ov-file#readme).'
|
143 |
-
'\n\n---\n\n'
|
144 |
-
'© Copyright 2023-2024 Barun Saha.\n\n'
|
145 |
-
)
|
146 |
-
|
147 |
-
|
148 |
-
logging.basicConfig(
|
149 |
-
level=GlobalConfig.LOG_LEVEL,
|
150 |
-
format='%(asctime)s - %(levelname)s - %(name)s - %(message)s',
|
151 |
-
datefmt='%Y-%m-%d %H:%M:%S'
|
152 |
-
)
|
153 |
-
|
154 |
-
|
155 |
-
def get_max_output_tokens(llm_name: str) -> int:
|
156 |
-
"""
|
157 |
-
Get the max output tokens value configured for an LLM. Return a default value if not configured.
|
158 |
-
|
159 |
-
:param llm_name: The name of the LLM.
|
160 |
-
:return: Max output tokens or a default count.
|
161 |
-
"""
|
162 |
-
|
163 |
-
try:
|
164 |
-
return GlobalConfig.VALID_MODELS[llm_name]['max_new_tokens']
|
165 |
-
except KeyError:
|
166 |
-
return 2048
|
|
|
|
|
|
|
|
|
|
|
1 |
import os
|
2 |
|
3 |
from dataclasses import dataclass
|
|
|
9 |
|
10 |
@dataclass(frozen=True)
|
11 |
class GlobalConfig:
|
12 |
+
HF_LLM_MODEL_NAME = 'mistralai/Mistral-7B-Instruct-v0.2'
|
13 |
+
LLM_MODEL_TEMPERATURE: float = 0.2
|
14 |
+
LLM_MODEL_MIN_OUTPUT_LENGTH: int = 50
|
15 |
+
LLM_MODEL_MAX_OUTPUT_LENGTH: int = 2000
|
16 |
+
LLM_MODEL_MAX_INPUT_LENGTH: int = 300
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
HUGGINGFACEHUB_API_TOKEN = os.environ.get('HUGGINGFACEHUB_API_TOKEN', '')
|
19 |
+
METAPHOR_API_KEY = os.environ.get('METAPHOR_API_KEY', '')
|
20 |
|
21 |
LOG_LEVEL = 'DEBUG'
|
|
|
22 |
APP_STRINGS_FILE = 'strings.json'
|
23 |
PRELOAD_DATA_FILE = 'examples/example_02.json'
|
24 |
SLIDES_TEMPLATE_FILE = 'langchain_templates/template_combined.txt'
|
25 |
+
JSON_TEMPLATE_FILE = 'langchain_templates/text_to_json_template_02.txt'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
|
27 |
PPTX_TEMPLATE_FILES = {
|
28 |
+
'Blank': {
|
29 |
'file': 'pptx_templates/Blank.pptx',
|
30 |
+
'caption': 'A good start'
|
31 |
},
|
32 |
'Ion Boardroom': {
|
33 |
'file': 'pptx_templates/Ion_Boardroom.pptx',
|
34 |
+
'caption': 'Make some bold decisions'
|
|
|
|
|
|
|
|
|
35 |
},
|
36 |
'Urban Monochrome': {
|
37 |
'file': 'pptx_templates/Urban_monochrome.pptx',
|
38 |
+
'caption': 'Marvel in a monochrome dream'
|
39 |
+
}
|
40 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
helpers/__init__.py
DELETED
File without changes
|
helpers/icons_embeddings.py
DELETED
@@ -1,166 +0,0 @@
|
|
1 |
-
"""
|
2 |
-
Generate and save the embeddings of a pre-defined list of icons.
|
3 |
-
Compare them with keywords embeddings to find most relevant icons.
|
4 |
-
"""
|
5 |
-
import os
|
6 |
-
import pathlib
|
7 |
-
import sys
|
8 |
-
from typing import List, Tuple
|
9 |
-
|
10 |
-
import numpy as np
|
11 |
-
from sklearn.metrics.pairwise import cosine_similarity
|
12 |
-
from transformers import BertTokenizer, BertModel
|
13 |
-
|
14 |
-
sys.path.append('..')
|
15 |
-
sys.path.append('../..')
|
16 |
-
|
17 |
-
from global_config import GlobalConfig
|
18 |
-
|
19 |
-
|
20 |
-
tokenizer = BertTokenizer.from_pretrained(GlobalConfig.TINY_BERT_MODEL)
|
21 |
-
model = BertModel.from_pretrained(GlobalConfig.TINY_BERT_MODEL)
|
22 |
-
|
23 |
-
|
24 |
-
def get_icons_list() -> List[str]:
|
25 |
-
"""
|
26 |
-
Get a list of available icons.
|
27 |
-
|
28 |
-
:return: The icons file names.
|
29 |
-
"""
|
30 |
-
|
31 |
-
items = pathlib.Path('../' + GlobalConfig.ICONS_DIR).glob('*.png')
|
32 |
-
items = [
|
33 |
-
os.path.basename(str(item)).removesuffix('.png') for item in items
|
34 |
-
]
|
35 |
-
|
36 |
-
return items
|
37 |
-
|
38 |
-
|
39 |
-
def get_embeddings(texts) -> np.ndarray:
|
40 |
-
"""
|
41 |
-
Generate embeddings for a list of texts using a pre-trained language model.
|
42 |
-
|
43 |
-
:param texts: A string or a list of strings to be converted into embeddings.
|
44 |
-
:type texts: Union[str, List[str]]
|
45 |
-
:return: A NumPy array containing the embeddings for the input texts.
|
46 |
-
:rtype: numpy.ndarray
|
47 |
-
|
48 |
-
:raises ValueError: If the input is not a string or a list of strings, or if any element
|
49 |
-
in the list is not a string.
|
50 |
-
|
51 |
-
Example usage:
|
52 |
-
>>> keyword = 'neural network'
|
53 |
-
>>> file_names = ['neural_network_icon.png', 'data_analysis_icon.png', 'machine_learning.png']
|
54 |
-
>>> keyword_embeddings = get_embeddings(keyword)
|
55 |
-
>>> file_name_embeddings = get_embeddings(file_names)
|
56 |
-
"""
|
57 |
-
|
58 |
-
inputs = tokenizer(texts, return_tensors='pt', padding=True, max_length=128, truncation=True)
|
59 |
-
outputs = model(**inputs)
|
60 |
-
|
61 |
-
return outputs.last_hidden_state.mean(dim=1).detach().numpy()
|
62 |
-
|
63 |
-
|
64 |
-
def save_icons_embeddings():
|
65 |
-
"""
|
66 |
-
Generate and save the embeddings for the icon file names.
|
67 |
-
"""
|
68 |
-
|
69 |
-
file_names = get_icons_list()
|
70 |
-
print(f'{len(file_names)} icon files available...')
|
71 |
-
file_name_embeddings = get_embeddings(file_names)
|
72 |
-
print(f'file_name_embeddings.shape: {file_name_embeddings.shape}')
|
73 |
-
|
74 |
-
# Save embeddings to a file
|
75 |
-
np.save(GlobalConfig.EMBEDDINGS_FILE_NAME, file_name_embeddings)
|
76 |
-
np.save(GlobalConfig.ICONS_FILE_NAME, file_names) # Save file names for reference
|
77 |
-
|
78 |
-
|
79 |
-
def load_saved_embeddings() -> Tuple[np.ndarray, np.ndarray]:
|
80 |
-
"""
|
81 |
-
Load precomputed embeddings and icons file names.
|
82 |
-
|
83 |
-
:return: The embeddings and the icon file names.
|
84 |
-
"""
|
85 |
-
|
86 |
-
file_name_embeddings = np.load(GlobalConfig.EMBEDDINGS_FILE_NAME)
|
87 |
-
file_names = np.load(GlobalConfig.ICONS_FILE_NAME)
|
88 |
-
|
89 |
-
return file_name_embeddings, file_names
|
90 |
-
|
91 |
-
|
92 |
-
def find_icons(keywords: List[str]) -> List[str]:
|
93 |
-
"""
|
94 |
-
Find relevant icon file names for a list of keywords.
|
95 |
-
|
96 |
-
:param keywords: The list of one or more keywords.
|
97 |
-
:return: A list of the file names relevant for each keyword.
|
98 |
-
"""
|
99 |
-
|
100 |
-
keyword_embeddings = get_embeddings(keywords)
|
101 |
-
file_name_embeddings, file_names = load_saved_embeddings()
|
102 |
-
|
103 |
-
# Compute similarity
|
104 |
-
similarities = cosine_similarity(keyword_embeddings, file_name_embeddings)
|
105 |
-
icon_files = file_names[np.argmax(similarities, axis=-1)]
|
106 |
-
|
107 |
-
return icon_files
|
108 |
-
|
109 |
-
|
110 |
-
def main():
|
111 |
-
"""
|
112 |
-
Example usage.
|
113 |
-
"""
|
114 |
-
|
115 |
-
# Run this again if icons are to be added/removed
|
116 |
-
save_icons_embeddings()
|
117 |
-
|
118 |
-
keywords = [
|
119 |
-
'deep learning',
|
120 |
-
'',
|
121 |
-
'recycling',
|
122 |
-
'handshake',
|
123 |
-
'Ferry',
|
124 |
-
'rain drop',
|
125 |
-
'speech bubble',
|
126 |
-
'mental resilience',
|
127 |
-
'turmeric',
|
128 |
-
'Art',
|
129 |
-
'price tag',
|
130 |
-
'Oxygen',
|
131 |
-
'oxygen',
|
132 |
-
'Social Connection',
|
133 |
-
'Accomplishment',
|
134 |
-
'Python',
|
135 |
-
'XML',
|
136 |
-
'Handshake',
|
137 |
-
]
|
138 |
-
icon_files = find_icons(keywords)
|
139 |
-
print(
|
140 |
-
f'The relevant icon files are:\n'
|
141 |
-
f'{list(zip(keywords, icon_files))}'
|
142 |
-
)
|
143 |
-
|
144 |
-
# BERT tiny:
|
145 |
-
# [('deep learning', 'deep-learning'), ('', '123'), ('recycling', 'refinery'),
|
146 |
-
# ('handshake', 'dash-circle'), ('Ferry', 'cart'), ('rain drop', 'bucket'),
|
147 |
-
# ('speech bubble', 'globe'), ('mental resilience', 'exclamation-triangle'),
|
148 |
-
# ('turmeric', 'kebab'), ('Art', 'display'), ('price tag', 'bug-fill'),
|
149 |
-
# ('Oxygen', 'radioactive')]
|
150 |
-
|
151 |
-
# BERT mini
|
152 |
-
# [('deep learning', 'deep-learning'), ('', 'compass'), ('recycling', 'tools'),
|
153 |
-
# ('handshake', 'bandaid'), ('Ferry', 'cart'), ('rain drop', 'trash'),
|
154 |
-
# ('speech bubble', 'image'), ('mental resilience', 'recycle'), ('turmeric', 'linkedin'),
|
155 |
-
# ('Art', 'book'), ('price tag', 'card-image'), ('Oxygen', 'radioactive')]
|
156 |
-
|
157 |
-
# BERT small
|
158 |
-
# [('deep learning', 'deep-learning'), ('', 'gem'), ('recycling', 'tools'),
|
159 |
-
# ('handshake', 'handbag'), ('Ferry', 'truck'), ('rain drop', 'bucket'),
|
160 |
-
# ('speech bubble', 'strategy'), ('mental resilience', 'deep-learning'),
|
161 |
-
# ('turmeric', 'flower'),
|
162 |
-
# ('Art', 'book'), ('price tag', 'hotdog'), ('Oxygen', 'radioactive')]
|
163 |
-
|
164 |
-
|
165 |
-
if __name__ == '__main__':
|
166 |
-
main()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
helpers/image_search.py
DELETED
@@ -1,148 +0,0 @@
|
|
1 |
-
"""
|
2 |
-
Search photos using Pexels API.
|
3 |
-
"""
|
4 |
-
import logging
|
5 |
-
import os
|
6 |
-
import random
|
7 |
-
from io import BytesIO
|
8 |
-
from typing import Union, Tuple, Literal
|
9 |
-
from urllib.parse import urlparse, parse_qs
|
10 |
-
|
11 |
-
import requests
|
12 |
-
from dotenv import load_dotenv
|
13 |
-
|
14 |
-
|
15 |
-
load_dotenv()
|
16 |
-
|
17 |
-
|
18 |
-
REQUEST_TIMEOUT = 12
|
19 |
-
MAX_PHOTOS = 3
|
20 |
-
|
21 |
-
|
22 |
-
# Only show errors
|
23 |
-
logging.getLogger('urllib3').setLevel(logging.ERROR)
|
24 |
-
# Disable all child loggers of urllib3, e.g. urllib3.connectionpool
|
25 |
-
# logging.getLogger('urllib3').propagate = True
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
def search_pexels(
|
30 |
-
query: str,
|
31 |
-
size: Literal['small', 'medium', 'large'] = 'medium',
|
32 |
-
per_page: int = MAX_PHOTOS
|
33 |
-
) -> dict:
|
34 |
-
"""
|
35 |
-
Searches for images on Pexels using the provided query.
|
36 |
-
|
37 |
-
This function sends a GET request to the Pexels API with the specified search query
|
38 |
-
and authorization header containing the API key. It returns the JSON response from the API.
|
39 |
-
|
40 |
-
[2024-08-31] Note:
|
41 |
-
`curl` succeeds but API call via Python `requests` fail. Apparently, this could be due to
|
42 |
-
Cloudflare (or others) blocking the requests, perhaps identifying as Web-scraping. So,
|
43 |
-
changing the user-agent to Firefox.
|
44 |
-
https://stackoverflow.com/a/74674276/147021
|
45 |
-
https://stackoverflow.com/a/51268523/147021
|
46 |
-
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent/Firefox#linux
|
47 |
-
|
48 |
-
:param query: The search query for finding images.
|
49 |
-
:param size: The size of the images: small, medium, or large.
|
50 |
-
:param per_page: No. of results to be displayed per page.
|
51 |
-
:return: The JSON response from the Pexels API containing search results.
|
52 |
-
:raises requests.exceptions.RequestException: If the request to the Pexels API fails.
|
53 |
-
"""
|
54 |
-
|
55 |
-
url = 'https://api.pexels.com/v1/search'
|
56 |
-
headers = {
|
57 |
-
'Authorization': os.getenv('PEXEL_API_KEY'),
|
58 |
-
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20100101 Firefox/10.0',
|
59 |
-
}
|
60 |
-
params = {
|
61 |
-
'query': query,
|
62 |
-
'size': size,
|
63 |
-
'page': 1,
|
64 |
-
'per_page': per_page
|
65 |
-
}
|
66 |
-
response = requests.get(url, headers=headers, params=params, timeout=REQUEST_TIMEOUT)
|
67 |
-
response.raise_for_status() # Ensure the request was successful
|
68 |
-
|
69 |
-
return response.json()
|
70 |
-
|
71 |
-
|
72 |
-
def get_photo_url_from_api_response(
|
73 |
-
json_response: dict
|
74 |
-
) -> Tuple[Union[str, None], Union[str, None]]:
|
75 |
-
"""
|
76 |
-
Return a randomly chosen photo from a Pexels search API response. In addition, also return
|
77 |
-
the original URL of the page on Pexels.
|
78 |
-
|
79 |
-
:param json_response: The JSON response.
|
80 |
-
:return: The selected photo URL and page URL or `None`.
|
81 |
-
"""
|
82 |
-
|
83 |
-
page_url = None
|
84 |
-
photo_url = None
|
85 |
-
|
86 |
-
if 'photos' in json_response:
|
87 |
-
photos = json_response['photos']
|
88 |
-
|
89 |
-
if photos:
|
90 |
-
photo_idx = random.choice(list(range(MAX_PHOTOS)))
|
91 |
-
photo = photos[photo_idx]
|
92 |
-
|
93 |
-
if 'url' in photo:
|
94 |
-
page_url = photo['url']
|
95 |
-
|
96 |
-
if 'src' in photo:
|
97 |
-
if 'large' in photo['src']:
|
98 |
-
photo_url = photo['src']['large']
|
99 |
-
elif 'original' in photo['src']:
|
100 |
-
photo_url = photo['src']['original']
|
101 |
-
|
102 |
-
return photo_url, page_url
|
103 |
-
|
104 |
-
|
105 |
-
def get_image_from_url(url: str) -> BytesIO:
|
106 |
-
"""
|
107 |
-
Fetches an image from the specified URL and returns it as a BytesIO object.
|
108 |
-
|
109 |
-
This function sends a GET request to the provided URL, retrieves the image data,
|
110 |
-
and wraps it in a BytesIO object, which can be used like a file.
|
111 |
-
|
112 |
-
:param url: The URL of the image to be fetched.
|
113 |
-
:return: A BytesIO object containing the image data.
|
114 |
-
:raises requests.exceptions.RequestException: If the request to the URL fails.
|
115 |
-
"""
|
116 |
-
|
117 |
-
headers = {
|
118 |
-
'Authorization': os.getenv('PEXEL_API_KEY'),
|
119 |
-
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20100101 Firefox/10.0',
|
120 |
-
}
|
121 |
-
response = requests.get(url, headers=headers, stream=True, timeout=REQUEST_TIMEOUT)
|
122 |
-
response.raise_for_status()
|
123 |
-
image_data = BytesIO(response.content)
|
124 |
-
|
125 |
-
return image_data
|
126 |
-
|
127 |
-
|
128 |
-
def extract_dimensions(url: str) -> Tuple[int, int]:
|
129 |
-
"""
|
130 |
-
Extracts the height and width from the URL parameters.
|
131 |
-
|
132 |
-
:param url: The URL containing the image dimensions.
|
133 |
-
:return: A tuple containing the width and height as integers.
|
134 |
-
"""
|
135 |
-
parsed_url = urlparse(url)
|
136 |
-
query_params = parse_qs(parsed_url.query)
|
137 |
-
width = int(query_params.get('w', [0])[0])
|
138 |
-
height = int(query_params.get('h', [0])[0])
|
139 |
-
|
140 |
-
return width, height
|
141 |
-
|
142 |
-
|
143 |
-
if __name__ == '__main__':
|
144 |
-
print(
|
145 |
-
search_pexels(
|
146 |
-
query='people'
|
147 |
-
)
|
148 |
-
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
helpers/llm_helper.py
DELETED
@@ -1,201 +0,0 @@
|
|
1 |
-
"""
|
2 |
-
Helper functions to access LLMs.
|
3 |
-
"""
|
4 |
-
import logging
|
5 |
-
import re
|
6 |
-
import sys
|
7 |
-
from typing import Tuple, Union
|
8 |
-
|
9 |
-
import requests
|
10 |
-
from requests.adapters import HTTPAdapter
|
11 |
-
from urllib3.util import Retry
|
12 |
-
from langchain_core.language_models import BaseLLM
|
13 |
-
|
14 |
-
sys.path.append('..')
|
15 |
-
|
16 |
-
from global_config import GlobalConfig
|
17 |
-
|
18 |
-
|
19 |
-
LLM_PROVIDER_MODEL_REGEX = re.compile(r'\[(.*?)\](.*)')
|
20 |
-
OLLAMA_MODEL_REGEX = re.compile(r'[a-zA-Z0-9._:-]+$')
|
21 |
-
# 6-64 characters long, only containing alphanumeric characters, hyphens, and underscores
|
22 |
-
API_KEY_REGEX = re.compile(r'^[a-zA-Z0-9_-]{6,64}$')
|
23 |
-
HF_API_HEADERS = {'Authorization': f'Bearer {GlobalConfig.HUGGINGFACEHUB_API_TOKEN}'}
|
24 |
-
REQUEST_TIMEOUT = 35
|
25 |
-
|
26 |
-
logger = logging.getLogger(__name__)
|
27 |
-
logging.getLogger('httpx').setLevel(logging.WARNING)
|
28 |
-
logging.getLogger('httpcore').setLevel(logging.WARNING)
|
29 |
-
|
30 |
-
retries = Retry(
|
31 |
-
total=5,
|
32 |
-
backoff_factor=0.25,
|
33 |
-
backoff_jitter=0.3,
|
34 |
-
status_forcelist=[502, 503, 504],
|
35 |
-
allowed_methods={'POST'},
|
36 |
-
)
|
37 |
-
adapter = HTTPAdapter(max_retries=retries)
|
38 |
-
http_session = requests.Session()
|
39 |
-
http_session.mount('https://', adapter)
|
40 |
-
http_session.mount('http://', adapter)
|
41 |
-
|
42 |
-
|
43 |
-
def get_provider_model(provider_model: str, use_ollama: bool) -> Tuple[str, str]:
|
44 |
-
"""
|
45 |
-
Parse and get LLM provider and model name from strings like `[provider]model/name-version`.
|
46 |
-
|
47 |
-
:param provider_model: The provider, model name string from `GlobalConfig`.
|
48 |
-
:param use_ollama: Whether Ollama is used (i.e., running in offline mode).
|
49 |
-
:return: The provider and the model name; empty strings in case no matching pattern found.
|
50 |
-
"""
|
51 |
-
|
52 |
-
provider_model = provider_model.strip()
|
53 |
-
|
54 |
-
if use_ollama:
|
55 |
-
match = OLLAMA_MODEL_REGEX.match(provider_model)
|
56 |
-
if match:
|
57 |
-
return GlobalConfig.PROVIDER_OLLAMA, match.group(0)
|
58 |
-
else:
|
59 |
-
match = LLM_PROVIDER_MODEL_REGEX.match(provider_model)
|
60 |
-
|
61 |
-
if match:
|
62 |
-
inside_brackets = match.group(1)
|
63 |
-
outside_brackets = match.group(2)
|
64 |
-
return inside_brackets, outside_brackets
|
65 |
-
|
66 |
-
return '', ''
|
67 |
-
|
68 |
-
|
69 |
-
def is_valid_llm_provider_model(provider: str, model: str, api_key: str) -> bool:
|
70 |
-
"""
|
71 |
-
Verify whether LLM settings are proper.
|
72 |
-
This function does not verify whether `api_key` is correct. It only confirms that the key has
|
73 |
-
at least five characters. Key verification is done when the LLM is created.
|
74 |
-
|
75 |
-
:param provider: Name of the LLM provider.
|
76 |
-
:param model: Name of the model.
|
77 |
-
:param api_key: The API key or access token.
|
78 |
-
:return: `True` if the settings "look" OK; `False` otherwise.
|
79 |
-
"""
|
80 |
-
|
81 |
-
if not provider or not model or provider not in GlobalConfig.VALID_PROVIDERS:
|
82 |
-
return False
|
83 |
-
|
84 |
-
if provider in [
|
85 |
-
GlobalConfig.PROVIDER_GOOGLE_GEMINI,
|
86 |
-
GlobalConfig.PROVIDER_COHERE,
|
87 |
-
GlobalConfig.PROVIDER_TOGETHER_AI,
|
88 |
-
] and not api_key:
|
89 |
-
return False
|
90 |
-
|
91 |
-
if api_key:
|
92 |
-
return API_KEY_REGEX.match(api_key) is not None
|
93 |
-
|
94 |
-
return True
|
95 |
-
|
96 |
-
|
97 |
-
def get_langchain_llm(
|
98 |
-
provider: str,
|
99 |
-
model: str,
|
100 |
-
max_new_tokens: int,
|
101 |
-
api_key: str = ''
|
102 |
-
) -> Union[BaseLLM, None]:
|
103 |
-
"""
|
104 |
-
Get an LLM based on the provider and model specified.
|
105 |
-
|
106 |
-
:param provider: The LLM provider. Valid values are `hf` for Hugging Face.
|
107 |
-
:param model: The name of the LLM.
|
108 |
-
:param max_new_tokens: The maximum number of tokens to generate.
|
109 |
-
:param api_key: API key or access token to use.
|
110 |
-
:return: An instance of the LLM or `None` in case of any error.
|
111 |
-
"""
|
112 |
-
|
113 |
-
if provider == GlobalConfig.PROVIDER_HUGGING_FACE:
|
114 |
-
from langchain_community.llms.huggingface_endpoint import HuggingFaceEndpoint
|
115 |
-
|
116 |
-
logger.debug('Getting LLM via HF endpoint: %s', model)
|
117 |
-
return HuggingFaceEndpoint(
|
118 |
-
repo_id=model,
|
119 |
-
max_new_tokens=max_new_tokens,
|
120 |
-
top_k=40,
|
121 |
-
top_p=0.95,
|
122 |
-
temperature=GlobalConfig.LLM_MODEL_TEMPERATURE,
|
123 |
-
repetition_penalty=1.03,
|
124 |
-
streaming=True,
|
125 |
-
huggingfacehub_api_token=api_key or GlobalConfig.HUGGINGFACEHUB_API_TOKEN,
|
126 |
-
return_full_text=False,
|
127 |
-
stop_sequences=['</s>'],
|
128 |
-
)
|
129 |
-
|
130 |
-
if provider == GlobalConfig.PROVIDER_GOOGLE_GEMINI:
|
131 |
-
from google.generativeai.types.safety_types import HarmBlockThreshold, HarmCategory
|
132 |
-
from langchain_google_genai import GoogleGenerativeAI
|
133 |
-
|
134 |
-
logger.debug('Getting LLM via Google Gemini: %s', model)
|
135 |
-
return GoogleGenerativeAI(
|
136 |
-
model=model,
|
137 |
-
temperature=GlobalConfig.LLM_MODEL_TEMPERATURE,
|
138 |
-
max_tokens=max_new_tokens,
|
139 |
-
timeout=None,
|
140 |
-
max_retries=2,
|
141 |
-
google_api_key=api_key,
|
142 |
-
safety_settings={
|
143 |
-
HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT:
|
144 |
-
HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
|
145 |
-
HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
|
146 |
-
HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
|
147 |
-
HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT:
|
148 |
-
HarmBlockThreshold.BLOCK_LOW_AND_ABOVE
|
149 |
-
}
|
150 |
-
)
|
151 |
-
|
152 |
-
if provider == GlobalConfig.PROVIDER_COHERE:
|
153 |
-
from langchain_cohere.llms import Cohere
|
154 |
-
|
155 |
-
logger.debug('Getting LLM via Cohere: %s', model)
|
156 |
-
return Cohere(
|
157 |
-
temperature=GlobalConfig.LLM_MODEL_TEMPERATURE,
|
158 |
-
max_tokens=max_new_tokens,
|
159 |
-
timeout_seconds=None,
|
160 |
-
max_retries=2,
|
161 |
-
cohere_api_key=api_key,
|
162 |
-
streaming=True,
|
163 |
-
)
|
164 |
-
|
165 |
-
if provider == GlobalConfig.PROVIDER_TOGETHER_AI:
|
166 |
-
from langchain_together import Together
|
167 |
-
|
168 |
-
logger.debug('Getting LLM via Together AI: %s', model)
|
169 |
-
return Together(
|
170 |
-
model=model,
|
171 |
-
temperature=GlobalConfig.LLM_MODEL_TEMPERATURE,
|
172 |
-
together_api_key=api_key,
|
173 |
-
max_tokens=max_new_tokens,
|
174 |
-
top_k=40,
|
175 |
-
top_p=0.90,
|
176 |
-
)
|
177 |
-
|
178 |
-
if provider == GlobalConfig.PROVIDER_OLLAMA:
|
179 |
-
from langchain_ollama.llms import OllamaLLM
|
180 |
-
|
181 |
-
logger.debug('Getting LLM via Ollama: %s', model)
|
182 |
-
return OllamaLLM(
|
183 |
-
model=model,
|
184 |
-
temperature=GlobalConfig.LLM_MODEL_TEMPERATURE,
|
185 |
-
num_predict=max_new_tokens,
|
186 |
-
format='json',
|
187 |
-
streaming=True,
|
188 |
-
)
|
189 |
-
|
190 |
-
return None
|
191 |
-
|
192 |
-
|
193 |
-
if __name__ == '__main__':
|
194 |
-
inputs = [
|
195 |
-
'[co]Cohere',
|
196 |
-
'[hf]mistralai/Mistral-7B-Instruct-v0.2',
|
197 |
-
'[gg]gemini-1.5-flash-002'
|
198 |
-
]
|
199 |
-
|
200 |
-
for text in inputs:
|
201 |
-
print(get_provider_model(text, use_ollama=False))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
helpers/pptx_helper.py
DELETED
@@ -1,987 +0,0 @@
|
|
1 |
-
"""
|
2 |
-
A set of functions to create a PowerPoint slide deck.
|
3 |
-
"""
|
4 |
-
import logging
|
5 |
-
import os
|
6 |
-
import pathlib
|
7 |
-
import random
|
8 |
-
import re
|
9 |
-
import sys
|
10 |
-
import tempfile
|
11 |
-
from typing import List, Tuple, Optional
|
12 |
-
|
13 |
-
import json5
|
14 |
-
import pptx
|
15 |
-
from dotenv import load_dotenv
|
16 |
-
from pptx.enum.shapes import MSO_AUTO_SHAPE_TYPE
|
17 |
-
from pptx.shapes.placeholder import PicturePlaceholder, SlidePlaceholder
|
18 |
-
|
19 |
-
sys.path.append('..')
|
20 |
-
sys.path.append('../..')
|
21 |
-
|
22 |
-
import helpers.icons_embeddings as ice
|
23 |
-
import helpers.image_search as ims
|
24 |
-
from global_config import GlobalConfig
|
25 |
-
|
26 |
-
|
27 |
-
load_dotenv()
|
28 |
-
|
29 |
-
|
30 |
-
# English Metric Unit (used by PowerPoint) to inches
|
31 |
-
EMU_TO_INCH_SCALING_FACTOR = 1.0 / 914400
|
32 |
-
INCHES_3 = pptx.util.Inches(3)
|
33 |
-
INCHES_2 = pptx.util.Inches(2)
|
34 |
-
INCHES_1_5 = pptx.util.Inches(1.5)
|
35 |
-
INCHES_1 = pptx.util.Inches(1)
|
36 |
-
INCHES_0_8 = pptx.util.Inches(0.8)
|
37 |
-
INCHES_0_9 = pptx.util.Inches(0.9)
|
38 |
-
INCHES_0_5 = pptx.util.Inches(0.5)
|
39 |
-
INCHES_0_4 = pptx.util.Inches(0.4)
|
40 |
-
INCHES_0_3 = pptx.util.Inches(0.3)
|
41 |
-
INCHES_0_2 = pptx.util.Inches(0.2)
|
42 |
-
|
43 |
-
STEP_BY_STEP_PROCESS_MARKER = '>> '
|
44 |
-
ICON_BEGINNING_MARKER = '[['
|
45 |
-
ICON_END_MARKER = ']]'
|
46 |
-
|
47 |
-
ICON_SIZE = INCHES_0_8
|
48 |
-
ICON_BG_SIZE = INCHES_1
|
49 |
-
|
50 |
-
IMAGE_DISPLAY_PROBABILITY = 1 / 3.0
|
51 |
-
FOREGROUND_IMAGE_PROBABILITY = 0.8
|
52 |
-
|
53 |
-
SLIDE_NUMBER_REGEX = re.compile(r"^slide[ ]+\d+:", re.IGNORECASE)
|
54 |
-
ICONS_REGEX = re.compile(r"\[\[(.*?)\]\]\s*(.*)")
|
55 |
-
|
56 |
-
ICON_COLORS = [
|
57 |
-
pptx.dml.color.RGBColor.from_string('800000'), # Maroon
|
58 |
-
pptx.dml.color.RGBColor.from_string('6A5ACD'), # SlateBlue
|
59 |
-
pptx.dml.color.RGBColor.from_string('556B2F'), # DarkOliveGreen
|
60 |
-
pptx.dml.color.RGBColor.from_string('2F4F4F'), # DarkSlateGray
|
61 |
-
pptx.dml.color.RGBColor.from_string('4682B4'), # SteelBlue
|
62 |
-
pptx.dml.color.RGBColor.from_string('5F9EA0'), # CadetBlue
|
63 |
-
]
|
64 |
-
|
65 |
-
|
66 |
-
logger = logging.getLogger(__name__)
|
67 |
-
logging.getLogger('PIL.PngImagePlugin').setLevel(logging.ERROR)
|
68 |
-
|
69 |
-
|
70 |
-
def remove_slide_number_from_heading(header: str) -> str:
|
71 |
-
"""
|
72 |
-
Remove the slide number from a given slide header.
|
73 |
-
|
74 |
-
:param header: The header of a slide.
|
75 |
-
:return: The header without slide number.
|
76 |
-
"""
|
77 |
-
|
78 |
-
if SLIDE_NUMBER_REGEX.match(header):
|
79 |
-
idx = header.find(':')
|
80 |
-
header = header[idx + 1:]
|
81 |
-
|
82 |
-
return header
|
83 |
-
|
84 |
-
|
85 |
-
def generate_powerpoint_presentation(
|
86 |
-
parsed_data: dict,
|
87 |
-
slides_template: str,
|
88 |
-
output_file_path: pathlib.Path
|
89 |
-
) -> List:
|
90 |
-
"""
|
91 |
-
Create and save a PowerPoint presentation file containing the content in JSON format.
|
92 |
-
|
93 |
-
:param parsed_data: The presentation content as parsed JSON data.
|
94 |
-
:param slides_template: The PPTX template to use.
|
95 |
-
:param output_file_path: The path of the PPTX file to save as.
|
96 |
-
:return: A list of presentation title and slides headers.
|
97 |
-
"""
|
98 |
-
|
99 |
-
presentation = pptx.Presentation(GlobalConfig.PPTX_TEMPLATE_FILES[slides_template]['file'])
|
100 |
-
slide_width_inch, slide_height_inch = _get_slide_width_height_inches(presentation)
|
101 |
-
|
102 |
-
# The title slide
|
103 |
-
title_slide_layout = presentation.slide_layouts[0]
|
104 |
-
slide = presentation.slides.add_slide(title_slide_layout)
|
105 |
-
title = slide.shapes.title
|
106 |
-
subtitle = slide.placeholders[1]
|
107 |
-
title.text = parsed_data['title']
|
108 |
-
logger.info(
|
109 |
-
'PPT title: %s | #slides: %d | template: %s',
|
110 |
-
title.text, len(parsed_data['slides']),
|
111 |
-
GlobalConfig.PPTX_TEMPLATE_FILES[slides_template]['file']
|
112 |
-
)
|
113 |
-
subtitle.text = 'by Myself and SlideDeck AI :)'
|
114 |
-
all_headers = [title.text, ]
|
115 |
-
|
116 |
-
# Add content in a loop
|
117 |
-
for a_slide in parsed_data['slides']:
|
118 |
-
try:
|
119 |
-
is_processing_done = _handle_icons_ideas(
|
120 |
-
presentation=presentation,
|
121 |
-
slide_json=a_slide,
|
122 |
-
slide_width_inch=slide_width_inch,
|
123 |
-
slide_height_inch=slide_height_inch
|
124 |
-
)
|
125 |
-
|
126 |
-
if not is_processing_done:
|
127 |
-
is_processing_done = _handle_double_col_layout(
|
128 |
-
presentation=presentation,
|
129 |
-
slide_json=a_slide,
|
130 |
-
slide_width_inch=slide_width_inch,
|
131 |
-
slide_height_inch=slide_height_inch
|
132 |
-
)
|
133 |
-
|
134 |
-
if not is_processing_done:
|
135 |
-
is_processing_done = _handle_step_by_step_process(
|
136 |
-
presentation=presentation,
|
137 |
-
slide_json=a_slide,
|
138 |
-
slide_width_inch=slide_width_inch,
|
139 |
-
slide_height_inch=slide_height_inch
|
140 |
-
)
|
141 |
-
|
142 |
-
if not is_processing_done:
|
143 |
-
_handle_default_display(
|
144 |
-
presentation=presentation,
|
145 |
-
slide_json=a_slide,
|
146 |
-
slide_width_inch=slide_width_inch,
|
147 |
-
slide_height_inch=slide_height_inch
|
148 |
-
)
|
149 |
-
|
150 |
-
except Exception:
|
151 |
-
# In case of any unforeseen error, try to salvage what is available
|
152 |
-
continue
|
153 |
-
|
154 |
-
# The thank-you slide
|
155 |
-
last_slide_layout = presentation.slide_layouts[0]
|
156 |
-
slide = presentation.slides.add_slide(last_slide_layout)
|
157 |
-
title = slide.shapes.title
|
158 |
-
title.text = 'Thank you!'
|
159 |
-
|
160 |
-
presentation.save(output_file_path)
|
161 |
-
|
162 |
-
return all_headers
|
163 |
-
|
164 |
-
|
165 |
-
def get_flat_list_of_contents(items: list, level: int) -> List[Tuple]:
|
166 |
-
"""
|
167 |
-
Flatten a (hierarchical) list of bullet points to a single list containing each item and
|
168 |
-
its level.
|
169 |
-
|
170 |
-
:param items: A bullet point (string or list).
|
171 |
-
:param level: The current level of hierarchy.
|
172 |
-
:return: A list of (bullet item text, hierarchical level) tuples.
|
173 |
-
"""
|
174 |
-
|
175 |
-
flat_list = []
|
176 |
-
|
177 |
-
for item in items:
|
178 |
-
if isinstance(item, str):
|
179 |
-
flat_list.append((item, level))
|
180 |
-
elif isinstance(item, list):
|
181 |
-
flat_list = flat_list + get_flat_list_of_contents(item, level + 1)
|
182 |
-
|
183 |
-
return flat_list
|
184 |
-
|
185 |
-
|
186 |
-
def get_slide_placeholders(
|
187 |
-
slide: pptx.slide.Slide,
|
188 |
-
layout_number: int,
|
189 |
-
is_debug: bool = False
|
190 |
-
) -> List[Tuple[int, str]]:
|
191 |
-
"""
|
192 |
-
Return the index and name (lower case) of all placeholders present in a slide, except
|
193 |
-
the title placeholder.
|
194 |
-
|
195 |
-
A placeholder in a slide is a place to add content. Each placeholder has a name and an index.
|
196 |
-
This index is NOT a list index, rather a set of keys used to look up a dict. So, `idx` is
|
197 |
-
non-contiguous. Also, the title placeholder of a slide always has index 0. User-added
|
198 |
-
placeholder get indices assigned starting from 10.
|
199 |
-
|
200 |
-
With user-edited or added placeholders, their index may be difficult to track. This function
|
201 |
-
returns the placeholders name as well, which could be useful to distinguish between the
|
202 |
-
different placeholder.
|
203 |
-
|
204 |
-
:param slide: The slide.
|
205 |
-
:param layout_number: The layout number used by the slide.
|
206 |
-
:param is_debug: Whether to print debugging statements.
|
207 |
-
:return: A list containing placeholders (idx, name) tuples, except the title placeholder.
|
208 |
-
"""
|
209 |
-
|
210 |
-
if is_debug:
|
211 |
-
print(
|
212 |
-
f'Slide layout #{layout_number}:'
|
213 |
-
f' # of placeholders: {len(slide.shapes.placeholders)} (including the title)'
|
214 |
-
)
|
215 |
-
|
216 |
-
placeholders = [
|
217 |
-
(shape.placeholder_format.idx, shape.name.lower()) for shape in slide.shapes.placeholders
|
218 |
-
]
|
219 |
-
placeholders.pop(0) # Remove the title placeholder
|
220 |
-
|
221 |
-
if is_debug:
|
222 |
-
print(placeholders)
|
223 |
-
|
224 |
-
return placeholders
|
225 |
-
|
226 |
-
|
227 |
-
def _handle_default_display(
|
228 |
-
presentation: pptx.Presentation,
|
229 |
-
slide_json: dict,
|
230 |
-
slide_width_inch: float,
|
231 |
-
slide_height_inch: float
|
232 |
-
):
|
233 |
-
"""
|
234 |
-
Display a list of text in a slide.
|
235 |
-
|
236 |
-
:param presentation: The presentation object.
|
237 |
-
:param slide_json: The content of the slide as JSON data.
|
238 |
-
:param slide_width_inch: The width of the slide in inches.
|
239 |
-
:param slide_height_inch: The height of the slide in inches.
|
240 |
-
"""
|
241 |
-
|
242 |
-
status = False
|
243 |
-
|
244 |
-
if 'img_keywords' in slide_json:
|
245 |
-
if random.random() < IMAGE_DISPLAY_PROBABILITY:
|
246 |
-
if random.random() < FOREGROUND_IMAGE_PROBABILITY:
|
247 |
-
status = _handle_display_image__in_foreground(
|
248 |
-
presentation,
|
249 |
-
slide_json,
|
250 |
-
slide_width_inch,
|
251 |
-
slide_height_inch
|
252 |
-
)
|
253 |
-
else:
|
254 |
-
status = _handle_display_image__in_background(
|
255 |
-
presentation,
|
256 |
-
slide_json,
|
257 |
-
slide_width_inch,
|
258 |
-
slide_height_inch
|
259 |
-
)
|
260 |
-
|
261 |
-
if status:
|
262 |
-
return
|
263 |
-
|
264 |
-
# Image display failed, so display only text
|
265 |
-
bullet_slide_layout = presentation.slide_layouts[1]
|
266 |
-
slide = presentation.slides.add_slide(bullet_slide_layout)
|
267 |
-
|
268 |
-
shapes = slide.shapes
|
269 |
-
title_shape = shapes.title
|
270 |
-
|
271 |
-
try:
|
272 |
-
body_shape = shapes.placeholders[1]
|
273 |
-
except KeyError:
|
274 |
-
placeholders = get_slide_placeholders(slide, layout_number=1)
|
275 |
-
body_shape = shapes.placeholders[placeholders[0][0]]
|
276 |
-
|
277 |
-
title_shape.text = remove_slide_number_from_heading(slide_json['heading'])
|
278 |
-
text_frame = body_shape.text_frame
|
279 |
-
|
280 |
-
# The bullet_points may contain a nested hierarchy of JSON arrays
|
281 |
-
# In some scenarios, it may contain objects (dictionaries) because the LLM generated so
|
282 |
-
# ^ The second scenario is not covered
|
283 |
-
|
284 |
-
flat_items_list = get_flat_list_of_contents(slide_json['bullet_points'], level=0)
|
285 |
-
|
286 |
-
for idx, an_item in enumerate(flat_items_list):
|
287 |
-
if idx == 0:
|
288 |
-
text_frame.text = an_item[0].removeprefix(STEP_BY_STEP_PROCESS_MARKER)
|
289 |
-
else:
|
290 |
-
paragraph = text_frame.add_paragraph()
|
291 |
-
paragraph.text = an_item[0].removeprefix(STEP_BY_STEP_PROCESS_MARKER)
|
292 |
-
paragraph.level = an_item[1]
|
293 |
-
|
294 |
-
_handle_key_message(
|
295 |
-
the_slide=slide,
|
296 |
-
slide_json=slide_json,
|
297 |
-
slide_height_inch=slide_height_inch,
|
298 |
-
slide_width_inch=slide_width_inch
|
299 |
-
)
|
300 |
-
|
301 |
-
|
302 |
-
def _handle_display_image__in_foreground(
|
303 |
-
presentation: pptx.Presentation(),
|
304 |
-
slide_json: dict,
|
305 |
-
slide_width_inch: float,
|
306 |
-
slide_height_inch: float
|
307 |
-
) -> bool:
|
308 |
-
"""
|
309 |
-
Create a slide with text and image using a picture placeholder layout. If not image keyword is
|
310 |
-
available, it will add only text to the slide.
|
311 |
-
|
312 |
-
:param presentation: The presentation object.
|
313 |
-
:param slide_json: The content of the slide as JSON data.
|
314 |
-
:param slide_width_inch: The width of the slide in inches.
|
315 |
-
:param slide_height_inch: The height of the slide in inches.
|
316 |
-
:return: True if the side has been processed.
|
317 |
-
"""
|
318 |
-
|
319 |
-
img_keywords = slide_json['img_keywords'].strip()
|
320 |
-
slide = presentation.slide_layouts[8] # Picture with Caption
|
321 |
-
slide = presentation.slides.add_slide(slide)
|
322 |
-
placeholders = None
|
323 |
-
|
324 |
-
title_placeholder = slide.shapes.title
|
325 |
-
title_placeholder.text = remove_slide_number_from_heading(slide_json['heading'])
|
326 |
-
|
327 |
-
try:
|
328 |
-
pic_col: PicturePlaceholder = slide.shapes.placeholders[1]
|
329 |
-
except KeyError:
|
330 |
-
placeholders = get_slide_placeholders(slide, layout_number=8)
|
331 |
-
pic_col = None
|
332 |
-
for idx, name in placeholders:
|
333 |
-
if 'picture' in name:
|
334 |
-
pic_col: PicturePlaceholder = slide.shapes.placeholders[idx]
|
335 |
-
|
336 |
-
try:
|
337 |
-
text_col: SlidePlaceholder = slide.shapes.placeholders[2]
|
338 |
-
except KeyError:
|
339 |
-
text_col = None
|
340 |
-
if not placeholders:
|
341 |
-
placeholders = get_slide_placeholders(slide, layout_number=8)
|
342 |
-
|
343 |
-
for idx, name in placeholders:
|
344 |
-
if 'content' in name:
|
345 |
-
text_col: SlidePlaceholder = slide.shapes.placeholders[idx]
|
346 |
-
|
347 |
-
flat_items_list = get_flat_list_of_contents(slide_json['bullet_points'], level=0)
|
348 |
-
|
349 |
-
for idx, an_item in enumerate(flat_items_list):
|
350 |
-
if idx == 0:
|
351 |
-
text_col.text_frame.text = an_item[0].removeprefix(STEP_BY_STEP_PROCESS_MARKER)
|
352 |
-
else:
|
353 |
-
paragraph = text_col.text_frame.add_paragraph()
|
354 |
-
paragraph.text = an_item[0].removeprefix(STEP_BY_STEP_PROCESS_MARKER)
|
355 |
-
paragraph.level = an_item[1]
|
356 |
-
|
357 |
-
if not img_keywords:
|
358 |
-
# No keywords, so no image search and addition
|
359 |
-
return True
|
360 |
-
|
361 |
-
try:
|
362 |
-
photo_url, page_url = ims.get_photo_url_from_api_response(
|
363 |
-
ims.search_pexels(query=img_keywords, size='medium')
|
364 |
-
)
|
365 |
-
|
366 |
-
if photo_url:
|
367 |
-
pic_col.insert_picture(
|
368 |
-
ims.get_image_from_url(photo_url)
|
369 |
-
)
|
370 |
-
|
371 |
-
_add_text_at_bottom(
|
372 |
-
slide=slide,
|
373 |
-
slide_width_inch=slide_width_inch,
|
374 |
-
slide_height_inch=slide_height_inch,
|
375 |
-
text='Photo provided by Pexels',
|
376 |
-
hyperlink=page_url
|
377 |
-
)
|
378 |
-
except Exception as ex:
|
379 |
-
logger.error(
|
380 |
-
'*** Error occurred while running adding image to slide: %s',
|
381 |
-
str(ex)
|
382 |
-
)
|
383 |
-
|
384 |
-
return True
|
385 |
-
|
386 |
-
|
387 |
-
def _handle_display_image__in_background(
|
388 |
-
presentation: pptx.Presentation(),
|
389 |
-
slide_json: dict,
|
390 |
-
slide_width_inch: float,
|
391 |
-
slide_height_inch: float
|
392 |
-
) -> bool:
|
393 |
-
"""
|
394 |
-
Add a slide with text and an image in the background. It works just like
|
395 |
-
`_handle_default_display()` but with a background image added. If not image keyword is
|
396 |
-
available, it will add only text to the slide.
|
397 |
-
|
398 |
-
:param presentation: The presentation object.
|
399 |
-
:param slide_json: The content of the slide as JSON data.
|
400 |
-
:param slide_width_inch: The width of the slide in inches.
|
401 |
-
:param slide_height_inch: The height of the slide in inches.
|
402 |
-
:return: True if the slide has been processed.
|
403 |
-
"""
|
404 |
-
|
405 |
-
img_keywords = slide_json['img_keywords'].strip()
|
406 |
-
|
407 |
-
# Add a photo in the background, text in the foreground
|
408 |
-
slide = presentation.slides.add_slide(presentation.slide_layouts[1])
|
409 |
-
title_shape = slide.shapes.title
|
410 |
-
|
411 |
-
try:
|
412 |
-
body_shape = slide.shapes.placeholders[1]
|
413 |
-
except KeyError:
|
414 |
-
placeholders = get_slide_placeholders(slide, layout_number=1)
|
415 |
-
# Layout 1 usually has two placeholders, including the title
|
416 |
-
body_shape = slide.shapes.placeholders[placeholders[0][0]]
|
417 |
-
|
418 |
-
title_shape.text = remove_slide_number_from_heading(slide_json['heading'])
|
419 |
-
|
420 |
-
flat_items_list = get_flat_list_of_contents(slide_json['bullet_points'], level=0)
|
421 |
-
|
422 |
-
for idx, an_item in enumerate(flat_items_list):
|
423 |
-
if idx == 0:
|
424 |
-
body_shape.text_frame.text = an_item[0].removeprefix(STEP_BY_STEP_PROCESS_MARKER)
|
425 |
-
else:
|
426 |
-
paragraph = body_shape.text_frame.add_paragraph()
|
427 |
-
paragraph.text = an_item[0].removeprefix(STEP_BY_STEP_PROCESS_MARKER)
|
428 |
-
paragraph.level = an_item[1]
|
429 |
-
|
430 |
-
if not img_keywords:
|
431 |
-
# No keywords, so no image search and addition
|
432 |
-
return True
|
433 |
-
|
434 |
-
try:
|
435 |
-
photo_url, page_url = ims.get_photo_url_from_api_response(
|
436 |
-
ims.search_pexels(query=img_keywords, size='large')
|
437 |
-
)
|
438 |
-
|
439 |
-
if photo_url:
|
440 |
-
picture = slide.shapes.add_picture(
|
441 |
-
image_file=ims.get_image_from_url(photo_url),
|
442 |
-
left=0,
|
443 |
-
top=0,
|
444 |
-
width=pptx.util.Inches(slide_width_inch),
|
445 |
-
)
|
446 |
-
|
447 |
-
_add_text_at_bottom(
|
448 |
-
slide=slide,
|
449 |
-
slide_width_inch=slide_width_inch,
|
450 |
-
slide_height_inch=slide_height_inch,
|
451 |
-
text='Photo provided by Pexels',
|
452 |
-
hyperlink=page_url
|
453 |
-
)
|
454 |
-
|
455 |
-
# Move picture to background
|
456 |
-
# https://github.com/scanny/python-pptx/issues/49#issuecomment-137172836
|
457 |
-
slide.shapes._spTree.remove(picture._element)
|
458 |
-
slide.shapes._spTree.insert(2, picture._element)
|
459 |
-
except Exception as ex:
|
460 |
-
logger.error(
|
461 |
-
'*** Error occurred while running adding image to the slide background: %s',
|
462 |
-
str(ex)
|
463 |
-
)
|
464 |
-
|
465 |
-
return True
|
466 |
-
|
467 |
-
|
468 |
-
def _handle_icons_ideas(
|
469 |
-
presentation: pptx.Presentation(),
|
470 |
-
slide_json: dict,
|
471 |
-
slide_width_inch: float,
|
472 |
-
slide_height_inch: float
|
473 |
-
):
|
474 |
-
"""
|
475 |
-
Add a slide with some icons and text.
|
476 |
-
If no suitable icons are found, the step numbers are shown.
|
477 |
-
|
478 |
-
:param presentation: The presentation object.
|
479 |
-
:param slide_json: The content of the slide as JSON data.
|
480 |
-
:param slide_width_inch: The width of the slide in inches.
|
481 |
-
:param slide_height_inch: The height of the slide in inches.
|
482 |
-
:return: True if the slide has been processed.
|
483 |
-
"""
|
484 |
-
|
485 |
-
if 'bullet_points' in slide_json and slide_json['bullet_points']:
|
486 |
-
items = slide_json['bullet_points']
|
487 |
-
|
488 |
-
# Ensure that it is a single list of strings without any sub-list
|
489 |
-
for step in items:
|
490 |
-
if not isinstance(step, str) or not step.startswith(ICON_BEGINNING_MARKER):
|
491 |
-
return False
|
492 |
-
|
493 |
-
slide_layout = presentation.slide_layouts[5]
|
494 |
-
slide = presentation.slides.add_slide(slide_layout)
|
495 |
-
slide.shapes.title.text = remove_slide_number_from_heading(slide_json['heading'])
|
496 |
-
|
497 |
-
n_items = len(items)
|
498 |
-
text_box_size = INCHES_2
|
499 |
-
|
500 |
-
# Calculate the total width of all pictures and the spacing
|
501 |
-
total_width = n_items * ICON_SIZE
|
502 |
-
spacing = (pptx.util.Inches(slide_width_inch) - total_width) / (n_items + 1)
|
503 |
-
top = INCHES_3
|
504 |
-
|
505 |
-
icons_texts = [
|
506 |
-
(match.group(1), match.group(2)) for match in [
|
507 |
-
ICONS_REGEX.search(item) for item in items
|
508 |
-
]
|
509 |
-
]
|
510 |
-
fallback_icon_files = ice.find_icons([item[0] for item in icons_texts])
|
511 |
-
|
512 |
-
for idx, item in enumerate(icons_texts):
|
513 |
-
icon, accompanying_text = item
|
514 |
-
icon_path = f'{GlobalConfig.ICONS_DIR}/{icon}.png'
|
515 |
-
|
516 |
-
if not os.path.exists(icon_path):
|
517 |
-
logger.warning(
|
518 |
-
'Icon not found: %s...using fallback icon: %s',
|
519 |
-
icon, fallback_icon_files[idx]
|
520 |
-
)
|
521 |
-
icon_path = f'{GlobalConfig.ICONS_DIR}/{fallback_icon_files[idx]}.png'
|
522 |
-
|
523 |
-
left = spacing + idx * (ICON_SIZE + spacing)
|
524 |
-
# Calculate the center position for alignment
|
525 |
-
center = left + ICON_SIZE / 2
|
526 |
-
|
527 |
-
# Add a rectangle shape with a fill color (background)
|
528 |
-
# The size of the shape is slightly bigger than the icon, so align the icon position
|
529 |
-
shape = slide.shapes.add_shape(
|
530 |
-
MSO_AUTO_SHAPE_TYPE.ROUNDED_RECTANGLE,
|
531 |
-
center - INCHES_0_5,
|
532 |
-
top - (ICON_BG_SIZE - ICON_SIZE) / 2,
|
533 |
-
INCHES_1, INCHES_1
|
534 |
-
)
|
535 |
-
shape.fill.solid()
|
536 |
-
shape.shadow.inherit = False
|
537 |
-
|
538 |
-
# Set the icon's background shape color
|
539 |
-
shape.fill.fore_color.rgb = shape.line.color.rgb = random.choice(ICON_COLORS)
|
540 |
-
|
541 |
-
# Add the icon image on top of the colored shape
|
542 |
-
slide.shapes.add_picture(icon_path, left, top, height=ICON_SIZE)
|
543 |
-
|
544 |
-
# Add a text box below the shape
|
545 |
-
text_box = slide.shapes.add_shape(
|
546 |
-
MSO_AUTO_SHAPE_TYPE.ROUNDED_RECTANGLE,
|
547 |
-
left=center - text_box_size / 2, # Center the text box horizontally
|
548 |
-
top=top + ICON_SIZE + INCHES_0_2,
|
549 |
-
width=text_box_size,
|
550 |
-
height=text_box_size
|
551 |
-
)
|
552 |
-
text_frame = text_box.text_frame
|
553 |
-
text_frame.text = accompanying_text
|
554 |
-
text_frame.word_wrap = True
|
555 |
-
text_frame.paragraphs[0].alignment = pptx.enum.text.PP_ALIGN.CENTER
|
556 |
-
|
557 |
-
# Center the text vertically
|
558 |
-
text_frame.vertical_anchor = pptx.enum.text.MSO_ANCHOR.MIDDLE
|
559 |
-
text_box.fill.background() # No fill
|
560 |
-
text_box.line.fill.background() # No line
|
561 |
-
text_box.shadow.inherit = False
|
562 |
-
|
563 |
-
# Set the font color based on the theme
|
564 |
-
for paragraph in text_frame.paragraphs:
|
565 |
-
for run in paragraph.runs:
|
566 |
-
run.font.color.theme_color = pptx.enum.dml.MSO_THEME_COLOR.TEXT_2
|
567 |
-
|
568 |
-
_add_text_at_bottom(
|
569 |
-
slide=slide,
|
570 |
-
slide_width_inch=slide_width_inch,
|
571 |
-
slide_height_inch=slide_height_inch,
|
572 |
-
text='More icons available in the SlideDeck AI repository',
|
573 |
-
hyperlink='https://github.com/barun-saha/slide-deck-ai/tree/main/icons/png128'
|
574 |
-
)
|
575 |
-
|
576 |
-
return True
|
577 |
-
|
578 |
-
return False
|
579 |
-
|
580 |
-
|
581 |
-
def _add_text_at_bottom(
|
582 |
-
slide: pptx.slide.Slide,
|
583 |
-
slide_width_inch: float,
|
584 |
-
slide_height_inch: float,
|
585 |
-
text: str,
|
586 |
-
hyperlink: Optional[str] = None,
|
587 |
-
target_height: Optional[float] = 0.5
|
588 |
-
):
|
589 |
-
"""
|
590 |
-
Add arbitrary text to a textbox positioned near the lower left side of a slide.
|
591 |
-
|
592 |
-
:param slide: The slide.
|
593 |
-
:param slide_width_inch: The width of the slide.
|
594 |
-
:param slide_height_inch: The height of the slide.
|
595 |
-
:param target_height: the target height of the box in inches (optional).
|
596 |
-
:param text: The text to be added
|
597 |
-
:param hyperlink: The hyperlink to be added to the text (optional).
|
598 |
-
"""
|
599 |
-
|
600 |
-
footer = slide.shapes.add_textbox(
|
601 |
-
left=INCHES_1,
|
602 |
-
top=pptx.util.Inches(slide_height_inch - target_height),
|
603 |
-
width=pptx.util.Inches(slide_width_inch),
|
604 |
-
height=pptx.util.Inches(target_height)
|
605 |
-
)
|
606 |
-
|
607 |
-
paragraph = footer.text_frame.paragraphs[0]
|
608 |
-
run = paragraph.add_run()
|
609 |
-
run.text = text
|
610 |
-
run.font.size = pptx.util.Pt(10)
|
611 |
-
run.font.underline = False
|
612 |
-
|
613 |
-
if hyperlink:
|
614 |
-
run.hyperlink.address = hyperlink
|
615 |
-
|
616 |
-
|
617 |
-
def _handle_double_col_layout(
|
618 |
-
presentation: pptx.Presentation(),
|
619 |
-
slide_json: dict,
|
620 |
-
slide_width_inch: float,
|
621 |
-
slide_height_inch: float
|
622 |
-
) -> bool:
|
623 |
-
"""
|
624 |
-
Add a slide with a double column layout for comparison.
|
625 |
-
|
626 |
-
:param presentation: The presentation object.
|
627 |
-
:param slide_json: The content of the slide as JSON data.
|
628 |
-
:param slide_width_inch: The width of the slide in inches.
|
629 |
-
:param slide_height_inch: The height of the slide in inches.
|
630 |
-
:return: True if double col layout has been added; False otherwise.
|
631 |
-
"""
|
632 |
-
|
633 |
-
if 'bullet_points' in slide_json and slide_json['bullet_points']:
|
634 |
-
double_col_content = slide_json['bullet_points']
|
635 |
-
|
636 |
-
if double_col_content and (
|
637 |
-
len(double_col_content) == 2
|
638 |
-
) and isinstance(double_col_content[0], dict) and isinstance(double_col_content[1], dict):
|
639 |
-
slide = presentation.slide_layouts[4]
|
640 |
-
slide = presentation.slides.add_slide(slide)
|
641 |
-
placeholders = None
|
642 |
-
|
643 |
-
shapes = slide.shapes
|
644 |
-
title_placeholder = shapes.title
|
645 |
-
title_placeholder.text = remove_slide_number_from_heading(slide_json['heading'])
|
646 |
-
|
647 |
-
try:
|
648 |
-
left_heading, right_heading = shapes.placeholders[1], shapes.placeholders[3]
|
649 |
-
except KeyError:
|
650 |
-
# For manually edited/added master slides, the placeholder idx numbers in the dict
|
651 |
-
# will be different (>= 10)
|
652 |
-
left_heading, right_heading = None, None
|
653 |
-
placeholders = get_slide_placeholders(slide, layout_number=4)
|
654 |
-
|
655 |
-
for idx, name in placeholders:
|
656 |
-
if 'text placeholder' in name:
|
657 |
-
if not left_heading:
|
658 |
-
left_heading = shapes.placeholders[idx]
|
659 |
-
elif not right_heading:
|
660 |
-
right_heading = shapes.placeholders[idx]
|
661 |
-
|
662 |
-
try:
|
663 |
-
left_col, right_col = shapes.placeholders[2], shapes.placeholders[4]
|
664 |
-
except KeyError:
|
665 |
-
left_col, right_col = None, None
|
666 |
-
if not placeholders:
|
667 |
-
placeholders = get_slide_placeholders(slide, layout_number=4)
|
668 |
-
|
669 |
-
for idx, name in placeholders:
|
670 |
-
if 'content placeholder' in name:
|
671 |
-
if not left_col:
|
672 |
-
left_col = shapes.placeholders[idx]
|
673 |
-
elif not right_col:
|
674 |
-
right_col = shapes.placeholders[idx]
|
675 |
-
|
676 |
-
left_col_frame, right_col_frame = left_col.text_frame, right_col.text_frame
|
677 |
-
|
678 |
-
if 'heading' in double_col_content[0] and left_heading:
|
679 |
-
left_heading.text = double_col_content[0]['heading']
|
680 |
-
if 'bullet_points' in double_col_content[0]:
|
681 |
-
flat_items_list = get_flat_list_of_contents(
|
682 |
-
double_col_content[0]['bullet_points'], level=0
|
683 |
-
)
|
684 |
-
|
685 |
-
if not left_heading:
|
686 |
-
left_col_frame.text = double_col_content[0]['heading']
|
687 |
-
|
688 |
-
for idx, an_item in enumerate(flat_items_list):
|
689 |
-
if left_heading and idx == 0:
|
690 |
-
left_col_frame.text = an_item[0].removeprefix(STEP_BY_STEP_PROCESS_MARKER)
|
691 |
-
else:
|
692 |
-
paragraph = left_col_frame.add_paragraph()
|
693 |
-
paragraph.text = an_item[0].removeprefix(STEP_BY_STEP_PROCESS_MARKER)
|
694 |
-
paragraph.level = an_item[1]
|
695 |
-
|
696 |
-
if 'heading' in double_col_content[1] and right_heading:
|
697 |
-
right_heading.text = double_col_content[1]['heading']
|
698 |
-
if 'bullet_points' in double_col_content[1]:
|
699 |
-
flat_items_list = get_flat_list_of_contents(
|
700 |
-
double_col_content[1]['bullet_points'], level=0
|
701 |
-
)
|
702 |
-
|
703 |
-
if not right_heading:
|
704 |
-
right_col_frame.text = double_col_content[1]['heading']
|
705 |
-
|
706 |
-
for idx, an_item in enumerate(flat_items_list):
|
707 |
-
if right_col_frame and idx == 0:
|
708 |
-
right_col_frame.text = an_item[0].removeprefix(STEP_BY_STEP_PROCESS_MARKER)
|
709 |
-
else:
|
710 |
-
paragraph = right_col_frame.add_paragraph()
|
711 |
-
paragraph.text = an_item[0].removeprefix(STEP_BY_STEP_PROCESS_MARKER)
|
712 |
-
paragraph.level = an_item[1]
|
713 |
-
|
714 |
-
_handle_key_message(
|
715 |
-
the_slide=slide,
|
716 |
-
slide_json=slide_json,
|
717 |
-
slide_height_inch=slide_height_inch,
|
718 |
-
slide_width_inch=slide_width_inch
|
719 |
-
)
|
720 |
-
|
721 |
-
return True
|
722 |
-
|
723 |
-
return False
|
724 |
-
|
725 |
-
|
726 |
-
def _handle_step_by_step_process(
|
727 |
-
presentation: pptx.Presentation,
|
728 |
-
slide_json: dict,
|
729 |
-
slide_width_inch: float,
|
730 |
-
slide_height_inch: float
|
731 |
-
) -> bool:
|
732 |
-
"""
|
733 |
-
Add shapes to display a step-by-step process in the slide, if available.
|
734 |
-
|
735 |
-
:param presentation: The presentation object.
|
736 |
-
:param slide_json: The content of the slide as JSON data.
|
737 |
-
:param slide_width_inch: The width of the slide in inches.
|
738 |
-
:param slide_height_inch: The height of the slide in inches.
|
739 |
-
:return True if this slide has a step-by-step process depiction added; False otherwise.
|
740 |
-
"""
|
741 |
-
|
742 |
-
if 'bullet_points' in slide_json and slide_json['bullet_points']:
|
743 |
-
steps = slide_json['bullet_points']
|
744 |
-
|
745 |
-
no_marker_count = 0.0
|
746 |
-
n_steps = len(steps)
|
747 |
-
|
748 |
-
# Ensure that it is a single list of strings without any sub-list
|
749 |
-
for step in steps:
|
750 |
-
if not isinstance(step, str):
|
751 |
-
return False
|
752 |
-
|
753 |
-
# In some cases, one or two steps may not begin with >>, e.g.:
|
754 |
-
# {
|
755 |
-
# "heading": "Step-by-Step Process: Creating a Legacy",
|
756 |
-
# "bullet_points": [
|
757 |
-
# "Identify your unique talents and passions",
|
758 |
-
# ">> Develop your skills and knowledge",
|
759 |
-
# ">> Create meaningful work",
|
760 |
-
# ">> Share your work with the world",
|
761 |
-
# ">> Continuously learn and adapt"
|
762 |
-
# ],
|
763 |
-
# "key_message": ""
|
764 |
-
# },
|
765 |
-
#
|
766 |
-
# Use a threshold, e.g., at most 20%
|
767 |
-
if not step.startswith(STEP_BY_STEP_PROCESS_MARKER):
|
768 |
-
no_marker_count += 1
|
769 |
-
|
770 |
-
slide_header = slide_json['heading'].lower()
|
771 |
-
if (no_marker_count / n_steps > 0.25) and not (
|
772 |
-
('step-by-step' in slide_header) or ('step by step' in slide_header)
|
773 |
-
):
|
774 |
-
return False
|
775 |
-
|
776 |
-
if n_steps < 3 or n_steps > 6:
|
777 |
-
# Two steps -- probably not a process
|
778 |
-
# More than 5--6 steps -- would likely cause a visual clutter
|
779 |
-
return False
|
780 |
-
|
781 |
-
bullet_slide_layout = presentation.slide_layouts[1]
|
782 |
-
slide = presentation.slides.add_slide(bullet_slide_layout)
|
783 |
-
shapes = slide.shapes
|
784 |
-
shapes.title.text = remove_slide_number_from_heading(slide_json['heading'])
|
785 |
-
|
786 |
-
if 3 <= n_steps <= 4:
|
787 |
-
# Horizontal display
|
788 |
-
height = INCHES_1_5
|
789 |
-
width = pptx.util.Inches(slide_width_inch / n_steps - 0.01)
|
790 |
-
top = pptx.util.Inches(slide_height_inch / 2)
|
791 |
-
left = pptx.util.Inches((slide_width_inch - width.inches * n_steps) / 2 + 0.05)
|
792 |
-
|
793 |
-
for step in steps:
|
794 |
-
shape = shapes.add_shape(MSO_AUTO_SHAPE_TYPE.CHEVRON, left, top, width, height)
|
795 |
-
shape.text = step.removeprefix(STEP_BY_STEP_PROCESS_MARKER)
|
796 |
-
left += width - INCHES_0_4
|
797 |
-
elif 4 < n_steps <= 6:
|
798 |
-
# Vertical display
|
799 |
-
height = pptx.util.Inches(0.65)
|
800 |
-
top = pptx.util.Inches(slide_height_inch / 4)
|
801 |
-
left = INCHES_1 # slide_width_inch - width.inches)
|
802 |
-
|
803 |
-
# Find the close to median width, based on the length of each text, to be set
|
804 |
-
# for the shapes
|
805 |
-
width = pptx.util.Inches(slide_width_inch * 2 / 3)
|
806 |
-
lengths = [len(step) for step in steps]
|
807 |
-
font_size_20pt = pptx.util.Pt(20)
|
808 |
-
widths = sorted(
|
809 |
-
[
|
810 |
-
min(
|
811 |
-
pptx.util.Inches(font_size_20pt.inches * a_len),
|
812 |
-
width
|
813 |
-
) for a_len in lengths
|
814 |
-
]
|
815 |
-
)
|
816 |
-
width = widths[len(widths) // 2]
|
817 |
-
|
818 |
-
for step in steps:
|
819 |
-
shape = shapes.add_shape(MSO_AUTO_SHAPE_TYPE.PENTAGON, left, top, width, height)
|
820 |
-
shape.text = step.removeprefix(STEP_BY_STEP_PROCESS_MARKER)
|
821 |
-
top += height + INCHES_0_3
|
822 |
-
left += INCHES_0_5
|
823 |
-
|
824 |
-
return True
|
825 |
-
|
826 |
-
|
827 |
-
def _handle_key_message(
|
828 |
-
the_slide: pptx.slide.Slide,
|
829 |
-
slide_json: dict,
|
830 |
-
slide_width_inch: float,
|
831 |
-
slide_height_inch: float
|
832 |
-
):
|
833 |
-
"""
|
834 |
-
Add a shape to display the key message in the slide, if available.
|
835 |
-
|
836 |
-
:param the_slide: The slide to be processed.
|
837 |
-
:param slide_json: The content of the slide as JSON data.
|
838 |
-
:param slide_width_inch: The width of the slide in inches.
|
839 |
-
:param slide_height_inch: The height of the slide in inches.
|
840 |
-
"""
|
841 |
-
|
842 |
-
if 'key_message' in slide_json and slide_json['key_message']:
|
843 |
-
height = pptx.util.Inches(1.6)
|
844 |
-
width = pptx.util.Inches(slide_width_inch / 2.3)
|
845 |
-
top = pptx.util.Inches(slide_height_inch - height.inches - 0.1)
|
846 |
-
left = pptx.util.Inches((slide_width_inch - width.inches) / 2)
|
847 |
-
shape = the_slide.shapes.add_shape(
|
848 |
-
MSO_AUTO_SHAPE_TYPE.ROUNDED_RECTANGLE,
|
849 |
-
left=left,
|
850 |
-
top=top,
|
851 |
-
width=width,
|
852 |
-
height=height
|
853 |
-
)
|
854 |
-
shape.text = slide_json['key_message']
|
855 |
-
|
856 |
-
|
857 |
-
def _get_slide_width_height_inches(presentation: pptx.Presentation) -> Tuple[float, float]:
|
858 |
-
"""
|
859 |
-
Get the dimensions of a slide in inches.
|
860 |
-
|
861 |
-
:param presentation: The presentation object.
|
862 |
-
:return: The width and the height.
|
863 |
-
"""
|
864 |
-
|
865 |
-
slide_width_inch = EMU_TO_INCH_SCALING_FACTOR * presentation.slide_width
|
866 |
-
slide_height_inch = EMU_TO_INCH_SCALING_FACTOR * presentation.slide_height
|
867 |
-
# logger.debug('Slide width: %f, height: %f', slide_width_inch, slide_height_inch)
|
868 |
-
|
869 |
-
return slide_width_inch, slide_height_inch
|
870 |
-
|
871 |
-
|
872 |
-
if __name__ == '__main__':
|
873 |
-
_JSON_DATA = '''
|
874 |
-
{
|
875 |
-
"title": "AI Applications: Transforming Industries",
|
876 |
-
"slides": [
|
877 |
-
{
|
878 |
-
"heading": "Introduction to AI Applications",
|
879 |
-
"bullet_points": [
|
880 |
-
"Artificial Intelligence (AI) is transforming various industries",
|
881 |
-
"AI applications range from simple decision-making tools to complex systems",
|
882 |
-
"AI can be categorized into types: Rule-based, Instance-based, and Model-based"
|
883 |
-
],
|
884 |
-
"key_message": "AI is a broad field with diverse applications and categories",
|
885 |
-
"img_keywords": "AI, transformation, industries, decision-making, categories"
|
886 |
-
},
|
887 |
-
{
|
888 |
-
"heading": "AI in Everyday Life",
|
889 |
-
"bullet_points": [
|
890 |
-
"Virtual assistants like Siri, Alexa, and Google Assistant",
|
891 |
-
"Recommender systems in Netflix, Amazon, and Spotify",
|
892 |
-
"Fraud detection in banking and credit card transactions"
|
893 |
-
],
|
894 |
-
"key_message": "AI is integrated into our daily lives through various services",
|
895 |
-
"img_keywords": "virtual assistants, recommender systems, fraud detection"
|
896 |
-
},
|
897 |
-
{
|
898 |
-
"heading": "AI in Healthcare",
|
899 |
-
"bullet_points": [
|
900 |
-
"Disease diagnosis and prediction using machine learning algorithms",
|
901 |
-
"Personalized medicine and drug discovery",
|
902 |
-
"AI-powered robotic surgeries and remote patient monitoring"
|
903 |
-
],
|
904 |
-
"key_message": "AI is revolutionizing healthcare with improved diagnostics and patient care",
|
905 |
-
"img_keywords": "healthcare, disease diagnosis, personalized medicine, robotic surgeries"
|
906 |
-
},
|
907 |
-
{
|
908 |
-
"heading": "AI in Key Industries",
|
909 |
-
"bullet_points": [
|
910 |
-
{
|
911 |
-
"heading": "Retail",
|
912 |
-
"bullet_points": [
|
913 |
-
"Inventory management and demand forecasting",
|
914 |
-
"Customer segmentation and targeted marketing",
|
915 |
-
"AI-driven chatbots for customer service"
|
916 |
-
]
|
917 |
-
},
|
918 |
-
{
|
919 |
-
"heading": "Finance",
|
920 |
-
"bullet_points": [
|
921 |
-
"Credit scoring and risk assessment",
|
922 |
-
"Algorithmic trading and portfolio management",
|
923 |
-
"AI for detecting money laundering and cyber fraud"
|
924 |
-
]
|
925 |
-
}
|
926 |
-
],
|
927 |
-
"key_message": "AI is transforming retail and finance with improved operations and decision-making",
|
928 |
-
"img_keywords": "retail, finance, inventory management, credit scoring, algorithmic trading"
|
929 |
-
},
|
930 |
-
{
|
931 |
-
"heading": "AI in Education",
|
932 |
-
"bullet_points": [
|
933 |
-
"Personalized learning paths and adaptive testing",
|
934 |
-
"Intelligent tutoring systems for skill development",
|
935 |
-
"AI for predicting student performance and dropout rates"
|
936 |
-
],
|
937 |
-
"key_message": "AI is personalizing education and improving student outcomes",
|
938 |
-
},
|
939 |
-
{
|
940 |
-
"heading": "Step-by-Step: AI Development Process",
|
941 |
-
"bullet_points": [
|
942 |
-
">> Define the problem and objectives",
|
943 |
-
">> Collect and preprocess data",
|
944 |
-
">> Select and train the AI model",
|
945 |
-
">> Evaluate and optimize the model",
|
946 |
-
">> Deploy and monitor the AI system"
|
947 |
-
],
|
948 |
-
"key_message": "Developing AI involves a structured process from problem definition to deployment",
|
949 |
-
"img_keywords": ""
|
950 |
-
},
|
951 |
-
{
|
952 |
-
"heading": "AI Icons: Key Aspects",
|
953 |
-
"bullet_points": [
|
954 |
-
"[[brain]] Human-like intelligence and decision-making",
|
955 |
-
"[[robot]] Automation and physical tasks",
|
956 |
-
"[[]] Data processing and cloud computing",
|
957 |
-
"[[lightbulb]] Insights and predictions",
|
958 |
-
"[[globe2]] Global connectivity and impact"
|
959 |
-
],
|
960 |
-
"key_message": "AI encompasses various aspects, from human-like intelligence to global impact",
|
961 |
-
"img_keywords": "AI aspects, intelligence, automation, data processing, global impact"
|
962 |
-
},
|
963 |
-
{
|
964 |
-
"heading": "Conclusion: Embracing AI's Potential",
|
965 |
-
"bullet_points": [
|
966 |
-
"AI is transforming industries and improving lives",
|
967 |
-
"Ethical considerations are crucial for responsible AI development",
|
968 |
-
"Invest in AI education and workforce development",
|
969 |
-
"Call to action: Explore AI applications and contribute to shaping its future"
|
970 |
-
],
|
971 |
-
"key_message": "AI offers immense potential, and we must embrace it responsibly",
|
972 |
-
"img_keywords": "AI transformation, ethical considerations, AI education, future of AI"
|
973 |
-
}
|
974 |
-
]
|
975 |
-
}'''
|
976 |
-
|
977 |
-
temp = tempfile.NamedTemporaryFile(delete=False, suffix='.pptx')
|
978 |
-
path = pathlib.Path(temp.name)
|
979 |
-
|
980 |
-
generate_powerpoint_presentation(
|
981 |
-
json5.loads(_JSON_DATA),
|
982 |
-
output_file_path=path,
|
983 |
-
slides_template='Basic'
|
984 |
-
)
|
985 |
-
print(f'File path: {path}')
|
986 |
-
|
987 |
-
temp.close()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
helpers/text_helper.py
DELETED
@@ -1,83 +0,0 @@
|
|
1 |
-
"""
|
2 |
-
Utility functions to help with text processing.
|
3 |
-
"""
|
4 |
-
import json_repair as jr
|
5 |
-
|
6 |
-
|
7 |
-
def is_valid_prompt(prompt: str) -> bool:
|
8 |
-
"""
|
9 |
-
Verify whether user input satisfies the concerned constraints.
|
10 |
-
|
11 |
-
:param prompt: The user input text.
|
12 |
-
:return: True if all criteria are satisfied; False otherwise.
|
13 |
-
"""
|
14 |
-
|
15 |
-
if len(prompt) < 7 or ' ' not in prompt:
|
16 |
-
return False
|
17 |
-
|
18 |
-
return True
|
19 |
-
|
20 |
-
|
21 |
-
def get_clean_json(json_str: str) -> str:
|
22 |
-
"""
|
23 |
-
Attempt to clean a JSON response string from the LLM by removing ```json at the beginning and
|
24 |
-
trailing ``` and any text beyond that.
|
25 |
-
CAUTION: May not be always accurate.
|
26 |
-
|
27 |
-
:param json_str: The input string in JSON format.
|
28 |
-
:return: The "cleaned" JSON string.
|
29 |
-
"""
|
30 |
-
|
31 |
-
response_cleaned = json_str
|
32 |
-
|
33 |
-
if json_str.startswith('```json'):
|
34 |
-
json_str = json_str[7:]
|
35 |
-
|
36 |
-
while True:
|
37 |
-
idx = json_str.rfind('```') # -1 on failure
|
38 |
-
|
39 |
-
if idx <= 0:
|
40 |
-
break
|
41 |
-
|
42 |
-
# In the ideal scenario, the character before the last ``` should be
|
43 |
-
# a new line or a closing bracket
|
44 |
-
prev_char = json_str[idx - 1]
|
45 |
-
|
46 |
-
if (prev_char == '}') or (prev_char == '\n' and json_str[idx - 2] == '}'):
|
47 |
-
response_cleaned = json_str[:idx]
|
48 |
-
|
49 |
-
json_str = json_str[:idx]
|
50 |
-
|
51 |
-
return response_cleaned
|
52 |
-
|
53 |
-
|
54 |
-
def fix_malformed_json(json_str: str) -> str:
|
55 |
-
"""
|
56 |
-
Try and fix the syntax error(s) in a JSON string.
|
57 |
-
|
58 |
-
:param json_str: The input JSON string.
|
59 |
-
:return: The fixed JSOn string.
|
60 |
-
"""
|
61 |
-
|
62 |
-
return jr.repair_json(json_str, skip_json_loads=True)
|
63 |
-
|
64 |
-
|
65 |
-
if __name__ == '__main__':
|
66 |
-
JSON1 = '''{
|
67 |
-
"key": "value"
|
68 |
-
}
|
69 |
-
'''
|
70 |
-
JSON2 = '''["Reason": "Regular updates help protect against known vulnerabilities."]'''
|
71 |
-
JSON3 = '''["Reason" Regular updates help protect against known vulnerabilities."]'''
|
72 |
-
JSON4 = '''
|
73 |
-
{"bullet_points": [
|
74 |
-
">> Write without stopping or editing",
|
75 |
-
>> Set daily writing goals and stick to them,
|
76 |
-
">> Allow yourself to make mistakes"
|
77 |
-
],}
|
78 |
-
'''
|
79 |
-
|
80 |
-
print(fix_malformed_json(JSON1))
|
81 |
-
print(fix_malformed_json(JSON2))
|
82 |
-
print(fix_malformed_json(JSON3))
|
83 |
-
print(fix_malformed_json(JSON4))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
icons/png128/0-circle.png
DELETED
Binary file (4.1 kB)
|
|
icons/png128/1-circle.png
DELETED
Binary file (3.45 kB)
|
|
icons/png128/123.png
DELETED
Binary file (2.5 kB)
|
|
icons/png128/2-circle.png
DELETED
Binary file (4.01 kB)
|
|
icons/png128/3-circle.png
DELETED
Binary file (4.24 kB)
|
|
icons/png128/4-circle.png
DELETED
Binary file (3.74 kB)
|
|
icons/png128/5-circle.png
DELETED
Binary file (4.12 kB)
|
|
icons/png128/6-circle.png
DELETED
Binary file (4.37 kB)
|
|
icons/png128/7-circle.png
DELETED
Binary file (3.78 kB)
|
|
icons/png128/8-circle.png
DELETED
Binary file (4.43 kB)
|
|
icons/png128/9-circle.png
DELETED
Binary file (4.44 kB)
|
|
icons/png128/activity.png
DELETED
Binary file (1.38 kB)
|
|
icons/png128/airplane.png
DELETED
Binary file (2.09 kB)
|
|
icons/png128/alarm.png
DELETED
Binary file (4.08 kB)
|
|
icons/png128/alien-head.png
DELETED
Binary file (4.73 kB)
|
|
icons/png128/alphabet.png
DELETED
Binary file (2.44 kB)
|
|
icons/png128/amazon.png
DELETED
Binary file (3.56 kB)
|
|
icons/png128/amritsar-golden-temple.png
DELETED
Binary file (4.44 kB)
|
|
icons/png128/amsterdam-canal.png
DELETED
Binary file (3.32 kB)
|
|
icons/png128/amsterdam-windmill.png
DELETED
Binary file (2.67 kB)
|
|
icons/png128/android.png
DELETED
Binary file (2.24 kB)
|
|
icons/png128/angkor-wat.png
DELETED
Binary file (2.64 kB)
|
|
icons/png128/apple.png
DELETED
Binary file (2.4 kB)
|
|
icons/png128/archive.png
DELETED
Binary file (1.27 kB)
|
|
icons/png128/argentina-obelisk.png
DELETED
Binary file (1.39 kB)
|
|
icons/png128/artificial-intelligence-brain.png
DELETED
Binary file (4.73 kB)
|
|
icons/png128/atlanta.png
DELETED
Binary file (2.87 kB)
|
|
icons/png128/austin.png
DELETED
Binary file (1.72 kB)
|
|
icons/png128/automation-decision.png
DELETED
Binary file (1.19 kB)
|
|
icons/png128/award.png
DELETED
Binary file (2.55 kB)
|
|
icons/png128/balloon.png
DELETED
Binary file (2.83 kB)
|
|
icons/png128/ban.png
DELETED
Binary file (3.32 kB)
|
|
icons/png128/bandaid.png
DELETED
Binary file (3.53 kB)
|
|
icons/png128/bangalore.png
DELETED
Binary file (2.4 kB)
|
|
icons/png128/bank.png
DELETED
Binary file (1.4 kB)
|
|