NLPinitiative-Streamlit-App / research /Steamlit_Research.md
dlsmallw's picture
Task-272 Add notes to research markdown file
bd7d4f0

A newer version of the Streamlit SDK is available: 1.43.2

Upgrade

Streamlit Research:

Creating the application:

  • Import Streamlit library
import streamlit as st
  • Running the Streamlit app:
    • This will open a separate browser window running the application locally
streamlit run <app_name>.py

State driven UI:

  • Elements can be loaded into variables and those variables can be used to modify the specified element:

    text_element_state = st.text('Element State Test')  ## Initializes the element with this text
    text_element_state.text('New Element State Test')   ## Updates the displayed text
    
  • Can also using caching:

    • Minimizes overhead of running functions everytime the application reloads by checking the passed values to see if they have changed and only running the method if they have
      @st.cache_data
      def load_data(args):
          ## Some logic
      
      text_element_state.text('Some data: (using st.cache_data)') ## This will appear immediately upon saving
      
  • There are some limitations with using the cache mechanism:

    • Will not work with functions with internal numerical randomization (with regards to calculations)
    • Scope of the validation is within the current working directory
    • Cached values are stored by reference, so it is undesirable to mutate the values
  • st.write() function:

    • Can pass more complex data types that will be displayed in an interactable format
      • i.e., dataframes can be passed and displayed automatically as a table
  • Additional functionality that is available:

    • Draw histograms by using numpy:
      st.subheader('Number of pickups by hour')
      hist_values = np.histogram(
          data[DATE_COLUMN].dt.hour, bins=24, range=(0,24))[0]
      st.bar_chart(hist_values)
      
    • Plot data on map:
      hour_to_filter = 17
      filtered_data = data[data[DATE_COLUMN].dt.hour == hour_to_filter]
      st.subheader(f'Map of all pickups at {hour_to_filter}:00')
      st.map(filtered_data)
      
    • Filter data using slider:
      hour_to_filter = st.slider('hour', 0, 23, 17)  # min: 0h, max: 23h, default: 17h
      
    • Using buttons for toggling data:
      if st.checkbox('Show raw data'):
          st.subheader('Raw data')
          st.write(data)
      

Applications for Project:

Caching:

  • st.cache_data:
    • Only really useful for serializing computational data (function results)
  • st.cache_resource:
    • Useful for storing NLP model globally
    • Stores unserializable data (like models) and loads it once
      • Do not need to worry about having to load the model multiple times across sessions since it is stored and accessible to all sessions