File size: 4,799 Bytes
3ddfd7a
ab59d33
3ddfd7a
 
b508eb9
64d50ce
ab59d33
 
 
 
 
 
 
3ddfd7a
ab59d33
 
 
 
51d9500
ab59d33
 
 
 
 
 
 
 
 
 
 
b508eb9
 
 
 
 
 
 
 
 
 
 
 
 
ab59d33
b508eb9
 
 
 
 
 
 
 
 
 
 
 
ab59d33
b508eb9
 
 
 
 
 
 
ab59d33
 
 
b508eb9
3ddfd7a
b508eb9
ab59d33
7ded404
ab59d33
7ded404
ab59d33
 
 
 
7ded404
 
ab59d33
 
 
7ded404
ab59d33
 
 
b508eb9
 
 
 
 
 
 
 
7ded404
 
b508eb9
ab59d33
 
7ded404
ab59d33
 
7ded404
b508eb9
ab59d33
 
 
b508eb9
 
51d9500
ab59d33
 
7ded404
ab59d33
7ded404
ab59d33
 
 
7ded404
 
 
 
 
51d9500
ab59d33
51d9500
7ded404
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
import streamlit as st
import pandas as pd
import altair as alt

st.title('UFO Data Visualization Analysis Report')

@st.cache_data
def load_data():
    columns = [
        'datetime', 'city', 'state', 'country', 'shape', 
        'duration_seconds', 'duration_reported', 'description', 
        'date_posted', 'latitude', 'longitude'
    ]

    df = pd.read_csv(
        'https://github.com/UIUC-iSchool-DataViz/is445_data/raw/main/ufo-scrubbed-geocoded-time-standardized-00.csv',
        header=None,
        names=columns
    )
    
    df['datetime'] = pd.to_datetime(df['datetime'])
    df['year'] = df['datetime'].dt.year
    df['month'] = df['datetime'].dt.month
    df['shape'] = df['shape'].fillna('unknown').str.lower()
    return df

df = load_data()

st.markdown("## 1. Temporal Trends in UFO Sightings")

min_year = int(df['year'].min())
max_year = int(df['year'].max())
year_range = st.slider(
    "Select Year Range",
    min_value=min_year,
    max_value=max_year,
    value=(min_year, max_year)
)

time_agg = st.selectbox(
    "Select Time Aggregation",
    ['Yearly', 'Monthly']
)

if time_agg == 'Yearly':
    time_counts = df[df['year'].between(year_range[0], year_range[1])]\
        .groupby('year').size().reset_index(name='count')
    x_encoding = alt.X('year:Q', title='Year')
else:
    df['date'] = pd.to_datetime(df[['year', 'month']].assign(day=1))
    time_counts = df[df['year'].between(year_range[0], year_range[1])]\
        .groupby('date').size().reset_index(name='count')
    x_encoding = alt.X('date:T', title='Date')

base_chart = alt.Chart(time_counts).encode(
    x=x_encoding,
    y=alt.Y('count:Q', title='Number of Sightings'),
    tooltip=['count:Q']
)

temporal_viz = (
    base_chart.mark_area(opacity=0.3) +
    base_chart.mark_line(color='steelblue') +
    base_chart.mark_point(color='steelblue')
).properties(
    width=700,
    height=400
).interactive()

st.altair_chart(temporal_viz)

st.markdown("""**(1) Features Highlighted**

This visualization emphasizes the temporal evolution of UFO sightings from 1949 to 2013, highlighting both the overall trend and year-specific fluctuations in reporting frequency. The visualization reveals distinct patterns of increased reporting over time, with notable spikes in certain periods that could correlate with significant historical events or changes in reporting methods.

**(2) Design Choices**

I implemented several key design elements for optimal data representation: 
+ A line chart with point markers was chosen for its effectiveness in showing continuous time-series data while maintaining precise year-specific values
+ Interactive tooltips were added to provide exact sighting counts

**(3) Potential Improvements**

Given more time, I would implement dual y-axes to show both sighting frequency and duration, and add the capability to filter by time periods and incorporate monthly/seasonal analysis options.""")

st.markdown("## 2. Analysis of UFO Shape Distribution")

all_shapes = sorted(df['shape'].unique())
selected_shapes = st.multiselect(
    "Select UFO Shapes to Display",
    options=all_shapes,
    default=all_shapes[:10]
)

filtered_df = df[df['shape'].isin(selected_shapes)]
shape_counts = filtered_df['shape'].value_counts().reset_index()
shape_counts.columns = ['shape', 'count']

shape_viz = alt.Chart(shape_counts).mark_bar().encode(
    y=alt.Y('shape:N', 
            sort='-x',
            title='UFO Shape'),
    x=alt.X('count:Q',
            title='Number of Reports'),
    color=alt.Color('count:Q', scale=alt.Scale(scheme='viridis')),
    tooltip=['shape:N', 'count:Q']
).properties(
    width=700,
    height=max(len(selected_shapes) * 25, 400)
).interactive()

st.altair_chart(shape_viz)

st.markdown("""**(1) Features Highlighted**

This visualization focuses on the distribution of reported UFO shapes. The data shows clear preferences in how witnesses describe UFO shapes, with certain forms being consistently more common across reports. This analysis helps identify patterns in how people perceive and describe unidentified flying objects.

**(2) Design Choices** 

The visualization employs several intentional design elements: 
+ A horizontal bar chart format was chosen to accommodate long shape labels and enable easy comparison of quantities
+ Bars are sorted in descending order to immediately highlight the most common shapes
+ The chart focuses on the top 10 shapes to maintain clarity and prevent information overload
+ Interactive tooltips provide precise counts for each shape category.

**(3) Potential Improvements**

With additional time, I would expand this visualization by adding temporal analysis to show how shape distributions have changed over decades, incorporate geographical analysis to reveal regional patterns in shape reporting.""")