Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -7,4 +7,10 @@ sdk: static
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
-
BigBanyanTree is an initiative to empower engineering colleges to set up their data engineering clusters and drive interest in data processing and analysis using tools such as Apache Spark.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
+
BigBanyanTree is an initiative to empower engineering colleges to set up their data engineering clusters and drive interest in data processing and analysis using tools such as Apache Spark.
|
11 |
+
|
12 |
+
As part of that initiative, we have open-sourced datasets processed from CommonCrawl data.
|
13 |
+
|
14 |
+
The datasets offer two subsets having the specified columns:
|
15 |
+
"script_extraction": ["ip", "host", "server", "script_src_attrs"]
|
16 |
+
"ipmaxmind": ["ip", "host", "server", "postal_code", "latitude", "longitude", "accuracy_radius", "continent_code", "continent_name", "country_iso_code", "subdivision_code", "city_name", "metro_code", "time_zone", "year"]
|