1149642
Journey of creating an analytics dataset on S3 Table Buckets
Sinopsis In this talk we will walk you through our journey of experimenting with different ETL solutions for creating and updating a dataset in the new AWS service: Table Buckets. Source data resides in glue tables and RDS. We start experimentation with Athena and eventually select Glue with Apache Spark as a solution. The presentation is focused on exploring the upsides and limitations of each solution, costs, challenges we faced, solutions we considered and local setup of an apache spark job.
AWS services covered:
- Table Buckets, Glue, Lake Formation, Athena
Use case
- Building a reporting system for advertisers by aggregating multiple S3 files.
- A journey of discovery and experimentation, sprinkled with false starts and iterations that didn’t lead to the desired outcome but brought valuable learnings.
