1149642

Journey of creating an analytics dataset on S3 Table Buckets

Sinopsis In this talk we will walk you through our journey of experimenting with different ETL solutions for creating and updating a dataset in the new AWS service: Table Buckets. Source data resides in glue tables and RDS. We start experimentation with Athena and eventually select Glue with Apache Spark as a solution. The presentation is focused on exploring the upsides and limitations of each solution, costs, challenges we faced, solutions we considered and local setup of an apache spark job.

AWS services covered:

  • Table Buckets, Glue, Lake Formation, Athena

Use case

  • Building a reporting system for advertisers by aggregating multiple S3 files.
  • A journey of discovery and experimentation, sprinkled with false starts and iterations that didn’t lead to the desired outcome but brought valuable learnings.
Andreea Olaru
Andreea Olaru
eMag, Nodejs Developer