Do you need a web scraping solution? Email me: email@example.com
For a scraper, there are several considerations:
1 - Making the scraper.
2 - Deploying the scraper.
1- Making the Scraper: How to get the data?
Here I needed to load dynamic data (arrival and departure times for certain vessels) so I went with Selenium because it doesn't require digging through the network tab and I felt the data on the browser page was unlikely to change and in-fact would be easier to parse than trying to keep track of vague or even randomized data urls.
Choosing selenium will create some challenges for deployment on GCP though.
2- Deploying the Scraper: Make it run in the cloud.
I wanted the cloud deployment to be relatively modular, pay-per-use, and support selenium well. Unfortunately google cloud run matches the first two requirements but not the last. Hosting standalone selenium docker image deployment won't work behind cloud run's authentication mechanism because the selenium grid server does not support authenticated requests.
To work around this is modified selenium's connection class in my python client code to set a google authentication token to pass along with requests to the selenium server living on cloud run. Then I built into this python code which hits the selenium server into a separate docker image that could be scheduled on cloud run as a job.