For my app Dunbar I needed to scrape person information. The goal was to get profile picture options to show them in the app. It wasn't easy. A whole new world opened up to me. The world of data scraping!
Puppeteer (Node JS) is a tool to do automate browsing using a Headless browser. I created a bot that scrapes Facebook and LinkedIn Images with it. See this repo for a working example. The biggest thing I had to learn was CSS Selectors in order to obtain information from web pages.
Unfortunately, after hours and hours of work, I noticed that especially the websites I wanted to scrape have anti-scraping protection in place. There is rate limiting and sometimes it shows an auth wall. How to counteract this?
https://proxycrawl.com/ is a paid SDK to set a proxy in front of any webpage.
https://serpapi.com/ is an api for Google Search.