Introduction
Web Scraping - An automatic way to retrieve unstructured or semi-structured data from a website and store them in a structured format. Web Scrapers can extract all the data on particular sites or the specific data that a user wants. Ideally, it’s best if you specify the data you want so that the web scraper only extracts that data quickly.
Challenges of web scraping
- It is more complex than other ways of getting data. APIs
 - It may be fragile - Frequently changing web designs
 - Ethical and legal risks
 - Reality - Website designs are getting better. Are you a robot?
 
Benefits of web scraping
- Can be run iteratively over many web pages
 - Some websites have thousand or millions of pages
 - Can construct large, robust data sets out of otherwise messy text that would only appear in your web browser.
 
Basics of Web Scraping
- 1. Finding the address - URL or URLs
 - 2. Sending HTTP requests to the server
 - 3. Parse the return
 - 4. Store the results
 
Important Steps to follow
- 1. Indentify the url patterns
 - 2. Inspect the source code and locate data elements
 - 3. Think about the logical flow of your crawler
 - 4. Persistence and creativity is often required to collect valuable data
 
Techniques covered in the 3 parts of web scraping
- Techniques for identifing url patterns
 - How to inspect html in a browser
 - How to request html with Python
 - Techniques for finding the data
 - Techniques for parsing html
 - BeautifulSoup
 
The website URLS from where we will be scraping the data
- https://www.indiana-demographics.com/cities_by_population
 - https://www.getcompanyinfo.com/industry/information-technology-services/
 - https://quotes.toscrape.com/