Claim your Biolink Click Here
2 like 0 dislike
I need to scrap data and fields should be business name, website, location and category/industry
in Education & Reference by (1.1k points) | 173 views

1 Answer

2 like 0 dislike
Best answer

You can write python script and install beautifulsoap
>pip install requests beautifulsoup4

import csv
import requests
from bs4 import BeautifulSoup

# Send a GET request to the page
#url = ""
url = ""
response = requests.get(url)

# Create a BeautifulSoup object to parse the HTML content
soup = BeautifulSoup(response.content, "html.parser")

# Find the pagination element
pagination = soup.find("div", class_="styles_paginationWrapper__fukEb styles_pagination__USObu")

if pagination:
    # Extract the total number of pages
    last_page = pagination.find_all("a")[-2].text

    # Create a CSV file to store the scraped data
    csv_filename = "business_data.csv"
    csv_file = open(csv_filename, "w", newline="")
    csv_writer = csv.writer(csv_file)
    csv_writer.writerow(["Website", "Name", "long", "lat", "loc", "cat"])

    # Scrape data from each page
    for page in range(1, int(last_page) + 1):
        #page_url = f"}"
        page_url = f""
        response = requests.get(page_url)
        soup = BeautifulSoup(response.content, "html.parser")
        businesses = soup.find_all("div", class_="paper_paper__1PY90 paper_outline__lwsUX card_card__lQWDv card_noPadding__D8PcU styles_wrapper__2JOo2")

        # Iterate over each business and extract the name and website
        for business in businesses:
            website = business.find("a", class_="link_internal__7XN06 link_wrapper__5ZJEx styles_linkWrapper__UWs5j")["href"]
            name = business.find("p", class_="typography_heading-xs__jSwUz typography_appearance-default__AAY17 styles_displayName__GOhL2").text.strip()
            long ='0.61'
            lat ='0.41'

            #location_text = business.find("span", class_="typography_body-m__xgxZ_ typography_appearance-subtle__8_H2l styles_metadataItem__Qn_Q2 styles_location__ILZb0").text.strip()

            location_element = business.find("span", class_="typography_body-m__xgxZ_ typography_appearance-subtle__8_H2l styles_metadataItem__Qn_Q2 styles_location__ILZb0")

                location = location_element.text.strip()
            except AttributeError:
            #print("Location not found")

            category = business.find("span", class_="typography_body-s__aY15Q typography_appearance-default__AAY17").text.strip()
            csv_writer.writerow([website, name, long, lat, location, category])

    print("Data scraping is complete. The results have been saved in", csv_filename)
    print("Pagination element not found. Check the URL or website structure.")


by (1.6k points)
selected by
0 0
in python 3.12
py -m pip install requests
py -m pip install BeautifulSoup4

Related questions

1 like 0 dislike
1 answer
0 like 0 dislike
1 answer
0 like 0 dislike
1 answer
0 like 0 dislike
1 answer
2 like 0 dislike
1 answer
2 like 0 dislike
1 answer
1 like 0 dislike
1 answer

Where your donation goes
Technology: We will utilize your donation for development, server maintenance and bandwidth management, etc for our site.

Employee and Projects: We have only 15 employees. They are involved in a wide sort of project works. Your valuable donation will definitely boost their work efficiency.

How can I earn points?
Awarded a Best Answer 10 points
Answer questions 10 points
Asking Question -20 points

1,313 questions
1,475 answers
4,809 users