파이썬으로 웹 페이지 스크랩을 해 보는 코드 만들어보기 (feat chat.openai.com)

스크랩

오늘은 python 코드를 이용해서 스크랩하는 코드를 만들어 보았습니다. 이 코드도 openAI에게 요청해서 받은 코드 2가지를 공개해 볼까 합니다.

첫 번째 요청 결과는 아래와 같이 구성해 알려 주었습니다. 다만, 저기 있는 URL 이 검색이 되지 않는 코드 이기 때문에 그다지 도움이 되지 못할 것으로 생각이 됩니다.

import requests from bs4 import BeautifulSoup  # set the URL of the website to scrape url = "https://www.thecrazytourist.com/15-best-places-to-visit-in-south-korea/"  # send a GET request to the URL response = requests.get(url)  # parse the HTML content of the response using BeautifulSoup soup = BeautifulSoup(response.content, 'html.parser')  # find all the article elements on the page articles = soup.find_all('article')  # iterate over each article and extract the title and description for article in articles:     title = article.find('h2').text     description = article.find('div', {'class': 'entry-content'}).text.strip()          # print the title and description of the article     print(f"Title: {title}")     print(f"Description: {description}\n")

두 번째 시도를 해 보았습니다. 이번에 알려주는 URL은 실제 조회가 됩니다.

import requests from bs4 import BeautifulSoup  # the URL of the page to scrape url = 'https://www.tripadvisor.com/Attractions-g294196-Activities-South_Korea.html'  # send a GET request to the URL response = requests.get(url)  # parse the HTML content of the page with BeautifulSoup soup = BeautifulSoup(response.content, 'html.parser')  # find all the links to attractions on the page attraction_links = soup.find_all('a', {'class': 'attractions-attraction-overview-pois-PoiCard__poiLink--3PWj5'})  # print the titles and links of the attractions for link in attraction_links:     title = link.get_text()     href = link.get('href')     print(f'{title}: https://www.tripadvisor.com{href}')

이렇게 만들어진 코드를 그대로 사용할 수 있을까요?

그건 좀 어렵게 생각이 됩니다. 다만, 이런 기본적인 코드를 얻음으로 인해서 배움의 길을 넓혀갈 수 있는 기초는 되지 않을까 하는 생각을 해 보게 되었습니다.

openAI에 대한 많은 기사거리, youtube 콘텐츠 많이 많이 생성이 되고 있습니다. 이것도 한 시간의 유행일까요? 아무튼 따라 해 보기는 해야 할 것 같습니다.

*** 위 코드는 따라 하지 마세요. 결과를 얻을 수 없는 코드 입니다. ㅠㅠ;;