Python – 網站擷取 – 101 – 簡單的程式範例

最後更新日期：2021 年 01 月 2 日

套件安裝

pip install beautifulsoup4

程式碼範例

先來看一個簡單的網站擷取程式，大致上，它做了以下二個動作
1、發送請求，並從網頁伺服器取得回應資料
2、處理回應資料

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen('https://kirin.idv.tw/python-csv-chinese-utf8/')
bs = BeautifulSoup(html.read(), 'html.parser')
print(bs.h1)

程式碼說明

from urllib.request import urlopen

urllib 是 Python 標準函式庫，其功能包含從網路請求資料、處理 cookie、處理標頭檔... 等；而 request 是其中一個模組，urlopen 是 request 模組裏的一項功能，可以打開網路上的遠端物件，並讀取它。

from bs4 import BeautifulSoup

bs4 是第三方函式庫，全名為「BeautifulSoup 4 函式庫」，我們從這個函式庫導入 BeautifulSoup 物件，此物件可以矯正不良的 HTML 格式，產生 XML 結構的 Python 物件。

html = urlopen('https://kirin.idv.tw/python-csv-chinese-utf8/')

取得網頁的 html 內容

bs = BeautifulSoup(html.read(), 'html.parser')

將 html 內容轉換為 BeautifulSoup 物件，以利後續處理。

print(bs.h1)

透過 BeautifulSoup 物件，取出網頁的 h1 標籤。

Python – 網站擷取 – 101 – 簡單的程式範例

套件安裝

程式碼範例

程式碼說明

Author

Comments

Write a Reply or Comment 取消回覆