How to get text with Selenium WebDriver in Python
Categories:
Mastering Text Extraction with Selenium WebDriver in Python

Learn how to effectively retrieve text content from web elements using Selenium WebDriver in Python, covering various scenarios and best practices.
Extracting text from web pages is a fundamental task in web scraping and automation using Selenium WebDriver. Whether you need to verify content, collect data, or interact with elements based on their visible text, understanding the different methods for text retrieval is crucial. This article will guide you through the primary techniques for getting text from various HTML elements using Python and Selenium.
The .text
Property: Your Primary Tool
The most common and straightforward way to get the visible text of a web element in Selenium is by using the .text
property. This property returns the inner text of the element, including the text of all its sub-elements, as displayed on the web page. It intelligently handles line breaks and spacing, providing a clean, human-readable string.
from selenium import webdriver
from selenium.webdriver.common.by import By
# Initialize the WebDriver (e.g., Chrome)
driver = webdriver.Chrome()
driver.get("https://www.example.com")
# Find an element by its ID
heading_element = driver.find_element(By.ID, "main-heading")
heading_text = heading_element.text
print(f"Heading Text: {heading_text}")
# Find a paragraph element by its tag name
paragraph_element = driver.find_element(By.TAG_NAME, "p")
paragraph_text = paragraph_element.text
print(f"Paragraph Text: {paragraph_text}")
# Close the browser
driver.quit()
Using the .text
property to extract visible text from elements.
.text
property only returns visible text. If an element or its children are hidden by CSS (e.g., display: none;
or visibility: hidden;
), their text will not be included in the result.Handling Hidden Text and Attributes with .get_attribute()
Sometimes, the text you need isn't directly visible on the page but is stored within an element's attribute, or you might need to retrieve text from elements that are hidden. In such cases, the .get_attribute()
method becomes invaluable. This method allows you to retrieve the value of any HTML attribute of an element, including value
for input fields, href
for links, or even textContent
and innerText
which are JavaScript properties.
flowchart TD A[Start] --> B{Element Found?} B -- Yes --> C{Text Visible?} C -- Yes --> D["Use .text property"] C -- No --> E{Text in Attribute?} E -- Yes --> F["Use .get_attribute('attribute_name')"] E -- No --> G{Text in JS property?} G -- Yes --> H["Use .get_attribute('textContent') or .get_attribute('innerText')"] G -- No --> I[Consider JavaScript execution] B -- No --> J[Handle Element Not Found] D --> K[End] F --> K H --> K I --> K J --> K
Decision flow for extracting text from web elements.
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://www.google.com")
# Get the value from a search input field
search_input = driver.find_element(By.NAME, "q")
search_input.send_keys("Selenium WebDriver")
input_value = search_input.get_attribute("value")
print(f"Input Field Value: {input_value}")
# Get the href attribute from a link
link_element = driver.find_element(By.PARTIAL_LINK_TEXT, "About")
link_href = link_element.get_attribute("href")
print(f"Link Href: {link_href}")
# Retrieve text using JavaScript properties (e.g., for hidden elements)
# Note: .text is generally preferred for visible text
# .textContent gets all text, including hidden, without formatting
# .innerText gets all text, respecting CSS visibility and formatting
# Example for an element that might have hidden text or specific formatting
# For demonstration, let's assume a div with some content
# driver.execute_script("document.body.innerHTML += '<div id=\"hidden-div\" style=\"display:none;\">Hidden Content</div>';")
# hidden_div = driver.find_element(By.ID, "hidden-div")
# text_content = hidden_div.get_attribute("textContent")
# inner_text = hidden_div.get_attribute("innerText")
# print(f"textContent (hidden): {text_content}")
# print(f"innerText (hidden): {inner_text}") # This would likely be empty if display:none
driver.quit()
Using .get_attribute()
for input values, link URLs, and JavaScript properties.
textContent
and innerText
can be accessed via .get_attribute()
, they behave differently. textContent
returns the raw text content of an element and its descendants, regardless of styling. innerText
returns the visible text, respecting CSS styling (e.g., display: none
will result in empty string). For most visible text extraction, .text
is the most reliable and recommended approach.Extracting Text from Multiple Elements
When you need to extract text from a list of similar elements, such as items in a list, table cells, or search results, you'll typically use find_elements()
(plural) to get a list of web elements, and then iterate through them to extract the text from each.
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://www.w3schools.com/html/html_tables.asp")
# Find all table header elements
table_headers = driver.find_elements(By.TAG_NAME, "th")
header_texts = [header.text for header in table_headers if header.text]
print(f"Table Headers: {header_texts}")
# Find all table data cells in the first row (example)
first_row_cells = driver.find_elements(By.XPATH, "//table[1]/tbody/tr[2]/td")
cell_texts = [cell.text for cell in first_row_cells]
print(f"First Row Data: {cell_texts}")
driver.quit()
Iterating through multiple elements to extract their text content.
if element.text:
to avoid adding empty strings to your list if some elements happen to have no visible text.