API Reference¶
SEO Auditer¶
Site Parser¶
This module contains SiteParser class which defines a parser at website level (list of urls).
Typical usage example:
site_parser = SiteParser(url, LXMLPageParser(url), urls=None, parse_sitemap_urls=True) while site_parser.parse_next_page():
print(“Running checks for url: {}”.format(site_parser.get_current_url())) # do something
-
class
seoaudit.analyzer.site_parser.
SiteParser
(base_url, page_parser: seoaudit.analyzer.page_parser.AbstractPageParser = None, urls=None, sitemap_link=None, parse_sitemap_urls=False)¶ Website level parser, uses a page parser object as the core of it’s parsing functionalities with the urls list being predefined or crawled from the sitemap file.
-
get_current_url
()¶ - Returns
url of currently indexed page
- Return type
str
-
parse_next_page
()¶ Parse next page using page parser object.
- Returns
True if next page was parse, False if end of list of urls was reached
- Return type
boolean
-
Pager Parser¶
This module contains page parses classes which define page parser objects at single web page level (single url).
Typical usage example:
page_parser = PageParser(url) sitemap_links = page_parser.get_elements(“(/html/head/link[@rel=’sitemap’])/@href”) sitemap_link = sitemap_link[0] if len(sitemap_links) >= 1 else None
-
class
seoaudit.analyzer.page_parser.
AbstractPageParser
(url)¶ Abstract web page parser. Used as a blueprint for page parser implementations.
-
abstract
get_element_attribute
(element, attribute='textContent') → str¶ Given an HTML element and its attribute name, return attributes content.
- Parameters
element – HTML element
attribute – attribute name, defaults to textContent
- Returns
HTML element’s attribute text value
-
abstract
get_element_code
(element) → str¶ Given an HTML element return its HTML code.
- Parameters
element – HTML element
- Returns
string HTML code
-
abstract
get_element_text
(element) → str¶ Given an HTML element return its text content.
- Parameters
element – HTML element
- Returns
string text content
-
abstract
get_elements
(xpath_query: str)¶ Get a list of HTML elements using xpath query on page parsed web page.
- Parameters
xpath_query (str) – xpath elements query
- Returns
list of HTML elements that can be used in other parser methods
-
abstract
-
class
seoaudit.analyzer.page_parser.
LXMLPageParser
(url)¶ Web page parser with lxml core.
-
get_element_attribute
(element: lxml.html.HtmlElement, attribute='textContent') → str¶ Given an HTML element and its attribute name, return attributes content.
- Parameters
element – HTML element of lxml HtmlElement type
attribute – attribute name, defaults to textContent
- Returns
HTML element’s attribute text value
-
get_element_code
(element) → str¶ Given an HTML element return its HTML code.
- Parameters
element (HtmlElement) – HTML element of lxml HtmlElement type
- Returns
string HTML code
-
get_element_text
(element) → str¶ Returns visible text of HTML element. If string HTML element is passed it returns it. This makes this function able to be iteratively called on page parser elements even if they are returned as a mix of HtmlElements and str.
- Parameters
element (HtmlElement | str) – HTML element of lxml HtmlElement type which has method text_content() or a string
of HTML element (representation) –
- Returns
string text content
-
get_elements
(xpath_query: str)¶ Get a list of HTML elements using xpath query on page parsed web page.
- Parameters
xpath_query (str) – xpath elements query
Returns: a list of lxml HtmlElement elements
-
-
class
seoaudit.analyzer.page_parser.
SeleniumPageParser
(url)¶ Web page parser with Selenium Webdriver core.
-
get_element_attribute
(element: selenium.webdriver.remote.webelement.WebElement, attribute='textContent') → str¶ Given an HTML element and its attribute name, return attributes content.
- Parameters
element – HTML element of Selenium WebElement type
attribute – attribute name, defaults to textContent
- Returns
HTML element’s attribute text value
-
get_element_code
(element) → str¶ Given an HTML element return its HTML code.
- Parameters
element (WebElement) – HTML element of Selenium WebElement type
- Returns
string HTML code
-
get_element_text
(element) → str¶ Returns visible text of HTML element. If string HTML element is passed it returns it. This makes this function able to be iteratively called on page parser elements even if they are returned as a mix of WebElements and str.
- Parameters
element (WebElement | str) – HTML element of Selenium WebElement type which has attribute text or a string
of HTML element (representation) –
- Returns
string text content
-
get_elements
(xpath_query: str)¶ Get a list of HTML elements using xpath query on page parsed web page.
- Parameters
xpath_query (str) – xpath elements query
Returns: a list of selenium Webdriver elements
-
Element Checks¶
This modules contains all of the predefined element checks. Element check works at single DOM element level.
Predefined element checks are enumerated in ElementCheck enum with each enum value containing the name of the class that implements the defined check by extending AbstractElementCheck class.
When functionality of predefined element checks is not enough, custom ElementCheck can be created by extending AbstractElementCheck class.
Typical usage example:
content = “abc” check = check_content(ElementCheck.MIN_LENGTH, “abc”, 2) # check = True check = check_content(ElementCheck.MIN_LENGTH, “abc”, 4) # check = False
-
class
seoaudit.checks.element.
AbstractElementCheck
¶ Abstract class that serves as a blueprint for element check classes.
-
abstract
check_content
(content: str, **kwargs)¶ Returns check validity of the given element.
- Parameters
content (str) – element content value on which check is performed
*kwargs – keyword check arguments (e.g. a number representing a minimal length value)
- Returns
a boolean value representing checks validity preceded by any extra check result information
-
abstract
-
class
seoaudit.checks.element.
AttributeFoundCheck
¶ Checks if content attribute is found and not empty.
-
check_content
(content: str, **unused)¶ - Parameters
content – element content value on which check is performed
unused – unused parameter defined to extend AbstractElementCheck
- Returns
boolean check result
-
-
class
seoaudit.checks.element.
ElementCheck
¶ Enum representing all of the predefined element check types.
-
class
seoaudit.checks.element.
MaxLengthCheck
¶ Check if content length is smaller or equal to maximal length..
-
check_content
(content: str, **kwargs)¶ - Parameters
content – element content value on which check is performed
kwargs – keyword arguments (map) that includes ‘max_length’ parameter which defaults to 0 if not defined
- Returns
tuple including boolean check result and content length
- Return type
Tuple(boolean, int)
-
-
class
seoaudit.checks.element.
MinLengthCheck
¶ Check if content length is bigger or equal to minimal length.
-
check_content
(content: str, **kwargs)¶ - Parameters
content – element content value on which check is performed
kwargs – keyword arguments (map) that includes ‘min_length’ parameter which defaults to 0 if not defined
- Returns
tuple including boolean check result and content length
- Return type
Tuple(boolean, int)
-
-
class
seoaudit.checks.element.
RegexMatchCheck
¶ Implements content regex match check.
-
check_content
(content: str, **kwargs)¶ - Parameters
content – element content value on which check is performed
kwargs – keyword argument (map) that includes ‘regex’ parameter which defaults to ‘.*’ if not defined
- Returns
tuple including boolean check result and content length
- Return type
Tuple(boolean, int)
-
-
seoaudit.checks.element.
check_content
(check: seoaudit.checks.element.ElementCheck, content: str, **kwargs)¶ Wrapper function to perform a check defined by the given ElementCheck.
- Parameters
check (ElementCheck) – Enum identifying type of the check
content (str) – element content value on which check is performed
*kwargs – various content check arguments (e.g. a number representing a minimal length value)
- Returns
a boolean value representing checks validity preceded by any extra check result information