Snippet – Wiktor Tech Notes

How to quickly get html meta tags of a page ?

Here you can grab a short example of how to crawl html meta tags of a page.

import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; def url = "http://wordpress.com"; def document = Jsoup.connect(url).get(); def metaTags = document.getElementsByTag("meta"); for (Element metaTag: metaTags) { def tagName = metaTag.attr("name") def tagProperty = metaTag.attr("property") def tagContent = metaTag.attr("content") println String.format("Name: %s, property : %s, content : %s", tagName, tagProperty, tagContent); }

Sometimes there is need to handle one of many ordered items on page, we can use integer as identifier, but it looks unnatural, for example :

User named “John” is author of 1 article

The number is misleading : it can be read as one, but we would like it to be read like “first”.

Below there is a Java example, where you can use both patterns (1st or 1).

 @Then("^User named \"(.*)\" is author of \"(\\d+)(?:st|nd|rd|th)\" article$")
public void UserNamedIsAuthorOfNthArticle(String name, int nth) {
    boolean isAuthor = testPage.getNthArticle(nth).getAuthor().equals(name);
    assertTrue("User named " + name + " is author of " + nth + " article", isAuthor);
}

Category: Snippet

Groovy for crawlling website meta tags

How to quickly get html meta tags of a page ?

Cucumber statement definition parameters : 1st, 2nd, 3rd, 4th