From 2808185f9e644c8103788dd3a644c27ed841ace5 Mon Sep 17 00:00:00 2001 From: cyian-1756 Date: Sat, 10 Mar 2018 20:18:34 -0500 Subject: [PATCH] Documented normalizeUrl --- How-To-Create-A-Ripper-for-HTML-websites.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/How-To-Create-A-Ripper-for-HTML-websites.md b/How-To-Create-A-Ripper-for-HTML-websites.md index 8e3c5b1..a0ca1bc 100644 --- a/How-To-Create-A-Ripper-for-HTML-websites.md +++ b/How-To-Create-A-Ripper-for-HTML-websites.md @@ -126,6 +126,24 @@ Throws: `IOException` if no next page can be retrieved. --- +#### String normalizeUrl(String url) + +Returns: The url that will be either written to or checked against the url history + +This function normalizes a url so that non-static urls can be used with the url history file + +Here is an example removing the time stamp ID from instagram links + +```java + @Override + public String normalizeUrl(String url) { + // Remove the date sig from the url + return url.replaceAll("/[A-Z0-9]{8}/", "/"); + } +``` + +--- + #### List getURLsFromPage(Document) Input: Jsoup `Document` retrieved in the `getFirstPage()` method (and optionally the `getNextPage()` method).