| Modifier and Type | Field and Description |
|---|---|
protected WebcrawlerConnector.DocumentURLFilter |
WebcrawlerConnector.ProcessActivityLinkHandler.filter |
| Modifier and Type | Method and Description |
|---|---|
protected java.lang.String |
WebcrawlerConnector.doCanonicalization(WebcrawlerConnector.DocumentURLFilter filter,
WebURL url)
Code to canonicalize a URL.
|
protected boolean |
WebcrawlerConnector.extractLinks(java.lang.String documentIdentifier,
org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities,
WebcrawlerConnector.DocumentURLFilter filter)
Code to extract links from an already-fetched document.
|
protected java.lang.String |
WebcrawlerConnector.makeDocumentIdentifier(java.lang.String parentIdentifier,
java.lang.String rawURL,
WebcrawlerConnector.DocumentURLFilter filter,
org.apache.manifoldcf.crawler.interfaces.IHistoryActivity activities)
Convert an absolute or relative URL to a document identifier.
|
protected void |
WebcrawlerConnector.processDocument(org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities,
java.lang.String documentIdentifier,
java.lang.String versionString,
boolean indexDocument,
java.util.Map<java.lang.String,java.util.Set<java.lang.String>> metaHash,
java.lang.String[] acls,
WebcrawlerConnector.DocumentURLFilter filter) |
| Constructor and Description |
|---|
ProcessActivityHTMLHandler(java.lang.String documentIdentifier,
org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities,
WebcrawlerConnector.DocumentURLFilter filter,
int metaRobotTagsUsage)
Constructor.
|
ProcessActivityLinkHandler(java.lang.String documentIdentifier,
org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities,
WebcrawlerConnector.DocumentURLFilter filter,
java.lang.String contextDescription,
java.lang.String linkType)
Constructor.
|
ProcessActivityRedirectionHandler(java.lang.String documentIdentifier,
org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities,
WebcrawlerConnector.DocumentURLFilter filter)
Constructor.
|
ProcessActivityXMLHandler(java.lang.String documentIdentifier,
org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities,
WebcrawlerConnector.DocumentURLFilter filter)
Constructor.
|