public class RSSConnector
extends org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector
| Modifier and Type | Class and Description |
|---|---|
protected static class |
RSSConnector.CanonicalizationPolicies
Class representing a list of canonicalization rules
|
protected static class |
RSSConnector.CanonicalizationPolicy
Class representing a URL regular expression match, for the purposes of determining canonicalization policy
|
protected static class |
RSSConnector.EvaluatorToken
Evaluator token.
|
protected static class |
RSSConnector.EvaluatorTokenStream
Token stream.
|
protected class |
RSSConnector.FeedAuthorContextClass |
protected class |
RSSConnector.FeedContextClass |
protected class |
RSSConnector.FeedItemContextClass |
protected static class |
RSSConnector.Filter
Class that handles parsing and interpretation of the document specification.
|
protected static class |
RSSConnector.MappingRule
Class representing a mapping rule
|
protected static class |
RSSConnector.MappingRules
Class that represents all mappings
|
protected static class |
RSSConnector.NameValue
Name/value class
|
protected class |
RSSConnector.OuterContextClass
This class handles the outermost XML context for the feed document.
|
protected class |
RSSConnector.RDFContextClass |
protected class |
RSSConnector.RDFItemContextClass |
protected class |
RSSConnector.RSSChannelContextClass |
protected class |
RSSConnector.RSSContextClass |
protected class |
RSSConnector.RSSItemContextClass |
protected static class |
RSSConnector.ThrottleSpec
The throttle specification class.
|
protected class |
RSSConnector.UrlsetContextClass |
protected class |
RSSConnector.UrlsetItemContextClass |
| Modifier and Type | Field and Description |
|---|---|
static java.lang.String |
_rcsid |
static java.lang.String |
ACTIVITY_FETCH |
static java.lang.String |
ACTIVITY_PROCESS |
static java.lang.String |
ACTIVITY_ROBOTSPARSE |
protected static DataCache |
cache |
static int |
CHROMED_METADATA_ONLY
Chromed suppression mode - index metadata only if dechromed content not available
|
static int |
CHROMED_SKIP
Chromed suppression mode - skip documents if dechromed content not available
|
static int |
CHROMED_USE
Chromed suppression mode - use chromed content if dechromed content not available
|
static int |
DECHROMED_CONTENT
Dechromed content mode - content field
|
static int |
DECHROMED_DESCRIPTION
Dechromed content mode - description field
|
static int |
DECHROMED_NONE
Dechromed content mode - none
|
protected ThrottledFetcher |
fetcher
The throttled fetcher used by this instance
|
protected static java.util.Map<java.lang.String,ThrottledFetcher> |
fetcherMap
Storage for fetcher objects
|
protected java.lang.String |
from
The email address for this connector instance
|
protected boolean |
isInitialized
Flag indicating whether session data is initialized
|
protected int |
maxOpenConnectionsPerServer
The maximum open connections
|
protected double |
minimumMillisecondsPerBytePerServer
The minimum milliseconds between bytes
|
protected long |
minimumMillisecondsPerFetchPerServer
The minimum milliseconds between fetches
|
protected java.lang.String |
proxyAuthDomain
Proxy auth domain
|
protected java.lang.String |
proxyAuthPassword
Proxy auth password
|
protected java.lang.String |
proxyAuthUsername
Proxy auth username
|
protected java.lang.String |
proxyHost
The proxy host
|
protected int |
proxyPort
The proxy port
|
protected Robots |
robots
The robots object used by this instance
|
protected static int |
ROBOTS_ALL |
protected static int |
ROBOTS_DATA |
protected static int |
ROBOTS_NONE |
protected static java.util.Map |
robotsMap
Storage for robots objects
|
protected int |
robotsUsage
Robots usage flag
|
protected static java.lang.String |
rssThrottleGroupType |
protected java.lang.String |
throttleGroupName
The throttle group name
|
protected static java.util.Map |
understoodProtocols |
protected java.lang.String |
userAgent
The user-agent for this connector instance
|
protected static java.util.Set<java.lang.String> |
xmlContentTypes |
currentContext, paramsGLOBAL_DENY_TOKEN, JOBMODE_CONTINUOUS, JOBMODE_ONCEONLY, MODEL_ADD, MODEL_ADD_CHANGE, MODEL_ADD_CHANGE_DELETE, MODEL_ALL, MODEL_CHAINED_ADD, MODEL_CHAINED_ADD_CHANGE, MODEL_CHAINED_ADD_CHANGE_DELETE, MODEL_PARTIAL| Constructor and Description |
|---|
RSSConnector()
Constructor.
|
| Modifier and Type | Method and Description |
|---|---|
java.lang.String |
addSeedDocuments(org.apache.manifoldcf.crawler.interfaces.ISeedingActivity activities,
org.apache.manifoldcf.core.interfaces.Specification spec,
java.lang.String lastSeedVersion,
long seedTime,
int jobMode)
Queue "seed" documents.
|
java.lang.String |
check()
Check status of connection.
|
protected static void |
compileList(java.util.List<java.util.regex.Pattern> output,
java.util.List<java.lang.String> input)
Compile all regexp entries in the passed in list, and add them to the output
list.
|
void |
connect(org.apache.manifoldcf.core.interfaces.ConfigParams configParams)
Connect.
|
void |
disconnect()
Close the connection.
|
protected static java.lang.String |
doCanonicalization(RSSConnector.CanonicalizationPolicy p,
WebURL url)
Code to canonicalize a URL.
|
java.lang.String[] |
getActivitiesList()
Return the list of activities that this connector supports (i.e.
|
java.lang.String[] |
getBinNames(java.lang.String documentIdentifier)
Get the bin name string for a document identifier.
|
int |
getConnectorModel()
Tell the world what model this connector uses for getDocumentIdentifiers().
|
protected ThrottledFetcher |
getFetcher()
Given the current parameters, find the correct throttled fetcher object
(or create one if not there).
|
int |
getMaxDocumentRequest()
Get the maximum number of documents to amalgamate together into one batch, for this connector.
|
protected Robots |
getRobots(ThrottledFetcher fetcher)
Given the current parameters, find the correct robots object (or create
one if none found).
|
protected void |
getSession()
Establish a session
|
protected static void |
handleIOException(java.io.IOException e,
java.lang.String context) |
protected void |
handleRSSFeedSAX(java.lang.String documentIdentifier,
org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities,
RSSConnector.Filter filter)
Handle an RSS feed document, using SAX to limit the memory impact
|
protected static java.lang.String |
makeDocumentIdentifier(RSSConnector.CanonicalizationPolicies policies,
java.lang.String parentIdentifier,
java.lang.String rawURL)
Convert an absolute or relative URL to a document identifier.
|
void |
outputConfigurationBody(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
java.util.Locale locale,
org.apache.manifoldcf.core.interfaces.ConfigParams parameters,
java.lang.String tabName)
Output the configuration body section.
|
void |
outputConfigurationHeader(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
java.util.Locale locale,
org.apache.manifoldcf.core.interfaces.ConfigParams parameters,
java.util.List<java.lang.String> tabsArray)
Output the configuration header section.
|
void |
outputSpecificationBody(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
java.util.Locale locale,
org.apache.manifoldcf.core.interfaces.Specification ds,
int connectionSequenceNumber,
int actualSequenceNumber,
java.lang.String tabName)
Output the specification body section.
|
void |
outputSpecificationHeader(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
java.util.Locale locale,
org.apache.manifoldcf.core.interfaces.Specification ds,
int connectionSequenceNumber,
java.util.List<java.lang.String> tabsArray)
Output the specification header section.
|
void |
poll()
This method is periodically called for all connectors that are connected but not
in active use.
|
java.lang.String |
processConfigurationPost(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
org.apache.manifoldcf.core.interfaces.IPostParameters variableContext,
java.util.Locale locale,
org.apache.manifoldcf.core.interfaces.ConfigParams parameters)
Process a configuration post.
|
void |
processDocuments(java.lang.String[] documentIdentifiers,
org.apache.manifoldcf.crawler.interfaces.IExistingVersions statuses,
org.apache.manifoldcf.core.interfaces.Specification spec,
org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities,
int jobMode,
boolean usesDefaultAuthority)
Process a set of documents.
|
java.lang.String |
processSpecificationPost(org.apache.manifoldcf.core.interfaces.IPostParameters variableContext,
java.util.Locale locale,
org.apache.manifoldcf.core.interfaces.Specification ds,
int connectionSequenceNumber)
Process a specification post.
|
protected static java.util.List<java.lang.String> |
stringToArray(java.lang.String input)
Read a string as a sequence of individual expressions, urls, etc.
|
void |
viewConfiguration(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
java.util.Locale locale,
org.apache.manifoldcf.core.interfaces.ConfigParams parameters)
View configuration.
|
void |
viewSpecification(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
java.util.Locale locale,
org.apache.manifoldcf.core.interfaces.Specification ds,
int connectionSequenceNumber)
View specification.
|
getFormCheckJavascriptMethodName, getFormPresaveCheckJavascriptMethodName, getRelationshipTypes, requestInfoclearThreadContext, deinstall, getConfiguration, install, isConnected, outputConfigurationBody, outputConfigurationHeader, outputConfigurationHeader, pack, packFixedList, packList, packList, processConfigurationPost, setThreadContext, unpack, unpackFixedList, unpackList, viewConfigurationpublic static final java.lang.String _rcsid
protected static final java.lang.String rssThrottleGroupType
protected static final int ROBOTS_NONE
protected static final int ROBOTS_DATA
protected static final int ROBOTS_ALL
public static final int DECHROMED_NONE
public static final int DECHROMED_DESCRIPTION
public static final int DECHROMED_CONTENT
public static final int CHROMED_USE
public static final int CHROMED_SKIP
public static final int CHROMED_METADATA_ONLY
protected int robotsUsage
protected java.lang.String userAgent
protected java.lang.String from
protected long minimumMillisecondsPerFetchPerServer
protected int maxOpenConnectionsPerServer
protected double minimumMillisecondsPerBytePerServer
protected java.lang.String throttleGroupName
protected java.lang.String proxyHost
protected int proxyPort
protected java.lang.String proxyAuthDomain
protected java.lang.String proxyAuthUsername
protected java.lang.String proxyAuthPassword
protected ThrottledFetcher fetcher
protected Robots robots
protected static java.util.Map<java.lang.String,ThrottledFetcher> fetcherMap
protected static java.util.Map robotsMap
protected boolean isInitialized
protected static DataCache cache
protected static final java.util.Map understoodProtocols
public static final java.lang.String ACTIVITY_FETCH
public static final java.lang.String ACTIVITY_ROBOTSPARSE
public static final java.lang.String ACTIVITY_PROCESS
protected static java.util.Set<java.lang.String> xmlContentTypes
protected void getSession()
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionpublic java.lang.String[] getActivitiesList()
getActivitiesList in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnectorgetActivitiesList in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectorpublic int getConnectorModel()
getConnectorModel in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnectorgetConnectorModel in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectorpublic void connect(org.apache.manifoldcf.core.interfaces.ConfigParams configParams)
connect in interface org.apache.manifoldcf.core.interfaces.IConnectorconnect in class org.apache.manifoldcf.core.connector.BaseConnectorconfigParams - are the configuration parameters for this connection.
Note well: There are no exceptions allowed from this call, since it is expected to mainly establish connection parameters.public void poll()
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
poll in interface org.apache.manifoldcf.core.interfaces.IConnectorpoll in class org.apache.manifoldcf.core.connector.BaseConnectororg.apache.manifoldcf.core.interfaces.ManifoldCFExceptionpublic java.lang.String check()
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
check in interface org.apache.manifoldcf.core.interfaces.IConnectorcheck in class org.apache.manifoldcf.core.connector.BaseConnectororg.apache.manifoldcf.core.interfaces.ManifoldCFExceptionpublic void disconnect()
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
disconnect in interface org.apache.manifoldcf.core.interfaces.IConnectordisconnect in class org.apache.manifoldcf.core.connector.BaseConnectororg.apache.manifoldcf.core.interfaces.ManifoldCFExceptionpublic java.lang.String[] getBinNames(java.lang.String documentIdentifier)
getBinNames in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnectorgetBinNames in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectordocumentIdentifier - is the document identifier.public java.lang.String addSeedDocuments(org.apache.manifoldcf.crawler.interfaces.ISeedingActivity activities,
org.apache.manifoldcf.core.interfaces.Specification spec,
java.lang.String lastSeedVersion,
long seedTime,
int jobMode)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
org.apache.manifoldcf.agents.interfaces.ServiceInterruption
addSeedDocuments in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnectoraddSeedDocuments in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectoractivities - is the interface this method should use to perform whatever framework actions are desired.spec - is a document specification (that comes from the job).seedTime - is the end of the time range of documents to consider, exclusive.lastSeedVersion - is the last seeding version string for this job, or null if the job has no previous seeding version string.jobMode - is an integer describing how the job is being run, whether continuous or once-only.org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruptionprotected static java.lang.String makeDocumentIdentifier(RSSConnector.CanonicalizationPolicies policies, java.lang.String parentIdentifier, java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
policies - are the canonicalization policies in effect.parentIdentifier - the identifier of the document in which the raw url was found, or null if none.rawURL - is the raw, un-normalized and un-canonicalized url.org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionprotected static java.lang.String doCanonicalization(RSSConnector.CanonicalizationPolicy p, WebURL url) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, java.net.URISyntaxException
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionjava.net.URISyntaxExceptionpublic void processDocuments(java.lang.String[] documentIdentifiers,
org.apache.manifoldcf.crawler.interfaces.IExistingVersions statuses,
org.apache.manifoldcf.core.interfaces.Specification spec,
org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities,
int jobMode,
boolean usesDefaultAuthority)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
org.apache.manifoldcf.agents.interfaces.ServiceInterruption
processDocuments in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnectorprocessDocuments in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectordocumentIdentifiers - is the set of document identifiers to process.statuses - are the currently-stored document versions for each document in the set of document identifiers
passed in above.activities - is the interface this method should use to queue up new document references
and ingest documents.jobMode - is an integer describing how the job is being run, whether continuous or once-only.usesDefaultAuthority - will be true only if the authority in use for these documents is the default one.org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruptionprotected static void handleIOException(java.io.IOException e,
java.lang.String context)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
org.apache.manifoldcf.agents.interfaces.ServiceInterruption
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruptionpublic void outputConfigurationHeader(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
java.util.Locale locale,
org.apache.manifoldcf.core.interfaces.ConfigParams parameters,
java.util.List<java.lang.String> tabsArray)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
java.io.IOException
outputConfigurationHeader in interface org.apache.manifoldcf.core.interfaces.IConnectoroutputConfigurationHeader in class org.apache.manifoldcf.core.connector.BaseConnectorthreadContext - is the local thread context.out - is the output to which any HTML should be sent.parameters - are the configuration parameters, as they currently exist, for this connection being configured.tabsArray - is an array of tab names. Add to this array any tab names that are specific to the connector.org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionjava.io.IOExceptionpublic void outputConfigurationBody(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
java.util.Locale locale,
org.apache.manifoldcf.core.interfaces.ConfigParams parameters,
java.lang.String tabName)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
java.io.IOException
outputConfigurationBody in interface org.apache.manifoldcf.core.interfaces.IConnectoroutputConfigurationBody in class org.apache.manifoldcf.core.connector.BaseConnectorthreadContext - is the local thread context.out - is the output to which any HTML should be sent.parameters - are the configuration parameters, as they currently exist, for this connection being configured.tabName - is the current tab name.org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionjava.io.IOExceptionpublic java.lang.String processConfigurationPost(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
org.apache.manifoldcf.core.interfaces.IPostParameters variableContext,
java.util.Locale locale,
org.apache.manifoldcf.core.interfaces.ConfigParams parameters)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
processConfigurationPost in interface org.apache.manifoldcf.core.interfaces.IConnectorprocessConfigurationPost in class org.apache.manifoldcf.core.connector.BaseConnectorthreadContext - is the local thread context.variableContext - is the set of variables available from the post, including binary file post information.parameters - are the configuration parameters, as they currently exist, for this connection being configured.org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionpublic void viewConfiguration(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
java.util.Locale locale,
org.apache.manifoldcf.core.interfaces.ConfigParams parameters)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
java.io.IOException
viewConfiguration in interface org.apache.manifoldcf.core.interfaces.IConnectorviewConfiguration in class org.apache.manifoldcf.core.connector.BaseConnectorthreadContext - is the local thread context.out - is the output to which any HTML should be sent.parameters - are the configuration parameters, as they currently exist, for this connection being configured.org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionjava.io.IOExceptionpublic void outputSpecificationHeader(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
java.util.Locale locale,
org.apache.manifoldcf.core.interfaces.Specification ds,
int connectionSequenceNumber,
java.util.List<java.lang.String> tabsArray)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
java.io.IOException
outputSpecificationHeader in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnectoroutputSpecificationHeader in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectorout - is the output to which any HTML should be sent.locale - is the locale the output is preferred to be in.ds - is the current document specification for this job.connectionSequenceNumber - is the unique number of this connection within the job.tabsArray - is an array of tab names. Add to this array any tab names that are specific to the connector.org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionjava.io.IOExceptionpublic void outputSpecificationBody(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
java.util.Locale locale,
org.apache.manifoldcf.core.interfaces.Specification ds,
int connectionSequenceNumber,
int actualSequenceNumber,
java.lang.String tabName)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
java.io.IOException
outputSpecificationBody in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnectoroutputSpecificationBody in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectorout - is the output to which any HTML should be sent.locale - is the locale the output is preferred to be in.ds - is the current document specification for this job.connectionSequenceNumber - is the unique number of this connection within the job.actualSequenceNumber - is the connection within the job that has currently been selected.tabName - is the current tab name. (actualSequenceNumber, tabName) form a unique tuple within
the job.org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionjava.io.IOExceptionpublic java.lang.String processSpecificationPost(org.apache.manifoldcf.core.interfaces.IPostParameters variableContext,
java.util.Locale locale,
org.apache.manifoldcf.core.interfaces.Specification ds,
int connectionSequenceNumber)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
processSpecificationPost in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnectorprocessSpecificationPost in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectorvariableContext - contains the post data, including binary file-upload information.locale - is the locale the output is preferred to be in.ds - is the current document specification for this job.connectionSequenceNumber - is the unique number of this connection within the job.org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionpublic void viewSpecification(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
java.util.Locale locale,
org.apache.manifoldcf.core.interfaces.Specification ds,
int connectionSequenceNumber)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
java.io.IOException
viewSpecification in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnectorviewSpecification in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectorout - is the output to which any HTML should be sent.locale - is the locale the output is preferred to be in.ds - is the current document specification for this job.connectionSequenceNumber - is the unique number of this connection within the job.org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionjava.io.IOExceptionprotected void handleRSSFeedSAX(java.lang.String documentIdentifier,
org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities,
RSSConnector.Filter filter)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
org.apache.manifoldcf.agents.interfaces.ServiceInterruption
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruptionpublic int getMaxDocumentRequest()
getMaxDocumentRequest in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnectorgetMaxDocumentRequest in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectorprotected ThrottledFetcher getFetcher()
protected static java.util.List<java.lang.String> stringToArray(java.lang.String input)
protected static void compileList(java.util.List<java.util.regex.Pattern> output,
java.util.List<java.lang.String> input)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionprotected Robots getRobots(ThrottledFetcher fetcher)