Queue and history

Parent Previous Next

Max urls in queue and history. Maximum permissible queue and history URLs total number. If setting is set to zero, restriction is ignored.

New URLs to the queue beginning. Add new found URLs to the queue beginning (by default they are added to the end). Thus Datacol will process them prior to other URLs. Setting used to make parsing process more demonstrative in case of category URLs large number put to Starting URL list.

Save / Use dump queue and history. If this checkbox is ON, Datacol will save Queue and History to dump after parsing process finalization. On the next parsing process launch Datacol will upload Queue and History from dump.

Starting URLs always (regardless dump presence). If this checkbox is ON, Datacol will always (when launching parsing process) add Starting URL list to Queue, even when they are present in History (uploaded from dump).

Reset history dump after parsing. If this checkbox is ON, Datacol will clean dump of processed pages History after parsing process finalization.

Reset queue dump after parsing. If this checkbox is ON, Datacol will clean dump of pages to process Queue after parsing process finalization.

Queue to history dump after parsing. If this checkbox is ON, Datacol will put URLs remaining in Queue to History dump after parsing process finalization. This setting let us harvest frequently updated blogs, because visiting starting URLs Datacol will collect just new links - old ones (found earlier) will be ignored.

Avoid adding catalog URLs to history dump. If this checkbox is ON, calalog URLs (URLs where link collecting is implemented) are not added to History dump. This settings let us regularly rescan goods (or ads) directories, where new links usually appear .

Created with the Personal Edition of HelpNDoc: Free Kindle producer