News crawler is Datacol-based module, which implements automatic news extraction from Dailymail website. Generally, data are saved to xlsx file (open in Excel to explore). You can also adjust Datacol export settings to publish data to website (WordPress, DLE, Joomla), database etc.
Main advantages of Datacol-based news crawler are listed below:
Step by Step test of news crawler
To test news extractor:
1. Install Datacol trial version;
2. Choose news-parsers/news-crawler.par in the campaign tree and click Start button to launch news extractor campaign.
Before launching news-parsers/news-crawler.par you can adjust the Input data. Select the campaign in the campaign tree for this purpose. In this way you can setup links to news categories you need to extract data from.
Please contact us if the news extractor will not collect data after you have made changes to the Starting URL list.
3. Wait for data extraction results to appear. When you see the first results, you can force running campaign to stop (click Stop button).
4. After campaign is finished/stopped you can find news-crawler.xlsx file in Documents folder.
Datacol Trial VS Activated
|Feature||Trial||License (Full version)|
|Preset default configuration for data extraction|
|Maximum data extraction results|
|Free software updates|
|Free email tech support|
|Paid skype+teamviewer consultations|
If the source website blocks your IP-address (after blocking you will get no more extraction results), use proxy.
Data processing options for data, harvested by news extractor:
Data export options for data, harvested by news extractor:
- Basic: CSV/TXT/Database/Excel;
- Online stores: Magento/PrestaShop/osCommerce/OpenCart/ZENCart/VirtueMart;
- Content CMS: WordPress/Joomla/DLE;
- All options.
If you have any questions, related to news extractor, please ask via the contact form.