Basic definitions

Previous Next

Parsing campaign - set of configuration parameters for Datacol to implement definite parsing task (e.g. collect data from specific website).


Link (Datacol actual definition) - webpages sources code area, refering to other webpage of the same or different website. In HTML link is specified with <a> tag. It has the following basic struct:


<a href="filename">link text</a>


where:

filename is WWW address where link refers;

link text is hyperlink text, which you see in HTML document.


Link address can be absolute or relevant. Absolute addresses start with protocol signature (usually http://) and include website domain name. Relative addresses show the path from website root directory OR from current document..


Below there are several examples of links:


Absolute:

<a href="/photo/my_photo.html">My photo album</a> - refering to my_photo.html document, located in photo folder of the root directory of the same website. In HTML document you will see My photo album text for this link.


Relative:

<a href="http://www.site.com">Other site</a> - refering to other website (located on the remote webserver). In HTML document you will see Other site text for this link.


Link examples are also shown on screenshot below:


URL - absolute webdocument address, e.g. http://websiteextractor.net/

Referer - webpage, where link to the currently processed webpage was found.

Data fields - data collecting informational units, e.g. article title, article text, article category, item price, item photo etc.

Webpage source code - webpage sources code, which was returned by server without Javascript processing.


CSV file - definition from Wikipedia. Please use semicolon (;) as separator for CSV file. Its very easy to use CSV, because saving process is pretty fast, further data processing and import to CMS is very convenient. Moreover CSV file can be easily opened and edited with Excel.


Dump (history, queue) - data saved to database.


User Agent - definition from Wikipedia.

Created with the Personal Edition of HelpNDoc: Full-featured EPub generator