Regex picking algorithm

Selector ››
Parent Previous Next

Data collecting regex picking algorithm

Selector can make regex picking process much easier. Here is an example. Lets say we want to collect post title:


Garmin Edge 500 Cycling GPS (Neutral Color)


Thus we need to pick regex, which will collect title from any post of the same site. Regex picking algorithm is the following:


1. Load page in Datacol Selector assistant. To be 100% accurate picking regex, please turn on Datacol loader option. In this case webpage will be loaded in the same way as Datacol loads it (particularly without Javascript processing).



2. Left-click several times in browser on post title (i.e. on the text we need to extract), till we find source code substring occurrence, surrounded with tag characters:


>OCCURRENCE<


Note, that if occurrence cannot be found, just use alternative XPATH data extraction mechanism.





Thus in this example the selection will contain the following code:


<h1>Garmin Edge 500 Cycling GPS (Neutral Color)</h1>

After source code substring selection we right-click with mouse on it and in opened context menu select Create regex automatically option:



4. Now the Control window will open and you will find Datacol generated regex, its source code and browser representation. You can also change Group (set 0 or 1) setting to understand better how brackets (...) are used in regex.



If Regex found matches number is 1 and Regex collecting result is exactly the text, you wanted to extract, usually it means that Datacol picked regex correctly.


5. Generated regex need to be put to Datacol setting. In this example we need to set it to Collecting Regex setting (because it is used for data field value extraction).



Note, that if data collecting regex cannot be found (using the above algorithm), just use alternative XPATH data extraction mechanism.

Created with the Personal Edition of HelpNDoc: Free Web Help generator