Agent

Agent is the most intelligent part of the scraping process. It simulates user activity (mouse click on the web page, type text into the input fields, enable check boxes, select items in the drop boxes and the lists, etc...)

On every page Agent can extract data, follow the links, iterate linked pages from the search result or do some user like activity.

Agent consists of states. Each state is related to new loaded page. Agent starts from the special Init state, which is not associated with any web page. The main purpose of this state is to load initial web page. Example of Agent with two states:

Conecepts: Agent Two  States

  1. Init - loads initial web page. This page will be associated with State1.
  2. State1 - capture list data.

Each page load is associated with a state. Here is the example of the Agent with three states:

Conecepts: Agent Three States

  1. Init loads initial web page. This page will be associated with State1.
  2. State1 extracts links to the detail pages and loads them. The loaded page will be associated with State2.
  3. State2 captures data from details pages.

You can read more about Agent here.