Scrapers

The following scrapers are available:

  • groups.get
  • groups.getInfo
  • groups.join
  • stream.getByAuthor, works only with a group’s id
from aiomailru.scrapers import APIScraper

api = APIScraper(session)
groups = await api.groups.get(scrape=True)  # current user's groups

Scrapers have the following requirements:

  • Cookies
  • Pyppeteer
  • Browserless

Cookies

If session is instance of TokenSession you must set cookies that were given by ImplicitSession:

session = ServerSession(app_id, secret_key, access_token, cookies=cookies)

Pyppeteer

Scrapers require an instance of Chrome.

You can start a new Chrome process:

from aiomailru.scrapers import APIScraper
from pyppeteer import launch

browser = await launch()
api = APIScraper(session, browser=browser)

print(browser.wsEndpoint)  # your browser's endpoint

or connect to the existing Chrome:

from aiomailru.scrapers import APIScraper
from pyppeteer import connect

browser_conn = {'browserWSEndpoint': 'your_endpoint'}
browser = await connect(browser_conn)
api = APIScraper(session, browser=browser)

Export environment variable

$ export PYPPETEER_BROWSER_ENDPOINT='your_endpoint'

to automatically connect to Chrome:

from aiomailru.scrapers import APIScraper
api = APIScraper(session)  # connects to PYPPETEER_BROWSER_ENDPOINT

Browserless

You can replace pyppeteer.launch with pyppeteer.connect. See https://www.browserless.io

Start headless chrome using

$ docker-compose up -d chrome

Export environment variable

$ export PYPPETEER_BROWSER_ENDPOINT=ws://localhost:3000

to automatically connect to Browserless container:

from aiomailru.scrapers import APIScraper
api = APIScraper(session)  # connects to ws://localhost:3000