Scrapers¶
The following scrapers are available:
groups.getgroups.getInfogroups.joinstream.getByAuthor, works only with a group’s id
from aiomailru.scrapers import APIScraper
api = APIScraper(session)
groups = await api.groups.get(scrape=True) # current user's groups
Scrapers have the following requirements:
- Cookies
- Pyppeteer
- Browserless
Cookies¶
If session is instance of TokenSession you must set cookies
that were given by ImplicitSession:
session = ServerSession(app_id, secret_key, access_token, cookies=cookies)
Pyppeteer¶
Scrapers require an instance of Chrome.
You can start a new Chrome process:
from aiomailru.scrapers import APIScraper
from pyppeteer import launch
browser = await launch()
api = APIScraper(session, browser=browser)
print(browser.wsEndpoint) # your browser's endpoint
or connect to the existing Chrome:
from aiomailru.scrapers import APIScraper
from pyppeteer import connect
browser_conn = {'browserWSEndpoint': 'your_endpoint'}
browser = await connect(browser_conn)
api = APIScraper(session, browser=browser)
Export environment variable
$ export PYPPETEER_BROWSER_ENDPOINT='your_endpoint'
to automatically connect to Chrome:
from aiomailru.scrapers import APIScraper
api = APIScraper(session) # connects to PYPPETEER_BROWSER_ENDPOINT
Browserless¶
You can replace pyppeteer.launch with pyppeteer.connect.
See https://www.browserless.io
Start headless chrome using
$ docker-compose up -d chrome
Export environment variable
$ export PYPPETEER_BROWSER_ENDPOINT=ws://localhost:3000
to automatically connect to Browserless container:
from aiomailru.scrapers import APIScraper
api = APIScraper(session) # connects to ws://localhost:3000