Scrapers¶
The following scrapers are available:
groups.get
groups.getInfo
groups.join
stream.getByAuthor
, works only with a group’s id
from aiomailru.scrapers import APIScraper
api = APIScraper(session)
groups = await api.groups.get(scrape=True) # current user's groups
Scrapers have the following requirements:
- Cookies
- Pyppeteer
- Browserless
Cookies¶
If session
is instance of TokenSession
you must set cookies
that were given by ImplicitSession
:
session = ServerSession(app_id, secret_key, access_token, cookies=cookies)
Pyppeteer¶
Scrapers require an instance of Chrome.
You can start a new Chrome process:
from aiomailru.scrapers import APIScraper
from pyppeteer import launch
browser = await launch()
api = APIScraper(session, browser=browser)
print(browser.wsEndpoint) # your browser's endpoint
or connect to the existing Chrome:
from aiomailru.scrapers import APIScraper
from pyppeteer import connect
browser_conn = {'browserWSEndpoint': 'your_endpoint'}
browser = await connect(browser_conn)
api = APIScraper(session, browser=browser)
Export environment variable
$ export PYPPETEER_BROWSER_ENDPOINT='your_endpoint'
to automatically connect to Chrome:
from aiomailru.scrapers import APIScraper
api = APIScraper(session) # connects to PYPPETEER_BROWSER_ENDPOINT
Browserless¶
You can replace pyppeteer.launch
with pyppeteer.connect
.
See https://www.browserless.io
Start headless chrome using
$ docker-compose up -d chrome
Export environment variable
$ export PYPPETEER_BROWSER_ENDPOINT=ws://localhost:3000
to automatically connect to Browserless container:
from aiomailru.scrapers import APIScraper
api = APIScraper(session) # connects to ws://localhost:3000