Headless browser - Puppeteer
About
Puppeteer is a Node library that provides a high-level API over Chrome or Chromium (ie headless chrome)
Puppeteer communicate with the browser via the DevTools Protocol
Articles Related
API
The Puppeteer API is hierarchical and mirrors the browser structure.
- A Browser instance can own multiple browser contexts.
- A BrowserContext instance defines a browsing session and can own multiple pages.
- A Page has at least one frame: main frame. There might be other frames created by iframe or frame tags.
- Frame has at least one execution context - the default execution context - where the frame's JavaScript is executed. A Frame might have additional execution contexts that are associated with extensions.
- Worker has a single execution context and facilitates interacting with WebWorkers.
Component
puppeteer-core
puppeteer-core is a library to help drive anything that supports DevTools protocol. puppeteer-core doesn't download Chromium when installed. Being a library, puppeteer-core is fully driven through its programmatic interface and disregards all the PUPPETEER_* env variables.
puppeteer-core doesn't download Chromium when installed.
Usage:
- build a PDF generator using puppeteer-core and write a custom install.js script that downloads headless_shell instead of Chromium to save disk space.
- to use in Chrome Extension / browser with the DevTools protocol
Code Usage:
const puppeteer = require('puppeteer-core');
- then with an explicit executablePath option, call
puppeteer
When installed, it downloads a version of Chromium, which it then drives using puppeteer-core. https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#environment-variables
Example
Integration
Javascript - Jest-puppeteer with typescript configuration
API / Doc
Launch
const browser = await puppeteer.launch({
headless: false,
slowMo: 200, // slowdown by 200 ms for every operations
devtools: true,
args: [
'--disable-infobars', // Removes the butter bar.
'--start-maximized',
// '--start-fullscreen',
// '--window-size=1920,1080',
// '--kiosk',
],
});
- Puppeteer flag: Launch option
- Chromium flag: (ie args) args flags
Snippet
Serialize and Deserialize a date
Execute Javascript inside the page
Example with local storage and passing parameters
await page.evaluate(
(storageKey) => { localStorage.removeItem(storageKey); },
'theKey'
);
Add a breakpoint
There are two execution context:
- node.js (running the test code)
- and the browser (running application code)
Timeout
If you are going to play with breakpoint, you need to change the timeout accordingly.
In a test file, as jest is available as a global object.
jest.setTimeout(100000);
It will be use in every invocation with the setTimeOut function.
Node breakpoint
- Start the browser with a GUI
const browser = await puppeteer.launch({
headless: false,
slowMo: 250, // slowdown by 250 ms
});
- Set a breakpoint in your IDE and step over each puppeteer step (open, click,…)
Browser breakpoint
- The browser should be start with the devtool
const browser = await puppeteer.launch({devtools: true});
- Add a breakpoint
await page.evaluate(() => {debugger;});
Select
<div class="tweet">
<div class="retweet">10</div>
</div>
/**
* @type {import("puppeteer").ElementHandle<HTMLDivElement>}
*/
const tweetHandle = await page.$('.tweet .retweet');
expect(await tweetHandle.evaluate(node => node.innerText)).toBe('10');
Debug
https://developers.google.com/web/tools/puppeteer/debugging
Documentation / Reference
- https://puppeteersandbox.com/ - A sandbox to run puppeteer code
- https://github.com/smooth-code/jest-puppeteer - Jest with puppetter
- https://puppetry.app/ - An ide to create test via a UI
- https://checklyhq.com/pricing/ - A platform to run puppeteer test continuous monitoring