Headless browser - Puppeteer
Table of Contents
1 - About
Puppeteer is a Node library that provides a high-level API over Chrome or Chromium (ie headless chrome)
Puppeteer communicate with the browser via the DevTools Protocol
2 - Articles Related
3 - API
The Puppeteer API is hierarchical and mirrors the browser structure.
- A Browser instance can own multiple browser contexts.
- A BrowserContext instance defines a browsing session and can own multiple pages.
- A Page has at least one frame: main frame. There might be other frames created by iframe or frame tags.
- Frame has at least one execution context - the default execution context - where the frame's JavaScript is executed. A Frame might have additional execution contexts that are associated with extensions.
- Worker has a single execution context and facilitates interacting with WebWorkers.
4 - Component
4.1 - puppeteer-core
puppeteer-core is a library to help drive anything that supports DevTools protocol. puppeteer-core doesn't download Chromium when installed. Being a library, puppeteer-core is fully driven through its programmatic interface and disregards all the PUPPETEER_* env variables.
puppeteer-core doesn't download Chromium when installed.
Usage:
- build a PDF generator using puppeteer-core and write a custom install.js script that downloads headless_shell instead of Chromium to save disk space.
- to use in Chrome Extension / browser with the DevTools protocol
Code Usage:
const puppeteer = require('puppeteer-core');
- then with an explicit executablePath option, call
4.2 - puppeteer
When installed, it downloads a version of Chromium, which it then drives using puppeteer-core. https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#environment-variables
5 - Example
6 - Integration
7 - API / Doc
7.1 - Launch
const browser = await puppeteer.launch({
headless: false,
slowMo: 200, // slowdown by 200 ms for every operations
devtools: true,
args: [
'--disable-infobars', // Removes the butter bar.
'--start-maximized',
// '--start-fullscreen',
// '--window-size=1920,1080',
// '--kiosk',
],
});
- Puppeteer flag: Launch option
- Chromium flag: (ie args) args flags
8 - Snippet
8.1 - Serialize and Deserialize a date
8.2 - Execute Javascript inside the page
Example with local storage and passing parameters
await page.evaluate(
(storageKey) => { localStorage.removeItem(storageKey); },
'theKey'
);
8.3 - Add a breakpoint
There are two execution context:
- node.js (running the test code)
- and the browser (running application code)
8.3.1 - Timeout
If you are going to play with breakpoint, you need to change the timeout accordingly.
In a test file, as jest is available as a global object.
jest.setTimeout(100000);
It will be use in every invocation with the setTimeOut function.
8.3.2 - Node breakpoint
- Start the browser with a GUI
const browser = await puppeteer.launch({
headless: false,
slowMo: 250, // slowdown by 250 ms
});
- Set a breakpoint in your IDE and step over each puppeteer step (open, click,…)
8.3.3 - Browser breakpoint
- The browser should be start with the devtool
const browser = await puppeteer.launch({devtools: true});
- Add a breakpoint
await page.evaluate(() => {debugger;});
8.4 - Select
<div class="tweet">
<div class="retweet">10</div>
</div>
/**
* @type {import("puppeteer").ElementHandle<HTMLDivElement>}
*/
const tweetHandle = await page.$('.tweet .retweet');
expect(await tweetHandle.evaluate(node => node.innerText)).toBe('10');
9 - Debug
10 - Documentation / Reference
- https://puppeteersandbox.com/ - A sandbox to run puppeteer code
- https://github.com/smooth-code/jest-puppeteer - Jest with puppetter
- https://puppetry.app/ - An ide to create test via a UI
- https://checklyhq.com/pricing/ - A platform to run puppeteer test continuous monitoring