Headless browser - Puppeteer

1 - About

Puppeteer is a Node library that provides a high-level API over Chrome or Chromium (ie headless chrome)

Puppeteer communicate with the browser via the DevTools Protocol

3 - API

The Puppeteer API is hierarchical and mirrors the browser structure.

  • A Browser instance can own multiple browser contexts.
  • A BrowserContext instance defines a browsing session and can own multiple pages.
  • A Page has at least one frame: main frame. There might be other frames created by iframe or frame tags.
  • Frame has at least one execution context - the default execution context - where the frame's JavaScript is executed. A Frame might have additional execution contexts that are associated with extensions.
  • Worker has a single execution context and facilitates interacting with WebWorkers.

4 - Component

4.1 - puppeteer-core

puppeteer-core is a library to help drive anything that supports DevTools protocol. puppeteer-core doesn't download Chromium when installed. Being a library, puppeteer-core is fully driven through its programmatic interface and disregards all the PUPPETEER_* env variables.

puppeteer-core doesn't download Chromium when installed.

Usage:

  • build a PDF generator using puppeteer-core and write a custom install.js script that downloads headless_shell instead of Chromium to save disk space.
  • to use in Chrome Extension / browser with the DevTools protocol

Code Usage:


const puppeteer = require('puppeteer-core');

4.2 - puppeteer

When installed, it downloads a version of Chromium, which it then drives using puppeteer-core. https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#environment-variables

5 - Example

6 - Integration

7 - API / Doc

7.1 - Launch


const browser = await puppeteer.launch({
  headless: false,
  slowMo: 200, // slowdown by 200 ms for every operations
  devtools: true,
  args: [
    '--disable-infobars', // Removes the butter bar.
    '--start-maximized',
    // '--start-fullscreen',
    // '--window-size=1920,1080',
    // '--kiosk',
  ],
});

8 - Snippet

8.1 - Serialize and Deserialize a date

8.2 - Execute Javascript inside the page

Example with local storage and passing parameters


await page.evaluate(
  (storageKey) => { localStorage.removeItem(storageKey); }, 
  'theKey'
);

8.3 - Add a breakpoint

There are two execution context:

  • node.js (running the test code)
  • and the browser (running application code)

8.3.1 - Timeout

If you are going to play with breakpoint, you need to change the timeout accordingly.

In a test file, as jest is available as a global object.


jest.setTimeout(100000);

It will be use in every invocation with the setTimeOut function.

8.3.2 - Node breakpoint

  • Start the browser with a GUI

const browser = await puppeteer.launch({
    headless: false,
    slowMo: 250, // slowdown by 250 ms
    });

  • Set a breakpoint in your IDE and step over each puppeteer step (open, click,…)

8.3.3 - Browser breakpoint

  • The browser should be start with the devtool

const browser = await puppeteer.launch({devtools: true});

  • Add a breakpoint

await page.evaluate(() => {debugger;});

8.4 - Select


<div class="tweet">
    <div class="retweet">10</div>
</div>


/**
* @type {import("puppeteer").ElementHandle<HTMLDivElement>}
*/
const tweetHandle = await page.$('.tweet .retweet');
expect(await tweetHandle.evaluate(node => node.innerText)).toBe('10');

9 - Debug

10 - Documentation / Reference


Data Science
Data Analysis
Statistics
Data Science
Linear Algebra Mathematics
Trigonometry

Powered by ComboStrap