Headless browser - Puppeteer

Browser

About

Puppeteer is a Node library that provides a high-level API over Chrome or Chromium (ie headless chrome)

Puppeteer communicate with the browser via the DevTools Protocol

API

The Puppeteer API is hierarchical and mirrors the browser structure.

  • A Browser instance can own multiple browser contexts.
  • A BrowserContext instance defines a browsing session and can own multiple pages.
  • A Page has at least one frame: main frame. There might be other frames created by iframe or frame tags.
  • Frame has at least one execution context - the default execution context - where the frame's JavaScript is executed. A Frame might have additional execution contexts that are associated with extensions.
  • Worker has a single execution context and facilitates interacting with WebWorkers.

Puppeteer Architecture

Component

puppeteer-core

puppeteer-core is a library to help drive anything that supports DevTools protocol. puppeteer-core doesn't download Chromium when installed. Being a library, puppeteer-core is fully driven through its programmatic interface and disregards all the PUPPETEER_* env variables.

puppeteer-core doesn't download Chromium when installed.

Usage:

  • build a PDF generator using puppeteer-core and write a custom install.js script that downloads headless_shell instead of Chromium to save disk space.
  • to use in Chrome Extension / browser with the DevTools protocol

Code Usage:

const puppeteer = require('puppeteer-core');

puppeteer

When installed, it downloads a version of Chromium, which it then drives using puppeteer-core. https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#environment-variables

Example

Integration

Javascript - Jest-puppeteer with typescript configuration

API / Doc

Launch

const browser = await puppeteer.launch({
  headless: false,
  slowMo: 200, // slowdown by 200 ms for every operations
  devtools: true,
  args: [
    '--disable-infobars', // Removes the butter bar.
    '--start-maximized',
    // '--start-fullscreen',
    // '--window-size=1920,1080',
    // '--kiosk',
  ],
});

Snippet

Serialize and Deserialize a date

Puppeteer - How to pass back and forth a date (or a complex type) to the headless browser via the evaluate function

Execute Javascript inside the page

Example with local storage and passing parameters

await page.evaluate(
  (storageKey) => { localStorage.removeItem(storageKey); }, 
  'theKey'
);

Add a breakpoint

There are two execution context:

  • node.js (running the test code)
  • and the browser (running application code)

Timeout

If you are going to play with breakpoint, you need to change the timeout accordingly.

In a test file, as jest is available as a global object.

jest.setTimeout(100000);

It will be use in every invocation with the setTimeOut function.

Node breakpoint

  • Start the browser with a GUI
const browser = await puppeteer.launch({
    headless: false,
    slowMo: 250, // slowdown by 250 ms
    });
  • Set a breakpoint in your IDE and step over each puppeteer step (open, click,…)

Browser breakpoint

  • The browser should be start with the devtool
const browser = await puppeteer.launch({devtools: true});
  • Add a breakpoint
await page.evaluate(() => {debugger;});

Select

<div class="tweet">
    <div class="retweet">10</div>
</div>
/**
* @type {import("puppeteer").ElementHandle<HTMLDivElement>}
*/
const tweetHandle = await page.$('.tweet .retweet');
expect(await tweetHandle.evaluate(node => node.innerText)).toBe('10');

Debug

https://developers.google.com/web/tools/puppeteer/debugging

Documentation / Reference





Discover More
Card Puncher Data Processing
Application - Download / Upload

download is when a remote resource from an application is saved on the local file system and not shown. upload is when a file on your local file system is saved into the remote application. If you...
Browser
Chrome DevTool protocol (CDP)

The is a API that permits to call browsers implementing the CDP api (chrome of course but also any other browser implementation ) via json RPC. The protocol is used to communicate with Chrome and drive...
Javascript - Jest-puppeteer with typescript configuration

How to install and configure puppeteer with Jest and Typescript. custom-example-without-jest-puppeteer-preset You...
Speed Index Distribution
Lighthouse

GoogleChrome/lighthouselighthouse - a tool for auditing an app for PWA features and checking your app meets a respectable bar for web performance under emulated mobile conditions. can emulate a Nexus...
Browser
Puppeteer - How to pass back and forth a date (or a complex type) to the headless browser via the evaluate function

A step by step guide that shows how to serialize and deserialize an object with a date ( ) when using the puppeteer evaluate...
Browser
Web - Headless browser (Test automation)

A headless browser is an application/library that emulates a web browser but without a graphical user interface ie (without DOM / without the Web api) They are the basis to build a web bot. Build...
Web - Prerendering / Snapshoting (Dynamic to Static Web Site Generation)

Prerendering is a web static generator method that will take a dynamic website and turn it into a static web application. You then: don't need a server. improve the page load The website (called...
Page Loading Key Moment
Web Page - Painting

Painting is the last step of the rendering phase for a page load. This phase takes the box model tree created during the layout rendering phase and positions each pixels accordingly to the screen. ...
Chrome Node Screenshot
Web Page ScreenShot

How to take a screenshot of a web page



Share this page:
Follow us:
Task Runner