Skip to content

Commit

Permalink
📝 update readme.md
Browse files Browse the repository at this point in the history
  • Loading branch information
nberlette committed Jul 12, 2022
1 parent 9d34b2c commit 89322fd
Showing 1 changed file with 105 additions and 90 deletions.
195 changes: 105 additions & 90 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,155 +1,170 @@
<div align="center">

# [DQL](https://deno.land/x/dql)
# [🦕 DQL](https://deno.land/x/dql)

### _**Web Scraping with Deno  –  DOM + GraphQL**_

</div>

---

**`DQL`** lets you use GraphQL queries to extract data from the DOM of a web page or HTML fragment (for sandboxing or use cases without network access). It accepts [**GraphQL Queries**](https://graphql.org/learn/queries) as input, and returns formatted JSON data as output.

> - [**Try out a real-world example of `useQuery` in the `Deno Playground`**](https://dash.deno.com/playground/dql)
> - [**View the example's JSON endpoint at `dql.deno.dev`**](https://dql.deno.dev)
## Summary

This is a fork of [**DenoQL**](https://deno.land/x/denoql) with some heavy refactoring and some additional features:
**`DQL`** is a web scraping module for Deno and Deno Deploy that integrates the power of [**GraphQL Queries**](https://graphql.org/learn/queries) with the DOM tree of a remote webpage or HTML document fragment. This is a fork of [**DenoQL**](https://deno.land/x/denoql) with some heavy refactoring and some additional features:

- [x] Compatibility with the [**Deno Deploy**](https://deno.com/deploy) architecture
- [x] Ability to pass variables alongside all queries
- [x] New state-management class with additional methods
- [x] Modular project structure (as opposed to a mostly single-file design)
- [x] Improved types and schema structure
- [ ] **This is a work-in-progress and there is still much to be done.** *

## Usage
> **Note**: _This is a work-in-progress and there is still a lot to be done._
The primary function exported by the module is the workhorse named `useQuery`:
### 🛝  [**`GraphQL Playground`**](https://dql.deno.dev)

```ts
import { useQuery } from "https://deno.land/x/dql/mod.ts";
### 📝  [**`HackerNews Scraper`**](https://dash.deno.com/playground/dql-hn)

const data = await useQuery(`query { ... }`);
```
### 🚛  [**`Junkyard Scraper`**](https://dash.deno.com/playground/dirty-sparrow-69)

### Query Options
---

You can also provide an options object for the second argument of `useQuery`:
## `useQuery`

The primary function exported by the module is the workhorse named `useQuery`:

```ts
const data = await useQuery(`query { ... }`, {
concurrency: 8,
fetch_options: {
// passed as the second param to fetch()
},
variables: {
// any variables used in your queries go here
},
});
import { useQuery } from "https://deno.land/x/dql/mod.ts";

const data = await useQuery(`query { ... }`);
```

### Authenticated Requests
### `QueryOptions`

To authenticate your requests, you can add an `Authorization` header like so:
You can also provide a `QueryOptions` object as the second argument of `useQuery`, to further control the behavior of your query requests. All properties are optional.

```ts
const data = await useQuery(`query { ... }`, {
fetch_options: {
concurrency: 8, // passed directly to PQueue initializer
fetch_options: { // passed directly to Fetch API requests
headers: {
"Authorization": "Bearer ghp_a5025a80a24defd0a7d06b4fc215bb5635a167c6",
},
},
variables: {}, // variables defined in your queries
operationName: "", // when using multiple queries
});
```

## GraphQL Playground

### Deno Deploy
## `createServer`

With [**Deno Deploy**](https://dash.deno.com/new), you can deploy **`DQL`** with a GraphQL Playground in **only 2 LOC**:
With [**Deno Deploy**](https://dash.deno.com/new), you can deploy **`DQL`** with a GraphQL Playground in **only 2 lines of code**:

```ts
import { createServer } from "https://deno.land/x/dql/mod.ts";

// change the endpoint to your unique URL ([...].deno.dev)
createServer(80, { endpoint: "https://dirty-sparrow-69.deno.dev" });
createServer(80, { endpoint: "https://dql.deno.dev" });
```

> - [**Try it out at `dirty-sparrow-69.deno.dev`**](https://dirty-sparrow-69.deno.dev)
> - [**View the public code in the `Deno Playground`**](https://dash.deno.com/playground/dirty-sparrow-69)
`🛝` [Try the **GraphQL Playground** at **`dql.deno.dev`**](https://dql.deno.dev)\
`🦕` [View the source code in the **`Deno Playground`**](https://dash.deno.com/playground/dql)

### Command Line Usage (CLI)
## Command Line Usage (CLI)

```bash
# spin up a playground on port 8080
deno run -A --unstable https://deno.land/x/dql/serve.ts
```

#### Custom port (default is `8080`)

```bash
# ... or using a custom port
deno run -A --unstable https://deno.land/x/dql/serve.ts --port 3000
deno run -A https://deno.land/x/dql/serve.ts --port 3000
```

> **Note**: you need to have the [**Deno CLI**](https://deno.land) installed for CLI usage.
> **Warning**: you need to have the [**Deno CLI**](https://deno.land) installed first.
### Programmatic Usage
---

```ts
import { createServer } from "https://deno.land/x/dql/mod.ts";
## 💻 Examples

// start a playground on port 8080
createServer();
### `🚛` Junkyard Scraper · [**`Deno Playground 🦕`**](https://dash.deno.com/playground/dirty-sparrow-69)

// or using a custom port
createServer(3000);
```ts
import { useQuery } from "https://deno.land/x/dql/mod.ts";
import { serve } from "https://deno.land/std@0.147.0/http/server.ts";

serve(async (res: Request) =>
await useQuery(
`
query Junkyard (
$url: String
$itemSelector: String = "table > tbody > tr"
) {
vehicles: page(url: $url) {
totalCount: count(selector: $itemSelector)
nodes: queryAll(selector: $itemSelector) {
id: index
vin: text(selector: "td:nth-child(7)", trim: true)
sku: text(selector: "td:nth-child(6)", trim: true)
year: text(selector: "td:nth-child(1)", trim: true)
model: text(selector: "td:nth-child(2) > .notranslate", trim: true)
aisle: text(selector: "td:nth-child(3)", trim: true)
store: text(selector: "td:nth-child(4)", trim: true)
color: text(selector: "td:nth-child(5)", trim: true)
date: attr(selector: "td:nth-child(8)", name: "data-value")
image: src(selector: "td > a > img")
}
}
}`,
{
variables: {
"url": "http://nvpap.deno.dev/action=getVehicles&makes=BMW",
},
},
)
.then((data) => JSON.stringify(data, null, 2))
.then((json) =>
new Response(json, {
headers: { "content-type": "application/json;charset=utf-8" },
})
)
);
```

## Examples

### Junkyard Inventory Scraper

> - [**Try it for yourself in the `Deno Playground`**](https://dash.deno.com/playground/dql)
> - [**View the JSON endpoint at `dql.deno.dev`**](https://dql.deno.dev)
### 📝 HackerNews Scraper · [**`Deno Playground 🦕`**](https://dash.deno.com/playground/dql-hn)

```ts
import { useQuery } from "https://deno.land/x/dql/mod.ts";

const query = `query Junkyard ($url: String, $itemSelector: String) {
vehicles: page(url: $url) {
totalCount: count(selector: $itemSelector)
items: queryAll(selector: $itemSelector) {
id: index
vin: text(selector: "td:nth-child(7)", trim: true)
sku: text(selector: "td:nth-child(6)", trim: true)
year: text(selector: "td:nth-child(1)", trim: true)
model: text(selector: "td:nth-child(2) > .notranslate", trim: true)
aisle: text(selector: "td:nth-child(3)", trim: true)
store: text(selector: "td:nth-child(4)", trim: true)
color: text(selector: "td:nth-child(5)", trim: true)
date: attr(selector: "td:nth-child(8)", name: "data-value")
image: src(selector: "td > a > img")
import { serve } from "https://deno.land/std@0.147.0/http/server.ts";

serve(async (res: Request) =>
await useQuery(`
query HackerNews (
$url: String = "http://news.ycombinator.com"
$rowSelector: String = "tr.athing"
) {
page(url: $url) {
title
totalCount: count(selector: $rowSelector)
nodes: queryAll(selector: $rowSelector) {
rank: text(selector: "td span.rank", trim: true)
title: text(selector: "td.title a", trim: true)
site: text(selector: "span.sitestr", trim: true)
url: href(selector: "td.title a")
attrs: next {
score: text(selector: "span.score", trim: true)
user: text(selector: "a.hnuser", trim: true)
date: attr(selector: "span.age", name: "title")
}
}
}
}
}`;

// pass any variables using the 'variables' key
const response = await useQuery(query, {
variables: {
"url": "http://nvpap.deno.dev/action=getVehicles&makes=BMW",
"itemSelector": "table > tbody > tr",
},
});

// do something with response (Object)
console.log(response);
}`)
.then((data) => JSON.stringify(data, null, 2))
.then((json) =>
new Response(json, {
headers: { "content-type": "application/json;charset=utf-8" },
})
)
);
```

---
## License

<div align="center">

MIT © [Nicholas Berlette](https://github.com/nberlette) • based on [DenoQL](https://deno.land/x/denoql) by [nyancodeid](https://github.com/nyancodeid)

</div>
MIT © [**Nicholas Berlette**](https://github.com/nberlette), based on [DenoQL](https://deno.land/x/denoql).

0 comments on commit 89322fd

Please sign in to comment.