mirror of
https://github.com/derfenix/webarchive.git
synced 2026-03-11 12:41:54 +03:00
130 lines
4.3 KiB
Markdown
130 lines
4.3 KiB
Markdown
# Own Webarchive
|
|
|
|
Aimed to be a simple, fast and easy-to-use webarchive for personal or home-net usage.
|
|
|
|
## Supported store formats
|
|
|
|
* **headers** — save all headers from response
|
|
* **pdf** — save page in pdf
|
|
* **single_file** — save html and all its resources (css,js,images) into one html file
|
|
|
|
## Requirements
|
|
|
|
* Golang 1.19 or higher
|
|
* wkhtmltopdf binary in $PATH (to save pages in pdf)
|
|
|
|
## Configuration
|
|
|
|
The service can be configured via environment variables. There is a list of available
|
|
variables:
|
|
|
|
* **DB**
|
|
* **DB_PATH** — path for the database files (default `./db`)
|
|
* **LOGGING**
|
|
* **LOGGING_DEBUG** — enable debug logs (default `false`)
|
|
* **API**
|
|
* **API_ADDRESS** — address the API server will listen (default `0.0.0.0:5001`)
|
|
* **UI**
|
|
* **UI_ENABLED** — Enable builtin web UI (default `true`)
|
|
* **UI_PREFIX** — Prefix for the web UI (default `/`)
|
|
* **UI_THEME** — UI theme name (default `basic`). No other values available yet
|
|
* **PDF**
|
|
* **PDF_LANDSCAPE** — use landscape page orientation instead of portrait (default `false`)
|
|
* **PDF_GRAYSCALE** — use grayscale filter for the output pdf (default `false`)
|
|
* **PDF_MEDIA_PRINT** — use media type `print` for the request (default `true`)
|
|
* **PDF_ZOOM** — zoom page (default `1.0` i.e. no actual zoom)
|
|
* **PDF_VIEWPORT** — use specified viewport value (default `1280x720`)
|
|
* **PDF_DPI** — use specified DPI value for the output pdf (default `150`)
|
|
* **PDF_FILENAME** — use specified name for output pdf file (default `page.pdf`)
|
|
|
|
|
|
*Note*: Prefix **WEBARCHIVE_** can be used with the environment variable names
|
|
in case of any conflicts.
|
|
|
|
## ⚡ One-Click Deploy
|
|
|
|
| Cloud Provider | Deploy Button |
|
|
|----------------|---------------|
|
|
| AWS | <a href="https://deploystack.io/deploy/derfenix-webarchive?provider=aws&language=cfn"><img src="https://raw.githubusercontent.com/deploystackio/deploy-templates/refs/heads/main/.assets/img/aws.svg" height="38"></a> |
|
|
| DigitalOcean | <a href="https://deploystack.io/deploy/derfenix-webarchive?provider=do&language=dop"><img src="https://raw.githubusercontent.com/deploystackio/deploy-templates/refs/heads/main/.assets/img/do.svg" height="38"></a> |
|
|
| Render | <a href="https://deploystack.io/deploy/derfenix-webarchive?provider=rnd&language=rnd"><img src="https://raw.githubusercontent.com/deploystackio/deploy-templates/refs/heads/main/.assets/img/rnd.svg" height="38"></a> |
|
|
|
|
<sub>Generated by <a href="https://deploystack.io/c/derfenix-webarchive" target="_blank">DeployStack.io</a></sub>
|
|
|
|
## Usage
|
|
|
|
### 1. Start the server
|
|
|
|
#### Start without docker
|
|
```shell
|
|
go run ./cmd/server/main.go
|
|
```
|
|
|
|
#### Change API address
|
|
```shell
|
|
API_ADDRESS=127.0.0.1:3001 go run ./cmd/server/main.go
|
|
```
|
|
|
|
#### Start in docker
|
|
|
|
```shell
|
|
docker compose up -d webarchive
|
|
```
|
|
|
|
### 2. Add a page
|
|
|
|
```shell
|
|
curl -X POST --location "http://localhost:5001/api/v1/pages" \
|
|
-H "Content-Type: application/json" \
|
|
-d "{
|
|
\"url\": \"https://github.com/wkhtmltopdf/wkhtmltopdf/issues/1937\",
|
|
\"formats\": [
|
|
\"pdf\",
|
|
\"headers\"
|
|
]
|
|
}" | jq .
|
|
```
|
|
|
|
or
|
|
|
|
```shell
|
|
curl -X POST --location \
|
|
"http://localhost:5001/api/v1/pages?url=https%3A%2F%2Fgithub.com%2Fwkhtmltopdf%2Fwkhtmltopdf%2Fissues%2F1937&formats=pdf%2Cheaders&description=Foo+Bar"
|
|
```
|
|
|
|
### 3. Get the page's info
|
|
|
|
```shell
|
|
curl -X GET --location "http://localhost:5001/api/v1/pages/$page_id" | jq .
|
|
```
|
|
where `$page_id` — value of the `id` field from previous command response.
|
|
If `status` field in response is `success` (or `with_errors`) - the `results` field
|
|
will contain all processed formats with ids of the stored files.
|
|
|
|
### 4. Open file in browser
|
|
|
|
```shell
|
|
xdg-open "http://localhost:5001/api/v1/pages/$page_id/file/$file_id"
|
|
```
|
|
Where `$page_id` — value of the `id` field from previous command response, and
|
|
`$file_id` — the id of interesting file.
|
|
|
|
### 5. List all stored pages
|
|
|
|
```shell
|
|
curl -X GET --location "http://localhost:5001/api/v1/pages" | jq .
|
|
```
|
|
|
|
## Roadmap
|
|
|
|
- [x] Save page to pdf
|
|
- [x] Save URL headers
|
|
- [x] Save page to the single-page html
|
|
- [ ] Save page to html with separate resource files (?)
|
|
- [ ] Basic web UI
|
|
- [ ] Optional authentication
|
|
- [ ] Multi-user access
|
|
- [ ] Support SQL database with or without separate files storage
|
|
- [ ] Tags/Categories
|
|
- [ ] Save page to markdown
|