104 lines
9.7 KiB
Markdown
Executable File
104 lines
9.7 KiB
Markdown
Executable File
# File Wizard
|
|
|
|
[](https://ko-fi.com/loredcast)
|
|
[](https://www.paypal.me/unterrikermanu)
|
|
|
|
A self-hosted, browser-based utility for file conversion, OCR and audio transcription. It wraps common CLI and Python converters (FFmpeg, LibreOffice, Pandoc, ImageMagick, etc.), plus `faster-whisper` and Tesseract OCR.
|
|
|
|

|
|
|
|
## Features
|
|
- Convert between many file formats; extendable via `settings.yml` to add any CLI tool.
|
|
- OCR for PDFs and images (`tesseract` / `ocrmypdf`).
|
|
- Audio transcription using Whisper models.
|
|
- Simple, responsive dark UI with drag-and-drop and file picker.
|
|
- Background job processing with real-time status updates and persistent history.
|
|
- `/settings` page for configuring conversion tools and OAuth (runs without auth in local mode).
|
|
- CPU-only by default; a `-cuda` image is available for GPU use.
|
|
|
|
## Security
|
|
**Warning:** exposing this app publicly without authentication risks arbitrary code execution. Intended for local use or behind a properly configured OAuth/OIDC provider.
|
|
|
|
## Tech stack
|
|
FastAPI backend, vanilla HTML/JS/CSS frontend (lightweight), Huey for task queuing, SQLite for storage.
|
|
|
|
## Installation
|
|
|
|
### Recommended — Docker (pull from Docker Hub)
|
|
Images available:
|
|
- `loredcast/filewizard:0.3-latest`
|
|
- `loredcast/filewizard:0.3-small` (omits TeX and other large tools)
|
|
- `loredcast/filewizard:0.3-cuda` (CUDA-enabled)
|
|
|
|
Copy `docker-compose.yml` from the repo, adjust as needed, then:
|
|
|
|
```bash
|
|
docker compose up -d
|
|
```
|
|
|
|
### Build locally with Docker
|
|
```bash
|
|
git clone https://github.com/LoredCast/filewizard.git
|
|
cd filewizard
|
|
docker compose up --build
|
|
```
|
|
Note: building can be slow (TeX and other dependencies).
|
|
|
|
### Manual (no Docker)
|
|
```bash
|
|
git clone https://github.com/LoredCast/filewizard.git
|
|
cd filewizard
|
|
python -m venv venv
|
|
source venv/bin/activate # Windows: venv\Scripts\activate
|
|
pip install -r requirements.txt
|
|
chmod +x run.sh
|
|
./run.sh
|
|
```
|
|
|
|
Dependencies include `fastapi`, `uvicorn`, `sqlalchemy`, `huey`, `faster-whisper`, `ocrmypdf`, `pytesseract`, `python-multipart`, `pyyaml`, etc.
|
|
|
|
## Configuration & docs
|
|
See the project Wiki for details and examples:
|
|
https://github.com/LoredCast/filewizard/wiki
|
|
|
|
## Usage
|
|
1. Open `http://127.0.0.1:8000`.
|
|
2. Drag & drop or choose files.
|
|
3. Select action: Convert, OCR, or Transcribe.
|
|
4. Track job progress in the History table (updates automatically).
|
|
|
|
# Tools Table
|
|
|
|
| Tool | Common inputs (extensions / format names) | Common outputs (extensions / format names) | Notes |
|
|
|---|---|---|---|
|
|
| **LibreOffice (soffice)** | `.odt`, `.fodt`, `.ott`, `.doc`, `.docx`, `.docm`, `.dot`, `.dotx`, `.rtf`, `.txt`, `.html/.htm/.xhtml`, `.xml`, `.sxw`, `.wps`, `.wpd`, `.abw`, `.pdb`, `.epub`, `.fb2`, `.lit`, `.lrf`, `.pages`, `.csv`, `.tsv`, `.xls`, `.xlsx`, `.xlsm`, `.ods`, `.sxc`, `.123`, `.dbf`, `.ppt`, `.pptx`, `.odp`, images (`.png`, `.jpg`, `.jpeg`, `.bmp`, `.gif`, `.tiff`), `.pdf` | `.pdf`, `.pdfa`, `.odt`, `.fodt`, `.doc`, `.docx`, `.rtf`, `.txt`, `.html/.htm`, `.xhtml`, `.epub`, `.svg`, `.png`, `.jpg/.jpeg`, `.pptx`, `.ppt`, `.odp`, `.xls`, `.xlsx`, `.ods`, `.csv`, `.dbf`, `.pdb`, `.fb2` | Good for office/document conversions; fidelity varies with complex layouts. |
|
|
| **Pandoc** | Markdown flavors (`.md`, `.markdown`), `.html/.htm`, LaTeX (`.tex`), `.rst`, `.docx`, `.odt`, `.epub`, `.ipynb`, `.opml`, `.adoc`/asciidoc, `.tex`, `.bib`/citation inputs | `.html/.html5`, `.xhtml`, `.latex/.tex`, `.pdf` (via LaTeX engine), `.docx`, `.odt`, `.epub`, `.md` (flavors), `.gfm`, `.rst`, `.pptx`, `.man`, `.mediawiki`, `.docbook` | Highly configurable via templates/filters; requires LaTeX for PDF output. |
|
|
| **Ghostscript (gs)** | `.ps`, `.eps`, `.pdf`, PostScript streams | `.pdf` (various compat levels incl PDF/A), `.ps`, `.eps`, raster images (`.png`, `.jpg`, `.tiff`, `.bmp`, `.pnm`) | Useful for PDF manipulations, rasterization, and producing PDF/A. |
|
|
| **Calibre (ebook-convert)** | `.epub`, `.mobi`, `.azw3`, `.azw`, `.fb2`, `.html`, `.docx`, `.doc`, `.rtf`, `.txt`, `.pdb`, `.lit`, `.tcr`, `.cbz`, `.cbr`, `.odt`, `.pdf` (input with caveats) | `.epub`, `.mobi` (legacy), `.azw3`, `.pdf`, `.docx`, `.rtf`, `.txt`, `.fb2`, `.htmlz`, `.pdb`, `.lrf`, `.lit`, `.tcr`, `.cbz`, `.cbr` | Excellent for ebook format conversions and metadata handling; PDF input/output fidelity varies. |
|
|
| **FFmpeg** | Containers & codecs: `.mp4`, `.mkv`, `.mov`, `.avi`, `.webm`, `.flv`, `.wmv`, `.mpg/.mpeg`, `.ts`, `.m2ts`, `.3gp`, audio: `.mp3`, `.wav`, `.aac/.m4a`, `.flac`, `.ogg`, `.opus`, image sequences (`.png`, `.jpg`, `.tiff`), HLS (`.m3u8`) | Wide set: `.mp4`, `.mkv`, `.mov`, `.webm`, `.avi`, `.flv`, `.mp3`, `.aac/.m4a`, `.wav`, `.flac`, `.ogg`, `.opus`, `.gif` (animated), `.ts`, elementary streams, many codec/container combos | Extremely versatile — audio/video transcoding, extraction, container changes, filters. Supported formats depend on build flags and linked libraries. |
|
|
| **libvips (vips)** | `.jpg/.jpeg`, `.png`, `.tif/.tiff`, `.webp`, `.avif`, `.heif/.heic`, `.jp2`, `.gif` (frames), `.pnm`, `.fits`, `.exr`, PDF (via poppler delegate) | `.jpg/.jpeg`, `.png`, `.tif/.tiff`, `.webp`, `.avif`, `.heif`, `.jp2`, `.pnm`, `.v` (VIPS native), `.fits`, `.exr` | Fast, memory-efficient image processing; great for large images and tiling. |
|
|
| **GraphicsMagick (gm)** | `.jpg/.jpeg`, `.png`, `.gif`, `.tif/.tiff`, `.bmp`, `.ico`, `.eps`, `.pdf` (via Ghostscript/poppler), `.dpx`, `.pnm`, `.svg` (if delegate), `.webp` (if built), `.exr` | `.jpg/.jpeg`, `.png`, `.webp` (if enabled), `.tif/.tiff`, `.gif`, `.bmp`, `.pdf` (from images), `.eps`, `.ico`, `.xpm`, `.dpx` | Similar to ImageMagick but with different performance/behavior; supported formats depend on build/delegates. |
|
|
| **ImageMagick (convert / magick)** | Same as GraphicsMagick (large set; many delegates) | Same as GraphicsMagick | Often used interchangeably; watch for security considerations when processing untrusted images. |
|
|
| **Inkscape** | `.svg/.svgz`, `.pdf`, `.eps`, `.ps`, `.ai` (legacy imports), `.dxf`, raster images (`.png`, `.jpg`, `.jpeg`, `.gif`, `.tiff`, `.bmp`) | `.svg`, `.pdf`, `.ps`, `.eps`, `.png`, `.emf`, `.wmf`, `.xaml`, `.dxf`, `.eps` | Vector editing and export; CLI useful for batch SVG → PNG/PDF conversions. |
|
|
| **libjxl (cjxl / djxl)** | Raster inputs: `.png`, `.jpg/.jpeg`, `.ppm/.pbm/.pgm`, `.gif`, etc. | `.jxl` (JPEG XL) | Encoder/decoder for JPEG XL; availability depends on build. |
|
|
| **resvg** | `.svg/.svgz` | `.png` (raster) | Fast, accurate SVG renderer — good for SVG→PNG conversion. |
|
|
| **Potrace** | Bitmaps: `.pbm`, `.pgm`, `.ppm` (PNM family), `.bmp` (via conversion) | Vector: `.svg`, `.pdf`, `.eps`, `.ps`, `.dxf`, `.geojson` | Traces bitmaps to vector paths; often used with pre-conversion steps. |
|
|
| **Potrace GUI / autotrace alternatives** | — | — | Not included but sometimes available in toolchains; behavior varies. |
|
|
| **MarkItDown / markitdown** | `.pdf`, `.docx`, `.doc`, `.pptx`, `.ppt`, `.xlsx`, `.xls`, `.html`, `.eml`, `.msg`, `.md`, `.txt`, images, `.epub` | `.md` (Markdown) | Utility to extract/produce Markdown from various formats; implementation details vary. |
|
|
| **pngquant** | `.png` (truecolor/rgba) | `.png` (quantized palette PNG) | Lossy PNG quantization for smaller PNGs. |
|
|
| **MozJPEG (cjpeg, jpegtran)** | `.ppm/.pbm/.pgm` (PNM), `.bmp`, existing `.jpg` | `.jpg/.jpeg` (MozJPEG-optimized) | Produces smaller JPEGs with improved compression; good for recompression. |
|
|
| **SoX (Sound eXchange)** | `.wav`, `.aiff`, `.mp3` (if libmp3lame), `.flac`, `.ogg/.oga`, `.raw`, `.au`, `.voc`, `.w64`, `.gsm`, `.amr`, `.m4a` (if libs present) | `.wav`, `.aiff`, `.flac`, `.mp3`, `.ogg`, `.raw`, `.w64`, `.opus`, `.amr`, `.m4a` | Audio processing, normalization, effects; exact formats depend on linked libraries. |
|
|
| **pngcrush / zopflipng / optipng** | `.png` | `.png` (optimized) | Lossless PNG optimization tools; choose depending on use-case and compression/compatibility trade-offs. |
|
|
| **Tesseract OCR / ocrmypdf** | Image formats (`.png`, `.jpg`, `.jpeg`, `.tiff`), PDFs (image PDFs) | Plain text (`.txt`), searchable PDF (PDF with text layer), HOCR, ALTO XML | OCR engine; language/training data required for best accuracy. `ocrmypdf` is a wrapper for PDF workflows. |
|
|
| **faster-whisper / OpenAI Whisper (local)** | Audio: `.mp3`, `.wav`, `.m4a`, `.flac`, `.ogg`, `.opus`, `.aac` | Plain text transcripts (`.txt`), `.srt`, `.vtt`, other subtitle formats | Local Whisper implementations for speech-to-text. Models and speed depend on CPU/GPU and model variant. |
|
|
| **WhisperX / forced alignment tools** | same as Whisper | time-aligned transcripts, word-level timestamps | Useful for precise timestamping and alignment. |
|
|
| **Calibre tools (ebook-meta, ebook-convert)** | see Calibre row | see Calibre row | Additional CLI tools for metadata editing and bulk operations. |
|
|
| **Ghostscript-based PDF tools (pdftk alternatives)** | `.pdf` | `.pdf`, extracted pages, raster outputs | For splitting/merging, linearization, compatibility conversion. |
|
|
| **djvulibre / ddjvu / djvutool** | `.djvu` | `.djvu`, `.png` (raster), `.pdf` | For DjVu document handling and conversions. |
|
|
| **Raster→Vector helpers (autotrace, potrace, trace-layers)** | raster formats (`.png`, `.bmp`, `.tiff`) | vector (`.svg`, `.eps`, `.pdf`) | Useful pipeline components; exact choices depend on quality/needs. |
|
|
| **OCR & layout tools (abbyy/paid SDKs not included)** | — | — | Proprietary solutions may offer higher accuracy/format support but are not bundled. |
|
|
| **Custom CLI tools via `settings.yml`** | Any formats accepted by the configured tool | Any outputs produced by the configured tool | File Wizard can invoke arbitrary CLI tools; add entries to `settings.yml` to expose them in the UI. |
|
|
|
|
---
|
|
|