2025-10-03 12:35:30 +02:00
2025-10-03 12:35:30 +02:00
2025-10-03 10:30:53 +00:00
2025-09-23 20:01:52 +02:00
2025-09-23 20:01:52 +02:00
2025-09-23 20:01:52 +02:00
2025-09-23 20:01:52 +02:00
2025-09-23 20:01:52 +02:00
2025-09-23 20:01:52 +02:00
2025-10-03 10:30:53 +00:00
2025-10-02 15:43:14 +02:00
2025-09-23 20:01:52 +02:00
2025-10-03 10:30:53 +00:00
2025-09-23 20:01:52 +02:00
2025-09-23 20:01:52 +02:00

File Wizard

PayPal Docker Pulls Docker Image Version

A self-hosted, browser-based utility for file conversion, OCR and audio transcription. It wraps common CLI and Python converters (FFmpeg, LibreOffice, Pandoc, ImageMagick, etc.), plus faster-whisper and Tesseract OCR.

Screenshot

ko-fi

Features

  • Convert between many file formats; extendable via settings.yml to add any CLI tool.
  • OCR for PDFs and images (tesseract / ocrmypdf).
  • Audio transcription using Whisper models.
  • Simple, responsive dark UI with drag-and-drop and file picker.
  • Background job processing with real-time status updates and persistent history.
  • /settings page for configuring conversion tools and OAuth (runs without auth in local mode).
  • CPU-only by default; a -cuda image is available for GPU use.

Security

Warning: exposing this app publicly without authentication risks arbitrary code execution. Intended for local use or behind a properly configured OAuth/OIDC provider.

Tech stack

FastAPI backend, vanilla HTML/JS/CSS frontend (lightweight), Huey for task queuing, SQLite for storage.

Installation

Images available:

  • loredcast/filewizard:0.3-latest
  • loredcast/filewizard:0.3-small (omits TeX and other large tools)
  • loredcast/filewizard:0.3-cuda (CUDA-enabled)

Copy docker-compose.yml from the repo, adjust as needed, then:

docker compose up -d

Build locally with Docker

git clone https://github.com/LoredCast/filewizard.git
cd filewizard
docker compose up --build

Note: building can be slow (TeX and other dependencies).

Manual (no Docker)

git clone https://github.com/LoredCast/filewizard.git
cd filewizard
python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate
pip install -r requirements.txt
chmod +x run.sh
./run.sh

Dependencies include fastapi, uvicorn, sqlalchemy, huey, faster-whisper, ocrmypdf, pytesseract, python-multipart, pyyaml, etc.

Configuration & docs

See the project Wiki for details and examples:
https://github.com/LoredCast/filewizard/wiki

Usage

  1. Open http://127.0.0.1:8000.
  2. Drag & drop or choose files.
  3. Select action: Convert, OCR, or Transcribe.
  4. Track job progress in the History table (updates automatically).

Tools Table

Tool Common inputs (extensions / format names) Common outputs (extensions / format names) Notes
LibreOffice (soffice) .odt, .fodt, .ott, .doc, .docx, .docm, .dot, .dotx, .rtf, .txt, .html/.htm/.xhtml, .xml, .sxw, .wps, .wpd, .abw, .pdb, .epub, .fb2, .lit, .lrf, .pages, .csv, .tsv, .xls, .xlsx, .xlsm, .ods, .sxc, .123, .dbf, .ppt, .pptx, .odp, images (.png, .jpg, .jpeg, .bmp, .gif, .tiff), .pdf .pdf, .pdfa, .odt, .fodt, .doc, .docx, .rtf, .txt, .html/.htm, .xhtml, .epub, .svg, .png, .jpg/.jpeg, .pptx, .ppt, .odp, .xls, .xlsx, .ods, .csv, .dbf, .pdb, .fb2 Good for office/document conversions; fidelity varies with complex layouts.
Pandoc Markdown flavors (.md, .markdown), .html/.htm, LaTeX (.tex), .rst, .docx, .odt, .epub, .ipynb, .opml, .adoc/asciidoc, .tex, .bib/citation inputs .html/.html5, .xhtml, .latex/.tex, .pdf (via LaTeX engine), .docx, .odt, .epub, .md (flavors), .gfm, .rst, .pptx, .man, .mediawiki, .docbook Highly configurable via templates/filters; requires LaTeX for PDF output.
Ghostscript (gs) .ps, .eps, .pdf, PostScript streams .pdf (various compat levels incl PDF/A), .ps, .eps, raster images (.png, .jpg, .tiff, .bmp, .pnm) Useful for PDF manipulations, rasterization, and producing PDF/A.
Calibre (ebook-convert) .epub, .mobi, .azw3, .azw, .fb2, .html, .docx, .doc, .rtf, .txt, .pdb, .lit, .tcr, .cbz, .cbr, .odt, .pdf (input with caveats) .epub, .mobi (legacy), .azw3, .pdf, .docx, .rtf, .txt, .fb2, .htmlz, .pdb, .lrf, .lit, .tcr, .cbz, .cbr Excellent for ebook format conversions and metadata handling; PDF input/output fidelity varies.
FFmpeg Containers & codecs: .mp4, .mkv, .mov, .avi, .webm, .flv, .wmv, .mpg/.mpeg, .ts, .m2ts, .3gp, audio: .mp3, .wav, .aac/.m4a, .flac, .ogg, .opus, image sequences (.png, .jpg, .tiff), HLS (.m3u8) Wide set: .mp4, .mkv, .mov, .webm, .avi, .flv, .mp3, .aac/.m4a, .wav, .flac, .ogg, .opus, .gif (animated), .ts, elementary streams, many codec/container combos Extremely versatile — audio/video transcoding, extraction, container changes, filters. Supported formats depend on build flags and linked libraries.
libvips (vips) .jpg/.jpeg, .png, .tif/.tiff, .webp, .avif, .heif/.heic, .jp2, .gif (frames), .pnm, .fits, .exr, PDF (via poppler delegate) .jpg/.jpeg, .png, .tif/.tiff, .webp, .avif, .heif, .jp2, .pnm, .v (VIPS native), .fits, .exr Fast, memory-efficient image processing; great for large images and tiling.
GraphicsMagick (gm) .jpg/.jpeg, .png, .gif, .tif/.tiff, .bmp, .ico, .eps, .pdf (via Ghostscript/poppler), .dpx, .pnm, .svg (if delegate), .webp (if built), .exr .jpg/.jpeg, .png, .webp (if enabled), .tif/.tiff, .gif, .bmp, .pdf (from images), .eps, .ico, .xpm, .dpx Similar to ImageMagick but with different performance/behavior; supported formats depend on build/delegates.
ImageMagick (convert / magick) Same as GraphicsMagick (large set; many delegates) Same as GraphicsMagick Often used interchangeably; watch for security considerations when processing untrusted images.
Inkscape .svg/.svgz, .pdf, .eps, .ps, .ai (legacy imports), .dxf, raster images (.png, .jpg, .jpeg, .gif, .tiff, .bmp) .svg, .pdf, .ps, .eps, .png, .emf, .wmf, .xaml, .dxf, .eps Vector editing and export; CLI useful for batch SVG → PNG/PDF conversions.
libjxl (cjxl / djxl) Raster inputs: .png, .jpg/.jpeg, .ppm/.pbm/.pgm, .gif, etc. .jxl (JPEG XL) Encoder/decoder for JPEG XL; availability depends on build.
resvg .svg/.svgz .png (raster) Fast, accurate SVG renderer — good for SVG→PNG conversion.
Potrace Bitmaps: .pbm, .pgm, .ppm (PNM family), .bmp (via conversion) Vector: .svg, .pdf, .eps, .ps, .dxf, .geojson Traces bitmaps to vector paths; often used with pre-conversion steps.
Potrace GUI / autotrace alternatives Not included but sometimes available in toolchains; behavior varies.
MarkItDown / markitdown .pdf, .docx, .doc, .pptx, .ppt, .xlsx, .xls, .html, .eml, .msg, .md, .txt, images, .epub .md (Markdown) Utility to extract/produce Markdown from various formats; implementation details vary.
pngquant .png (truecolor/rgba) .png (quantized palette PNG) Lossy PNG quantization for smaller PNGs.
MozJPEG (cjpeg, jpegtran) .ppm/.pbm/.pgm (PNM), .bmp, existing .jpg .jpg/.jpeg (MozJPEG-optimized) Produces smaller JPEGs with improved compression; good for recompression.
SoX (Sound eXchange) .wav, .aiff, .mp3 (if libmp3lame), .flac, .ogg/.oga, .raw, .au, .voc, .w64, .gsm, .amr, .m4a (if libs present) .wav, .aiff, .flac, .mp3, .ogg, .raw, .w64, .opus, .amr, .m4a Audio processing, normalization, effects; exact formats depend on linked libraries.
pngcrush / zopflipng / optipng .png .png (optimized) Lossless PNG optimization tools; choose depending on use-case and compression/compatibility trade-offs.
Tesseract OCR / ocrmypdf Image formats (.png, .jpg, .jpeg, .tiff), PDFs (image PDFs) Plain text (.txt), searchable PDF (PDF with text layer), HOCR, ALTO XML OCR engine; language/training data required for best accuracy. ocrmypdf is a wrapper for PDF workflows.
faster-whisper / OpenAI Whisper (local) Audio: .mp3, .wav, .m4a, .flac, .ogg, .opus, .aac Plain text transcripts (.txt), .srt, .vtt, other subtitle formats Local Whisper implementations for speech-to-text. Models and speed depend on CPU/GPU and model variant.
WhisperX / forced alignment tools same as Whisper time-aligned transcripts, word-level timestamps Useful for precise timestamping and alignment.
Calibre tools (ebook-meta, ebook-convert) see Calibre row see Calibre row Additional CLI tools for metadata editing and bulk operations.
Ghostscript-based PDF tools (pdftk alternatives) .pdf .pdf, extracted pages, raster outputs For splitting/merging, linearization, compatibility conversion.
djvulibre / ddjvu / djvutool .djvu .djvu, .png (raster), .pdf For DjVu document handling and conversions.
Raster→Vector helpers (autotrace, potrace, trace-layers) raster formats (.png, .bmp, .tiff) vector (.svg, .eps, .pdf) Useful pipeline components; exact choices depend on quality/needs.
OCR & layout tools (abbyy/paid SDKs not included) Proprietary solutions may offer higher accuracy/format support but are not bundled.
Custom CLI tools via settings.yml Any formats accepted by the configured tool Any outputs produced by the configured tool File Wizard can invoke arbitrary CLI tools; add entries to settings.yml to expose them in the UI.

Description
No description provided
Readme 647 KiB
Languages
Python 62.9%
JavaScript 15.5%
HTML 10.5%
CSS 10%
Dockerfile 0.9%
Other 0.2%