alfablend

@ alfablend @lemmy.world

Posts

1
Comments

3
Joined

4 mo. ago

4mo ago

"What’s Your Preferred Self-Hosted Solution for Deep Monitoring (Beyond Simple Page Changes)?"
Jump
alfablend @lemmy.world 4mo ago
@xyro Ah, I see! I’m not using Ollama at the moment — my setup is based on GPT4All with a locally hosted DeepSeek model, which handles the semantic parsing directly.
As mentioned earlier, the pipeline doesn’t just diff pages — it detects new document URLs from the source feed (via selectors), downloads them, and generates structured summaries. Here's a snippet from the YAML config to illustrate how that works:
yaml:

(extract: events: selector: "results[*]" fields: url: pdf_url title: title order_number: executive_order_number download: extensions: [".pdf"] gpt: prompt: | Analyze this Executive Order document: - Purpose: 1–2 sentences - Key provisions: 3–5 bullet points - Agencies involved: list - Revokes/amends: if any - Policy impact: neutral analysis )
To keep things efficient, I also support regex-based extraction before passing content to the LLM. That way, I can isolate relevant blocks (e.g. addresses, client names, conclusions) and reduce the noise in the prompt. Example from another config:
yaml:

processing: extract_regex: - "object of cultural heritage" - "address[:\\s]\\s*(.{10,100}?)(?=\\n|$)" - "project(?:s)?" - "circumstances" - "client\\s*:?\\s*(.{10,100}?)(?=\\n|$)" - "(?:conclusions?)\\s*(.{50,300}?)(?=\\n|$)"
Let me know if you're experimenting with similar flows — I’d be happy to share templates or compare how DeepSeek performs on your sources!

4mo ago

"What’s Your Preferred Self-Hosted Solution for Deep Monitoring (Beyond Simple Page Changes)?"

Hello! For changedetection.io there is setup instruction with PIP install: https://github.com/dgtlmoon/changedetection.io/wiki/Microsoft-Windows What is your use case?

4mo ago

"What’s Your Preferred Self-Hosted Solution for Deep Monitoring (Beyond Simple Page Changes)?"

Jump

alfablend @lemmy.world 4mo ago

@xyro Thanks for sharing your case! I’ve also tested changedetection.io — it’s a great tool for basic site monitoring.

But in my tests, it doesn’t go beyond the surface. If there’s a page with multiple document links, it’ll detect changes in the list (via diff), but it won’t automatically download and analyze the new documents themselves.

Here’s how I’ve approached this:

Crawl the page to extract links
Detect new document URLs
Download each document and extract keywords
Generate an AI summary using a local LLM
Add the result to a readable feed

P.S. If it helps, I can create a YAML template tailored to your grant-tracking case and run a quick test.