Alex Soto & Markus Eisele

RAG like a hero with Docling

Your RAG pipeline has security holes you haven't considered. Learn to defend against data poisoning and a new class of vector store attacks.

RAG like a hero with Docling
#1about 3 minutes

Using RAG to enrich LLMs with proprietary data

Retrieval-augmented generation (RAG) is the key to making large language models useful for enterprises by providing them with up-to-date, proprietary information.

#2about 4 minutes

The challenge of parsing complex document structures

Simple document parsers can misinterpret layouts like multi-column text, leading to corrupted data and incorrect outputs from the language model.

#3about 3 minutes

Using Docling to convert documents into structured formats

Docling is an open-source tool that acts like an advanced OCR service, converting various binary document formats into a structured, parsable tree.

#4about 7 minutes

Demo of a basic RAG ingestion pipeline

A live demonstration shows how a Quarkus application uses Docling to ingest a PDF, generate embeddings, and store the resulting chunks and vectors in Redis.

#5about 3 minutes

Securing RAG against data poisoning and leaks

To prevent data poisoning and sensitive data leaks, it is crucial to sanitize documents, verify their signatures, and use tools for PII masking.

#6about 4 minutes

Mitigating vector store attacks and encryption challenges

Vector stores are vulnerable to attacks like close vector modification and reversal, and standard encryption breaks vector distance, requiring specialized solutions.

#7about 5 minutes

Demo of a secure ingestion pipeline in action

A final demonstration showcases a secure pipeline that verifies document signatures, anonymizes sensitive data, and encrypts vectors before storing them.

Related jobs
Jobs that call for the skills explored in this talk.

Featured Partners

Related Articles

View all articles
DC
Daniel Cranney
Dev Digest 211: Securing Agents, Top AI Apps and Lost Readers…
Inside last week’s Dev Digest 211 . 🏗️ Can the infrastructure keep up with AI growth? 📱 Top 100 GenAI consumer apps 🪱 Wikipedia hit by worm and AI slop 🔍 The results of Codex Security scanning 1.2M commits 🧹 Bye bye innerHTML, welcome setHTML() 🔄 Cl...
Dev Digest 211: Securing Agents, Top AI Apps and Lost Readers…
DC
Daniel Cranney
Dev Digest 210: AI Agents Are Go! Is MCP Dead? LLMs Crack Anonymity
Inside last week’s Dev Digest 210 . 🪦 Is MCP already dead? 🐍 Secure snake on the CLI 🏗️ The architecture behind open source LLMs ⚖️ AI companies and governments at odds 🦫 Is Go the best language for AI agents? 🕵️ “Security research” bot hacks Micros...
Dev Digest 210: AI Agents Are Go! Is MCP Dead? LLMs Crack Anonymity
DC
Daniel Cranney
Dev Digest 205: AI vs. OSS, Hidden ChatGPT Features, Linux in a PDF
Inside last week’s Dev Digest 205 . 😔 The end of the curl bug bounty 📝 Agent Skills vs. Rules vs. Commands 💬 The best hidden ChatGPT features 📅 Weaponising calendar invites 🟪 CSS in 2026 🐍 Python numbers you should know 👨‍💻 The Github Copilot SDK 💻 ...
Dev Digest 205: AI vs. OSS, Hidden ChatGPT Features, Linux in a PDF

From learning to earning

Jobs that call for the skills explored in this talk.

Data Analyst

Robert Ragge GmbH

Intermediate
Python
Tableau
A/B testing
Data analysis
Google BigQuery
+2