I built a semantic arXiv search engine with AI-generated TL;DRs, claim classification, and paper comparison

Reddit r/artificial 06/08/26, 01:29 PM Tools

semantic-search arxiv ai-summaries open-source nextjs cloudflare paper-comparison claim-classification

Summary

A semantic search engine for arXiv papers featuring AI-generated TL;DRs, claim classification, paper comparison, and more. Built with Next.js, Cloudflare, and open-source models.

No content available

Original Article

View Cached Full Text

Cached at: 06/08/26, 03:21 PM

Teycir/ArxivExplorer

Source: https://github.com/Teycir/ArxivExplorer

Support Development

If this project helps your work, support ongoing maintenance and new features.

ETH Donation Wallet
0x11282eE5726B3370c8B480e321b3B2aA13686582

Scan the QR code or copy the wallet address above.

Fast semantic arXiv paper search with AI-powered summaries — no login required.

“Research papers, decoded..”

Video Demo

Screenshots

Landing Page

Advanced Search Filters

Search papers similar to abstracts

Claim Assessment

Author Pages

Paper Comparison

Explore & Discover

Features

Core Search & Discovery

Hybrid Search — Combines FTS5 keyword search and Vectorize semantic search for accurate results
Advanced Filtering — Filter by author (substring match), citation count, category, and date range
Smart Caching — KV-based caching with 2h TTL for search results, 24h for embeddings
Related Papers — Pre-computed top-8 semantically similar papers via Vectorize
Topic Collections — Curated topics with category mappings (stored in topics table)
Author Pages — Author statistics, timeline visualization, and all papers
Full-Text Search — SQLite FTS5 virtual table with automatic triggers

AI-Powered Features

Pre-Generated Summaries — TL;DR, key contributions, methods, limitations, beginner/technical explanations
Entity Extraction — Keywords, entities (models/datasets/benchmarks), paper type classification
Claim Classification — AI-powered support/contradiction analysis for scientific claims
Smart Abstracts — Enhanced paper metadata with prerequisites and follow-up questions

Paper Management

Bookmarks — Client-side collections with 90-day TTL (100 bookmark soft cap)
Export Options — JSON and BibTeX export for collections
Paper Comparison — Side-by-side comparison view (up to 6 papers)
Revision History — Track paper updates and version differences
Share & Copy — Quick copy for arXiv ID and BibTeX entries

Enrichment & Metadata

Citation Tracking — Semantic Scholar integration with citation count + influential citations
Citation Snapshots — Historical citation data stored in citation_snapshots table
CrossRef Integration — Journal metadata, publisher, license, funders
OpenAlex Data — Concepts, affiliations, institutional data (ROR IDs)
Papers With Code — Code repositories, benchmarks, SOTA rankings (schema ready)

User Engagement

Achievements System — Gamified badges stored client-side with activity tracking
Recent Searches — Search history with suggestions
Personalized Feed — Recommendations based on bookmark history
RSS Feed — /rss.xml with 20 recent papers (1h cache)

Developer Tools

CLI Interface — arxiv-cli for AI assistants (search, trending, topics, authors)
Admin API — Vectorize bulk operations, maintenance endpoints, enrichment triggers

SEO & Discoverability

Dynamic Meta Tags — Open Graph and Twitter Card tags on all paper pages
Sitemap.xml — Auto-generated sitemap with all papers, topics, and authors
Robots.txt — Search engine crawler configuration
Structured Data — JSON-LD schema markup for papers and authors
SSR Content — Server-side rendered pages with full content for crawlers
Canonical URLs — Proper canonical tags to prevent duplicate content
AI Agent Discovery — /ai.txt and /llms.txt routes for LLM tool integration

Performance

Edge Caching — Cloudflare KV with intelligent TTL strategies
ISR Rendering — Next.js ISR with 10-minute revalidation
Zero Login — Instant access to all features
Global CDN — Cloudflare Workers edge deployment

Security

Rate Limiting — Per-IP token bucket on all public endpoints (60-100 req/min) with lockout
SQL Injection Protection — 100% parameterized queries via D1 .prepare().bind()
Input Sanitization — Strict validation on all user inputs (control chars, length limits, allowlists)
Timing-Safe Auth — Admin endpoints use crypto.timingSafeEqual (no timing oracles)
Strict CORS — Explicit origin only (wildcard rejected at startup)
AI Quota Protection — Hard character limits + rate limiting on /api/classify-claim
Error Sanitization — Generic 500 messages (internal details logged server-side only)

See SECURITY.md for full details.

Architecture

Built on Cloudflare’s edge platform for global performance:

Frontend: Next.js deployed as a Cloudflare Worker (via OpenNext + main + assets mode)
API: Cloudflare Workers
Database: Cloudflare D1 (SQLite)
Vector Search: Cloudflare Vectorize
Cache: Cloudflare KV
AI: Workers AI (Llama 3.1 + BGE embeddings) for live inference; local Ollama for bulk ingestion

Deployment note: The frontend is deployed as a Worker (not Cloudflare Pages) to avoid the per-request nonce injection that Pages unconditionally adds to script-src, which breaks the app’s CSP.

System Design

Browser → Next.js Worker → API Worker → KV Cache → D1 Database
                                            ↓
                                       Vectorize
                                            ↑
                                  Ingest Worker (Cron)
                                            ↑
                              Workers AI  /  local Ollama

Data Pipeline

Papers flow through a multi-stage pipeline:

1. Fetch Stage

Ingest worker polls the arXiv API on cron schedule (0 * * * * hourly) and writes new papers to D1 with summary_ready = 0.

2. Summarize Stage

Either the ingest worker (Workers AI, rate-limited) or the local bulk script (Ollama, unlimited) generates:

Structured summaries (tldr, contributions, methods, limitations, explanations)
Paper embeddings for semantic search
Sets summary_ready = 1 when complete

3. Enrichment Stage (optional)

Citations: Semantic Scholar API updates citation counts via cron
CrossRef: DOI-based metadata enrichment (daily cron 30 2 * * *)
OpenAlex: Concepts, affiliations, open access metadata
Papers With Code: Code repositories, benchmarks, SOTA rankings

4. Related Papers

Pre-computes top-8 semantically similar papers using Vectorize and stores in related_papers table.

Cron Schedule

The ingest worker runs on a single cron trigger:

* * * * * — Every minute (processes 1 paper per run with 1 retry on failure; citation updates via Semantic Scholar run in the same cron)

CrossRef enrichment is triggered via the admin endpoint (POST /admin/crossref-batch) rather than a separate cron.

Bulk Local Processing

When remote Workers AI hits rate limits, use the local Ollama pipeline to catch up:

# Process all pending/failed papers from remote D1 using local Ollama
ADMIN_SECRET=<secret> npx tsx scripts/process-pending-local.ts

# Push a fully-processed local DB up to remote D1 + Vectorize
ADMIN_SECRET=<secret> npx tsx scripts/push-local-to-remote.ts

# Bulk ingest (fetch + summarize + embed in one pass)
npx tsx scripts/bulk-ingest.ts --days 7 --categories cs.LG,cs.CL

Both scripts use the D1 REST API directly (no wrangler subprocess per paper), which is ~100× faster than the naive approach and avoids shell-escaping issues with special characters in paper text.

Ollama models used locally:

Role	Model
Summarisation	`gemma4:e4b` (8 B, Q4_K_M)
Embeddings	`nomic-embed-text` (137 M, F16)

Quick Start

Prerequisites

Node.js 18+
Cloudflare account (free tier works)
Wrangler CLI: npm install -g wrangler

Installation

git clone https://github.com/yourusername/arxiv-explorer.git
cd arxiv-explorer
npm install
wrangler login

# Create infrastructure
wrangler d1 create arxiv-explorer
wrangler kv:namespace create CACHE
wrangler vectorize create arxiv-papers --dimensions=768 --metric=cosine

# Update wrangler config files with your IDs
# Edit: wrangler.api.toml, wrangler.ingest.toml, wrangler.jsonc

# Apply database schema (canonical version)
wrangler d1 execute arxiv-explorer --remote --file=migrations/schema.sql

# Copy and fill env files
cp .env.local.example .env.local
cp scripts/config.local.example.ts scripts/config.local.ts
# Edit scripts/config.local.ts with your Cloudflare credentials

Development

npm run dev                                  # Next.js dev server
wrangler dev --config wrangler.api.toml      # API worker
wrangler dev --config wrangler.ingest.toml   # Ingest worker

Visit http://localhost:3000

Deployment

# Full deployment (Next.js + API worker)
./deploy.sh

# Or individually:
npm run deploy          # Next.js frontend (Worker mode via OpenNext)
npm run deploy:api      # API worker
npm run deploy:ingest   # Ingest worker

# Note: deploy.sh does NOT deploy ingest worker
# Deploy ingest worker manually when needed

Project Structure

├── app/                        # Next.js 16 app directory
│   ├── page.tsx               # Home page
│   ├── search/                # Search results
│   ├── paper/[id]/            # Paper detail pages
│   ├── topic/[slug]/          # Topic pages
│   ├── author/[name]/         # Author pages
│   ├── compare/               # Paper comparison
│   ├── diff/[id]/             # Paper revision history
│   ├── bookmarks/             # Bookmark management
│   ├── explore/               # Explore page
│   ├── achievements/          # Achievement tracking
│   ├── claim/                 # Claim classification
│   ├── faq/                   # FAQ page
│   ├── how-to-use/            # User guide
│   ├── rss.xml/               # RSS feed route
│   │   └── route.ts
│   ├── ai.txt/                # LLM discovery route
│   │   └── route.ts
│   ├── llms.txt/              # LLM discovery route
│   │   └── route.ts
│   └── components/            # React components
│       ├── SummarySection.tsx
│       ├── PaperCard.tsx
│       ├── SearchFilters.tsx
│       ├── BookmarkButton.tsx
│       ├── CollectionManager.tsx
│       ├── SearchBoxHome.tsx
│       ├── Navbar.tsx
│       ├── Footer.tsx
│       └── ... (40+ components)
├── src/
│   ├── api-worker/            # Cloudflare Workers API
│   │   ├── index.ts           # Router
│   │   └── routes/
│   │       ├── search.ts      # Hybrid search (FTS5 + semantic)
│   │       ├── paper.ts       # Paper details
│   │       ├── related.ts     # Related papers
│   │       ├── trending.ts    # Trending papers
│   │       ├── topic.ts       # Topic endpoints
│   │       ├── topics.ts      # List topics
│   │       ├── author.ts      # Author endpoints
│   │       ├── authors.ts     # List authors
│   │       ├── claim.ts       # Claim classification
│   │       ├── admin.ts       # Admin endpoints (Vectorize, maintenance)
│   │       ├── stats.ts       # Database statistics
│   │       └── sitemap.ts     # Sitemap generation
│   ├── ingest-worker/         # Background processing (cron)
│   │   ├── index.ts           # Cron entrypoint
│   │   ├── pipeline.ts        # Main ingestion pipeline
│   │   ├── fetch-arxiv.ts     # arXiv API fetcher
│   │   ├── generate-summary.ts
│   │   ├── generate-embedding.ts
│   │   ├── generate-entities.ts
│   │   ├── update-citations.ts # Semantic Scholar sync
│   │   ├── fetch-crossref.ts  # CrossRef enrichment
│   │   ├── fetch-openalex.ts  # OpenAlex enrichment
│   │   ├── fetch-pwc.ts       # Papers With Code enrichment
│   │   ├── compute-related.ts # Related papers computation
│   │   └── tfidf.ts           # TF-IDF utilities
│   └── shared/                # Shared types & utils
│       ├── types.ts           # TypeScript interfaces
│       ├── db.ts              # Database helpers
│       └── utils.ts           # Utilities
├── scripts/
│   ├── push-local-to-remote.ts   # Sync local → remote D1 + Vectorize
│   ├── retry-failed-local.ts     # Reprocess pending papers via Ollama
│   ├── bulk-ingest.ts            # Full bulk ingest pipeline
│   ├── sync-remote-to-local.ts   # Sync remote → local
│   ├── backfill-*.ts             # Various backfill scripts
│   ├── upload-embeddings.ts      # Standalone Vectorize uploader
│   ├── test-*.sh                 # Test scripts
│   ├── config.local.example.ts   # Local config template
│   └── ... (25+ utility scripts)
├── migrations/
│   ├── schema.sql             # Canonical D1 schema (single source of truth)
│   ├── 0001_schema.sql        # Initial migration (legacy)
│   └── 000*.sql               # Other migrations
├── helper/                    # API client helpers
├── lib/                       # Frontend libraries
├── wrangler.api.toml          # API worker config
├── wrangler.ingest.toml       # Ingest worker config
├── wrangler.jsonc             # Next.js worker config (frontend)
├── next.config.ts             # Next.js configuration
├── open-next.config.ts        # OpenNext Cloudflare adapter config
└── deploy.sh                  # Deployment script

API Reference

GET  /api/search?q=attention+mechanisms              # Hybrid FTS5 + semantic search
GET  /api/search?q=...&author=Hinton                  # Filter by author (substring match)
GET  /api/search?q=...&minCitations=10                # Filter by minimum citations
GET  /api/search?q=...&category=cs.LG                 # Filter by arXiv category
GET  /api/search?q=...&date=week                      # Filter by date (day/week/month)
GET  /api/search?q=...&author=X&minCitations=Y&...    # Combine multiple filters
GET  /api/paper/:id                                   # Paper detail + summary
GET  /api/paper/:id/related                           # Semantically similar papers
GET  /api/trending                                    # Trending papers (KV cached)
GET  /api/topic/:slug                                 # Topic paper collection
GET  /api/topics                                      # List all topics
GET  /api/author/:name                                # Author papers and statistics
GET  /api/authors                                     # List authors
GET  /api/stats                                       # Database statistics
GET  /api/sitemap                                     # Sitemap for SEO
GET  /rss.xml                                         # RSS feed (20 recent papers, 1h cache)
GET  /compare?ids=id1,id2,id3                         # Compare up to 6 papers side-by-side

POST /api/classify-claim                             # AI-powered claim classification

# Admin endpoints (x-admin-secret required)
POST /admin/vectorize/upsert                          # Bulk embed upsert
POST /admin/retry-failed                              # Reset summary_ready=2 → 0
POST /admin/backfill-related                          # Backfill related papers
POST /admin/crossref-batch                            # CrossRef batch enrichment
POST /admin/related/clear                             # Clear related papers
POST /admin/related/bulk-insert                       # Bulk insert related papers
POST /admin/kv/delete                                 # Delete KV cache entries
GET  /admin/papers/all                                # Export all papers

Configuration

Environment Variables

# .env.local (Next.js frontend)
NEXT_PUBLIC_API_BASE=https://arxiv-api.yourdomain.workers.dev
API_BASE=https://arxiv-api.yourdomain.workers.dev

// scripts/config.local.ts (for local scripts)
export const CF_TOKEN = 'your-cloudflare-api-token';
export const CF_ACCOUNT_ID = 'your-account-id';
export const CF_D1_ID = 'your-d1-database-id';

Ingestion Settings (`wrangler.ingest.toml`)

[vars]
ARXIV_FETCH_CATEGORIES = "cs.AI,cs.LG"                 # Default fetch categories (add more as needed)
ARXIV_FETCH_LIMIT_PER_CATEGORY = "0"                   # Papers per category per cron (0 = process pending only)
INGEST_MAX_CONCURRENT = "1"                            # Concurrent AI processing
ARXIV_RATE_LIMIT_DELAY_MS = "3000"                     # Delay between arXiv requests
SUMMARY_MODEL = "@cf/meta/llama-3.1-8b-instruct"       # Workers AI summary model
EMBEDDING_MODEL = "@cf/baai/bge-base-en-v1.5"          # Workers AI embedding model
INGEST_PHASE = "hourly"                                # Phase label (informational only)
POLITE_EMAIL = "[email protected]"                # Contact email for arXiv API

# Optional Ollama (local AI)
# OLLAMA_BASE = "https://your-tunnel.trycloudflare.com"
# OLLAMA_SUMMARY_MODEL = "gemma4:e4b"
# OLLAMA_EMBEDDING_MODEL = "nomic-embed-text"

Minutely cron schedule:

Processes exactly 1 pending paper per run (summary_ready = 0 or failed within 7 days)
Retries once on failure (2 total attempts)
Daily quota: 113 papers/day max (5,000 neurons, 50% of daily budget reserved for tooltips)
Quota tracking via KV with automatic reset at 00:00 UTC

Admin Secret

Required for Vectorize upserts, maintenance endpoints, and enrichment endpoints:

# Set for API worker
wrangler secret put ADMIN_SECRET --config wrangler.api.toml

# Use in local scripts
ADMIN_SECRET=your-secret npx tsx scripts/push-local-to-remote.ts

Database Schema

papers

arXiv metadata (id, title, authors, abstract, categories, dates, URLs)
authors_normalized — lowercased for fast prefix search
citation_count — from Semantic Scholar (updated hourly via cron)
citations_updated_at — last citation sync timestamp
summary_ready: 0 = pending · 1 = done · 2 = failed
Additional fields: comment, journal_ref, doi, primary_category

summaries

tldr — one-sentence result
key_contributions — JSON array
methods — JSON array
limitations — JSON array
beginner_explain — plain-language paragraph
technical_summary — researcher-level paragraph
model_version — which model generated it

Supporting tables

paper_categories — normalized category rows (indexed for topic queries)
papers_fts — FTS5 virtual table with insert/update/delete triggers
embeddings_meta — tracks embedding generation per paper
related_papers — pre-computed top-8 semantic neighbors
topics — curated topic collections with category mappings
citation_snapshots — historical citation data for velocity tracking
entity_definitions — terminology definitions for entities

Canonical schema file

The single source of truth is migrations/schema.sql. Additional columns added via incremental migrations (e.g. 0012_summaries_extended.sql adds problem_statement to summaries) must be applied on top with wrangler d1 execute.

Rebuild from scratch

# Apply canonical schema (wipes and recreates all tables)
wrangler d1 execute arxiv-explorer --remote --file=migrations/schema.sql

# Push local data (papers, summaries, categories, FTS, embeddings)
ADMIN_SECRET=<secret> npx tsx scripts/push-local-to-remote.ts

Performance

Search: <240 ms average (KV cache hit) · <400 ms (D1 fallback)
Paper detail: <190 ms average (KV cache hit) · <500 ms (D1 fallback)
Cache hit rate: ~85% (188ms average cache hit time)
Throughput: 50 req/s under mixed load
Edge deployment: Global CDN via Cloudflare Workers
Stress tested: 100 concurrent requests, 0% error rate

Key Features

Citation Tracking

Source: Semantic Scholar API integration
Updates: Automatic cron job (part of ingest worker)
Storage: citation_count and citations_updated_at fields in papers table
History: Citation snapshots stored in citation_snapshots table
Rate Limiting: Respects Semantic Scholar rate limits

Paper Collections

Location: /bookmarks page
Storage: Client-side localStorage
Features:
- Create named collections
- Assign bookmarks to collections
- Export as JSON or BibTeX
- Export all bookmarks or by collection
Capacity: 100 bookmarks (soft cap), 90-day TTL

Advanced Search Filters

Author Filter: ?author=Hinton — substring match across all authors
Citation Filter: ?minCitations=10 — minimum citation threshold
Category Filter: ?category=cs.LG — arXiv category code (cs.LG, cs.CL, cs.CV, etc.)
Date Filter: ?date=week — time window (day/week/month)
Combined Filters: All filters work together and with hybrid search
Caching: Separate KV cache keys per filter combination (2h TTL)
Example: /api/search?q=transformer&author=Vaswani&minCitations=100&category=cs.LG&date=month

RSS Feed

Endpoint: /rss.xml
Content: 20 most recent papers with AI-generated summaries
Format: RSS 2.0 with full TL;DR, key contributions, and methods
Cache: 1-hour TTL via Cloudflare KV
Use Case: Subscribe in your RSS reader to stay updated on new papers
Example: https://arxiv-explorer.yourdomain.com/rss.xml

Paper Comparison

Route: /compare?ids=id1,id2,id3
Capacity: Up to 6 papers side-by-side
Sections: TL;DR, Key Contributions, Methods, Limitations, Technical Summary
Layout: Responsive grid adapts to paper count
Example: /compare?ids=2605.30353,2302.13971,2303.08774

Testing

Integration Tests

cd scripts
./test-integration.sh      # Core functionality tests
./test-new-features.sh     # New features tests
./test-full.sh             # Comprehensive test suite

Stress Testing

cd scripts
./test-stress.sh           # Production load testing

API Deep Testing

cd scripts
./test-api-deep.sh         # Deep API endpoint testing

CLI Tool for AI Assistants

A command-line interface designed for AI assistants (Claude Code, ChatGPT, etc.) to programmatically search and explore papers.

Installation

# Quick install
./install-cli.sh

# Manual
cd cli
npm run build
npm link

Usage

# Search papers
arxiv-cli search "transformer attention" 5

# Get paper details with AI summary
arxiv-cli paper 2605.30353

# Show trending papers
arxiv-cli trending 10

# Browse topics
arxiv-cli topics
arxiv-cli topic large-language-models 20

# Author papers
arxiv-cli author "Yann LeCun" 10

Output Format

Clean, structured text optimized for AI parsing:

ID: 2605.30353
Title: Physics Is All You Need...
Authors: John Doe, Jane Smith...
Published: 2026-06-03
Categories: cs.LG, cs.AI
TL;DR: This paper introduces...
URL: https://arxiv.org/abs/2605.30353

See cli/README.md for complete documentation.

Troubleshooting

Check paper counts

npx wrangler d1 execute arxiv-explorer --remote --config wrangler.api.toml \
  --command="SELECT summary_ready, COUNT(*) as cnt FROM papers GROUP BY summary_ready"

Retry pending/failed papers locally

# Retry up to 50 papers
ADMIN_SECRET=<secret> LIMIT=50 npx tsx scripts/process-pending-local.ts

# Process with higher concurrency (careful with GPU memory)
ADMIN_SECRET=<secret> LIMIT=100 CONCURRENCY=2 npx tsx scripts/process-pending-local.ts

Push local DB to remote

ADMIN_SECRET=<secret> npx tsx scripts/push-local-to-remote.ts

Watch live logs

wrangler tail arxiv-api    --format=pretty   # API worker
wrangler tail arxiv-ingest --format=pretty   # Ingest worker

Sync remote DB to local

npx tsx scripts/pull-remote-to-local.ts

Reset database

./scripts/reset-and-ingest.sh

Design Notes

Why Worker instead of Pages

Deploying the Next.js frontend as a Cloudflare Worker (via OpenNext main + assets) rather than Cloudflare Pages avoids the per-request nonce that Pages unconditionally injects into script-src. That injection happens at the CDN layer before the response reaches the browser, so no amount of middleware or _headers file can override it. The Worker deployment has no such injection and serves the app’s own CSP intact.

The deployment uses:

@opennextjs/cloudflare adapter
OpenNext build: npx opennextjs-cloudflare build
Output: .open-next/worker.js + .open-next/assets/
Wrangler config: wrangler.jsonc with main and assets bindings

Search Algorithm

Normalise query
Check KV cache (2 h TTL)
Parallel:
- D1 FTS5 keyword search (title boosted 10:1:5)
- Vectorize semantic search (query embedding cached 24 h)
Merge (25 % keyword · 75 % semantic), deduplicate
Return top 10, write to KV

Caching Strategy

Lazy KV writes: paper detail written to KV on first access, not at ingestion
Query embedding cache: popular search vectors cached 24 h in KV
Trending KV cache: 60-minute TTL, auto-invalidated on new papers

AI Processing

Single consolidated prompt per paper → structured JSON output
Workers AI uses @cf/meta/llama-3.1-8b-instruct for summaries, @cf/baai/bge-base-en-v1.5 for embeddings
Local Ollama fallback: gemma4:e4b (summaries) + nomic-embed-text (embeddings)
Failed papers marked summary_ready = 2 and retried on next run

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

This project is licensed under the Business Source License 1.1 (BSL 1.1).

✅ Free for personal, academic, and non-commercial use
❌ Commercial use requires a separate license
📅 Converts to MIT License on 2029-06-01

See LICENSE.md for full terms, or contact the author for commercial licensing.

Acknowledgments

arXiv for open access to research papers
Cloudflare for the edge platform
Next.js / OpenNext for the framework + Worker adapter
Ollama for local model inference
SeekYou for BackgroundBeams, DecryptedText, and AnimatedTagline components

🌐 Related Projects

Explore more privacy-first and security tools:

Privacy & Encryption

Timeseal - Time-locked encryption vault with Dead Man’s Switch. AES-256 split-key crypto, ephemeral seals.
Sanctum - Zero-trust encrypted vault with cryptographic plausible deniability. XChaCha20-Poly1305, Argon2id.
GhostChat - True P2P encrypted chat via WebRTC. No servers, no storage, self-destructing messages.
xmrproof - Monero payment verification, 100% client-side.
GhostReceipt - Anonymous receipt generation with zero-knowledge proofs.

Security Tools

BurpAPISecuritySuite - Burp Suite extension for API security testing. 15 attack types, 108+ payloads, BOLA/IDOR detection.
Mcpwn - Automated security scanner for Model Context Protocol servers. Detects RCE, path traversal, prompt injection.
DiffCatcher - Git repo discovery, diff capture, code element extraction.
HoneypotScan - Honeypot detection service for security research.
CheckAPI - LLM API key validator for multiple providers. Privacy-first, client-side validation.
SeekYou - Host intelligence aggregator — unified OSINT across 15 sources for IPs, domains, and ASNs.

MCP Security Servers

burp-mcp-server - MCP server for Burp Suite Professional. Vulnerability scanning via AI assistants.
nuclei-mcp - MCP server for Nuclei. Multi-target scanning, severity filtering.
nmap-mcp - MCP server for Nmap. Stealth recon, vuln/NSE scanning.
frida-mcp - MCP server for Frida. Dynamic instrumentation, SSL pinning bypass.

💼 Services Offered

🔒 Privacy-First Development - P2P applications, encrypted communication, zero-knowledge systems
🚀 Web Application Development - Full-stack development with Next.js, React, TypeScript
🔧 Edge Computing Solutions - Cloudflare Workers, Pages, D1, KV, Durable Objects
🛡️ Security Tool Development - Burp extensions, penetration testing tools, automation frameworks
🤖 AI Integration - LLM-powered applications, intelligent automation, custom AI solutions
🔍 OSINT & Threat Intelligence - Custom reconnaissance tools, threat feed aggregation, IOC correlation

Get in Touch: teycirbensoltane.tn | Available for freelance projects and consulting

Built with 💚 by Teycir Ben Soltane

Teycir/ArxivExplorer

Support Development

“Research papers, decoded..”

Video Demo

Screenshots

Landing Page

Advanced Search Filters

Search papers similar to abstracts

Claim Assessment

Author Pages

Paper Comparison

Explore & Discover

Features

Core Search & Discovery

AI-Powered Features

Paper Management

Enrichment & Metadata

User Engagement

Developer Tools

SEO & Discoverability

Performance

Security

Architecture

System Design

Data Pipeline

1. Fetch Stage

2. Summarize Stage

3. Enrichment Stage (optional)

4. Related Papers

Cron Schedule

Bulk Local Processing

Quick Start

Prerequisites

Installation

Development

Deployment

Project Structure

API Reference

Configuration

Environment Variables

Ingestion Settings (wrangler.ingest.toml)

Admin Secret

Database Schema

papers

summaries

Supporting tables

Canonical schema file

Rebuild from scratch

Performance

Key Features

Citation Tracking

Paper Collections

Advanced Search Filters

RSS Feed

Paper Comparison

Testing

Integration Tests

Stress Testing

API Deep Testing

CLI Tool for AI Assistants

Installation

Usage

Output Format

Troubleshooting

Check paper counts

Retry pending/failed papers locally

Push local DB to remote

Watch live logs

Sync remote DB to local

Reset database

Design Notes

Why Worker instead of Pages

Search Algorithm

Caching Strategy

AI Processing

Contributing

License

Acknowledgments

🌐 Related Projects

Privacy & Encryption

Ingestion Settings (`wrangler.ingest.toml`)