Salma Mayorquin PRO

salma-remyx

AI & ML interests

None yet

Recent Activity

reacted to pbhappliedsystems's post with ๐Ÿ”ฅ about 1 hour ago
๐Ÿš€ **New flagship dataset โ€” and an argument about what a dataset card should be.** Most synthetic datasets on the Hub ship row counts, a license, and little else โ€” pipeline opaque, rejection criteria unstated, compliance unaudited. We published the opposite. **SynthEval Cloud โ€” Regulated-Domain Synthetic Instruction Dataset** ๐Ÿ‘‰ https://huggingface.co/datasets/pbhappliedsystems/syntheval-cloud-regulated-instruct-1k **1,116** quality-gated instruction records across **7 regulated domains** (medical, legal, GDPR, privacy, education, e-commerce, transport). Every record cleared a documented cascade, not a vibe check: - ๐Ÿงช **Dual-signal hallucination gate** โ€” rejects only when embedding cosine *and* keyword-overlap both fail; a low score alone never rejects. - ๐Ÿ”’ **Layered PII masking + independent leak audit** โ€” a separate over-reporting scanner found **0.0% residual leak** across all 1,116 records. - ๐Ÿ“Š **Whole-corpus evaluation, not a sample** โ€” MATTR **0.769**, mean cosine **0.73**, **0%** near-duplicates, **96.9%** yield. - ๐Ÿงพ **The 36 rejections ship too**, each tagged with its failing gate. Removal at the gate is the product; we show our work. Every number on the card is a field in the `evaluation_report.json` shipped beside the data โ€” full methodology + provenance (Mistral-Nemo AWQ W4A16 ยท vLLM 0.8.5.post1 ยท Modal A10G). One release from **SynthEval**: Studio (local GPU) + Cloud (Modal+vLLM), proving quality parity across substrates. ๐Ÿ“„ Whitepaper: https://pbhappliedsystems.com/SynthEval_Studio_and_Cloud_Quality-Gated_Synthetic_Data_Generation.pdf ๐Ÿ”Ž Overview: https://pbhappliedsystems.com/synthetic-data.html **CC BY 4.0** โ€” commercial use welcome, just credit it. Need defensible synthetic data at scale? Let's talk. โ€” Patrick Hill, PBH Applied Systems
liked a model 3 days ago
remyxai/dockergen-0.5b
View all activity

Organizations

Remyx AI's profile picture