ยท
AI & ML interests
None yet
Recent Activity
reacted to sergiopaniego's post with ๐ฅ about 1 hour ago OpenEnv has a new home: github.com/huggingface/OpenEnv
Starting today, it's coordinated by a committee that includes Meta-PyTorch, Reflection, Unsloth, Modal, Prime Intellect, Nvidia, Mercor, Fleet AI, and Hugging Face
frontier labs train their models and their harnesses together. Claude knows Claude Code. GPT-5.5 knows Codex. that's not an accident, it's training. open-source models deserve the same magic, but pulling that off requires infrastructure that belongs to everyone, not one lab
OpenEnv is that layer. one api, any harness, any trainer, any environment
Rewards and training loops stay in TRL, Unsloth, wherever you already work. OpenEnv is the socket they all plug into
Get involved!
Full announcement: https://huggingface.co/blog/openenv-agentic-rl reacted to pbhappliedsystems's post with ๐ฅ about 1 hour ago ๐ **New flagship dataset โ and an argument about what a dataset card should be.**
Most synthetic datasets on the Hub ship row counts, a license, and little else โ pipeline opaque, rejection criteria unstated, compliance unaudited. We published the opposite.
**SynthEval Cloud โ Regulated-Domain Synthetic Instruction Dataset**
๐ https://huggingface.co/datasets/pbhappliedsystems/syntheval-cloud-regulated-instruct-1k
**1,116** quality-gated instruction records across **7 regulated domains** (medical, legal, GDPR, privacy, education, e-commerce, transport). Every record cleared a documented cascade, not a vibe check:
- ๐งช **Dual-signal hallucination gate** โ rejects only when embedding cosine *and* keyword-overlap both fail; a low score alone never rejects.
- ๐ **Layered PII masking + independent leak audit** โ a separate over-reporting scanner found **0.0% residual leak** across all 1,116 records.
- ๐ **Whole-corpus evaluation, not a sample** โ MATTR **0.769**, mean cosine **0.73**, **0%** near-duplicates, **96.9%** yield.
- ๐งพ **The 36 rejections ship too**, each tagged with its failing gate. Removal at the gate is the product; we show our work.
Every number on the card is a field in the `evaluation_report.json` shipped beside the data โ full methodology + provenance (Mistral-Nemo AWQ W4A16 ยท vLLM 0.8.5.post1 ยท Modal A10G).
One release from **SynthEval**: Studio (local GPU) + Cloud (Modal+vLLM), proving quality parity across substrates.
๐ Whitepaper: https://pbhappliedsystems.com/SynthEval_Studio_and_Cloud_Quality-Gated_Synthetic_Data_Generation.pdf
๐ Overview: https://pbhappliedsystems.com/synthetic-data.html
**CC BY 4.0** โ commercial use welcome, just credit it. Need defensible synthetic data at scale? Let's talk.
โ Patrick Hill, PBH Applied Systems View all activity Organizations
salma-remyx/vqasynth_testing_evals_eval
Viewer
โข Updated โข 5 โข 49
salma-remyx/vqasynth_testing_evals
Viewer
โข Updated โข 5 โข 32
salma-remyx/vqasynth_testing_evals_full_reasoning
Viewer
โข Updated โข 5 โข 21
salma-remyx/vqasynth_sample_processed
Viewer
โข Updated โข 5 โข 16
salma-remyx/vqasynth_sample_processed_full
Viewer
โข Updated โข 5 โข 15
salma-remyx/remyxai_docker_images_with_content
Viewer
โข Updated โข 10.4k โข 25
โข 1
salma-remyx/remyxai_docker_images
Viewer
โข Updated โข 10.4k โข 9
โข 1
salma-remyx/vqasynth_sample_processed_test
Viewer
โข Updated โข 5 โข 17
salma-remyx/vqasynth_sample_processed_test_full
Viewer
โข Updated โข 5 โข 17
salma-remyx/SpaceOm_MindCube_Results
Updated โข 25
salma-remyx/SpaceThinker_SpatialScore-Hard
Updated โข 3
salma-remyx/SpaceOm_SpatialScore-Hard
Updated โข 3
salma-remyx/SpaceOm_OmniSpatial
Updated โข 5
salma-remyx/SpaceThinker_SpaCE-10_Results
Preview
โข Updated โข 4
salma-remyx/SpaceQwen_SpaCE-10_Results
Preview
โข Updated โข 4
salma-remyx/SpaceOm_SpaCE-10_Results
Preview
โข Updated โข 3
salma-remyx/SpaceOm_SpatialScore
Updated โข 9
โข 1
salma-remyx/SpaceThinker_SpatialScore
Updated โข 5
โข 1
salma-remyx/Q-Spatial-Bench-sMAPE-Comparison
Viewer
โข Updated โข 13 โข 10
โข 1
salma-remyx/vqasynth_sample_processed_dummy
Viewer
โข Updated โข 5 โข 2
salma-remyx/vqasynth_sample_processed_dummy_full
Viewer
โข Updated โข 5 โข 6
salma-remyx/localllama-sentiment-Why-new-models-feel-dumber
Viewer
โข Updated โข 20 โข 5
โข 1
Viewer
โข Updated โข 8 โข 6
salma-remyx/vqasynth_processed_r1_12k
Viewer
โข Updated โข 12.7k โข 6
salma-remyx/vqasynth_processed_r1_12k_full_reasoning
Viewer
โข Updated โข 12.7k โข 16
salma-remyx/ffmperative-sample
Viewer
โข Updated โข 1.89k โข 15
Viewer
โข Updated โข 6.38k โข 41
โข 1
salma-remyx/vqasynth_nas_example_ds
Viewer
โข Updated โข 51 โข 14
salma-remyx/vqasynth_nas_example_ds_full
Viewer
โข Updated โข 51 โข 7
salma-remyx/nas_example_ds
Viewer
โข Updated โข 58 โข 8