Not able to have an output with a smaller size than the given `max_length`

by Loheek - opened Apr 25, 2024

Apr 25, 2024

•

edited Apr 25, 2024

Hello, when using the 3B model, I am not able to have an output with a smaller size than the given max_length.
It always give a correct answer on first tokens, and then output garbage to fill until max_length is reached.

I use the code in the generate_openelm.py as template.
I tried to adapt the different generation options as described here, like length penalty or changing the generation strategy, but without success.

Is it possible ? Any advice would be very welcome

reginaldlu

Apr 26, 2024

It seems that apple does not release its chat template, so currently it only worked in text-generation (by using llama2's tokenizer)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment