is an optimized communication language better for training and inference? #5342

ouvaa · 2024-02-05T13:04:55Z

ouvaa
Feb 5, 2024

let's say we

reduce the letters to 13 instead of 26, that makes the requirement to process at least halved.
reduce base words for faster processing.
remove plurals, one word per form.

let's say we have this new language, will we have faster inference / training speed?
https://github.com/ouvaa/ouvaa.github.io

BrickBee · 2024-02-05T18:32:20Z

BrickBee
Feb 5, 2024

reduce the letters to 13 instead of 26, that makes the requirement to process at least halved.

That's not how it works. When you copy your full posting here then you'll see that most words get replaced by a single token, a single symbol. The number of characters per word does not matter, if you exclude misspelled words.

reduce base words for faster processing.

Not necessarily faster processing, but most importantly also faster training. While a more minimalist language could help a bit in that regard, it's not needed. Check out the TinyStories paper that followed a similar approach with plain English. The resulting model is so tiny that it fits into the CPU cache and is thus extremely fast.

The description of the language in your git repo is incomplete. Furthermore, you will not be able to train a model without sufficient text in that designed language. Once you have that, you could use train-text-from-scratch from llama.cpp to see what happens. As far as I remember dropout wasn't implemented yet, so you might want to use another framework if you do not have a lot of training data.

0 replies

ouvaa · 2024-02-05T19:33:14Z

ouvaa
Feb 5, 2024
Author

@BrickBee yes u are right, i understand what u mean.
i was thinking about doing text translation into the new language without typos (using llama.cpp).

there are other areas other than just pure training that can be assisted with a language whereby we control and limit the parameters of how it is used. which can be highly benefitial in being cross used in CPU or old school search engines.

can you give reference to "dropout"? what does that mean? the training data will depend upon translating existing english corpus to the new language.

1 reply

ouvaa Feb 5, 2024
Author

looking for contributors to improve the new minimalist language. imagine the GBs we can save with a universal unambigiuos minimal language designed just for AI-human communication

ouvaa · 2024-02-06T20:27:09Z

ouvaa
Feb 6, 2024
Author

@BrickBee i've updated https://ouvaa.com/, pls take a look into it.

The description of the language in your git repo is incomplete.

can you pls give me feedback on this rudimentary version of what i have done? what else do you think i need to improve upon other than having a large examples, dictionary words etc? (these i can get chatgpt to generate)

Anything else u think i will need to enhance for llama training etc?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

is an optimized communication language better for training and inference? #5342

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

is an optimized communication language better for training and inference? #5342

Uh oh!

Uh oh!

ouvaa Feb 5, 2024

Replies: 3 comments · 1 reply

Uh oh!

BrickBee Feb 5, 2024

Uh oh!

Uh oh!

ouvaa Feb 5, 2024 Author

Uh oh!

ouvaa Feb 5, 2024 Author

Uh oh!

Uh oh!

ouvaa Feb 6, 2024 Author

ouvaa
Feb 5, 2024

Replies: 3 comments 1 reply

BrickBee
Feb 5, 2024

ouvaa
Feb 5, 2024
Author

ouvaa Feb 5, 2024
Author

ouvaa
Feb 6, 2024
Author