inserting hyphens in words
I worked on that script on the English Wiktionary and it's not perfect. It's actually really hard to automatically syllabize words with a script.
I would syllabize that word as "pa‧fi‧lo" and the script was supposed to syllabize it like that. Somebody overrode the script. I corrected it.
There are no obligatory rules nor is there an overwhelming tendency in practice for hyphenation in Esperanto. In a Lingva Respondo ("Linguistic Answer") from 1893*, Zamenhof stated that morphological hyphenation would be most logical, but that the question in fact is not important and you can divide words as you like.
It is my impression, however, that the most common, if at all, kind of hyphenation today is according to syllables. The syllable structure of Esperanto is partly flexible nevertheless, you can get an impression from the §§2-3 in the Fundamenta Ekzercaro.
(*) La Esperantisto, 1893, p. 32
Transportante la vortojn el unu linio en la sekvantan, ni ordinare dividas ilin per iliaj partoj gramatikaj, ĉar ĉiu parto gramatika en nia lingvo prezentas apartan vorton. Tiel ni ekzemple dividas: «Esper-anto», «ricev-ita» k.t.p. Sed tio ĉi tute ne estas deviga regulo; ni faras ĝin nur por ne rompi subite kun la kutimoj de aliaj lingvoj: efektive tiu ĉi maniero havas nenian celon kaj signifon, ĉar la transportado de la vortoj estas afero pure papera, havanta nenion komunan kun la leĝoj de la lingvo; ni konsilas al vi per nenio vin ĝeni en la dividado de la vortoj kaj fari ĝin tute tiel, kiel en la donita okazo estos al vi pli oportune. Eĉ se vi dividos ekzemple «aparteni-s», ni vidus en tio ĉi nenion malregulan, kvankam la aliaj lingvoj (tute sen ia logika kaŭzo) ne permesas tian dividadon.
I got into a discussion with someone recently about the syllabification of <nothing> and whether it was <no-thing> (what I was saying) or <noth-ing> (what they were saying). I was saying that I'm a Linguistics undergrad and I've had to do a lot of weekly problem sets and tutorial activities with TAs on syllabifiying stuff in different languages and one of the first things I learned was that languages will always add as many things to the onset as possible. In the case of <nothing> /ɪŋ/ has no onset and /θ/ is a valid onset in English so /θ/ should act as the onset, it's not even creating a consonant cluster.
However they rightly pointed out that several different dictionaries syllabified it their way, dictionary.com did [ nuhth-ing ] and even in IPA did / ˈnʌθ ɪŋ /, not marking the syllable boundary with a . but still with a space. https://www.dictionary.com/browse/nothing And while they didn't mention Wiktionary, Wiktionary has a thing called "hyphenation" where for <nothing> it's "Hyphenation: noth‧ing" and assuming this is meant to mark syllabification (I don't see what else it could be) then is more evidence in their favour.
Now they pointed out that they had actual sources and all I had were my words and of course they were right. I'd never actually done a reading on syllabification, all I had were lecture slides and the grades on my homework assignments, not actual sources, and they had actual sources, actual dictionaries. They suggested to me 3 possible explanations, I misremembered, unlikely given how much time I'd spent on this over 2 years so far, it was a regional difference, also unlikely given that I've had TAs and profs from all over the anglosphere (Southern US, California, Canada, Nigeria for phonology) and a regional difference upending what I was taught as the golden rule of syllabification seems odd to me, or I was mistaught, the most likely of the 3.
Now obviously I don't think all these people like messed up in teaching me, afaik it's a good program at a good school, though of course if my entire education were misinformed I wouldn't have the skills to comprehend that because the skills I was given were flawed, but that's a path that makes me uncomfortable. I understand that teachers often simplify things for newer students and maybe this rule I was taught actually has way more exceptions than I was taught but this was left for 3rd, or 4th, or master's, or PhD phonology. If this is the case then how does this rule actually work and what conditions <nothing> to behave differently to how I was taught. If this was not the case and I was taught correctly, why do so many dictionaries use this method that doesn't actually represent phonology, what are they instead representing. Sorry if this was too long, I just like phonology and don't like the idea of thinking I understand something and having that all upended.
Edit: weirdly Merriam Webster has for the IPA https://www.merriam-webster.com/dictionary/nothing "ˈnə-thiŋ" so I don't even know anymore
The easiest thing to do, and the only way of being sure you agree with the authorities, is to look words up in the dictionary. Some of the hyphenations currently in American dictionaries make no sense at all. For example, the reason that prai-rie and fair-y are hyphenated the way they are seems to be that 150 years ago, the editors of Webster's dictionary thought they didn't rhyme1; prairie was pronounced pray-ree with a long 'a', while fairy was pronounced fair-ee with an r-colored 'a'.
That said, there are a few hyphenation rules that will let you hyphenate 90% of English words properly (and your hyphenations of the remaining 10% will be perfectly reasonable, even if they disagree with the authorities'). Here they are, in roughly decreasing order of priority:
- Break words at morpheme boundaries (inter-face, pearl-y, but ear-ly).
- Break words between doubled consonants — 'sc' counts here but not 'ck'. (bat-tle, as-cent, jack-et).
- Never separate an English digraph (e.g., th, ch, sh, ph, gh, ng, qu) when pronounced as a single unit (au-thor but out-house).
- Never break a word before a string of consonants that cannot begin a word in English (anx-ious and not an-xious).
- Never break a word after a short vowel in an accented syllable (rap-id but stu-pid).
Finally, if the above rules leave more than one acceptable break between syllables, use the Maximal Onset Principle:
- If there is a string of consonants between syllables, break this string as far to the left as you can (mon-strous).
There are lots of exceptions to these rules:
Sometimes the rules conflict with each other. For example, ra-tio-nal gets hyphenated after a short vowel in an accented syllable because ti acts as a digraph indicating that the 't' should be pronounced 'sh'.
Sometimes it's not clear what constitutes a morpheme boundary: why ger-mi-nate and not germ-i-nate?
Sometimes the pronunciation of a word varies—/væpɪd/ or /veɪpɪd/? Merriam-Webster and American Heritage dictionaries agree that both pronunciations are valid, but they disagree about the hyphenation.
And some hyphenations I can't figure out the reason for: the Maximum Onset Principle would suggest pa-stry, but the authorities all agree on pas-try.
1I believe some American dialects still make this distinction in pronunciation; the editors of Webster's dictionary weren't imagining things.
Vincent McNabb gives good advice generally on when to hyphenate—never if you can get away with it, and if you must, in a sensible place.
However, the question of where to hyphenate is something that dictionaries have answered for generations. Every entry has a word split into syllables, and technically speaking, according to traditional rules of typesetting, you can hyphenate a word at any syllable boundary. For example in the Merriam-Webster's online dictionary, the entry for "dictionary" reads "dic·tio·nary"—so you could hyphenate anywhere there appears a centered dot. Of course there are various rules of thumb and heuristics to choose the best place to hyphenate, and in many cases hyphenating a word dramatically reduces readability, but in a strict answer to OP's original question, it is acceptable to hyphenate a word at any syllable boundary, and you can find all the syllable boundaries in a dictionary.