inserting hyphens in words
I got into a discussion with someone recently about the syllabification of <nothing> and whether it was <no-thing> (what I was saying) or <noth-ing> (what they were saying). I was saying that I'm a Linguistics undergrad and I've had to do a lot of weekly problem sets and tutorial activities with TAs on syllabifiying stuff in different languages and one of the first things I learned was that languages will always add as many things to the onset as possible. In the case of <nothing> /ɪŋ/ has no onset and /θ/ is a valid onset in English so /θ/ should act as the onset, it's not even creating a consonant cluster.
However they rightly pointed out that several different dictionaries syllabified it their way, dictionary.com did [ nuhth-ing ] and even in IPA did / ˈnʌθ ɪŋ /, not marking the syllable boundary with a . but still with a space. https://www.dictionary.com/browse/nothing And while they didn't mention Wiktionary, Wiktionary has a thing called "hyphenation" where for <nothing> it's "Hyphenation: noth‧ing" and assuming this is meant to mark syllabification (I don't see what else it could be) then is more evidence in their favour.
Now they pointed out that they had actual sources and all I had were my words and of course they were right. I'd never actually done a reading on syllabification, all I had were lecture slides and the grades on my homework assignments, not actual sources, and they had actual sources, actual dictionaries. They suggested to me 3 possible explanations, I misremembered, unlikely given how much time I'd spent on this over 2 years so far, it was a regional difference, also unlikely given that I've had TAs and profs from all over the anglosphere (Southern US, California, Canada, Nigeria for phonology) and a regional difference upending what I was taught as the golden rule of syllabification seems odd to me, or I was mistaught, the most likely of the 3.
Now obviously I don't think all these people like messed up in teaching me, afaik it's a good program at a good school, though of course if my entire education were misinformed I wouldn't have the skills to comprehend that because the skills I was given were flawed, but that's a path that makes me uncomfortable. I understand that teachers often simplify things for newer students and maybe this rule I was taught actually has way more exceptions than I was taught but this was left for 3rd, or 4th, or master's, or PhD phonology. If this is the case then how does this rule actually work and what conditions <nothing> to behave differently to how I was taught. If this was not the case and I was taught correctly, why do so many dictionaries use this method that doesn't actually represent phonology, what are they instead representing. Sorry if this was too long, I just like phonology and don't like the idea of thinking I understand something and having that all upended.
Edit: weirdly Merriam Webster has for the IPA https://www.merriam-webster.com/dictionary/nothing "ˈnə-thiŋ" so I don't even know anymore
The easiest thing to do, and the only way of being sure you agree with the authorities, is to look words up in the dictionary. Some of the hyphenations currently in American dictionaries make no sense at all. For example, the reason that prai-rie and fair-y are hyphenated the way they are seems to be that 150 years ago, the editors of Webster's dictionary thought they didn't rhyme1; prairie was pronounced pray-ree with a long 'a', while fairy was pronounced fair-ee with an r-colored 'a'.
That said, there are a few hyphenation rules that will let you hyphenate 90% of English words properly (and your hyphenations of the remaining 10% will be perfectly reasonable, even if they disagree with the authorities'). Here they are, in roughly decreasing order of priority:
- Break words at morpheme boundaries (inter-face, pearl-y, but ear-ly).
- Break words between doubled consonants — 'sc' counts here but not 'ck'. (bat-tle, as-cent, jack-et).
- Never separate an English digraph (e.g., th, ch, sh, ph, gh, ng, qu) when pronounced as a single unit (au-thor but out-house).
- Never break a word before a string of consonants that cannot begin a word in English (anx-ious and not an-xious).
- Never break a word after a short vowel in an accented syllable (rap-id but stu-pid).
Finally, if the above rules leave more than one acceptable break between syllables, use the Maximal Onset Principle:
- If there is a string of consonants between syllables, break this string as far to the left as you can (mon-strous).
There are lots of exceptions to these rules:
Sometimes the rules conflict with each other. For example, ra-tio-nal gets hyphenated after a short vowel in an accented syllable because ti acts as a digraph indicating that the 't' should be pronounced 'sh'.
Sometimes it's not clear what constitutes a morpheme boundary: why ger-mi-nate and not germ-i-nate?
Sometimes the pronunciation of a word varies—/væpɪd/ or /veɪpɪd/? Merriam-Webster and American Heritage dictionaries agree that both pronunciations are valid, but they disagree about the hyphenation.
And some hyphenations I can't figure out the reason for: the Maximum Onset Principle would suggest pa-stry, but the authorities all agree on pas-try.
1I believe some American dialects still make this distinction in pronunciation; the editors of Webster's dictionary weren't imagining things.
Vincent McNabb gives good advice generally on when to hyphenate—never if you can get away with it, and if you must, in a sensible place.
However, the question of where to hyphenate is something that dictionaries have answered for generations. Every entry has a word split into syllables, and technically speaking, according to traditional rules of typesetting, you can hyphenate a word at any syllable boundary. For example in the Merriam-Webster's online dictionary, the entry for "dictionary" reads "dic·tio·nary"—so you could hyphenate anywhere there appears a centered dot. Of course there are various rules of thumb and heuristics to choose the best place to hyphenate, and in many cases hyphenating a word dramatically reduces readability, but in a strict answer to OP's original question, it is acceptable to hyphenate a word at any syllable boundary, and you can find all the syllable boundaries in a dictionary.
Syllables (which are a unit of spoken language and nothing per se to do with punctuation or hyphenation) are generally considered to be governed by something called the Maximum Onset Principle, meaning that a syllable consists of a vowel at its centre or nucleus and at its two edges (the onset and coda) zero or more consonants, with the coda first filled with as many consonants as the language in question allows.
These are the principles of syllabification and you'll find a few corner cases. In English, for example, in a word such as "strengths", it might be argued that the final -s actually functions as a though it were a vowel, forming the nucleus of a syllable. In a word such as "university", which intuitively appears to have five syllables, in actual pronunciation it's not clear that the "i" really heads a syllable but might in fact get "merged" into the coda of the previous syllable. Within a language, different speakers can syllabify some sound combinations differently. For example, to most speakers from England, "film" consists of one syllable; to most speakers from Wales, it consists of two syllables. In Spanish, the word "atlas" is probably syllabified "at-las" by a speaker from Spain and "a-tlas" by a speaker from Mexico. But, barring these occasional corner cases, the principle I've just mentioned holds pretty much across languages and there's reasonable consistency and predictability in how speakers of a given language syllabify.
Then, loosely based on syllabification, are rules of hyphenation. Taking something close to "real" syllable divisions as a starting point, in various languages these are then are modified so as not to split up parts of a word that go together as a "unit", or to avoid "odd-looking" hyphenations. So in "university", one might avoid hyphenating as "u-niversity" as it looks a bit odd leaving one letter on its own and also splits up the unit "uni-". The rules might also take account of spelling phenomena which don't reflect pronunciation. So for example in English, there might be a rule to always place a hyphen between consecutive letters representing consonants even where phonologically there is no corresponding syllable break, e.g. im-mune (only one [m] is actually pronounced).
There's no God-given, universally agreed upon "rules" for hyphenation, but there are preferences of individual editors and style guide writers. And as I say, syllabification is more or less consistent, but not 100% so. So dictionary "syllabifications" will differ because (a) what they are giving may or may not be syllabification in the true sense; and (b) there's not necessarily a consensually agreed syllabification or hyphenation.
My recommended rules of hyphenation in English:
- Never hyphenate words. In 2011, what is the real need to hyphenate words?[*]
- If you absolutely absolutely must hyphenate: just leave the hyphen wherever your word processor puts it. There are more important things in life for you to worry about. (Of course, if you are writing the hyphenation algorithm of a word processor, then you need to care a little more, but that's about the only occasion I can think of.)
[*] If you're writing in a more agglutinative language like German or worse Finnish, where you get an average of about 2 words per line of A4, then I would posit that there is more of a case for hyphenation.
Those dots in Merriam Webster do not denote syllables. Note that Merriam Webster gives this pronunciation for university, clearly showing five syllables:
\ˌyü-nə-ˈvər-sə-tē\
I expect this is the same for Collins Cobuild. The dots and pipes show hyphenation points.
I worked on that script on the English Wiktionary and it's not perfect. It's actually really hard to automatically syllabize words with a script.
I would syllabize that word as "pa‧fi‧lo" and the script was supposed to syllabize it like that. Somebody overrode the script. I corrected it.
There are no obligatory rules nor is there an overwhelming tendency in practice for hyphenation in Esperanto. In a Lingva Respondo ("Linguistic Answer") from 1893*, Zamenhof stated that morphological hyphenation would be most logical, but that the question in fact is not important and you can divide words as you like.
It is my impression, however, that the most common, if at all, kind of hyphenation today is according to syllables. The syllable structure of Esperanto is partly flexible nevertheless, you can get an impression from the §§2-3 in the Fundamenta Ekzercaro.
(*) La Esperantisto, 1893, p. 32
Transportante la vortojn el unu linio en la sekvantan, ni ordinare dividas ilin per iliaj partoj gramatikaj, ĉar ĉiu parto gramatika en nia lingvo prezentas apartan vorton. Tiel ni ekzemple dividas: «Esper-anto», «ricev-ita» k.t.p. Sed tio ĉi tute ne estas deviga regulo; ni faras ĝin nur por ne rompi subite kun la kutimoj de aliaj lingvoj: efektive tiu ĉi maniero havas nenian celon kaj signifon, ĉar la transportado de la vortoj estas afero pure papera, havanta nenion komunan kun la leĝoj de la lingvo; ni konsilas al vi per nenio vin ĝeni en la dividado de la vortoj kaj fari ĝin tute tiel, kiel en la donita okazo estos al vi pli oportune. Eĉ se vi dividos ekzemple «aparteni-s», ni vidus en tio ĉi nenion malregulan, kvankam la aliaj lingvoj (tute sen ia logika kaŭzo) ne permesas tian dividadon.