Glossary Updated

This post describes a recent process to update the glossary found on this blog. I believe a reader should know how a glossary is assembled in order to know how much to  trust its accuracy so I’m trying to be as transparent about process as possible. Furthermore my glossary has two “biases”: 1) it is aimed at terms found in Spain, not any Spanish term from anywhere, and, 2) I (mostly) only include terms I’ve actually found on the hundreds of  menus from restaurants in  Spain I’ve collected and analyzed to create a highly curated corpus. So while the glossary has considerable effort in constructing it naturally it still has errors as it was manually compiled. But I believe it is one of the better and more exhaustive glossaries you’ll find, at least for free on the Net.

After eight more days of work since my post about this effort I decided to call it “done” and update my glossary page as version 4.0. The glossary gained about 150 items, had numerous errors corrected (especially spelling, especially accents), had some definitions changed or enhanced, and adopted my “syntax” to show all the forms of this word under under a single “lemma” (just learned this term from linguistics).

Despite all the work I did there are still mistakes, omissions, inconsistencies in the lemma representations and other errors. This is the challenge of manually editing a large amount of material, even while trying to be very careful. Each time I do this manually I learn a bit more about how I’ll have to create the software to create and manage a properly curated corpus which I’ll need for my translation application.

Not every term in this glossary is really a “translation” to English as often there is no translation. So instead, based on terms I have found in the many menus from Spain restaurants that I’ve analyzed as the “raw” data, I have sometimes had to supply a description instead of either a “definition” or a translation. For instance, I researched and added most of the names of grapes used in Spanish wines, olives used in tapas and cheeses used in various dishes. While one might translate Cabrales as “blue cheese” this isn’t that helpful so descriptions work better.

So almost every term in my glossary I have found in menus. There are more terms in the various glossaries I’ve found and assembled but unless I actually see a term used in a menu in Spain I can’t be certain some term from some other glossary actually applies to Spain. Or, of course, Spanish food terms in other parts of the world may mean something entirely different than they do in Spain and so I’m trying (as best I can) to focus on the vocabulary one would encounter in Spain.

I may do some more “fixes” or additions to this glossary but I don’t expect to do another major revision. As it is this is now one of the largest glossary you’ll find anywhere on the net (and perhaps the easiest to access, just a single, albeit, long webpage, not some more complex access scheme). So while this glossary, like anything you find on the Net, is easily available one should ALWAYS be somewhat skeptical as the editor is human and makes mistakes, so check with authoritative sources for any terms that might really matter for you.

Advertisements

A look at my drill application

Since I’ve mentioned this in multiple posts I thought I’d provide a little more detail. Here’s a screen shot with some food terms.

Ugh, WordPress is hard to get images right, hope this looks OK after saving. Good, for some reason the image looks bad in WordPress’ post editor but I chopped the screenshot to fit and it looks OK after posting.

BTW: Spanish readers out there will note kokotxa in this list which is really Basque, not Castilian which would be cococha.

Anyway, the basic idea is to load a random (though biased to get most effective drilling) set of words and then I visually examine them. Most drills do some sort of “quiz” but this is for me so I just scan the list.

If I don’t instantly know the translation I click the word. That gives me a score of -1 (otherwise if I don’t click a word it gets a score of 0, for appearing but “known”). I don’t “cheat”, since this is just for me, so I don’t need a quiz.

But if I have the least bit of doubt I click and then I see the translation. Then I decide: a) was this a mistake that I clicked and then click Ignore button, b) if I thought I knew the answer but was wrong, then I click the Wrong button and my score becomes -3, and, c) if I really didn’t know at all (or my “guess” was wildly wrong) I click the “no clue” button and get a score of -10).

After I’ve looked at all the words I click Done to record the results. Then I click Drill to get a new set of words (which is more likely to repeat wrongs with scores other than 0). I continue as long as I can stand and then click Save (unless I’m just testing code) and the scores are then added to the XML database.

And if I’m sure I want to record the results then I can use the File menu item to save a new copy of the the XML.  The XML Editor and XML Update are what I use to fix issues in the database itself.

All the drill results are saved in another part of the XML (eventually making it very large, hurrah for having lots of RAM to have all this in memory – I come from the days when RAM was scarce and had to do lots of programming tricks, now I just brute force all this).

Then I have an analysis routine (WIP) to consolidate all the scores over all the drill sessions to find out which words are worst (lots of mistakes, therefore drill more) and which are best (few or no mistakes, so only drill after some time has passed).

While I intend to create other types of drills this is “good enough” to have me looking at a fair portion of my vocabulary every day (todos los días) and thus keep refreshing my wetware memory. I can’t do this very long (so the magenta number on the screen shot is a timer of how long I’ve been doing drills, rarely do I exceed 20 minutes) because I’ll start having “short-term” memory (since my mistakes are more likely to repeat in the drill, by design) and so I begin to “know” them, but not really.

I’m focusing the drill (really the way I’ve created the XML database) on recognizing the Spanish, since, again, my goal is reading menus, not writing them. So my database is (now) poorly structured for doing English drills, which is harder than the Spanish drills, but more useful if I need to be able to ask questions about the menus.

And of course this is all “written” rather than spoken drills and to be really helpful I actually need to know how hablar a camarero but I’m getting there.

Back to menus; a big project

My primary purpose for this blog is to record my progress in developing an application to translate menus in Spain. I worked diligently on this for about nine months but then got into some side-trips in other projects. But now I’m trying to get back to that primary objective.

For 78 days now I’ve also been trying to actually learn Spanish via the nice online application, Duolingo. While this diverted me from my primary task it has been useful. My sister always thought my idea was silly and that instead I should just learn the language. That’s not a bad idea but it looked harder (and more time consuming) than my primary limited work just to read menus, based on the assumption I’d soon be heading to Spain to tour along the route of the Camino de Santiago. Therefore I needed results sooner than I could learn the language.

To build my application I’d first need a large corpus of terms from menus with accurate English equivalents. To do that I’d import the text from websites into a working document and crunch through all the terms. Often that gave me some interesting observations that I was converting to posts, hopefully also interesting to my readers. Obviously there are going to be mistakes in manually collating data so my corpus needed to be carefully curated, with the terms and my “guesses” at translation with a “confidence” factor. Then via the large corpus I could extract the accurate equivalent Spanish to English translations I’d need for the application.

That’s a long slog so a couple of times I went ahead and created a minimally curated “glossary” which I have as a page here at this site. In my searches I found a number of glossaries, or even dictionaries in Spanish, covering food. Years ago when I first got interested in these I just extracted all the glossaries I could find and manually collated them into a single glossary. It was a mess!

The trouble is that food terms in Spanish (my searches) yield results that either don’t apply to Spain’s food dialect or were just wrong. After all any other person who compiles glossaries makes mistakes too. Or I’d make mistakes extracting and collating them. And my lack of any fluency in Spanish meant I often misinterpreted the raw material I was attempting to organize. That previous experience convinced me I needed to be very precise about collating material AND focused on Spain as the source of the raw material and so my idea about creating a corpus evolved.

But in nearly a year I still don’t have that corpus. And without it I can’t build my application. And in the meantime I needed to get some “drill” code done since I reached the point where I was forgetting more than I was learning. And while Duolingo is fairly good for learning Spanish it’s not as good for repeating previous lessons (and their vocabulary). And repetition is the key to learning a language. So I found myself forgetting vocabulary I’d once before acquired.

So I set out to build a drill application, which has some of the same elements I’d need in the translation application. And like compiling glossaries I’ve done this also, in the past – the first time for Italian food terms. So I’ve built drill programs before with only limited success.

The key to a drill program is to be efficient and force me to do repetitions of the vocabulary I know the least well. That’s harder than it sounds. Plus most of the types of drill I did (glorified flashcards, a common language learning technique) took so much time that as my vocabulary grew my repetition, of any particular word, got less and less frequent. Even with an hour a day I could only repeat a fraction of the vocabulary I’d acquired.

So I had some ideas how to improve this and make the drill more efficient. But I needed data even to do the programming. So I fairly quickly assembled the glossary I posted at this blog without being too concerned about its accuracy.

So with that lengthy background now I can describe what I’ve more recently done and the “big project” I’m now doing. I built my first version of the drill application centered around the Duolingo vocabulary. As I’d do each lesson I would fairly careful assemble the “database” (a complex XML) to feed the feed program. For my Duo vocabulary that now contains about 1100 “terms” and 1400 “forms” of those terms. By forms I mean the usual four spellings of adjectives (in Spanish both gender and number) and the first set of conjugations for verbs. Getting all that going for Duo vocabulary drills got me a fairly useful and efficient drill program which is helpful as a supplement to Duolingo.

So then using that code and crunching the glossary I’d assembled here I started on the food terms. And that was a bit of a mess because the glossary sucked.

So to fix this I went back to my 30 or so working documents of all the menus I’d processed. Rather than the more difficult chore of extracting material for a well curated corpus I just quickly (a couple of days) just extracted all the accumulated Spanish. That’s a tedious chore but it does reveal some of the problems of getting “raw” material from the websites. Naturally I found lots of spelling mistakes (easier for me to recognize now that I know a little Spanish) but also the inconsistencies in gender and sometimes number. Also many instances of words are very inconsistent on the use of accents in the Spanish words. My Duolingo study also let me learn the rule that accents sometimes change (for real, not typos) in certain circumstances.

So once I’d compiled all my “words” from all menus I had about 10,000 “raw” bits that I was able to clean up, de-duplicate and consolidate (like all the forms of adjectives under a single “term”) and ended up with about 5500 lines.

Then in a separate process I took the latest (v3.3) copy of my glossary and then combined that with about six other glossaries. That was a chore and resulted in about 4000 entries.

So then I combined these, all the glossary “words” and all the menu “words” and started going through all that by hand. I’m now down with everything through M (since I sort all 9000 or so lines into alphabetic order). I’ve done a few hundred “fixes” to my glossary and about 100 additions. But more importantly all those changes are in my XML “database” for the drill program. With a bit of code I can then extract from that XML to create text I can paste into the glossary page here.

So when I’m finally done with all that tedious manual work I can update my glossary and it will be a big change so I’ll make that the v4.0 version which I believe will be quite a bit better than my current v3.3 but not as good as a curated corpus needs to be. And, really my glossary will then mostly contain words that exist in reference sources (several online dictionaries I use) and/or reconciliation with the other glossaries I found.

Please note, therefore, than my word product is fully derivative from many sources and my editorial work and thus constitutes “original” work. I’m quite conscious of never (almost never) posting anything in this blog that would violate copyright, i.e. the wholesale use of someone else’s glossary.

And now all my material is synchronized – my XML database for the drill program, my derived glossary with reconciliation to other glossaries or reference sources, and I’m only including terms in either place that I’ve found in menus so my product is more closely aligned with Spain dialect and I can exclude other Spanish food terms.

Now, while that isn’t done, I’m back into the code for my drill program. In the case of my Duolingo vocabulary I feed into the drill program I (mostly) know that vocabulary by memory. Duolingo is divided into lessons (aka skills) that require 40 actual drills (to pass the skill and unlock the next one) which means about 800 individual drills. At Duolingo I’ve now done 16,843 “XPs” over 31 skills. On average each skill introduces around 30 words (forms actually). So when I do my “refresh my memory” drills with that vocabulary I have relatively few words I ever mark as uncertain, or worse, “I’m wrong” or “I’m clueless” (really forgot). That means all the scoring I’ve done with that vocabulary has relatively few “errors” and my aggregate score on most terms is 100%.

In contrast I’m much worse on my new food vocabulary. As I’d work on menus I’d “learn” many words, but since I had almost no repetition of those (the most common words appear on many menus so that was my repetition) and I’d done none of my own drill. Now that I have something to feed my drill program I’m getting a lot more “bad” scores. That’s good and bad. It’s bad because it means I don’t know those words very well, by memory. It’s good because now all the scoring of the drills I record in the XML has a lot more data than the drills on Duolingo vocabulary.

So that means back to programming. How do I consolidate tens of thousands of individual drills into some sort of metric that rates each word in the vocabulary as to how well I know it (and/or don’t confuse similar terms). Because I want to drill myself on what I know the least. I don’t very much need to drill on carne or aqua or cerveza or a few hundred other food words and I don’t want to waste the limited time I have for drills (even less than my free time because drill is tedious and I can only tolerate a certain amount each day). So that’s now the algorithms I’m trying to develop so my drill program is even more efficient and therefore more useful.

So while I thought I’d be done with this by now I have probably another week to finish cleaning up my food vocabulary and enhancing up my drill program.  But once I’m done with that I can spend 15-30 minutes every day (or most days) so I get more of the food vocabulary into longer-term memory along with a growing Duolingo vocabulary. Thus I’d hope to have reasonable fluency within a few months so soon I may need to head to some Spanish speaking country to test myself.

Now, note, all this is “reading” (and less “writing”) Spanish. Hearing or speaking is an entirely different problem. But without mastery over much of the vocabulary actual conversation is pretty hopeless. I’d originally assumed I’d have no more audible Spanish than a few phrases and the rest I’d do through reading (plenty of time to study a menu, have to be fast to have conversation).

Now, finally, all this I’m just doing for myself, other than relating some hopefully “interesting” tidbits here in the blog. While I’ve built many software products over my working life all this I’m just doing for myself. But at least, as a derivative from this work, I do hope to end up with the best glossary for food terms in Spain here at this blog as my contribution to others who might need this.

 

Still chugging along the Camino, still learning Spanish

I’ve been so much buried in digressions I haven’t had any time to post. You might remember that my project, which is the primary subject of this blog, is to find as many menus as possible from restaurants in Spain, figure out what they “mean” (not just purely translate), build up a corpus of menu terminology to drive the creation of an application to translate menus.

So much for that, as I haven’t been doing any of that for about a month. In addition I continue to do stationary exercise in my basement to try to stay in shape and/or control my weight (lose a little ideally) and potentially build up to a real walk. So I take my mileage on a treadmill and convert it to a location along the Camino (the French route). While I’ve kept up exercise I’ve meanwhile been digressing into another area that has interfered with my primary goals.

But nonetheless I can report that I’m now at mile 368.9, having covered 21 miles thus far in January. That may not sound like much, given most peregrinos can do 12-20 miles/day but I’ve also done 480 miles in just January on stationary bike or the entire Camino.

So I had planned to do a post when I was around 344 miles, which is then near the cruz de ferro, which as Henri Sebastian (in the movie The Way) says is a place of much significance. For those of you who watched the movie or especially those of you who have actually walked the Camino you know cruz de ferro is a small iron cross at the top of tall wooden pole with a bunch of pebbles at the base. The idea is that pilgrims carry a stone from there starting location and then deposit it along with a prayer. The location happens to also be almost the highest point along the entire route.

It all looks very quaint in the movie but looking at that location via my “virtual” walk (i.e. looking at Google Maps, satellite views and the geotagged photos Google shows; you can search for ‘cruz de ferro’ and see what I’m talking about, I don’t reproduce photos from online sources due to implied copyright) it’s not quite the same as the image of the movie. The site is near a major road and is surrounded by parking lots and picnic areas. The cross itself is unimpressive so only interesting due to its historical perspective. Plus visitors leave a lot of mess at the site so again it’s not so quaint.

Also in the movie a collection of rustic signposts is shown. It turns out that’s just a short distance from the cross in the town of Manjarín (you can search for this to see). It appears to be part of a somewhat bizarre albergue/bar near all those signs, the Manjarín Encomienda Templaria.  That too is a bit less quaint than the movie made it look. So much for fiction.

And this raises an interesting point that I couple with other observations. A “virtual” walk certainly isn’t the same as a real one, but I’ve “seen” enough to get a much better understanding of what the Camino is like. And, frankly, a lot of it isn’t that great. The people who have the spiritual connection to the route don’t care, but for merely a “tourist” who’d like a more physical experience than riding tour buses I now question whether I’d really want to ever walk the Camino.

Or at least the classic (aka French) route. So now I’ve begun to focus on Camino del Norte route. What is still appealing to me is visiting the northern (Atlantic) coast of Spain, from France to Galacia. The country looks prettier (certainly greener) and I think the food would be better. Since my wife doesn’t want to do the walking as a compromise we’ll do part tourist stuff (driving, hitting hot spots like Bilboa) and then some more rural touring in the vicinity of the Camino del Norte and thus have some of the same experience.

But that’s in the future.  Now as to the digressions that are bogging me down.

My original idea was that I could merely focus on a mechanical aid to “translate” the written menus without actually learning Spanish. It’s not that I didn’t want to learn Spanish, I just saw that as too difficult. My sister (RIP) disagreed with my idea and said I should learn the language. So as I recently posted I’ve started to do that since I suspect some conversation with camareros  (waiters) would be required.

But I’m not going to fill this blog with many comments about my efforts. Any reader interested in that language has a lot better resources than I can provide. And my personal issues with it are mostly a digression so I don’t want to fill this blog with my adventures. But I’ll mention a bit.

As I previously posted I found what first appeared to be a good resource for learning a bit of conversational Spanish, which I do think I’d need to be able to order in restaurants. So I’m doing the Duolingo online study and have had decent results, thus far (up to about 600 words now, still struggling with verbs, of course). But as useful as Duolingo is I find that I fairly quickly master their “skills” (aka lessons) but then almost as fast forget most of what I learned. Without repeating some of the vocabulary (or having some other way to practice) I forget.

So, naturally, given an entire lifetime of developing software I began to think about building my own drills. I’ve done this before, several times in fact. Basically I’ve built software “flash cards” but with “intelligent” repetition, where I’ve developed some, not so good, algorithms to maximize drill on the vocabulary (or to some degree grammar) on what I’m not getting. Now learning vocabulary and grammar are helpful but speaking, and worse, hearing Spanish is tough. Duolingo helps a bit for hearing, but Spanish is a language my ear/brain simply don’t get. First of all, most Spanish speakers speak really quickly (this, I’ve found from online sources, is well known in comparison to other languages). And even with Duolingo, the full speed recorded sentences that I have to either translate or simply write what I hear, I miss lots of little bits. I have a terrible time hearing the gender or verb tenses which can be critical. I figure I can botch my pronunciation, as well as gender or conjugation, and probably still be understood, but hearing any response is really going to be tough. But the better I know the vocabulary, without a big mental delay to translate in my head, the more likely I can understand the spoken part. Fortunately there are many Spanish language TV channels in my cable subscription, often with good subtitling, so I have some opportunity, beyond Duolingo, to “practice” hearing, which will be more important to me than actually speaking well.

So, of course I started working on my own software to supplement Duolingo. That does have advantages over just using online courses. To write software one really has to understand some of the structure of the language (“teaching” something to a computer is a good way to find out what I do and don’t understand). So, for instance, I just finished, after considerable study and coding, how to do all the conjugations of regular verbs. And I’ve extracted all the vocabulary I’m learning in Duolingo to put into drills as well. So, IOW, I’ve switched from learning about menus to learning the language to writing code to help me learn the language. Hence, the “digressions” that have diverted my time from my original goal.

But I’m beginning to see the light at the end of that tunnel (plus my coding skills were rusty, so doing my menu translation app will now be a bit easier) and maybe I can get back to my original plan and more, hopefully, interesting posts about menus, instead of my experience with learning Spanish or writing programs.

So stay tuned when I get back on track.

 

Quiero hablar más español

It’s been quite a while since my last post. In addition to all the activities of the holidays I have continued, sporadically, to work on my project that is one of the subjects of this blog. So now I can report some progress.

As a reminder I am (slowly) working my way to develop a mobile application to translate restaurant menus in Spain. To accomplish this I am finding many menus from restaurants in Spain (only Spain to avoid Spanish terms from other Spanish-speaking lands). I translate these using machine translation (mostly Google Translate), then looking for discrepancies in that translation method and using either online dictionaries or Google searches to make better “guesses” about translation. Often terms on menus are not translated accurately (or at all) by machine translation

Once I have accumulated enough raw data (a never ending process) I can create a corpus with Spanish terms and the best English translation I can produce with a “confidence” factor (expressed as a probability). Once the corpus is large enough I’ll write code to extract the best food related (and a few other terms) vocabulary with the highest confidence levels of the accuracy of the translation. Once the vocabulary is “complete” (again a never ending process) I can build my application and then test it on all the menus I’ve accumulated. I’ll judge how well I’ve done this by expecting my translation tool to work much better than other machine translations.

Fine, a useful exercise as someday I hope to actually need to do this while touring Spain, an indefinite “wish” for me. Being able to accurately translate menus, as well as having knowledge of Spain’s cuisine I’d be able to wisely select my choices.

But, my sister, who was quite dedicated to mastering Spanish, albeit focused more on Mexican cuisine, was critical of my approach. Instead of just building an application her strong suggestion was merely that I should just become fluent in Spanish. A fine idea, but one I find very challenging.

Several times in my past I’ve attempted (not very vigorously) to learn Spanish. Since I lived much of my life in California some fluency in Spanish is almost a necessity. I first tried, decades ago, using the best technology then available, i.e. cassette tapes and accompanying text. Ugh. That was a bust. Later as computer tutorials became more common I also tried those, initially using DVDs (as the sound source, later just online voice recordings). These attempts all failed for me.

Why? For one thing I’m not very good at foreign languages. While I studied both French and German in several years of school classes I never got very far with those. My first trip to Germany was a joke at how badly I could either speak or hear. My only real exposure to having to use French was in Québec, during the time when speaking French was a strong “political” issue. I had a bit more success with that partly because everyone, e.g. waiters in restaurants, insisted on French. My stumbling attempts were at least considered a sufficiently sensitive effort that I had some success.

But with Spanish I have a different problem. The sounds of the language are much more alien to my ear – I really can’t hear the words, especially since, it seems to me, native speakers speak very fast and to my ear the words are run together. And, my attempts at speaking were even worse than my attempts to hear and understand. So this has been very discouraging and so I rejected my sister’s urging to just actually learn the language. Additionally I had the joke running through my head that her years of vigorous effort were analyzed by several other people that she had atrocious pronunciation, barely intelligible to a native Spanish speaker. If she couldn’t do it how could I possibly succeed.

BUT, in my effort to translate menus I’ve also found a serious stumbling block. Even with English menus often I need to have some conservation with the server to really understand the menu. And as I translated more and more menus I found this was even more true in Spain. Certainly discussing food with a knowledgeable server adds to the enjoyment of food (another lesson I learned from my sister who was more skilled at cooking than me and through example demonstrated how dining was more pleasant after discussing menu items in some detail).

So I happened to stumble on a new possible learning method. Just happening on an article on the Net about the best apps for “your new smartphone” (naturally timed with the assumption of Christmas gifts) I discovered Duolingo. Previously I’d done the demos with several of the subscription or purchased online tools with little success. But at least: a) Duolingo was free, and, b) it was available for my phone and so I could do the exercises at any time, not just during some study time while on my computer.

So I downloaded the app (both to phone and multiple computers) and committed myself to really giving an earnest effort to learn, at least some basic Spanish. Now, as best I know, traveling in Spain in the larger cities, especially those popular with tourists, probably doesn’t require speaking or hearing Spanish. When i visited Portugal I knew zero Portuguese but managed to get by OK (with some help from hotel staff making phone calls for me). And I managed to get by in both Japan and China, although with considerable help from the people I was visiting.

But my interest in visiting Spain is out in the countryside, initially focusing on the Camino de Santiago (the French route). Now I’m looking more at the Del Norte route since that part of Spain is more appealing to me that the dull plodding through country that looks a bit too much like the Great Plains or Central Valley of California. In such areas I would expect that at least some minimal conversational skill would be necessary. My hope would be: a) I could ask Spanish speakers to speak more slowly and thus hear each word, and, b) that my poor pronunciation wouldn’t prevent them from (mostly) understanding me.

So I’ve now worked as hard as I can on Duolingo. I strongly recommend this for anyone following my blog who might have the same need, especially as it is free (gracias to the community who create these lessons). I’ve made it through 12 days and 12 of the lessons. Duolingo requires a LOT of repetition and thus this forces me to work hard enough at estudio that I actually have made some progress.  Even the sentence I used as the title of this post would have been impossible for me prior to Duolingo.

In the first part of each exercise Duolingo introduces one to vocabulary (and without the more academic approach to grammar, i.e. simple conjugation of verbs). Then the exercises move more and more to responding to spoken phrases or sentences by: a) writing what was said in English, and, b) much harder, writing what was said in Spanish. Each exercise gets steadily harder making it difficult to “guess” and thus requiring actually learning something, especially when one has to actually type the Spanish (from an utterance), especially being picking about getting gender and verb conjugation right. The sheer repetition is working for me.

Despite my best progress ever attempting to learn Spanish I: a) still find it difícil to “hear” the utterance spoken at full speed.  I often either cannot hear the spaces between words or miss subtle bits (I really have trouble hearing una vs un). But since I must get every drill question right before I can proceed I muddle through. So thus far Duolingo reports I’ve now encountered 308 words (many useless for my purpose, also they count each version of a verb as a separate word). Thus far, as far as verbs go I’m still only in the present tense and with the singular persons (figuring out at usted is third person like él or ella was fun since Duolingo mostly uses the informal second person tú  as ‘you’, which often would be rude for me to use in conversation).

While Duolingo focuses on conversation instead of the typical more “academic” language study (all the grammar details, especially conjugations) I’ve done more exploration with other tools (especially spanishdict.com and Wikipedia) to go beyond the Duolingo simple lessons. I’m accumulating some of my own “lessons” to supplement the Duolingo lessons.

Now another challenge for me is that I’ve also learned, in past language learning efforts, that I’m fairly good at immediate duration memory. So while I’m intensely involved I learn to recognize many words. Unfortunately weeks later I’ve forgotten most of those. So, with Duolingo I actually repeat finished exercises to continue repetition which is key.

BUT, repeating everything is time-consuming and not that helpful. The real repetition I need to do is the vocabulary (or sometimes grammar) that I do badly. So now I’m thinking about another bit of programming for my own learning tool.

Once before I built a fairly complex bit of code to extend my English vocabulary. Using something built into Kindle I would mark English words that I either didn’t know at all (like reading more “academic” texts that use more esoteric vocabulary) or that I wasn’t really sure about. Kindle had a drill application that accumulated the words I’d mark as I encountered them in some book. But the Kindle drill, like Duolingo, wasn’t very “smart” about focusing my drill time on the words that gave me the most trouble. So in my own app I developed a scoring system that adjusted my drill to the words I most often missed and also then made sure all but the easiest (for me) words were at least repeated some. I spent a lot of time tuning how that algorithm worked but never was completely satisfied with it.

So with Duolingo as a model (incomplete for what I need) and all my past efforts at learning languages I soon will begin to build my study app (a fancy version of the classic flashcards, especially for verbs and gender). I can move all my Duolingo vocabulary to that app, plus much of what I’ve accumulated from menu study, plus just grabbing more words not found in either source from either: a) various lists I’ve found of the “most common” Spanish words, or, b) from going through a couple of dictionaries, tourist phrase books and grammar books I’ve purchased for my Kindle.

Eventually I would expect my drill app to be sufficient to potentially get by in parts of Spain where I might not find any English speakers. One thing I have learned from my foreign travel is that travel itself (public transportation, getting directions) often requires speaking to people who don’t know English (say, unlike typical tourist destinations, i.e. city hotels, museums and restaurants).

But all this is just a start. I know, largely from my experience in Québec that “immersion” is the real way to learn a language. To be someplace where there is no English mandates that I at least stumble through some sort of conversation to get what I need. Mi esposa loved her weeks in Oaxaca and wants to go back (which I’ve resisted) so perhaps I’ll give in and make the trip she wants as preparation for Spain (just as Québec can be a shorter preparation trip for going to France).

So, I won’t belabor this point much more in posts since I’ve focused this blog on food in Spain and the Camino. My efforts to learn a language are probably even more boring to my readers. But I will supplement some of my posts purely about food terms with a bit more of the conversational stuff I pick up through this other study.

 

 

a consultar about cecina

Even though I’ve now marched past León on my virtual trek I’m slowly plodding through the restaurant menus I found there. One menu, for the restaurant attached to Royal Collegiate of Saint Isidoro Hotel, has an English version as well as the Spanish. This is relatively rare and provides a unique opportunity to compare online machine translation of Spanish to the same material written in English. Of course, and as I found, the English text on a webpage may be different than the Spanish; after all it is aimed at a different audience and probably is not just a translation from the Spanish. Nonetheless a careful analysis may provide some interesting clues.

So I’ll start with a menu phrase, a consultar, which appears in three places (Spanish in first column, Google Translate in second, English from the website in third):

Pescado del Día (a consultar) Fish of the Day (to consult) Fish of the Day
Postre del día (a consultar) Dessert of the day (to consult) Dessert of the day
Domingo: Arroz / Fideuá (A consultar) Sunday: Rice / Fideuá (On request) Sunday: Rice / Fideuá (To consult)

Now consultar is a typical Spanish verb which has various meanings (the sense of the literal translation (in black) is marked in green:

  1. to consult (to seek advice from) (to refer for information to)
  2. to discuss with (to talk about)
  3. to look up (to look for)

or (Google translations of Spanish definition in green)

  1. Pedir información, opinión o consejo sobre una determinada materia (Ask for information, opinion or advice on a certain subject)
  2. Buscar información en una fuente de documentación (Search information in a documentation source)

Note that Google translated this differently as either ‘to consult’ or ‘on request’. Now to my sense the ‘on request’ makes less sense, either compared to dictionary definitions or that  por encargo is more common on menus for ‘on request’. Unfortunately the author of the English part on the website doesn’t provide an English equivalent in two cases and ‘to consult’ (the most literal translation) in the third.

So we’re really left without a good English equivalent. I would submit ‘ask your server’ as the common phrase you’d see in USA for these items. IOW, the X del día is a common phrase (less so in Spain) and ‘of the day’ in the USA. In most cases it means what the chef was interested in making today or what ingredients might have been available. So the customer can’t know, from the menu, what the item is and thus has to ask (btw, I don’t think this is the same as the “specials” often rattled off by servers so that wouldn’t be my preferred translation.).

So if I’m right (and I am getting the context right, if not the translation) this presents another interesting flaw in my project. There is NO way to read the menu and determine what this item is – you will have to speak to the server or the chef to find out and, of course, that requires some amount of fluency in both speaking and hearing Spanish (perhaps another type of aided communication app on a smartphone might work but unlikely the server would know how to use it; I tried this in China and totally confused a cab driver). My sister dismissed the idea of my project in lieu of just learning to speak and hear Spanish conversationally and maybe focus a bit more of restaurant and food vocabulary. I think this is a fine idea, but: a) it takes a lot of work I’d prefer software to do, and, b) I’ve actually tried and for some reason, despite modest fluency in a couple of other languages than English I just cannot hear Spanish (the sounds and the speed really confuse me, I watch movies with subtitles and rarely “hear” words I even know and know, from the subtitles, were in the audible portion). And like the jokes some more Spanish fluent people made about my sister my pronunciation would be awful and at minimum irritate a native Spanish speaker or very likely totally confuse them. So I have to try to continue on my path of using software (not brainware) to navigate menus. Perhaps I’ll just have to skip the del día items or perhaps see them on another table and point.

So on to cecina.

This is a common item on menus I’ve encountered before but it tends to be more feature on menus in Castilla y León. In fact this geographical interest is so strong there is also the specific Cecina de León, an IGP (Indicación Geográfica Protegida, EU equivalent protected geographical indication).  This specific item even has its own website (https://www.cecinadeleon.org/) explaining how it must be produced.

It’s not actually a mystery of what this is (although for a long time it was unavailable in the USA; oh, and now it appears actual cecina from Spain is still not available in USA so this is an imitation made in the style of León) but now you can buy it online where it is described:

Tender sliced cured beef with a deep red color and rich smoky flavor is León’s answer to jamón. This is cecina, a premium cut of beef cured with sea salt and smoked over oakwood with no preservatives. Cecina is Spain’s culinary secret, just as worthy of culinary acclaim as Spain’s famous hams. And like jamón, over thousands of years the people of Spain have transformed the curing of beef from a necessity to an art, creating a delicate, flavorful meat unlike any other in the world.

In another article I was saw it described as ‘chipped beef’ which would possibly be close but certainly an insult to this seriously expensive dried meat.

So, what should the translation be? Or is this one of those terms, say like chorizo or lomo, that you just have to know what it is?

But Google thinks it has the answer. Most of the time (and often it doesn’t translate cecina at all) Google thinks it is ‘jerky’. While the official description about its elaboración (method/recipe of production) has various similarities to most recipes for making jerky the best descriptions I can find is that jerky is not that equivalent.

So what does the English version of the menu at this restaurant say? Here are a couple of references, again with Spanish in first column, Google Translate in second and website English translation in third:

Ofrecemos servicios de corte de jamón/cecina, quesos artesanos al corte, cervezas artesanas… We offer ham / cecina cutting services, cut artisan cheeses, craft beers … We offer professional ham / beef jerky cutting services, sliced local artisan cheeses, craft beers and more.

Note that in this case Google didn’t translation cecina at all but the website does refer to it as ‘beef jerky’ and the human translation otherwise seems very close to the original Spanish.

And another reference:

Lunes: Salmorejo con Cecina IGP. Monday:  Salmorejo with Cecina IGP. Monday: Salmorejo with Smooked Beef  IGP.

Note that ‘smooked’ is in the menu itself as is another typo ‘Thuesday’ which certainly makes it look likely this is the work of a person.

And then our final reference:

El menú del cabildo es una
salmorejo de tomates de mansilla con cecina IGP, puerros de sahagun, escalibada de pimientos del Bierzo…
The menu of the cabildo is a
salmorejo de tomates de mansilla with cecina IGP, leeks of sahagun, escalivada of peppers of the Bierzo …
The Cabildo menu is a proposal ‘Salmorejo’ or cold-tomato soup made with local ‘Mansilla’ tomatoes and beef-jerky, ‘Sahagun’ leeks, ‘Escalivada’ or roasted vegetables on flat rustic bread and made with local ‘Bierzo’ peppers…

So here we see beef jerky again. So either the author believes calling it jerky will best describe it to an English speaking person or they had to use some dictionary lookup, which, btw, lists: ‘smoked’, ‘cured’ and ‘salted’ meat (each as a separate term when the elaboración explains ALL these steps are involved in creating cecina).

Now the imitation online stuff refers to cecina as “The “beef version” of jamón” and the picture shows a solid piece of meat whereas the elaboración  is quite clear the meat must be thinly sliced before any other processing so a solid ham-like chunk certainly doesn’t match the IGP definition.

And, finally, our sometimes reliable English version of Wikipedia adds this information in its description:

is made by curing beef, horse or (less frequently) goat, rabbit, or hare

Emphasis on ‘horse’! Since I’ve also found this item on a different León menu: Cecina de Burro. Now burro might be a brand or a geographical reference but it might also be, in fact its literal translation, ‘donkey’.  Pure beasts, work in the hot sun and when they’re worn out they end up on the table – no thanks.

So finally I might end up calling cecina “thin slice of mystery meat cured in salt, then dried (by heat or sun) and (usually, but not always) smoked”. So I think a consultar ties in nicely with cecina and strongly recommends spoken fluency to find out what you’re eating (or at least know the phrase ¿Qué animal es este de.

Too many menus, too little time

I’m only about five miles away from León (on my virtual trek, previously mentioned) where I’m bound to find a lot of online restaurant menus so I’ve been rushing to finish my list from the city of Palenica. I can work on the menus in bits and pieces, extracting and formatting the material into my source files and then analyzing the entries, doing lookups and searches on terms that machine translations handled badly. This isn’t easy and beyond mere mechanical, sometimes, but I can pick it up and put it down, thus squeezing this work into crooks and crannies of my day.

But the real work, actually generating a corpus and then, even more, creating the software to collate all this and actually create a Spain food translator that is far better than the extant machine translations requires a really concentrated effort and so I’ve essentially done none of this. I have to remember what it was like to work hard all day long on this kind of task, day after day, as I did when I was in a real job of software architect. But I find I can never get around to this for a “fun” project.

In between is writing these posts. I can’t do that in bits and pieces either. While a post is a shorter task I still require some concentration and focus, plus usually even more research. But that’s the good part. My quick cursory analysis of menus is sufficient to find specific translation issues for posts and thus, wanting to get it right in the posts, the need for more careful research and conclusions. And even though this may only be a few hours it’s hard to get that hunk of uninterrupted time. So my posts have really been infrequent.

I write the posts as part of a discipline to do this work more carefully. Knowing someone might notice my mistakes and then (and I’d love it if they did) comment as to my mistakes forces me to be more careful. Plus, sometimes, I try to tell more story than just the translations and that even enriches my data collection more.

So posts are great to do (and hopefully of some interest to you, Dear Reader) but it’s hard to get them done.

I have material for at least six posts about the menus from Palencia that I’ve studied. I really hope I can apply myself and get these posts done before I start digging in León menus.

So here are some restaurants you might find interesting. There were 159 restaurants in my starting list but I only looked at the ones with real websites (the Facebook sites are useless to my purpose and frankly, IMHO, worthless to a potential customer). Many of the websites then have little information and especially lack menus. Then often the menus are in two formats I just barely can use: 1) just images (i.e. no text to extract from browser so have to manually transcribe, hard to do accurately) or, 2) PDF’s. While I can usually (not always) get text from the PDF’s it: a) takes a lot of manual post-processing to organize, and, b) then it’s not easy to get Google translations (I have to build my own temporary webpage from the extracted and processed PDF information to let Google chomp on it), and, c) using Microsoft’s translation within MSWord is both a bit clumsier and overall somewhat inferior to Google (although in some cases it is better as well).

So my criteria for looking at restaurants in the following list has little to do with any sense of their quality or interesting cuisine. BUT, that said, usually I’ve found what appear to be the better restaurants often also have the better websites. I encourage them (not that any of them will be listening) to put more work in it. Perhaps for local clientele websites are not very important but for tourists I believe they’re beginning to be critical. I have another post about how I was persuaded to recently visit, even going out of my way, a particular restaurant in Ohio solely on the grounds of its website, although later learning it was also “rated” as one of the best in Columbus. And while pretty pictures of the food and glowing descriptions are nice online menus are far more important, again IMHO, for “selling” your restaurant to new customers.

So here’s the list I’ve processed, hopefully with stories to come when I can find the time for posts.

Bar Comedor El Garaje http://barelgaraje.es
Bar El Cobre https://barelcobrepalencia.es/
Casa Pepe’s http://casapepes.es/
Dominos (just wanted to compare to both US menus and local restaurants but some new vocabulary did appear) https://www.dominospizza.es/carta-de-pizzas
El Majuelo http://www.elmajuelopalencia.es
El Rincon de Istambul (interesting since they focus on Turkish food and so had non-Spanish items I had to look up) http://rincondeistambul.es
Gastrobar Donde Dani http://gastrobardondedani.es
Habana Cafeteria (interesting that a cafeteria has different selection which revealed some new terms) https://habanacafeteria.com
La Barra de Villoldo https://labarradevilloldo.com
Ponte Vecchio (interesting since they focus on Italian food and so had non-Spanish items I had to look up) http://www.pontevecchio.es
Restaurante – Cerveceria Las Hurdes http://cervecerialashurdes.com
Restaurante Asador Palencia La Encina http://www.asadorlaencina.com/es/palencia/
Restaurante El Brezo http://www.elbrezo.com
Restaurante La Cantara https://restaurantelacantara.com
Restaurante La Traserilla http://www.latraserilla.es/
Restaurante-Bar Mano http://barmaño.es
Restaurante-Cervecería Moesia https://moesia.es/

 

At home menu to translate

Sometimes one doesn’t have to leave home to encounter menus that need translation. In this case the menu is German, not Spanish and in Omaha Nebraska, not Hamburg Germany where the chef trained for several years. One of our favorite restaurants, Dolce, has an inspired chef Anothony Kueper. He loves his usual menu but also loves to do special menus which he emails to his loyal fans.

In this case it turns out it was his wife’s birthday. And she is from Germany and much of her family came for her birthday. And Chef Kueper worked several years in a one-star restaurant in Hamburg (that gained its second star while he was working there). So it became his task to create a special menu, with wine pairings, for his wife and her family and then share it with his loyal customers.

Now frankly, originally I was completely unenthusiastic about this when my wife wanted to do it. I’ve made both business and recreational trips to Germany, and, well, uh, frankly, I wasn’t impressed with the cuisine. In fact, in my last trip for a week in Köln we ate most often at an Italian restaurant run by Bulgarians instead of the German selections.

But I was blown away by Chef Kueper’s dishes. As one of a few tables trying the special menu the chef came out to explain each dish. Of course, local is a big deal and it turns out via my wife’s connection to the state agricultural organization had actually visited several of these local suppliers. Being able to converse with the chef wasn’t critical to the meal (since the menu was fixed and we had no choices to make so translation didn’t really matter) BUT it certainly made the meal more interesting.

But the point of this post was my attempt to actually figure out what the menu items were! I know just a tiny bit of German but had little success reading the menu (like I got rotkohl and obviously spätzle). AND, critically, despite having considerable time between courses using a smartphone and its available resources only helped a bit in decided what the menu items were. When they actually arrived and were explained by Chef Kueper there was only a limited comparison to what I found on my phone, not contradictions per se (mostly) but just inadequate descriptions online. Had we had to make choices, especially with limited time to study a multi-item menu it would have been tough.

Since my blog is about food in Spain a bunch of German translations are irrelevant but just for fun I’ll list the items in the excellent meal we enjoyed:

Jakobsmuscheln · Zwiebeln
Saffran Soße · fritierter Spinat

Königsberger Klopse
Servietten Knödel · Kapern · Sahne

Tafelspitzsülze · Frisee
sauce vert · Ei

Spanferkel · Spätzle
Rotkohl · Apfelgelee

Schwarzwälder Kirschotorte

You can have fun trying to figure this out. The second item was amazing and the spanferkel (local, a supplier we’ve visited) was outstanding.

 

Back to work – lists

As I don’t have any more travel planned I can get back to work, perhaps with a renewed effort. So I returned to looking at lists, at least three I’ve found and with more to go. Lists come as: just translation of terms in English and Spanish, glossaries and dictionaries where dictionaries supply an actual definition and glossaries sometimes just provide translation (where literal is possible) or definition otherwise. The Net is full of these but using them can be a challenge. Also I’ve usually looked only at these lists where the terms are Spanish but the translation or definition is in English. It’s more interesting, although more work, to get the lists entirely in Spanish. And ideally as apply to Spain rather than anywhere Spanish is used.

So in my first attempt to build up a translation dictionary I only used lists I could find. It never dawned on me to use purely sources in Spanish and in particular menus, but of course machine translation has advanced a lot since my V1.0 attempt years ago so now sources entirely in Spanish and especially as applied to Spain are my primary sources.

But lists provide a lot information in a hurry. And despite the issues they often provide terms that are unlikely to be found elsewhere. But the biggest issue is that whole thing of Spanish throughout the world versus Spanish gastronomy terms for Spain. As I’ve mentioned tortilla is common in western hemisphere but something entirely different than you’d get in Spain even if the menu does say tortilla patatas. Now where lists might include New World terms not used in Spain it’s just a waste of time, at least for my purpose to process them. But when they conflict in meaning between Spain and elsewhere that is a problem.

So I’ve been crunching through three lists. Finding more lists is a lot easier (at least until I’ve found most of them) than processing the lists, especially when the lists are entirely in Spanish. Plus some types of webpages are hard to “mine” (also known as scraping when code is doing it). Web authors design pages to be most useful for their intended audience and not for someone accumulating a corpus. And even when I’ve processed lists I have to be careful with the whole copyright issue. If I published (except in the fair use case, i.e. a small sample with attribution) any substantial portion of any list I find that is improper. But since my real notion is accumulating a large corpus from many sources and then basing my final translation vocabulary on a meta-analysis of many sources I think I should be OK. Also whenever I only have a term translation from a single source I need to be suspicious of the accuracy of that as well.

So thus far I’ve looked at: 1) the Gallina Blanca Diccionario which is from a website in Spain representing a food company producing packaged products for Spain markets and supplied the diccionario to aid their users of the recetas they also provide; this has Spanish terms and definitions in Spanish but does not apply, at least exclusively to Spain; 2) Nitty Grits, a glossary with Spanish terms and English definitions, not exclusive to Spain, but as I learned after crunching through most of it each term is clickable and often (not always) then indicates where this term is used; Nitty Grits is a large list and allows me to get fairly unambiguous definitions (since they’re in English) and avoid the often incorrect machine translations (such as occurred in Gallina Blanca); and, 3) now I’ve return, since doing some work by in May to a complex website, ARecetas, a recipe site that then has multiple glossaries especially the largest and most directly useful, Glosario de Alimentos.  And there are more I’ve found but haven’t yet crunched through at all. Of these ARecetas glosario is the hardest to process so I only briefly looked at in May and instead focused on Nitty Grits. But for several months Nitty Grits was not operational (at first I thought they might have blocked me but that was not the case).

Anyway now I have more issues having finished two of these sources and now resumed work on the third. First, the way I’ve extracted information (often a tedious process) is inconsistent between the three lists (meaning the tables I created in MSWord manually). Second, my notion system was inconsistent, i.e. I annotated much of what I found with no particular notation as to what is original source text and then my annotation. These issues meant I can’t possibly consolidate the three lists manually. So I had started some code to create a consistent format across all lists (in XML which is more robust than just text in MSWord with a few fonts and colors). I was able to do Nitty Grits fairly easily but ARecetas and GallinaBlanca are toughter, i.e. it’s not just code I need, but I have to go back to the manually compiled lists and use consistent inline markup so the code can parse all entries to the common XML I want for all three lists.

Now I need to finish ARecetas (and perhaps also some other smaller sites I found and also do a thorough job of searching) before moving on to the real world. Once I can convert each list, with my annotations and markup, to a consistent XML structure then I can attempt a “merge”. Once that is done I can then look for agreement or disagreement between the sources (as I processed them) and start fixing errors or doing more searching to get more accurate answers (although without wasting much time on non Spain terms).

People who compile lists usually have some other work. They usually want to get their list with minimal effort to achieve their purpose. Simply put, this means they make mistakes, sometimes even blatantly obvious to simple analysis, sometimes more subtle. I’m well familiar with this from my career, a concept of “good enough”. No compilation of information is ever perfect anyway so it’s more a question of how good does it need to be for the intended purpose versus how much work (usually measured as cost since some paid person is doing the work). So online lists have many flaws. And it’s not just online lists. I’ve bought a few books about food in Spain back in my V1.0 effort and these books have inconsistencies and errors (where error means they disagree with other sources). I’ve looked and I’ve never found a “best” or even highly accurate and comprehensive source.

And that’s part of why I’m even doing this project. Unlike the other people creating materials, either free on the Net or in for-sale published works I don’t have a cost issue with my work. As I’m retired and unlikely to ever even be a temporary consultant the marginal value of my time, measured in money, is zero. Therefore I can spend an infinite amount of it trying to be as accurate and comprehensive as I can be, even (and that would be fun) doing original field research, i.e. actually going to lots of restaurants in Spain with some consultant I could hire who’d be fluent in Spanish and cooking (then the bills do add up). So at least my “free” effort is just a question of how much work I wish to put in it.

So I do believe, despite my lack of fluency in Spanish language, it is feasible that I could compile the best list, meaning the most comprehensive and accurate. Of course my list would have mistakes too but I think it could be better than any I’ve seen. AND, if I write good code to does the bulk of the work consolidating the raw materials for my corpus and then extracting I should have an easier time making corrections, especially as my targeted application is either machine-generated webpages or a smartphone app, i.e. updates should be possible once I actually get feedback (too many sites or apps fail to take advantage of the knowledge of their users to provide very valuable feedback to constantly improve the product, either its usability or its underlying database of Spain culinary terminology.

So I hope to get back into it and finishing these three lists would be a critical milestone because then I can really get down to designing my corpus and the code for importing and consolidating and proofing the information in the corpus.

cata de vinos

I’ve been spending a lot (too much?) time trying to mine Spanish terms associated with wine. Discovering a large list of these is only somewhat useful for reading menus in Spain which is the primary purpose of my project. But sometimes you look where the light is, not where your keys are (this is a cliche in USA, perhaps not obvious to others).

Anyway cata de vinos is not quite what it says literally. The literal translation is simple – ‘wine tasting’, something rather obvious that any of us do when we drink wine, at a restaurant or at a party or wherever. BUT, there is a more formal meaning which is spelled out in this Spanish language Wikipedia article.  This is the kind of tasting “professionals” do to write all those articles (or a description of a particular wine on a menu) in all that wonderful (and frankly somewhat snobbish) wine jargon.

Any kind of tasting that involves comparative analysis requires training but also requires a vocabulary that can be fairly precisely defined and used by different tasters in the same way. We amateur wine “tasters” often don’t really know these terms.

I was surprised to find a number of fairly detailed sources, in Spanish (both the terms and definitions) covering “official” cata de vinos. While many of these terms would not have a precise (or sometimes any) meaning to us amateurs it’s still worthwhile to attempt to dig them out.

So this has been a long duration for me doing this since I found such rich and extensive, but difficult to process sources. By now I’d hoped to provide a more complete post on this subject but I’m still not done so this is just a fragment to demonstrate some of the issues of decoding vocabulary like this, especially for a non Spanish-speaker.

The source I’ll discuss here is Vocabulario del Vino that is reached by the Glosario tab at a site © 2011-2017 Enominer.  Try as I have I can’t actually figure out who/what Enomier is! (no translation I can find)    It is a web domain name as per https://www.enominer.com/ but it doesn’t have an About… to actually figure out what this is. I suspect it’s a publisher of magazines about wine but that’s just a guess. The page name containing the glossary is diccivino.html which, again I’m guessing, I think just a contraction of diccionario and vino. And in the many searches I’ve done trying to expand on the definitions here I seem to have encountered very similar lists at other URLs so despite the © at this site (no idea if it really is their copyrighted material or a copy from elsewhere) some/all of this glossary is published elsewhere on the web. Which, btw, doesn’t help me when I search to just find what I already have as text from this glossary. The sub-heading under the name at this site just says:

cultura del vino, desarrollo rural y ciencias de la tierra Wine culture, rural development and Earth sciences

As explanation of their glossary the webpage explains that it is presenting a formal terminology.

Toda ciencia o materia cuenta con un conjunto ordenado y sistemático de términos y de su correspondiente significado.

La viticultura y la enología no son una excepción.

Aún siendo comúnmente admitido que la cata de vinos es una acción de los sentidos que aprecian sensaciones de aromas y sabor con un contenido más subjetivo que objetivo,
no es menos cierto que hay un conjunto de normas y reglas no escritas que permiten traducir las apreciaciones sensoriales que influyen principalmente en la cata de un vino (vista, olfato y gusto) en valores que pueden comprobarse de una forma objetiva.

All science or matter has an ordered and systematic set of terms and their corresponding meaning.

Viticulture and winemaking are no exception.

Although it is commonly accepted that wine tasting is an action of the senses that appreciate sensations of aromas and flavor with a more subjective than objective content,
it is no less true that there is a set of rules and unwritten rules that allow the translation of sensory appreciations that influence mainly in the tasting of a wine (sight, smell and taste) in values ​​that can be checked in an objective way.

They divide their glossary in four sets:

Términos relativos al color Color-related terms
Términos relativos al aroma. Terms related to the aroma
Términos relativos al sabor. Terms related to taste
Otros términos. Other terms

So I’ve been churning through these using both Google and Microsoft to do the translations. So as a fragment of this work here are a few terms (from the sabor/taste set under R):

rancio

Vino oxidado, licoroso y seco. Es un defecto en los vinos de mesa, pero no en los vinos generosos.

stale Rancio

Rusty, dry and dried wine. It is a flaw in table wines, but not in generous wines.

Oxidized wine, liqueur and dry. It is a defect in table wines, but not in generous wines. 

Purple text is the Google Translation and black text is the Microsoft (inside MSWord translation). Note that Google doesn’t translate rancio to ANY English word. This has been common in analyzing the cata terms as many don’t seem to have a direct English equivalent and thus require a lot of research to make a guess. Microsoft picked ‘stale’. Looking at my usual two online dictionaries, spanishdict.com and Oxford I get a variety of English terms for rancio:  rancid (the obvious cognate), mellow (interesting this is the wine sense), ancient, long-established, stale (bread sense), antiquated, old-fashioned, sour and unpleasant. That’s a lot to choose from to decide what rancio means in the cata sense; IOW, how would a professional taster apply this term and if they were also fluent in English what English term would they use?

So we look at how it is defined. In the first phrase of the definition:

Vino oxidado, licoroso y seco.

Google and Microsoft have some significant difference. MSFT translates oxidado as ‘rusty’ (a valid dictionary literal translation) but Google uses the more appropriate ‘oxidized’. Even a somewhat amateur taster like me is familiar with ‘oxidized’ as a flaw in wine and ‘rusty’ is a chemical oxidation process but not likely to really apply in this case.  Likewise for licoroso  MSFT and Google disagree and in my research I think both are wrong (although Google’s liqueur  is closer.  licoroso is a concept that doesn’t really have a single English equivalent, only a definition which is ‘strong; of high alcoholic content’.

So we still haven’t quite got this figured out but the critical clue lies in the next sentence and the words vinos generosos. Both Google and Microsoft translate this literally (generous wines) BUT in this case this is a very specific word pair that really means a type of sherry as explained in this source which indicates generoso is a regulated term of Consejo Regulador.

Now actually this issue (sherry versus table wines) has occurred many times in studying the cata vocabulary.  I’ve learned that Spain is actually the leading wine producer (by volume) in the world, surpassing both France and Spain and also easily California (which as a former citizen, to me, is US, when it comes to wine). Simply put the fortified sherry wines are quite different from the lower alcohol table wines and thus tastes, aroma (bouquet) and color attributes can be quite different.

So in this case this source is telling us that an acceptable (possibly desirable) taste in sherry is not attractive in table wines BUT it is hardly the same as rancid (I doubt even in sherry this is good) or oxidized or any of the other translations of rancio. So if I were forced to pick an English equivalent I would go with ‘mellow’/’ancient’. And this shows the problem – these words don’t really describe this taste but none of the other translations do either.

In short, especially trying to understand the specialized vocabulary of cata de vinos you really have to have experience tasting, in Spain, in the context of all the wines available in Spain. It’s basically not possible to translate this over to English.

And since rancio looks a lot like rancid so a non-Spanish speaker who saw this as a term describing a wine it’s unlikely they’d try it, which, according to this, they shouldn’t if it is table wine but should if it’s sherry.

I had planned to discuss several other R taste terms but this post is already too long so I’ll merely mention one more:

retronasal

Es el aroma de menor intensidad que el olfato que se percibe por vía interna desde el paladar cuando respiramos por la boca con una pequeña cantidad de vino en la cavidad bucal.

Aftertaste Retronasal

It is the aroma of less intensity than the smell that is perceived by internal way from the palate when we breathe through the mouth with a small amount of wine in the oral cavity.

It is the aroma of less intensity than the smell that is perceived internally from the palate when we breathe through the mouth with a small amount of wine in the oral cavity.

Again the stuff in purple is Google’s Translation. Interestingly Microsoft actually picked a translated English word (aftertaste) for retronasal. But to my eye retronasal doesn’t even look Spanish at all and thus might be a loanword from English. In fact it is. But what does it mean? Actually finding a description of this in English wine tasting sources shows approximately the same thing as the translation (almost identical between Google and Microsoft) of the definition.

The funny thing is I didn’t know what retronasal meant BUT I’ve actually done exactly what it’s definition describes (if I was told this term I’ve forgotten but I don’t believe I ever knew it). Not long after moving to California and just as California was becoming a major player in wine (hard to believe it once was poorly regarded, decades ago) I took a course on California wines and how to do tasting at a community college in the Bay Area. We were actually taught how to do this – take a sip, hold the wine in your mouth, open your mouth slightly and breathe in. The sensation one gets is entirely different than just tasting (mouth closed) or the aftertaste (breathing in after swallowing). And if you’ve ever watched a professional tasting you see the tasters doing this (and of course, also spitting out the possibly very expensive wines they’re tasting).

Anyway this diversion in my project has taken a lot of time and hasn’t provided a great deal of material to put in my corpus for my menu translation app but it has certainly provided a lot of opportunity to see challenges in translation.

So I’ll leave you, Dear Reader, with a couple of quiz questions.

aguja

Vino con contenido carbónico perceptible al paladar y visiblemente observado al descorchar la botella. El gas carbónico procede de su propia fermentación y da sensación picante y agradable

needle

Wine with carbonic content perceptible to the palate and visibly observed when uncork the bottle. Carbon dioxide comes from its own fermentation and gives a pungent and pleasant feeling

quebrado

Vino alterado por las quiebras, que afectan al color.

broken

It was altered by bankruptcies which affect the color.

What English equivalent would you use for aguja and quebrado?

And there are about 50 more of these just in this source!