A blogging dilemma

I’m using this blog (partially) to “document” interesting tidbits I encounter while doing research for my anticipated smartphone app to translate menus in Spain. That app needs to have a comprehensive and accurate dataset to use in the translation, not just the equivalent English term (which doesn’t always exist) but also some description. For example, what is sobrasada? Yes, it’s ‘sausage’ but saying that (or even ‘spicy pork sausage’) doesn’t tell you very much.

So I’m using various sources to build up a “big data” corpus which will have translation errors and other errors. But algorithmically I can extract from that corpus what I’ll need to power the app. But I have to build that corpus manually, often exploring “puzzles” I find in trying to figure out a proper equivalent in English for some culinary item I find in Spain (btw, I am focusing on Iberian Spanish and trying to prevent terms only found (or used differently) in the New World from defocusing my corpus).

So I’m doing several things with these posts. First they are a kind of journal (or lab notebook) for various translation/description puzzles I try to solve. While I have many MSWord files with the raw work the blog posts highlight some interesting (at least to me) bits. Second by writing for potential readers I have to work a bit harder to try to have my posts accurate and at least somewhat coherent (instead of the real-time stream-of-consciousness in my raw material). This more careful writing makes the posts better but does have a real downside – it’s SLOW. It might not seem like it to you, Dear Reader, but I probably spend more time writing a post about something interesting in a menu than it took me to decipher the entire menu. So at some point the blogging gets in the way of my work.

But the real “dilemma”  I have is that I just don’t get the posts done, at anywhere near the rate I’m discovering the tidbits I want to write about. And days later when I go back over my raw data I often can’t recreate my thoughts or discover I forget to include links or definitions or whatever and don’t much feel like repeating my work.

My posts are fairly long which is good and bad. It’s good because I try to weave multiple points into a post, often with some background research. It’s bad, because the posts are probably too long for most readers’ attention spans and because I don’t get them done.

So every now I’m tempted to do short posts, literally for each situation I encounter, rather than trying to organize multiple examples into a single post.

For instance, I’ve started looking at a new source. Previously I’d used menus I could extract from restaurant websites along the course of the Camino de Santiago, and several online glossaries and dictionaries. But I’d also stumbled on many sites (focused on Spain and entirely in Spanish) for recetas (recipes). These are more tedious to process but often contain information I don’t find elsewhere and therefore can stuff in my corpus so potentially less frequently used (in menus) terms are still incorporated.

So I just started a small trial to look at this recipe site. Under its recetas tab it has 14 categories, and under Pasta y Arroz (pasta and rice) there are 15 webpages with about 12-16 recetas per page. IOW, this is a lot. And every receta is presented on the webpage as a caption (to a photo) where I can use Google Translate and then manually produce a side-by-side Spanish and English pair, such as:

Ñoquis de calabaza y boniato con salsa de gorgonzola Pumpkin and sweet potato gnocchi with gorgonzola sauce

For this I’d extract for my corpus ñoquis (gnocchi ), calabaza (pumpkin), boniato (sweet potato), salsa (sauce), and gorgonzola (gorgonzola). If I double check these term associations by looking in the Oxford dictionary or the DLE (more authoritative, but harder to use than Oxford) I  could add these associations to my corpus with higher confidence levels. IOW, mistakes are bound to get into the corpus without a lot of checking, but I’m also hoping the “big data” type filtering will eliminate the spurious pairs.

But what I just described as the process in this post took me quite a bit more time than it did for me to extract the side-by-side pair (still tedious but relatively quick) and do a quick visual parsing (really looking for any terms that require more research). Note that while I have no fluency in Spanish I do know a bit about the grammar and thus know how to spot parts-of-speech and change the word order used in Spanish to my normal English and thus find the term-by-term association. This entry was simple to do and the only (slightly) interesting part is that the original ‘gnocchi’ does have a different word in Spanish but ‘gorgonzola’ doesn’t (and as a somewhat interesting question, are these “Italian” words or now so incorporated in English, at least by foodies, to consider them English words (known linguistically as ‘loanwords’).

So of the first webpage of pastas this was the most interesting puzzle:

Escudella con sopa de galets, el plato estrella de la Navidad catalana Escudella (in Oxford as -dilla, but some searches appeared with this spelling; is it a typo? here? and on web?) with soup of galets (is this short for galettas?), the star dish of Catalan Christmas

but Oxford has it with a definition (didn’t have translation) in which case it was a specific dish

no, galets appears to be a type of pasta (shells) https://www.tienda.com/products/galets-nadal-pasta-sandro-desii-su-40.html

This is my raw entry. Since escudella and galets appear in the Google Translate as same word in English (i.e. not translated or perhaps there is no translation) this is the type of thing I look for to do more research. When I merely asked Oxford for the translation of  escudella it said that was missing. What it does show (helpfully) is close matches which in this case I tried its suggestion of escudilla (which is bowl and kinda seems to fit this recipe name). So you see the note I made to myself (in Oxford as -dilla, but some searches appeared with this spelling; is it a typo? here? and on web?) but that’s just a start. Since I’ve done this a lot I immediately used the Oxford a different way; instead of asking for translation I asked for definition (of escudella ) and it had this in Spanish (then with Google’s English:

Plato que consiste en un caldo de carne y hortalizas, colado, en el que se cuece arroz, fideos u otro tipo de pasta; es un plato típico de Cataluña, comunidad autónoma de España. Plate consisting of a broth of meat and vegetables, strained, in which rice, noodles or other type of pasta are cooked; It is a typical dish of Catalonia, autonomous community of Spain.

Now I could immediately point out that Google’s translation of plato as ‘plate’ is not correct as plato also means ‘dish’ which fits better but that’s the typical kind of digression I get into that just makes posts take even longer.

Now meanwhile I thought I recognized galets. I did a previous post about the menu from a store selling cookies (as a bit of diversity from just restaurant menus). So I double checked by asking Oxford for the Spanish translation of ‘cookie’ (which is lists also as biscuit in British English) and it has galletas (as I thought I recalled). So I thought this might be some colloquial term for cookie.

But now my “translation” ‘bowl with soup of cookies’ is pretty obvious nonsense and so no better than the untranslated correspondence. So, since this is a new source and I’d already discovered I could click on each receta and get a full page explanation (intro to the disk, ingredients, preparation) I began to see the flaws in my attempt to unravel this puzzle. As the recipe page itself is entirely in Spanish I have the same kind of puzzle, i.e. Google again botched some of the translation. But there is enough text and importantly a picture that I could try some searching and I found galets as an item I can buy online (I’ve often used this source in this project). These look like (in both the recipe picture and the tienda picture as fairly ordinary pasta shells (I don’t see what’s special about them) but pasta shells are pasta shells (except maybe tiny details) so now I’d know what I am getting if I’d picked this off a menu in a restaurant.

So finally I know both these words don’t have English translations so I’d want a different kind of entry in my corpus of a short description and then potentially a longer one. Thus a diner using my app could learn about this dish.

So there, you see what I mean. This post has taken me far longer than the original analysis. Yet it’s good (for my purposes, hopefully somewhat interesting to you, Dear Reader) to have this more complete explanation (I can re-read this post someday when I’ve completely forgotten this and have to resolve something in my app). But if I’d simply written this one item in the most brief form (to jog my memory later, plus at least some glue prose to make it read better than my raw notes) I would have gotten this done.

But it also means I’d probably have many more posts which is mixed benefit as well. So, IOW, there really isn’t a great answer.

So I have a solution. I can use categories to distinguish the posts that are really minimal and that I create almost immediately after doing the work for the corpus. These will really be post “fragments” but at least I get more recorded.

For instance, I was looking at a menu on Friday and its Menu del Dia was for Mother’s Day so I had in mind a post to create on the 5th. But instead I spent most of the day cooking for our Cinco de Mayo feast (and drinking a few too many margaritas). So I never did that post and now the “joke” of it is gone as its timeliness is past.

So I’ll continue to struggle with this, fragmentary and terse posts, or (sometimes too long) complete posts.


Where did I go?

I was generating fairly regular posts but then dropped out of sight for almost two weeks – what happened? Well I’ve been out of town and thus mostly offline, south to Oklahoma. It’s not that Oklahoma doesn’t have the Net – I was just busy and my work on food terms in Spain is on a computer back home so I had nothing new to post.

Oklahoma is a long and not very interesting drive from Nebraska with most of the distance in Kansas. To most people the variation is scenery is so slight they’d say it all looks the same (and it has some of the same dusty and dry character of the part of Spain now along my virtual trek on the Camino). But to those of us starved for something to see there is a difference, even several regional variations (e.g. the Flint Hills) on the drive and it is easier to make that drive with brief excursions off the main route.

I was doing the trip to meet with a new attorney to finally start the process in Oklahoma to transfer my mother’s estate to her heirs. Her/our family has had a farm there for four generations. The farm isn’t much, as a farm. It served, many decades ago, as a subsistence farm for the family with most of its production for the family’s own consumption. Some cream and eggs got sold for cash to buy things. But industrialized agriculture, in the USA, has largely driven this type of farm out of operation. Today it serves just as grazing land for a tenant rancher. Much of the land in the immediate area is abandoned for agriculture.

But today the land grows something else – energy. On our 1/8th section (80 acres, sounds large but that is small in USA) there is one wind turbine from a fairly large wind farm (just like the turbines one sees along the Camino as Spain is more advanced in use of wind power than the USA). It was chugging away most of the time we were there (this is the windy and stormy time of year) and every revolution puts some cash in the pocket of landowners. Wind is new, oil and natural gas are old. The new and sometimes controversial technology of horizontal drilling and fracking has drastically increased production. So there is a new well, over a mile away on the surface, that has sent out its horizontal shafts under our land. And these horizontal wells, with a much large collection area (than a vertical shaft) is also a nice income.

That is, if I can ever get the deeds settled. Back when the land was just for low value farming the legal standards of ownership records were less. Today there is more at stake and so the standards are higher. Probably in any multigenerational ownership story, almost anywhere, there are gaps – some probate was never filed with the county clerk, some conveyance deed was properly signed or dated, or some change in marital status wasn’t recorded, or whatever. Everyone (local) knows Person X owns the land but challenged these claims may not stand up. So therefore I will have substantial legal bills and years of chasing lost documents to ever establish ownership (by my mother) which no one challenges. What fun!

Meanwhile the drive, as I mentioned, is fairly boring so we try to spice it up a bit with geodashing. Once upon a time there was no GPS (at all, then for a while it was only massively expensive military technology). I happened to work next door to Trimble who developed the first civilian GPS technology, later made more affordable and so learned of GPS before most people. So when commercial GPS was new and just barely available to the public it was a novelty and a number of “games” evolved using GPS. geocaching is the best known. For a while everyone wanted to rush out to those spots on the globe, known as confluences, where the GPS would read XX.0000 and YY.0000.

There are only so many of those and all that could be found have been. So geodashing  was developed to create artificial and thus sustainable purely random locations to find. And to make a game out of the search. Why? For fun. What is there? geocaching goes to some place, for sure, that another person has been (they left the cache there) but geodashing goes to a completely unknown (to outsiders, obviously locals know it) location. The game insists on not violating trespassing so often the location is not reachable (we must get with 100 meters). So each month when the new dashpoints are published we silly folks doing this game put them on maps and figure out whether they can be reached via public right-of-ways and then, more importantly, if there is any pattern that can allow reaching the most dashpoints with the least driving.

OTOH, when one has a long drive we look for something to break up the monotony by locating nearby dashpoints along the route. The drive from Nebraska to Oklahoma can be done purely on freeways (really limited access multilane highways as one part, the Kansas Turnpike is definitely not “free”). It’s really boring to just see 550 miles of pavement. Tourists drive through the midwestern “fly-over” USA states, especially along I-80 in Iowa or Nebraska hoping to get to the interesting tourist destinations further west, so I-80 looks really monotonous (and is).

But get off the main route, designed for speed, even if a non-tourist part of USA interesting things can be found. Before the Interstate highway system drivers were on two-lane roads that deliberately went into every town along the way. Frankly this is a lot like what I see on the Camino, a route that reaches a new small town every few miles. As in the USA there is some parallel route high-speed highway to go from the major spots, i.e. Logroño to Burgos that bypasses all these towns. But the Camino walking moves at a different pace and that is exactly the point.

And it is the same point with geodashing. There is no there-there at a random longitude and latitude (sometimes there actually is). It is the JOURNEY, not the destination. The slogan of geodashing is “getting there is all the fun” and that’s why we crazy people do this. There are surprises everywhere and interesting things one never even knew existed. Sure everyone knows about Yellowstone or Glacier or Grand Canyon or Yosemite but what is in Templeton Iowa or Arthur Nebraska? Scale is everything and that is part of the appeal, to me, of the Camino. When you zip by at 120kph in a car everything outside is a blur, but passing on foot at 5kph (and easy to stop and look around) the world is different. And driving on a farm road (which here look much like most of the Camino route) at 50kph and being able to stop anywhere since it might be hours before another car comes by is a very different way to see the world.

So the route from Nebraska to Oklahoma is really boring, unless you can get off the main road, if only for a bit, and see something you never expected by going to someplace entirely random. There may be huge historical differences between geodashing and a pilgrimage on the Camino but there is also a lot of similarity.


Post formatting problem found and fixed

I thought I was going to have to do a bunch of tedious work here at WordPress.com and then I discovered the problem.

I had a post (and thought there was more than one) where the entire body of the post had been converted to italics. This is bad since I make an effort to clearly mark Spanish (or other non-English) terms in italics with the English part of the post in non-italics. I noticed at least one post “screwed up”.

I tried various things to recreate the problem and failed to find what was causing this. The body of the post looks fine in the WordPress WYSIWYG editor but is wrong when viewed. I thought I’d have to repost a bunch of posts to correct this and that was going to mess up the history of this blog. But better to have the formatting of the posts correct.

So I started with the most recent post that had this problem. In one window I’d have the bad post open so I could copy and paste its text to a new post. A pain but the only way I thought I could fix it.

Then I saw the problem.

WordPress.com’s editor has a toolbar to select italics for some text in the body of the post. BUT that doesn’t work on the title. So being familiar with direct editing of HTML I used the <i>word</i> in the title, which works to get the Spanish word in italics in the title.

BUT, what if one forgets to include the </i> in the title?

Well, that messes up the HTML WordPress.com generates when they display the post and so italics is turned on but never off, so it applies to the entire post.

So mystery solved and a few posts are now repaired and in their proper (i.e. original) sequence in the blog.

Whew! Glad I spotted this and now know what I have to avoid.

Do I care if anyone is reading this?

Of course, but not very much.

I’ve done several blogs before. Most have gotten more (essentially) random hits than this one which is a bit surprising. It’s not like I expected this to be a trending hashtag, destined for viral spread, but the response has been underwhelming. How can a bunch of political rants get more attention than this (perhaps because the destruction of the U.S. by Trump is more important than Spanish food terminology – you think).

It’s a fair amount of work, in comparison to just keeping some notes and/or some thoughts about interesting things I’m discovering doing this project. But writing them up pushes me a bit. Even if you, Dear Reader, aren’t really out there you could be and I’d feel stupid to write something public that is badly done or especially wrong.

At the same time I’m just having fun with this project and using the observations I make in blog posts to both keep me motivated, keep me sharp, and looking for something interesting in what is otherwise a tedious process. Who knows if I’ll ever finish this and end up with some super app (Android only, sorry Apple makes it too difficult for app developers) to assist real people in real restaurants in real Spain.

But just think there are readers out there keeps me on my toes. Unlike our idiot president I can’t (won’t) just say any stupid thing I think about. I do, in some cases, think I’m being very clever about figuring stuff out. But then, unlike our insane president, I discover I’m wrong. Hey, being wrong is part of life and saying stuff in blog posts that isn’t quite correct is to be expected BUT the important point is to try – at least do some fact checking, at least try to be logical and consistent, at least try to be CORRECT.

I believe that nothing ever disappears in the digital world and like the real world every now and then works that attracted little attention in their day end up, eventually, having some value. With months into this project I’ve done a lot of searches to learn origins and/or definitions of terms, to disambiguate similar but not quite identical terms, AND, thus far, I’ve not found anyone else out there doing that. I have no doubt that someday an app will exist that will actually correctly and usefully translate menus, probably for most countries, not the literal and approximate (though still somewhat impressive) stuff existing software is doing – NO, something way better than that which is what I aspire to accomplish. But, as I mention in my About I am not a likely candidate to solve this problem. So real foodie, especially from Spain traditions with Spanish fluency could do this a lot better, but, thus far, I don’t see any such people stepping up to the plate.

So I’m on a virtual trek across Spain and I have (mostly) virtual readers of these posts. But that doesn’t mean the energy I use on a treadmill isn’t real, even though it’s not on the Camino, and it doesn’t mean that having some discipline to write posts carefully  because I might have readers isn’t also real.

So therefore thank you virtual Dear Readers – I know you’re really out there and I will continue to find every interesting thing I can about food in Spain and how to interpret the descriptive language into something meaningful for English speakers with some amount of foodie knowledge.

Purpose of this blog

I’m using this blog to document my progress and explain amusing (at least to me) challenges I’ve found to creating an easy-to-use, yet comprehensive vocabulary of what one would find on menus in restaurants in Spain. Once I’ve compiled my glossary/dictionary/phrasebook I’ll publish it, at least on the web, perhaps as an Android app, and perhaps as some self-published eBook. But for now this will be the story of how I reach that final goal. The individual posts will tell the story of some of the more challenging translation issues with actual examples.

For me to attempt this is truly audacious, which at least one dictionary defines as ‘intrepidly daring‘, which this project will certainly be. Why? Well, first I speak only a few words of Spanish and actually little else than English. Second, I live a long way from Spain (or even any Spanish restaurants) here in the midwest; in fact, I’ve never even been to Spain (although I could see it from my visit to Portugal). And, third, I know little about Spanish cuisine  (but hope to learn much more via this effort), or really cuisine in general, even though I’m a decent cook (definitely not chef). That is a bunch of strikes against me succeeding with creating anything but a mess.

But I think I can do it. In a separate post (to keep this one somewhat brief) I’ll explain the numerous attempts I’ve made over the years (and with better methodology and technology to help) to assemble food glossaries for languages I don’t know. I made my first attempt long before general public access to the Internet but I did have a computer and a few books. My attempts improved each time but the results didn’t – largely because more isn’t better if it isn’t well-organized. And access to more sources just means more wrong information from other people to edit, along with good information. Also ‘more’ doesn’t fit on a one-page cheat sheet I could carry in my wallet and thus actually use in a restaurant. But alas, technology to the rescue there as well as I do carry electronic assistants into restaurants so at least I have a practical delivery vehicle for my final result. And of the various attempts I’ve made Spanish is particularly intriguing due to some unusual issues attempting to build a translation to English.

And, besides, maybe if I finally get this done right I will actually get moving on an actual culinary trip to Spain and that should be a good incentive to keep grinding on, with this blog to remind me to get some work done.

Short posts are a contradiction for me as I tend to go on and on but a blog does provide a way to break up a 30 minute monologue (for me, short) into more manageable segments so I’ll finish this introductory post and then move on to several ideas I could have included here, but for the sake of brevity I’ll do as successor posts: 1) my history and now “final” technique for assembling this list, 2) an example of the challenges of “literal” translation, which is about all you’ll get from a voice assistant on your phone or web translations, and, 3) the challenges of Spanish food vocabulary which turn out to be more than other countries/languages, and, 4) a bit more about my methodology.

Along the way I’ll enhance this site with some additional pages, primarily of resources (like glossaries other people have assembled or general descriptions of Spanish cuisine or lists of my cookbooks, or whatever) and links to some of the restaurants I’m using as the raw material in my research.

So I hope, Dear Reader, you’ll be interested in following this process but also to understand how this list was finally assembled and how much you can trust it.