GeneaBloggers

Thursday, 29 September 2016

Reaping What We Sow — Part II



Has genealogy found itself in a rut? Part I of this article looked at the major contributors to modern genealogy so I now want to examine the repercussions of their efforts and question whether, collectively, they’ve resulted in good or bad for us as users.

Digitised Sources

Back in Internet Genealogy - is this progress?, Janet Few mentioned that many end-users do not venture beyond the brief details that have been transcribed for them, often doing no more than saving any images to their computer. As well as wasting any un-transcribed content, this would also mean that they have not verified the transcribed portions. Since only enough portions are transcribed to support a database index then valuable information may be missed. How many researchers look at the neighbours in a census, or travelling companions in a passenger list? But not all online sources have accompanying images, and so end-users are then expected to blindly accept someone else’s limited transcription.

A converse to this situation occurs with certain newspaper archives. Although they will have used OCR to generate text from the image of a whole article, as opposed to merely selected items, they do not allow the end-user to save that fully searchable and editable text; instead, expecting them to simply save the image or transcribe it themselves. In other words, rather than relying on someone’s transcription, end-users are then forced to rely on the image and make their own transcriptions (if any).

Full and accurate transcription may be an effort but also so valuable, and so why are there so few tools that support it, or standards for its representation? What we have here is a failure of software and commercial genealogy to appreciate the importance of a transcription, and its relationship to an image copy of the original. Particularly in the newspaper case, there is no excuse whatsoever for not providing access to the already transcribed text. But even if you have a full transcript of a document, your online trees offer no way to keep that and the associated image together as a single item associated with your source reference. Where are users expected to include an analysis of such a source?

Finally, it may be difficult-to-impossible to determine the provenance of the digital information (image or transcript). Although this is gradually changing, that change has been the result of pressure from traditional genealogy, and generally from research-orientated genealogists who need that information in order to make a considered argument. Maybe this was initially considered too complicated for the host’s intended market, or maybe they just didn’t understand genealogical research to that extent.

There are adequate technologies for recording provenance, and other information, in the image itself, in meta-data that would be invisible to the end-user but not to the software. What is missing is an agreed standard from software genealogy and commitment within commercial genealogy. What we have is zilch! A tentative proposal was produced within FHISO but there was no feedback or discussion over it.

Online Trees

Software is all about organisation and visualisation of your data, and these are quite different to each other. In effect, the organisation helps make data accessible, but the visualisation is designed to convey information to the end-user. One organisational schema will typically support multiple visualisations, dependent upon the perspective being studied, the regional settings and preferences of the user, and the sophistication of the product.

A family tree is essentially just one way of visualising lineage — another being a pedigree chart — but the simplistic model used to attract the mass market has made it into an organisational structure upon which anything and everything is expected to be hung. This effectively treats a tree like a wardrobe full of hangers, and results in an inappropriate organisation of the data.

But what is meant by “your family tree”? A search for that phrase suggests there are hundreds of thousands of references, combined with verbs such as: find, discover, climb, Google, trace, flesh-out, grow, build, create, and research. Do we all have just one family tree? Obviously not, unless you believe that only male ancestors carrying your own surname are significant in your history; however, even if we accept that we have multiple family trees then where do we put step relations, half siblings, adopted or foster parents? Where do we put incidental people who may have been so important? Where do we record non-vital historical events in all those lives?

Surprisingly, many casual genealogists do look only at their “surname tree” and commercial genealogy eagerly accommodates that, but what happens if you want to store the lineage associated with every paternal and maternal branch of each generation? How many distinct trees are we limited to?

What happens, too, when you want to attach a document or photograph involving multiple people, maybe from distinct trees or even from none of your trees? If you’re not forced to duplicate it then are you allowed to see all the persons it is attached to? What about shared events where multiple persons were involved, such as a wedding, or even a census? Merely attaching the same item to multiple people leaves no room for a representation of the event and its history. Such requirements should be common sense, but the unswerving adherence to using trees as an organisational concept means that they’re hard to accommodate.

Documents or images relating to people on different trees
Figure 1 – Documents or images relating to people on different trees.

As an aside, notice that the linkages shown in this diagram are bidirectional. Yes, it is possible for specific parts of an image to be connected to entries in a tree — or some other organisational framework — a little like being tagged on a social-networking site. As mentioned already, the technology is there but the standards and commitment are not.

Maybe the biggest criticism of online trees is that they’re conclusion-based; they usually represent someone’s conclusions based on absent evidence. Support for source citations began to appear when it became obvious that unsubstantiated conclusions were propagating like a virus, but then citations are insufficient unless the evidence is direct and non-conflicting. Where is the incentive for writing any justification? Is it incompatible with the mass-market objective of commercial genealogy?

Trees are generally about discrete data: names, vital-event dates and places, and biological relationships. When combined with the point-and-click ease with which search results can be added to a tree then there’s no room for justifying an operation. If the expected name isn’t visible in the search results then it is a brick wall for many people, and similarly if there are too many close alternatives. That simple model offered by the commercial sites presumes that the discrete data you require are all in their records, somewhere, and that you’ll just be assembling them into your tree. From that perspective, a citation serves only to say where a datum came from, and not why you believe it to be relevant or correct. In effect, online trees bypass huge parts of the research process. A source-based approach would fix that, and yet still allow the tree as a visualisation of the underlying data.

Collaboration

Ask any group of people what they think collaboration means and the majority will mention unified online trees or exchanging GEDCOM files. They’re not wrong but there is more: there are forums, groups, message boards, and wikis devoted to helping people and that is also a form of collaboration. Although collaboration existed before Internet genealogy there are many more possibilities now, but do they really help?

Let’s try and breakdown the types of collaboration into some broad categories.

  • Operational: helping others with questions about the how, why, or where. This is one of the biggest uses of the groups and forums.
  • Research: working with others on a given research topic, such as a family or a surname. Working with other family members qualifies, but do too would one-name and one-place groups, as would any crowd-sourcing initiative. I will also add unified trees to this category.
  • Publication: effectively sharing work that we’ve already done with others who may be interested. Examples include blogs, dedicated Web sites, and user-owned trees.

It’s interesting to compare which of these are currently supported within commercial genealogy. Most sites provide public user-owned trees and some provide unified trees. Not all offer a community area for operational collaboration, only FamilySearch offers an area for “memories” that can be linked to entries in trees, and Findmypast have no offerings that fit these categories. But who supports collaborative research? Even collaborative publication is not supported well.

In Part I, I mentioned that sites could allow their patrons to upload images or documents under a Creative Commons licence, and that this would help researchers elsewhere to make use of them with appropriate attribution. My emphasis is to indicate that those sites invariably have an insular approach to collaboration. They are not really concerned with non-subscribers, but it’s a two-way street and they stand to benefit from a little more vision.

Let me present an example. I recently published an article entitled A Sad Career in which I researched the short life of a girl who was not related to me at all. I therefore do not have a tree for her family, but I still want to share all my research with descendants of that family. Try as I did, the best I could do was to add a link to a tree on Ancestry, but I could not contact the owner there or elsewhere. It was about this time that I suggested people like me could volunteer meta-data for their articles (or Web pages) that would allow these sites to make them freely searchable (see Blogs as Genealogical Sources). Making written research articles available as another type of source means that these sites could retain their existing focus on family trees, and would not have to embark on a major enhancement to support narrative locally.

As that article states: this proposal should be a win-win for all concerned; however, commercial genealogy has not even acknowledged it. The worrying aspect to this is that if, as these commercial sites claim, they have genealogists on their teams then they would be well aware of the written suggestion and subsequent discussions. Alternatively, they must be in an ivory tower.

Software Tools

There are a couple of software tools that currently acknowledge evidence and its respective sources, but I am not aware of any that allow you to work upwards from the information in sources without a precisely predefined goal. I have discussed support for source-based genealogy at Our Days of Future Passed — Part III, and a particular way of working at Source Mining. The difference is that it dissects a source to identify information to be associated with several persons (rather than searching for specific details for a given person), and with source-mining it assembles the history of a person, family, place, etc., from all the information that can be found for it across multiple sources. All the same issues of handling source references, conflicts, and interpretation apply but the goal is much more general than finding, say, a birth or marriage. Under Digitised Sources, above, I suggested that a scaled-down version of this approach would also apply when building family trees, but an inappropriate data organisation might render the approach difficult-to-impossible to implement.

I recently had to make contributions to online trees at Ancestry and FamilySearch and was shocked to find how hard it was to do something as simple as take a given census page and associate all the details of a household with the respective tree entries. I had to switch from person to person and continually repeat myself, but when looking at it from the perspective of the census event for that household then it is so much more natural — the citation is the same; the event, place and date are the same; the people are usually part of the same family; proof arguments relating to identification of the family, analysis of errors, etc., are the same. So, as well as the existing ergonomics being very poor, the obvious place to hold a written source analysis is stolen away.

But what else has been missed by our tools? In addition to source-based genealogy, that series of posts beginning at Our Days of Future Passed — Part I identified the following alternative approaches that have been tried within software genealogy: arboreal (tree) genealogy, event-based genealogy, and narrative genealogy. One of the main thrusts of that series was that these should not have to be exclusive alternatives. As already explained, the many possible visualisations, including the user interactions with them, must be supported by a separate organisational framework, which in the STEMMA case is a single all-embracing one.

Another thrust was that of generalising the subjects of our research from people to include places, groups, and animals, including an orthogonal treatment of events, evidence, sources, narrative, and hierarchical relationships. This level of generalisation was the reason why I prefer the term micro-history to either family history or genealogy.

There are many variations in the way that software tools can be written, with each having its own benefits, as recently detailed by Tamura Jones in a series of posts related to genealogical software choices. In Do Genealogists Really Need a Database?, I justified my elimination of a relational database on the basis that:

  • they use indexes that are inefficient  and not a good fit for historical data,
  • they limit sharing because each product has its own schema,
  • they risk data loss as disk-based linkages may become corrupt,
  • they force an extra stage before data can be accessed or viewed,
  • they reduce the longevity of the data as the database is tied to the longevity of a specific product.

As that same article concludes, having an all-embracing representation in a file that doesn’t require a database gives a special degree of freedom that genuinely helps with sharing. These genealogical contributions, or “bundles” as I refer to them internally, may be likened to Word, Excel, and Powerpoint files in the Microsoft Office paradigm — so termed not because Microsoft invented it but because Office is a ubiquitous example of it. The essential elements of this paradigm are that files can be directly exchanged and immediately visualised by recipients, and that other products may use an API to access the associated data for alternative visualisations.

Visualising Excel data
Figure 2 – Visualising Excel data.

If an Excel spreadsheet were received in an email then you could immediately click on it to see its information in a registered viewer. This would be the Excel tool in this case, but note that an Adobe PDF file could use a dedicated read-only viewer that is freely available; you only need a licence if you want to create such files yourself. Alternatively, software developers can write other tools that use the publicly documented API to present information in different ways, and possibly to supplement suites of proprietary software.

Visualising STEMMA data
Figure 3 – Visualising STEMMA data.

These failures by software genealogy may be linked to the prevailing notion of genealogy as family trees maintained in databases. A particular concern of mine is that this model restricts what parts of our history we can leave for the future, and how long its digital representation will be accessible. Consider that a JPG image file is backed by an international standard that is publicly accessible, whereas a genealogical database generally has a proprietary schema implemented in a proprietary database engine (I can list several such engines that are no longer available).

Narrative

In Part I, I suggested that the fundamental medium of narrative is poorly supported by our tools, and that it is either squeezed into some internal (to a tree) plain-text field, or relegated to some external tool such as a word-processor, blog, or wiki. I know of no public tool that supports the integration of narrative — whether for research write-ups, stories and memories, proof arguments, or transcription — with trees, timelines, geography, and other forms of genealogical visualisation. Again, I am making the distinction between organisation and visualisation so your first reading of this paragraph may be misleading.

The concept of genealogical narrative has been tainted by the laughable claims of some products that they can generate narrative from discrete data in trees, and I want to distance myself from that. As an example, the ProGen manual for professional genealogy implies that software combined with narrative must mean template-generated robot-speak,[1] and this perception will spread as long as such claims exist.

There may be a more subtle issue with narrative which relates to its usage on the Web. In Our Days of Future Passed — Part II, I started distinguishing narrative essay from narrative report, and rather more recently noted that ProGen (p.354) also used the term narrative report in the same fashion, albeit with negative connotations. A forum post from June 2016, entitled Hereinafter Unsure, began as a simple question about potential confusion with the use of hereinafter in a citation, but quickly degenerated to a circular debate about writing for the Web and the use of this same term. The term is widely used outside of genealogy, but I’d adopted it to describe the format of my own research articles. These articles are certainly not research reports, which are far more rigorous and intended to describe research undertaken for a client. Neither are they historical accounts, which would be wholly about lives or events, or even case studies, which might be more appropriate in an academic journal. My usage reflects the fact that they embrace both the research journey and the uncovered history in a single narrative form, together with identification of sources and analysis of evidence. It soon became clear that there was a deep difference of opinion over whether a readable account of research, plus the inclusion of evidence analysis and citations, were a bad combination; furthermore, that the latter belong in scholarly journals and books rather than in narrative shared on the Web or with family. My belief is that the Web is the primary mechanism for sharing our narrative, and that such material will be found by corresponding search operations rather than reading journals. If true then the proper handling of sources and evidence makes the difference between more throwaway genealogical claims and something of value that can be used and cited by others. The sad element of this is that online narrative would provide the missing venue for the full use of the good methods taught within traditional genealogy, the very ones that are now being poorly applied to online trees.

So is there a general feeling that the majority of genealogists are not up the challenge of a written account? I sincerely hope not. In fact, I believe that the majority would not only be capable, they would welcome the encouragement from both commercial genealogy and traditional genealogy. It is true that there are some so-called genealogy police who can deliver heavy-handed criticism, and they will effectively discourage these people. But remember two things: experts who know this craft inside-out did not arrive by parachute out of thin air — it took them time to achieve it — and these people will not be writing for academic publications. I personally don’t care if such work isn’t grammatically perfect, or without every citation perfectly crafted and punctuated, as long as all the details and reasoning are captured. People can always learn to do it better, but presuming it to be the preserve of academics is a self-fulfilling prophecy.

Conclusion

The price we’ve paid for having access to online records was giving genealogy a mass-market appeal — meaning ease and simplicity — but the momentum associated with that market has changed the face of genealogy, probably forever. It’s now very difficult to take a step back and look at what might have been achieved with more vision and in the absence of this legacy.

There are many genealogists who have only ever known computer-based genealogy — using the Internet and/or local applications — and their perception will therefore have been determined by the currently available sites and products. A case in point is that many of these users do not know how to handle conflicting information from different sources, such as ages in different census returns — not because it’s too complicated but because their tools offer them the wrong orientation. If their family tree only accommodates conclusions then it’s hardly surprising if they resort to creating multiple birth events.

At some point, the software industry was always going to look at genealogy as a market for new applications, but what would prospective software developers currently see: family trees. It is hardly surprising that their data models, databases, and the products themselves, all try to deliver variations on this theme.

In an episode of Mondays with Myrt on 10 Aug 2015 (timestamp 1:14:20), I suggested that genealogy was serving the interests of the software industry, rather than software serving the interests of genealogists. Had genealogy not turned into a big enough market then the software industry would have selected some other field of endeavour to focus its talents on. Unfortunately, it’s still rare to find professional software designers who are also research-orientated genealogists, and I believe this is hindering further progress.

These endeavours (software and history/genealogy) would almost certainly have been exclusive career choices since they would each have required a wealth of experience to become truly proficient, and each would have required a quite different academic background. On top of this, both have their own jargon, but with much room for confusion and ambiguity when participants interact.

The main problem is one of education, but not the traditional forms such as how to use a given Web site, or how to use a given product, or even attaining qualifications such as BCG certification; there is a lack of reciprocating knowledge, and vision, in both software genealogy and traditional genealogy. Software genealogy certainly has a naïve and overly-simple view of its goal, and so it needs input as part of that education. I have mentioned a perception within traditional genealogy, though, of software as necessarily conclusion-based, and possibly even a mistrust of its capabilities or reliability. It’s a poor analogy but one that everyone can relate to: no one would now prepare a document using pen-and-paper before entering it into a word-processor. A word-processor is a complicated program, but it has evolved to the point where almost everyone can use it to some level of productivity, often with no documentation.

Traditional genealogy acknowledges the power and importance of narrative, for all its many uses such as the handling of evidence and inference, but it does not sell this notion within the digital world. As a result, the requirement has not been picked up by software genealogy or by commercial genealogy. This one deficiency, alone, can be linked to poorly researched trees, a dearth of reasoning and justification on these trees, and a disrespect of genealogy by academic historians. It is welcome, therefore, to see kindex — whom I met at RootsTech 2016 — progressing with their mark-up for narrative; and also to hear FamilySearch reinventing itself and emphasising stories and memories over trees — although I hope they can also see a place for fully researched, reasoned, and sourced narrative.

So where does this Gordian knot leave me? Well, I admit that I struggle to find common ground. Software genealogy is too concerned with something different to what I'm doing; traditional genealogy has taught me much, but it appears uninterested in its future within a digital world; commercial genealogy provides me with valuable data, but I fear it is not interested in what I really want to achieve. Someone please tell me that I'm not alone!




[1] Elizabeth Shown Mills, ed., Professional Genealogy: A Manual for Researchers, Writers, Editors, Lecturers, and Librarians (Baltimore: Genealogical Publishing Co., 2001), p.355; hereinafter cited as ProGen.

Friday, 23 September 2016

Reaping What We Sow — Part I



Several pundits have questioned what genealogy really is, usually focusing their answers on the interpretation of the word. Even I’ve contrasted the semantics of the terms genealogy and family history, as used in the US and the UK, at What Is Genealogy? In this article, though, I want to consider the question in a much wider arena: is what we’re doing what we really want to do, and how has the Internet influenced this? Or, with a Monty Python twist, what has the digital age ever done for us?

On 9 Aug 2015, genealogist Janet Few prompted a flurry of diverse opinion with a post entitled Internet Genealogy - is this progress?. She suggested that although ease of access to record images was of great benefit, thoroughness and rigour had been compromised in the interests of speed. Also, that Web site changes were largely in the interests of profit rather than of “serious researchers”.

Only a couple of days before that, at Is It Time to Let Go of the Internet in Genealogy?, Amy Johnson Crow bemoaned the continued use of the adjective online as though it indicated some fundamentally different resource. In other words, the Internet is here to stay, and is now a fundamental part of genealogical research, so why emphasise it.

So who is right? Is the Internet simply part-and-parcel of our pursuit, or is it a crucial opportunity that has been missed through a combination of commercial interests and a hands-off fear of the technological leviathan?

I want to make the case that genealogy has come off its rails with advent of Internet genealogy, and that the different interests, diverse skills, and entrenched viewpoints within our community have unintentionally left it injured, disrespected, and a pale shadow of what it should be. In order to do this, I will first look at the most change-laden contributions to genealogy of recent times. In Part II, I will examine the repercussions of those contributions and consider whether they have collectively been good or bad for genealogy.

Figure 1 – Barren tree in an infertile landscape.[1]

Communities

Back in Are we a Genealogical Community?, I was naïve enough to suggest that we were a single community. I now want to renege on that and suggest that we currently have a multitude of largely independent communities operating under the same umbrella. I will refer to the main three driving forces as:

  • Software Genealogy
  • Traditional Genealogy
  • Commercial Genealogy

The recipients of their uncoordinated efforts are the everyday devotees, enthusiasts and hobbyists, including the end-users of any technology.

Software Genealogy

This group includes those with strong software backgrounds who are either producing products or who are researching into the application of software to genealogy. I accept that I fall into this category myself, and so when I criticise it then I am implicitly accepting my own failings.

For the first eight year of my research, I used no genealogy product or database. The initial task through which I entered genealogy was a complex family mystery that left me with many clues and hypotheses, other writings, verbal recollections, and newspaper cuttings, but comparatively few official records. When the time came, I found that no software came remotely close to taking over so I had to write my own — so was born my STEMMA project.

Current software tools appear to force a binary choice: your primary focus must be either lineage or family history, and so your respective tool of choice must involve either a tree or some form of narrative aid. This is woefully inadequate and prevents the proper integration of, say, a family history write-up with access to the associated biological and marital relationships, events, timelines, and geography. There may be a growing number of sites advocating written history, but there is an implicit assumption that such writing will separately use either a normal word-processor, a blog, or some wiki-style tool.

So what about those people who are working upwards from information encountered in various sources, and making inferences or arguments as they go? I examined this source-based approach back in Source Mining, and discussed its advantages and differing goals, but it is not a feature of any mainstream products or Web sites. There are some newer products that help keep track of your evidence, and its relationship to sources, but they are not — as far as I’m aware — advocating a different methodology.

So where lies the root-cause of this discrepancy between what I want to do and what’s expected of me? In Light-bulb Moments I suggested that programmers were effectively writing specifications for whatever form of genealogy they happen to indulge in themselves. Also that it was hard to assess the type of genealogy they indulged in, or to what depth of knowledge they aspired, if they didn’t publish their work. This was the main reason why I decided to publish some of my own research on this blog; putting it in the spotlight would allow people to assess whether my still-evolving software ideas had any merit in the wider world. In practice, though, my association with software is something of a stigma that makes it hard to be taken seriously in certain quarters, or to cooperate in productive debate.

Software people generally have a talent for looking at things in abstract ways that can lead to clever and efficient designs that may have longevity beyond their originally-envisaged functional requirements. This is a two-edged sword, though, and it can lead to over simplification of a problem, or to approaches that are just too abstract to be useful in the real world. STEMMA has been criticised for being an overly-complex data model, to which I would counter that it is modelling data and relationships that are part of the real world, and that reductive software thinking can ultimately lead to reduced potential.

A good example of this is narrative. I have heard statements that genealogical narrative is too free-form for computer representation, and that what is expresses is therefore too opaque for software to understand. This speaks volumes about a particular mindset, and those commentators must be reminded that it’s people that do genealogy, not software. Narrative is a very rich medium that is essential for genealogists, but it must not be supported alone.

One of the most important contributions from software genealogy should have been data standards but all attempts have been unsuccessful to date. My work within FHISO has shown a number of things: that it is impossible to get the major software people around the same table; that our different ideas of genealogy are often at odds, and sometimes not grounded with sufficient experience; that the industry is content to sit on the sidelines and wait for something to appear, which may then be ignored; and that only a very small number of non-software people have been able to tolerate the abstract discussions and make valuable contributions.

Traditional Genealogy

This group includes those who undertake professional genealogy, publish books and write for academic journals, or who promote the rigorous handling of evidence and sources in research methodology. Judging by the membership of organisations such as APG and NGS, this influential group makes up a surprisingly small proportion of all US genealogists, and the same pattern is probably evident in Europe too. It is undeniable that their guidance can be found in books and on certain Web sites, but it is not linked or advocated by any of the big commercial sites, and that puts it in a different domain to the ones frequented by the majority of genealogists. In other words, why would they hunt it out if they’ve never heard of it?

The importance of promoting rigorous research, and the clear and detailed writing-up of its fruits, cannot be overstated. Unfortunately, these recommendations are deeply-rooted in traditional printed forms of media. Publishing books involving genealogical research, or writing articles for academic journals, may attract more kudos — and may even be more profitable — but the readership will be smaller; the average genealogist will not be consulting those sources, and that is a loss in more than one respect.

Ideally, such work should be published online, not simply as a source of information but as a beacon to guide other researchers. We might all benefit from reading well-researched and clearly-presented write-ups from professionals, but most genealogists will never see one. Is there a reason behind this?

There is a gulf between printed and online genealogy that may be traced simply to technology, but one that is rapidly becoming a chasm. There is a perception of software genealogy as being related only to databases of conclusions. For instance, the following is a quote from Evidence Explained QuickLesson 20.

Step 4: Data entry?…this is the point—but not until this point—that we cherry-pick individual bits of data and record them in a spread sheet or other data-management softwareWe need only cut-and-paste them from our research report…[2]

In effect, that genealogical software plays little part in the research process, and is simply a repository for discrete so-called “facts” derived from the real research. So while a word-processor, blog, or wiki, might be employed by a user, they would not be considered genealogical software, and by implication any notion of a product that embraced both narrative and research aids could not be entertained.

Instead, most serious genealogists attempt to employ those good teachings in the area of the prevailing software tool: the family tree. This dilutes their intent to the mere association of a source citation with a “fact”, such as a date, name, or place. These could be construed as proof summaries, but that assumes that the evidence from those sources is direct and non-conflicting for each claim. It is no wonder then that online trees are still full of errors since a “fact” is worthless — no matter how many citations you offer for it — if it is for the wrong person. Although rarely done, proof arguments (the why rather than the where) could be provided in notes fields, but then bigger claims such as why a whole family upped and left for faraway climes would require real narrative to convey it properly. Hanging snippets of narrative off a conclusion-based tree is like putting the cart before the horse.

Furthermore, as I recently commented on one of James Tanner’s blogs (Important updates to the FamilySearch.org Website and with the Family Tree), a 'source' is a source of information that you've mentioned in a work (positively or negatively), not necessarily a source of so-called “facts”, and so the skewed usage of citations in online trees will eventually lead people to misunderstand about sources.

Despite this group collectively publishing many works, it does not embrace or direct any software group on its own behalf. The net effect of this slightly obvious statement is that it has no direct influence on software research, and so carte blanche is effectively given to the other groups.

Commercial Genealogy

On the face of it, access to digitised sources should be a windfall for every genealogist who has a computer. The benefits include immediate access to records that we might have to travel to see in another form, and faster searching due them being indexed on selected items of information. This is clearly progress but at what cost?

The commercial Web sites who host such records need to finance their digitisation, transcription, indexing, storage, and purchase of more data, as well as making a profit. Creating a mass-market genealogy was a fundamental requirement to make this work: too few users and the subscription cost would be too high; too complicated and it would put off the newcomers. In others words, that progress has only been possible by providing a simple model where you give the end-users masses of data to satisfy their searches, and some simple tools to make use of their finds.

That simple model is the ubiquitous online family tree. I believe this model was too simplistic, and out of necessity has since been distorted beyond the original concept, but more on that in Part II. For now, I want to highlight the fact that a simplistic model combined with mass-market advertising will undoubtedly redefine what genealogy is, and so it has been; it is now clear that the majority of genealogists equate the pursuit with family trees. Historical research and the determination of events in people’s lives have been replaced by a philatelic point-and-click collecting of names and vital-event dates and places. There is nothing wrong with online trees, per se, except that the concept has been sold to the public through relentless advertising until the majority of genealogists now talk about building family trees without so much as a blink. All their limitations and failings are then reflected negatively upon the pursuit of genealogy.

Collaboration is an essential element of genealogy; if you can’t share then progress is impeded and future generations are robbed of their histories. Being able to exchange genealogical data in a static file, such as GEDCOM, has fallen way behind modern requirements, mainly due to the inability of software genealogy to come up with correspondingly modern standards. It is left to commercial genealogy to support collaboration and sharing but that is then impacted by both their simplistic model and their commercial considerations. They can only share tree-based data — either unified or user-owned — and primarily with subscribers to the same site. Anyone who doubts that should try to contact a researcher on a site they don’t subscribe to, or add a constructive piece of information to their tree. I have written about other forms of collaboration, such as working on identification of census individuals (Collaboration Without Tears), but they would be so far removed from their existing model that they are dismissed as distractions. In the case of a unified tree then the desire to keep things simple has resulted in naïve models that spawn both edit wars and a diffusion of less-rigorous research into the collective effort.

Although FamilySearch does not strictly qualify for this category, I am including it because their software uses similar models. In particular, I recently queried their site’s conditions of use as it appeared to hinder collaboration involving research written-up on other sites, and I received the following response on 2 Sep 2016:

As you may know, nearly all of the records within the collection of FamilySearch International are governed by contracts between the original record custodian and FamilySearch. For most contracts, FamilySearch merely acquires rights for a patron to use the records for incidental, personal, noncommercial genealogical research purposes. This includes the right to extract factual data of the patron's direct family line and then reformat that data to add to the patron's personal family tree which the patron may then use as desired.

However, publication or distribution of the actual record images/documents (including via print or the Web) and wholesale indexing, transcribing, and/or translating of the records (even when these activities are for non-profit purposes) are prohibited under the contracts. Therefore, you must acquire written permission from the custodian of the original records before publishing an image of a record or document. Once this is accomplished, you may proceed as the record custodian directs. FamilySearch will have no further objections.

I'm pretty sure that this doesn't happen much in the real world, and that most people think images displayed there are in the public domain. For real collaboration, it should be possible for patrons to declare that their images or documents are available under a Creative Commons licence, rather than a blanket restriction and the expectation that patrons will respond to such written requests.

Next Step

I will wait until Part II to look at how these contributions have left us where we are now.



[1] Dead tree, Salton Sea, taken 16 Feb 2012; image credit: Dan Eckert (https://www.flickr.com/photos/cloudchaser32000/8455073038 : accessed 28 Jul 2016).
[2] Elizabeth Shown Mills, “QuickLesson 20: Research Reports for Research Success”, Evidence Explained: Historical Analysis, Citation & Source Usage (https://www.evidenceexplained.com/content/quicklesson-20-research-reports-research-success : accessed 14 Sep 2016).

Monday, 15 August 2016

Blogs as Genealogical Sources


Many people publish their family history stories or research on a blog, or some other type of Web site. Why aren’t these searched by the large genealogical providers? Is there a problem, or simply a lack of vision? This should be a win-win possibility.

Apparently, some time ago, Ancestry used to search blog material, but their concept had a number of fundamental problems:

  • The material was searched without the permission of the owners.
  • The content was copied and so diverted traffic away from the original blogs.
  • The search was based on simple name matches, or other word matches.

So isn’t there a way of doing this with the permission of the owners, and not by copying the material to a separate location, and better integrating with their normal search function? Yes of course there is!

I actually made a suggestion to them that they allow members to voluntarily list the links to their blog or Web articles, together with relevant meta-data for the referenced individuals and their relationships. This meta-data would be essential if their search had solicited information from the end-user such as a name, place-of-birth, date-of-birth, parent names, spouse name, etc.

So what’s different here? Well, the meta-data allows a more functional search operation, and doesn’t require them to copy and pre-index the raw text of the articles. Also, the end-user would be directed to the actual blog rather than a copy of it if they selected an associated match. All Ancestry would retain is the URL link and the associated meta-data, and no copying also means no copyright issues.

Surely, this sounds like the win-win that it should be. Ancestry would have more sources to search, and for zero cost other than for providing the corresponding software; members get increased traffic on those blogs from other researchers, thus increasing collaboration. Lastly, it would be entirely voluntary and so there would be no licensing issues. A search through a new category of, say, “Members’ Articles” (or implied by “Search All Records”) would show links to online articles that are relevant to the current search criteria.

 So why can't this be done by simply attaching an article to an existing tree?

  • The author may prefer to write research articles rather than maintaining trees.
  • The article may already be published on a separate site or blog.
  • The article may reference people from several distinct families, and so a single tree would not suffice.
  • Some references may be to "incidental people" where no lineage information is available, and yet the article could provide invaluable historical context for them.

Also, you would still need meta-data to support a genealogical search rather than a plain text search. For instance, in my previous blog A Sad Career, I could indicate that Ellen Poland also went under the names Helen Polin and Elenor Polin. Also, that her parents were Owen Polin (aka MacPolin) and Rosanna Polin (aka Poland, and with many given-name variants). As well as providing basic “facts” such as names, date-of-birth, etc., the meta-data would also indicate relationships to other individuals referenced in the corresponding article such as a spouse or parents. This FOAF (Friend of a Friend) concept will one-day be a basic tenet of Web sites and blogs, but for now it would have to be done by the genealogical provider.

Obviously there would have to be a way of avoiding misuse, such as offensive material or advertising, but that sort of moderation must already occur in their message boards.

This suggestion follows quite naturally from previous work I have been involved in with STEMMA.  Our Days of Future Passed — Part II discussed the importance of narrative generally, and especially of marked-up narrative. Before that, What to Share, and How - Part II tried to explain how a STEMMA file contained different types of data, including lineage and other relationships that could be used as meta-data. However, the watered-down scheme presented here is achievable now, using standard technologies.