Amongst all my many hobbies is researching my family history, which I have been doing on and off (well, more off actually) for the last 30 years. Over that time, how you research has changed enormously with so many records now being available online. Also, when I first started, the only way to record your information was on paper, but now there are numerous software packages that have largely replaced the need for paper.
Data Mining
The main issue with recording your family history information in a software package is that you are beholden to the developer’s choice of what is displayed and how. If you want more flexibility, you need to get your information out and manipulate it in some third-party tool. The de facto standard for exporting genealogical data is GEDCOM, but that is a flat text file, and there’s not much you can do with it other than use it to reimport into another program.
SQL has entered the building
Of course, what you really want to be able to do is manipulate the data using SQL, writing queries to extract the information and then formatting it for display.
The software I use is called MacFamilyTree, and it works really well. Underneath the hood, it uses a SQLite database to store the information, and so that was my first port of call. I cracked open the database and had a look to see if I could work out how the data was stored and the relationships between the tables. This proved too difficult, and some people got a bit sniffy when I asked on a MacFamilyTree forum about reverse engineering the database structure.
Turns out the users of rival software package RootsMagic aren’t quite so sniffy about it and the database structure is very well documented including an entity relationship diagram. This was the way forward but how to get my data from MacFamilyTree to RootsMagic? This, of course, is where GEDCOM comes into it’s own as it is designed for exactly this purpose.
Writing the queries
I transferred my data via GEDCOM into RootsMagic and, having validated that it had arrived safely, I opened up DB Browser for SQLite and had a poke around. Despite the well documented data structure there were still some qwerks in how the data was linked but I had some extra help in the form of a course called SQL for Genealogy which walked me through some queries and gave me a headstart.
The reporting in MacFamilyTree I find a bit obtuse and so I wanted to write myself something simple that allowed me to quickly get access to reports I needed. Top of my list was a report showing all records that had no citation or source but I also wanted:
- Individuals without either a forename or surname
- Those missing either a birth or death date
- Orphaned indivduals
- Upcoming anniversaries
- Free text search across all notes.
Wanting a quick and intuative interface to access these queries from I reused the front end I had built for the What should I watch next? project and got ChatGPT to create me a logo to use. The result is what you see below. All the code for this can be found on my GitHub project page.
Conclusions
By using RootsMagic and the work of others in documenting the underlying data structure, it has been possible to quickly put together some queries into an app I can use to help support my family history research.
The workflow is a bit cumbersome: update my records in MacFamilyTree, export the data in GEDCOM format, import the GEDCOM file into RootsMagic and finally copy the SQLite database to the webserver but it is worth it for the help it gives me.
One idea I have for an enhancement that would help the workflow would be an upload button that allows me to upload the latest GEDCOM file to the server through the interface rather than SSH.
If you are intending to use SQL to query your family history then I thoroughly recommend Margaret’s course.
Finally, you can find the code on the GitHub project page.
