Home > Uncategorized > Converting lines in an svg image to csv

Converting lines in an svg image to csv

During a search for data on programming language usage I discovered Stack Overflow Trends, showing an interesting plot of language tags appearing on Stack Overflow questions (see below). Where was the csv file for these numbers? Somebody had asked this question last year, but there were no answers.

Stack Overflow language tag trends over time.

The graphic is in svg format; has anybody written an svg to csv conversion tool? I could only find conversion tools for specialist uses, e.g., geographical data processing. The svg file format is all xml, and using a text editor I could see the numbers I was after. How hard could it be (it had to be easier than a png heatmap)?

Extracting the x/y coordinates of the line segments for each language turned out to be straight forward (after some trial and error). The svg generation process made matching language to line trivial; the language name was included as an xml attribute.

Programmatically extracting the x/y axis information exhausted my patience, and I hard coded the numbers (code+data). The process involves walking an xml structure and R’s list processing, two pet hates of mine (the data is for a book that uses R, so I try to do everything data related in R).

I used R’s xml2 package to read the svg files. Perhaps if my mind had a better fit to xml and R lists, I would have been able to do everything using just the functions in this package. My aim was always to get far enough down to convert the subtree to a data frame.

Extracting data from graphs represented in svg files is so easy (says he). Where is the wonderful conversion tool that my search failed to locate? Pointers welcome.

Categories: Uncategorized Tags: , ,
  1. August 16, 2019 10:50 | #1

    It makes the chart using javascript (https://cdn.sstatic.net/insights/Js/Trends/trends.js?v=b902f297172c) and pulls the data in via an xmlhttrequest async call. https://insights.stackoverflow.com/trends/get-data is the URL it loads and has ready-to-use JSON data.

    You can discover ^^ by opening up Developer Tools and reloading the SO URL, then inspecting the Network tab.

    https://paste.sr.ht/~hrbrmstr/3171a840f867209c35ccdb38418d76b6f56ff631 has some code you can use to acquire, reshape, and plot the data.

  2. August 16, 2019 11:54 | #2

    @Bob Rudis
    Thanks for this information.

    I have a copy of your data oriented Data-Driven Security book, which contains some interesting data.

  3. CJ Yetman
    August 16, 2019 14:48 | #3

    Just for fun, if you save the SVG (either with the download button, or by pulling it out from the inspector), you could extract the y-coordinates, and they would reflect that data, at least relatively speaking…

    library(rvest)
    library(dplyr)

    legend %
    html_nodes(“path.line”) %>%
    html_attr(‘data-legend’)

    read_html(‘~/Desktop/so_data.svg’) %>%
    html_nodes(‘path.line’) %>%
    html_attr(‘d’) %>%
    setNames(legend) %>%
    strsplit(‘[[:alpha:]]’) %>%
    sapply(function(x) as.numeric(sub(‘^.*,’, ”, x))) %>%
    as_tibble() %>%
    slice(-1)

  4. August 18, 2019 04:50 | #4

    with following url, it is easy to obtain the data from svg
    https://insights.stackoverflow.com/trends/get-data

  5. A.N. Spiess
    August 19, 2019 18:05 | #5

    A Python svg => data module:
    https://github.com/peterstangl/svg2data
    Maybe someone can port it to R…

  1. No trackbacks yet.