A GraphQL tale: What else is in there besides introspection?

hatecomputers · September 12, 2022, 10:49am

TL;DR: Finding and extracting GraphQL endpoints / queries / mutations / data types (partially) without relying on introspection or fuzzing, from publicly available javascript files.

Story time: A couple days ago I was hanging out on Reddit when I saw someone talking about this one event I wanted to attend. As it had just been confirmed, I quickly went to the website so I could check the venue, but unfortunately couldn’t find the location. I rapidly learnt that I had to first (surprise, surprise), setup an account and agree with all the terms and conditions in order to see the address.

The thing is, I hate going around the web setting up accounts in websites I rarely go back to and sadly, this was one of those situations. I went for the no-brainer first: Got myself a disposable email and tried setting a dummy account. Kid’s stuff. Problem was, it didn’t work. They had whitelisted the email providers so unless you had gmail, outlook, etc you just couldn’t do it. “What an absolute bummer”, I thought to myself.

Eventually, fed up with the whole thing, I decided to do what any reasonable person in my position would: Proxy the app through BurpSuite and see if they were somehow leaking the damn address.

Investigatin’…

I put my hoodie on lol and after dodging an inexplicable amount of requests crafted solely with the intention of extracting every piece of information available, with the excuse of bETTer uNdErsTanDiNg uSeR bEhAviOur, I noticed a few requests flying over my proxy history tab, all referring to this /gql endpoint. Immediately, I knew this was the place I should probably be focusing on.

The gql is short for GraphQL. I’ve read a few things about GraphQL here and there when it first came out, but I didn’t pay much attention. Now, as you can imagine, approaching it took some probing and experimentation, which is usually a good recipe for an interesting article. I don’t think anything you will see here is actually groundbreaking but there’s definitely some nuggets you can benefit from, especially when it comes to analysing javascript files.

Anyways, interested in finding out if I ever manage to get that address without setting up an account while learning some GraphQL shenanigans along the way? If so, grab yourself a piece of that 4 formaggi pizza sitting on your desk for the last three days and keep on reading, because you are in for a ride.

A GraphQL prime (not really)

Let me first, set the stage to make sure we are all in the same page: If you haven’t been around web development for the last couple of years or don’t necessarily care because well, you actually have healthy and balanced life, let me fill you in: GraphQL stands for Graph Query Language and it was initially created by some folks from that social media platform & company ~~which almost had democracy dismantled~~ responsible for creating / maintaining technologies such as React. It serves as an alternative to REST architecture, which became popular around the time single page apps became a thing (maybe).

The general idea is that you can basically tweak queries and adapt them to your needs, removing the necessity of creating multiple endpoints just to fulfil corner cases that you might eventually require. This is probably a very grotesque simplification and I’m sure there’s a whole lot more to cover, but I won’t.

So yeah, right, these are just… words. What does that actually mean? Alright, let’s say you want to fetch data from an user endpoint, something like GET /user/1000. In a classic REST app, this can be directly translated into “get data from user where its ID equals to 1000”:

GET /user/1000

Response:

HTTP/2 200 OK
Content-Type: application/json
X-Other-Headers: idunno
Content-Length: 100

{
    "user": {
        "id": 1000,
        "name": "Mevin Kitnick",
        "address": {
            "street": "Whatever street",
            "zipCode": 1234,
            "city": "Los Angeles"
            "additionalInfo": {
                ...
            }
        }
    }
}

The first thing to noticed when checking the server response is ~~it’s 2022 so why the f*ck are you using incremental IDs~~ when the endpoint /user/<id> is called, it gives back the whole user whose id is 1000. Easy, right? However, let’s say now we care only about the address piece from the same user, maybe for a different part of the website. What happens then? Well, we would have to either get the entire user again and then, manually extract the address or if the back-end kids were nice enough, we could ask them to have this new endpoint introduced:

GET /user/1000/address

Response:

HTTP/2 200 OK
Content-Type: application/json
X-Other-Headers: idunno
Content-Length: 40

{
    "address": {
        "street": "Whatever street",
        "zipCode": 1234,
        "city": "Los Angeles"
        "additionalInfo": {
            ...
        }
    }
}

When tinkering with graphs and queries (?) however, we instead always send a POST request, which looks a little odd at first, but sort of makes sense considering GraphQL expects a little more information in order to provide what we need (see request’s body below):

in GraphQL’s fashion, this would be the equivalent GET /user/1000 we previously saw:

POST /graphql HTTP/2
Host: www.someapp.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:103.0) Gecko/20100101 Firefox/103.0
Accept: */*
Accept-Language: undefined
Accept-Encoding: gzip, deflate
Content-Type: application/json
Origin: https://www.someapp.com

{
    "query": { query { user(ID: $id) { username, lastUpdate }} },
    "variables": {
        id: 1000
    }
}

Same for address GET /user/1000/address:

POST /graphql HTTP/2
Host: www.someapp.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:103.0) Gecko/20100101 Firefox/103.0
Accept: */*
Accept-Language: undefined
Accept-Encoding: gzip, deflate
Content-Type: application/json
Origin: https://www.someapp.com

{
    "query": { query { user(ID: $id) { address }} },
    "variables": {
        id: 1000
    }
}

Have you noticed the endpoint hasn’t really changed? This means that once the schema is defined, we can easily request different subsets of data and the GraphQL server will gladly provide us just that, while the endpoint remains the same. Hopefully that makes sense. If it doesn’t, I guess maybe… read the docs?

GraphQL has massive adoption amongst the cool kids these days - considering you are reading this is 2022, I don’t really know what the future holds - so you won’t have any trouble finding ~~a shit load of redundant~~ content about it out there.

Now that this is out of the way, we move on…

GraphQL Recon - The classic medieval style

Every once in a while when I’m ~~dumbly typing things on Google and recklessly trying it out in the wild~~ performing a penetration test on a given application, whenever I come across a technology I’m not necessarily knowledgeable about - as in, most cases - my approach is the following: I start by poking with it directly as soon as I can. No joke, that’s it.

But if you allow me to elaborate: This is my attempt to get a sense of what it does without checking any other resources. In this case, it wasn’t different although I knew beforehand that as GraphQL is an open-source project, it would have been so much easier to just dive in, checking the documentation - which I eventually did. But I like to believe there’s some value in approaching things while clueless, as I’m still fresh and not bound by any usage expectations. Ultimately, this gives me full permission to be deliberately stupid and I might try things which they haven’t foreseen when designing the solution. Believe me when I say sometimes good things can come out of it (that’s what I like telling myself, at least). However in this case, nothing necessarily weird popped out. So ugh, a bit of a waste, but not entirely because up to this point, I was already familiar with the syntax and noticed some GraphQL errors were quite informative, which definitely gave me some leverage.

As my rEsEaRCh continued, I decided to take advantage of our collective brains and do some snooping. If there’s a lesson to be taken from this is, in a world of 7.9 billion people, THERE’S ALWAYS SOMEONE putting off the gym or bailing on their friends because they had convinced themselves there’s this incredible idea that needs to be shared with the world, right here right now, right? Right. It’s called functional procrastination, apparently. And as a matter of fact, this was no different.

So when I googled “how do I Hakc GraphQL!11”, a bunch of articles came up, where some slick folks who had already gone through the trouble, explained their methodology. One thing kept popping up however: They wouldn’t shut up about this thing called introspection, so I thought there was maybe something in there.

Toying with introspection for fun and zero profit

Introspection gives you the ability to extract metadata out of the schema provided, or to put it simply, get information about the schema itself. Think of it as the information_schema’s younger sibling, who never got the chance to hangout with his older bro because his family, the rich type, don’t really want them to have anything to do with one another. But you don’t have to stretch too far to know they do. Got it?

Ok, equipped with this stupid analogy, I decided to give it a try:

POST /graphql HTTP/2
Host: www.secreteventapp.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:103.0) Gecko/20100101 Firefox/103.0
Accept: */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Content-Type: application/json
Origin: https://www.secreteventapp.com

{
    "query": { query __schema{types{name,fields{name}}} },
    "variables": {
    }
}

But it didn’t work. I know this because the response said Error: Unable to use introspection, so I guess we can all agree it didn’t. And I’ll be telling you why in just a second but first, let me tell what we could’ve done if things had been differently:

POST /graphql HTTP/2
Host: www.secreteventapp.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:103.0) Gecko/20100101 Firefox/103.0
Accept: */*
Accept-Language: undefined
Accept-Encoding: gzip, deflate
Content-Type: application/json
Origin: https://www.secreteventapp.com

{
    "query": { query __schema{types{name,fields{name, args{name,description,type{name, kind, ofType{name, kind}}}}}} },
    "variables": {
    }
}

Hopefully I haven’t left you feeling alone and bamboozled with so many curly braces, but if that’s the case, here’s a quick walkthrough: When sending this query to the server, it immediately gives back all the graph’s node names along with their respective fields and last but not least, every field’s data type. Isn’t that cute? They just purposefully give us information, free of charge. I mean, it doesn’t give you the queries per see, but knowing the schema and the format expected, you can craft your own. Easy peasy.

Now, going back to the point where it didn’t work: Apparently, people realized it doesn’t make much of sense having introspection enabled for the most part, so the standard recommendation is actually disabling it. A bummer, but a tiny win for our broken little industry.

To wrap it up: Introspection might not work - as it didn’t for me, because it was a technique discovered in the early 60’s, so maybe a little outdated? But when testing, we have a moral obligation to give it a try. Seriously. It will take you seconds to figure whether it works or not and if does, it’s just smiles from there. Moreover, every time you blindly assume something won’t work and put it aside when it comes to infosec, a tree randomly burns into a flame. So if you truly care about saving earth, make sure to keep tabs on that. Regardless, chances are, it won’t work.

Pr0 H4x0r t1p: If you’ve got your recon done right, might be worth checking for dev / testing / staging environments as those have a better chance of having introspection enabled.

What you do then?

GraphQL Recon - The contemporary / artsy approach

So, introspection didn’t work and I was back to stage 1. I thought about fuzzing GraphQL endpoints, but I didn’t want to do it just yet. I wanted something easier. Everyone loves when things are easy, like when you randomly trip on something just find out there’s five bucks sitting right next to it and you can get yourself that bag of greasy doritos without feeling guilty. So I set myself to answer the following questions:

How does the client gets ahold of those queries?
Is it being “pre-mounted” in the back-end and immediately sent out to the client when the app starts?
Are the queries only available when interacting with the feature place?

There’s an old saying: “If you don’t see a piece of information coming from a previous request when analysing Burp’s history, it probably means it’s sitting in the client-side all along”. It’s very popular amongst prehistoric civilizations as you might have noticed based on the sentence’s structure and outdated words.

I had BurpSuite fired up, remember? So I went back to it and this time, armed with ancient knowledge, checked to see if they were somehow retrieving queries as you navigate through pages. When I could not find any, it became obvious: Javascript files.

So back to my target’s page, I opened up Chrome Developer Tools and clicked in the Console tab. Then, I fiercely typed:

$$('script').map(script => script).filter(script => script !== '')

This pretty little snippet brought me back a bunch of javascript URLs. After that, I threw this list into a file, did some sed-fu to clean up all the unwanted characters and finally, dumped all the files while beautifying them because they were all ugly… as in, minified.

cat script-urls.txt | xargs -I@ sh -c "curl -s @ | js-beautify | tee -a target.js"

Isn’t that nice? This appends every single javascript file’s content into one big old file because I don’t necessarily care about investigating functionality separately. I knew the GraphQL queries were there somewhere amongst that spaghetti code and I just needed to get them out of there. Next, my idea was to come up with a regex pattern that could spot queries, so I went back to the ones I previously saw to make sure and I didn’t screw up the format and this is what I came up with:

cat target.js | grep -Eo "query\s\w+\(.*?\)"

Let me quickly walk you through this:

Print the contents of target.js
Using the following regex (-E), extract only (-o) whatever matches instead of getting the entire line
Regex breakdown: search for the word query, follow by a space (\s), follow by any alphanumeric chars (\w+), follow by any character(s) that exists between ( and ).

And as a result, I got something like this:

query eventById($eventId: ID)
query userEventsById($userId: ID)
... and a bunch of other things :)

Now we’re talking. Maybe I should also take advantage of this hook to explain the difference between queries and mutations.

Queries and Mutations

You see, GraphQL sees the world through a different set of lens when drawing a comparison with REST APIs. While REST APIs rely on HTTP methods to define and specify behavior (retrieve = GET, insert = POST, update = PUT, etc.), GraphQL always performs HTTP POST requests, communicating its intention through the usage query or mutation keywords, which are reserved. The first stands for retrieving data while the latter, for inserting / modifying data.

Finally, the sole purpose of having APIs is to provide the ability to access / modify data while keeping it consistent when distributing it through all clients. This application being just another client which can also modify data, meant mutations could also be found throughout the javascript files:

cat target.js | grep -Eo "mutation\s\w+\(.*?\)"

Here’s what I’ve got:

mutation createEvent($input: Event!)
mutation updateEvent($input: Event!)
... truncated

Ultimately, you can compose the patterns together by writing:

cat target.js | grep -Eo '(query|mutation)\s\w+\(.*?\)\s{.*?\"'

which this time, yielded something like this:

query eventById($eventId: ID) { event(id: $eventId) { ...EventById } } "

Attention-oriented fellows might have spotted a subtle difference between the pattern above and the ones before that. The reason is that I have expanded it to capture the entire queries / mutations instead of just signatures. I did this because I wanted to easily navigate through the app’s functionalities while still being able to analyse queries separately when needed. This helps to summarize the features and gives me a direction to where / what to look for.

And speaking of entire queries: Have you noticed some of the expected arguments are like $variable: Something or sometimes a query has something like { ...Something }. That Something is the expected type (e.g. Int, String, complex types, etc.) and the exclamation point (!) means its mandatory. For strings, in order to fix the query you can basically throw something between quotes and call it a day. Even if the value you passed is wrong, query will still work given the type is correct. Same for numbers. But when it comes to complex types, it is harder to infer what is expecting (introspection would specially come handy here, but life ain’t easy, am I right?). Which brings us to last worth covering topic I want to discuss here.

Fragments

Again, what if you didn’t actually have to guess anything? Fragments are a subset of the fields associated with their existing type. Confusing? Maybe. But let’s together, check this snippet which is gonna make things as clear as day:

fragment EventAdress on Event {
  location
  number
  city
}

Fragments encourage reusability (is that word?), so when you see something like { ...EventAddress } as part of one query, it’s actually being converted into { location number city }. Why’s that important? Well, you can also search for fragments and extract more information about each node and their respective fields. As always, here’s a slick oneliner to get you up to speed:

cat *.js | grep -Eo '(fragment)\s\w+\s\w+\s\w+\s\{.*?\"'

which gets us:

fragment EventAdditionalInfo on Event {\n    id\n  baseUrl\n  }\n"
... truncated

Pheeeew. That’s a whole lot of things. Some of you might be wondering if it was worth putting this much effort instead of creating the stupid account. You are right, it probably wasn’t, which maybe now got you to think about your own life and how it isn’t that bad after all, which lastly, somehow ties everything together since the whole secret point of this article is to be read as a meme.

Wait, what were we talking about again?

So, all this information, countless hours of restless research (it was actually barely 5 hours with my ADHD hyperfocus on its best)… for what? To get an address for an event I was already considering bailing because of all the hassle I had to go through. But let’s wrap it up so we can all go home with something new to talk about when starting off conversations with strangers in random bars.

Here’s the thing: If you made this far (I thank you advance for that), I’m sure you’ve already put 2 + 2 together and already has some ideas of what you can do with this. But maybe a couple of ideas if I may?

Couple of ideas

Expanding queries

As in my case as you might recall, I only cared about that address bit. I found the query I wanted and noticed that when the user was logged, it would add an additional field called eventAddress, which didn’t show up when you just hit the event’s page without having an account. Simple as that. As their API would still respond regardless, I just added the additional field to the query, replayed the request and voila. The lack of authentication didn’t prevent the additional field being added to the query. It was omitted when the application was performing the request for us, so I guess security through obscurity as they say.

And that was it, I was as in… (not really, because there was an additional field saying confirmedAttendance which was set to false), but you get the point.

Proxy all queries to map out the application’s features

Another thing I thought was, we could moved all the queries into a wordlist kinda of structure, and run a script against it, requesting everything while proxying it (maybe you can also do it with Burp Intruder, up to you).

One caveat is that if you immediately forward the requests, you will see some of the queries (the ones with arguments) will fail and that’s expected, since we got those queries through static analysis and providing right arguments is the application’s job, happening at runtime. You can still play around with the queries as you go. Moreover, now you have access to everything the application needs to communicate with the GraphQL server.

Fuzzing

I said earlier I didn’t want to try fuzzing right at the beginning but there’s definitely value in doing this once you know how the queries are structured along with their naming conventions. This can save you a lot of time. Maybe there was some endpoint that isn’t there but your chances of finding out had now increase as you have a bit more understanding of their internals.

Wrapping up

You may be wondering how that event turned out in the end. Was it worth it? So funny thing: a friend of mine ended up setting up an account and added me as a +1, but I decided to take a nap before going and turns out it 6 hours don’t fall into the nap category, so I missed it. I guess my conclusion is: The journey matters more than the outcome? I don’t know. Bye.

Additional resources

messede · September 12, 2022, 12:48pm

Shoot, this has to be the best ever “first post” from a forum member.
Welcome to 0x00sec and Thank you for the post!.

vict0ni · September 12, 2022, 1:31pm

That’s actually pretty cool. Def helps my limited knowledge about graphql.

Also, welcome!

hatecomputers · September 12, 2022, 2:03pm

Appreciate the feedback, folks. Glad it was somewhat useful

system · January 12, 2023, 2:49am

This topic was automatically closed after 121 days. New replies are no longer allowed.