# Script for extracting tcp/ssl streams with tshark

## miroR

title (since 2015-11-29 21:20+01:00): Script for extracting tcp/ssl streams with tshark

You can find the current links to the script at this local address of this topic. The other posts, before that one, got me there, a step at a time...

I also removed  [SOLVED] because this should be moved into Tips and Tricks, IMO.

---

title (previously): How to extract content from tshark-saved streams?

---

[[ The instructions are of course, only for people who, even if advanced, haven't delved into network traffic analysis ]]

Familiarize with how to follow streams

SSL Decode & My Hard-Earned Advice for SPDY/HTTP2 in Firefox

https://forums.gentoo.org/viewtopic-t-1029408.html

(beginners should study there or better the links from there; it is about SSL decrypt, but tcp and ssl streams are saved in similar fashion)

You will need to open in Wireshark a file... (don't know if Wireshark-2 still has issues:

net-analyzer/wireshark-2.0.0_rc3 saves different tcp streams (non-decryptable/non-gunzip'able)

https://bugs.gentoo.org/show_bug.cgi?id=565152

Wireshark-1.x is fine for this.)

To be able to open in Wireshark, the file:

dump_150927_1848_g0n.pcap

(find it at, and download it from:

http://www.CroatiaFidelis.hr/foss/cap/cap-150927-TLS-why-js/

Also I won't repeat the procedure how to extract the content from the stream below, I explained it somewhere in the "SSL Decode & My Hard-Earned Advice for SPDY/HTTP2 in Firefox" topic, linked above: 

I'll (only!) partly repeat the procedure, for clarity.

Enter the filter: "tcp.stream eq 9" (without quotes). Select "raw" and save it as:

```

dump_150927_1848_g0n_s00009-W.bin

```

(where the infic "-W" is for wireshark, without the infix will be the same stream saved with tshark, below)

Only that much of the procedure repeated, find how to extract the javascript file which is the content of that stream, at the above linked Gentoo Forums topic.

Download (empty dir where all the permissions you have...) the script that I wrote based on a fellow Unixer's script from:

http://www.CroatiaFidelis.hr/foss/cap/cap-150927-TLS-why-js/Add-151119/

tshark-streams.sh

Look up what the issue is, which I figured out in part, but which I can not still solve the last few steps of, as I miss some understanding of something here, on:

[Wireshark-users] follow [tcp|ssl].stream with tshark

https://www.wireshark.org/lists/wireshark-users/201511/msg00033.html

https://www.wireshark.org/lists/wireshark-users/201511/msg00047.html

and finally today, when it dawned on me that the method was right:

https://www.wireshark.org/lists/wireshark-users/201511/msg00048.html

And so run the script (running it without the second parameter will automatically list, and then follow and save all the streams from a PCAP for you!)...

And so run the script with the first parameter, the PCAP file, and the second, the same tcp stream filter expression we just worked with Wireshark (above):

```

$ tshark-streams.sh dump_150927_1848_g0n.pcap "tcp.stream eq 9"

```

Examine the files that the script got you.

My question is pretty simple:

How do you extract the same javascript file content from that tshark-followed and -saved  stream:

```

dump_150927_1848_g0n_s00009.bin

```

which the procedure above got you?

( and it's without the infix "-W" as we earlier promised )

I really, at this time, don't have any inklings as to how to do it...

So if anybody helps out, thanks in advanced!Last edited by miroR on Sun Nov 29, 2015 8:28 pm; edited 2 times in total

----------

## miroR

I have posted what anybody could have gotten with the PCAP in the first post,

with Wireshark, and with tshark, as explained, in:

http://www.CroatiaFidelis.hr/foss/cap/cap-150927-TLS-why-js/Add-151121/

Download dLo.sh, run it in a (proverbial) empty dir, it'll download the other

files.

This issue can be looked at as an encoding issue of sorts.

These are the two files gotten in the ways explained:

```

$ ls -ltr dump_150927_1848_g0n_s00009-W.bin dump_150927_1848_g0n_s00009.bin

-rw-r--r-- 1 miro miro 16743 2015-11-21 15:21 dump_150927_1848_g0n_s00009-W.bin

-rw-r--r-- 1 miro miro 33738 2015-11-21 15:32 dump_150927_1848_g0n_s00009.bin

$

```

I'm pretty sure the same content can be extracted from both.

Maybe I should still repeat the howto from:

https://forums.gentoo.org/viewtopic-t-1029408.html#7822484

Just instead of what you can find there:

```

$ hexedit dump_150927_1848_g0n_s09.dump

```

since the better naming of those is with the ".bin" extension and more zeroes, you have now (if you ran the script correctly):

```

$ hexedit dump_150927_1848_g0n_s00009-W.bin

```

And, in hex, search for "1F8B08". Ctrl-SPACE marks, then Esc-> selects to end, then Esc-w copies, then Esc-y opens the dialog asking you to provide a filename.

Paste in there: dump_150927_1848_g0n_s00009-W.gz

hit Enter, and you can exit with Ctrl-C.

And lo, in the (proverbial empty) dir where you tried this, you have a file:

```

$ ls -l dump_150927_1848_g0n_s00009-W.gz

-rw-r--r-- 1 miro miro 15887 2015-11-21 19:59 dump_150927_1848_g0n_s00009-W.gz

$

```

And:

```

$ file dump_150927_1848_g0n_s00009-W.gz

dump_150927_1848_g0n_s00009-W.gz: gzip compressed data, from Unix

$ gunzip --verbose dump_150927_1848_g0n_s00009-W.gz

dump_150927_1848_g0n_s00009-W.gz:    64.2% -- replaced with dump_150927_1848_g0n_s00009-W

$

```

It's a:

```

$ file dump_150927_1848_g0n_s00009-W

dump_150927_1848_g0n_s00009-W: ASCII text, with very long lines

$
```

it is not just a few chars...:

```

$ ls -l dump_150927_1848_g0n_s00009-W

-rw-r--r-- 1 miro miro 44381 2015-11-21 19:59 dump_150927_1848_g0n_s00009-W

$
```

and I will head of it one line for you (except, I'll break it in lines at ";",

else the page in, say, my Dillo browser way to wide):

```

$ head -1 dump_150927_1848_g0n_s00009-W

function _truste_eu(){truste=self.truste||{};

truste.eu=truste.eu||{};

truste.eu.version="v3.12-21";

truste.eu.COOKIE_DAX_NAME="notice_dax_signature";

truste.eu.COOKIE_PREF_NAME="notice_preferences";

truste.eu.COOKIE_CATEGORY_NAME="optout_domains";

truste.util=truste.util||{};

truste.util.getUniqueID=function(){return"truste_"+Math.random()};

truste.util.getIntValue=function(h){h=parseInt(h);

return isNaN(h)?null:h};

truste.util.getScriptElement=function(h,k){"string"==typeof h&&(h=RegExp(h));

if(!(h instanceof

$
```

Since it's a javascript file, it ought to be renamed:

```

$ mv -iv dump_150927_1848_g0n_s00009-W dump_150927_1848_g0n_s00009-W.js

```

(you can inspect that entire Javascript file in the link in the top)

End of the extracting of what anybody can get from the said PCAP file, with Wireshark.

This goes for easier presentation of the problem.

Because none of this can be simply done this way with the dump_150927_1848_g0n_s00009.bin file in this same nor similar way!

I want to say here that I would like best if somebody came up and told us how to do this. Some advanced users surely know how to solve this!

But maybe they are busy or not around (or there are other reasons, who knows?).

To keep matters simple, I would like to make another post and show precisely why I think that these files do have the same content but are just encoded differently.

I want to try and reach to the solution with my own efforts, rather than only wait...Last edited by miroR on Sun Nov 22, 2015 11:14 am; edited 1 time in total

----------

## miroR

If the two files:

```

$ ls -l dump_150927_1848_g0n_s00009{,-W}.bin 

-rw-r--r-- 1 miro miro 33738 2015-11-21 15:32 dump_150927_1848_g0n_s00009.bin

-rw-r--r-- 1 miro miro 16743 2015-11-21 15:21 dump_150927_1848_g0n_s00009-W.bin

```

are looked in two different ways, and with some swapping of bytes, they hold the same numbers!

Have a look. I'll demonstrate it on only the first few dozens of chars. You choose chars from other respective sections, and you must find the same analogy.

```

$ cat dump_150927_1848_g0n_s00009.bin | head -6 | tail -c 721 | head -c 48 ; echo

474554202f6765743f6e616d653d6e6f746963652e6a7326

```

And:

```

$ hexdump dump_150927_1848_g0n_s00009-W.bin | head -c76 ; echo

0000000 4547 2054 672f 7465 6e3f 6d61 3d65 6f6e

0000010 6974 6563 6a2e 2673 

```

I try and make a command to do it, for the cat line. I'll add spaces only to it. And in the hexdump line I'll manually remove the offset, and make only one line of it. And then I'll set the two one beneath the other:

```

$ cat dump_150927_1848_g0n_s00009.bin | head -6 | tail -c 721 | head -c 48 | sed 's/\([0-9a-f][0-9a-f][0-9a-f][0-9a-f]\)/\1 /g' ; echo

4745 5420 2f67 6574 3f6e 616d 653d 6e6f 7469 6365 2e6a 7326

4547 2054 672f 7465 6e3f 6d61 3d65 6f6e 6974 6563 6a2e 2673 

$
```

The top one is from the dump_150927_1848_g0n_s00009.bin, the bottom one is from the dump_150927_1848_g0n_s00009-W.bin.

It's just the bytes are swapped. The first byte in the two bytes format of the tshark-saved file is the second byte in the two bytes format of the wireshark-saved file, and vice versa.

The problem, however, is that these two files are fundamentally different kinds of files, in a way:

```

$ file dump_150927_1848_g0n_s00009.bin

dump_150927_1848_g0n_s00009.bin: ASCII text, with very long lines

$ file dump_150927_1848_g0n_s00009-W.bin

dump_150927_1848_g0n_s00009-W.bin: data

$

```

How do you deal with the tshark one now? If you open it with hexedit, hexedit thinks it is a text file, and you get all the wrong screen, nothing sensible in the ascii column! Do try!

The words little-endian and big-endian come to mind... Also the difference in size is like btwn UTF-16 and UTF-8... Wikipedia is, I'm afraid, in the order of the day. 

More studying... But these two files must have the same content. They come from the same source, same content must be possible to extract from the tshark-saved one.

The thing is, the tshark is preferred for me. I could eventually automate things a bit with tshark. But how can I normally check with Wireshark what is going on when my machine connects somewhere if I have to manually save streams with Wireshark? When you go online with Firefox, you get tcp/ssl streams in the order of a hundred per minute!

----------

## miroR

I'm getting convinced that this is a bug with Wireshark. Tried posting a bug on their bugzilla, couldn't do it. So I posted with Gentoo Bugzilla:

tshark (net-analyzer/wireshark-1.12.8-r1) saves tcp/ssl raw streams in ascii file, content unrecoverable

https://bugs.gentoo.org/show_bug.cgi?id=566472

Cheers!

----------

## miroR

Managed to file it with Wireshark:

tshark saves raw stream in ascii file, content unrecoverable

https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=11750

----------

## miroR

This is almost being pretty much on the bleeding edge...

Because these issues with Wireshark, it seems, by the reactions to my very imperfect, as yet, bug reports, that I haven't really invented them They are real issues.

Only they had been fixed, as a senior (somewhat angrily) wrote. And so no replies, as yet, on the Wireshark bug that I reported (and I'm trembling a little fearing flac on me  :Wink:  ...).

Have a look:

net-analyzer/wireshark-1.12.8-r1 - tshark saves tcp/ssl raw streams in ascii file, content unrecoverable

https://bugs.gentoo.org/show_bug.cgi?id=566472#c2

 *Jeroen Roovers wrote:*   

> 
> 
> I don't see how Gentoo is responsible for wireshark's behaviour. If there is such a link, we should see upstream refer it back to us.
> 
> 

 

I really hope I'll learn and be able to report bugs better than I have done so far, and report only what is needed to be reported.

But this is being close to the bleeding edge:

https://packages.gentoo.org/packages/net-analyzer/wireshark

and:

https://gitweb.gentoo.org/repo/gentoo.git/log/net-analyzer/wireshark?showmsg=1

which tells us the Wireshark-2.0.0 was commited by Jeroen Roovers 41 hours ago.

It's an honor.

However, I am not able to rsync the new wireshark, as the change does not seem to have propagated to German mirrors that I use

(I use the late-to-update but so great, because it is PGP signed by the Engineering Team, the emerge-webrsync to update my machines...)

And so I still can't update to 2.0.0 to see if I can wade my way through and extract content with Wireshark-2.0.0 and with its tshark.

---

Pls. if somebody cares, have a look at that link (already given above also):

https://bugs.gentoo.org/show_bug.cgi?id=566472#c2

and from it click to the link under words:

net-analyzer/wireshark: Version bump (bug #566180 by Pavel Půlpán)

which is:

https://bugs.gentoo.org/show_bug.cgi?id=566180

Search for the words:

A Follow Stream dialog crash has been fixed. Bug Bug 11711.

and go to that bug (which has the link under the words: "Bug 11711".

It's not the content that the writer of that comment meant, is it? It opens the bug:

tuxracer complains about missing /usr/share/games/tuxracer

https://bugs.gentoo.org/show_bug.cgi?id=11711

which has nothing to do with "A Follow Stream dialog crash has been fixed"

So that ought to be fixed there, if I'm not wrong.

And I'm posting this here, first, so somebody other hopefully confirm what I see.

I have hard time reporting bugs, as I don't want to make mistakes, and I still do...

Regards.

----------

## miroR

That  but #11711 is relative/local to bugs.wireshark.org, not to bugs.gentoo.org :

https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=11711

But I'm tired and sleepy (Europe here, good USians; I know there are lots of you worthy USians, who are in the greatest part responsible for my addiction, addiction to Gentoo  :Wink:  )

----------

## miroR

 *miroR wrote:*   

> ...
> 
> However, I am not able to rsync the new wireshark, as the change does not seem to have propagated to German mirrors that I use
> 
> (I use the late-to-update but so great, because it is PGP signed by the Engineering Team, the emerge-webrsync to update my machines...)
> ...

 

I use emerge-webrsync on my local mirror, and I mirror it from German mirrors usually, but not even in the, I guess, earliest to get the changes propagated do I see portage-20151123.tar.xz .

I only see:

http://gentoo.gossamerhost.com/snapshots/

```

...

portage-20151122.tar.xz          22-Nov-2015 16:45   61M  

...

```

(and I hope I understand correctly the time is in UTC)

[I only see] portage-20151122.tar.xz being currenlty the latest. And wireshark-2.0.0 is not in it (although I got the package from my usual mirrors just three hours ago, with something like, first:

```

# rsync -nav rsync://mirror.netcologne.de/gentoo/distfiles/ distfiles/ 2>&1 | tee rsync_mirror.netcologne.de-nav_`date +%y%m%d_%H%M`.log

```

to see what I would get, and then get it with:

```

# rsync -av rsync://mirror.netcologne.de/gentoo/distfiles/ distfiles/ 2>&1 | tee rsync_mirror.netcologne.de-av_`date +%y%m%d_%H%M`.log

```

(that also logs it for you, and with the right, but local, timestamp)

).

So I guess I can only get it in the afternoon (Europe), or IOW in some at least 10 hours time  :Sad:  .

----------

## miroR

I decided 10 hours was too long to wait  :Wink:  :

An Example of Local Overlay Install: Wireshark

https://forums.gentoo.org/viewtopic-t-1033942.html

----------

## miroR

I have finally solved this.

But it wasn't a bug with tshark.

It was when I improved the script

http://www.CroatiaFidelis.hr/foss/cap/cap-150927-TLS-why-js/Add-151119/tshark-streams.sh

(

but if you are reading later, that incomplete script will be renamed:

http://www.CroatiaFidelis.hr/foss/cap/cap-150927-TLS-why-js/Add-151119/tshark-streams-INCOMPLETE.sh

and the correct, binary, content-extractable files producing script will be at it's address up until 2015-11-23 or so...

)

[It was when I improved the script] (see in the first page or see it in the Matt's heapspray.net address I got it from), and fitting in the  

```

egrep '[[:print:]]'

```

was a replacement, I left out the piping the result to xxd. Pure ignorance.

Since I'm tired of this quest (in which I learned quite a lot), I'll just show you how to get the extractable-content binary file out of:

wget http://www.CroatiaFidelis.hr/foss/cap/cap-150927-TLS-why-js/Add-151121/dump_150927_1848_g0n_s00009.bin

```

$ cat dump_150927_1848_g0n_s00009.bin | xxd -r -p > dump_150927_1848_g0n_s00009-REAL.bin

```

And from that dump_150927_1848_g0n_s00009-REAL.bin the  same javascript file as:

dump_150927_1848_g0n_s00009-W.js

in the dir:

http://www.CroatiaFidelis.hr/foss/cap/cap-150927-TLS-why-js/Add-151121/

can be extracted in the exactly same fashion as I explained in the second post of this topic.

Thanks for reading this.

Marking this as [SOLVED].

----------

## miroR

I've completed writing the script:

tshark-streams.sh

Find it in the original location:

http://www.CroatiaFidelis.hr/foss/cap/cap-150927-TLS-why-js/Add-151119/

And still there for some time in the future (God willing), I'll keep the old:

tshark-streams-INCOMPLETE.sh

I really think this script correctly extracts all the streams, tcp and ssl, from any PCAP.

So far I tested it only with dumpcap-made files, but pcap-dump-made (or what the name is), and surely tcpdump-made captures should make no difference, it's the same libpcap library they all use.

I really think so, but then again, I misreported bugs, and thought different things at different stages of this quest  :Wink:  ...

Regards!

----------

## miroR

The script is now (I recently tried hard to  improve it, it's much nicer than before) on:

https://github.com/miroR/tshark-streams.git

(

also without '.git':

https://github.com/miroR/tshark-streams

)

as previously (kind of) promised.

----------

