Lorem ipsum dolor sit amet, consectetur adipiscing elit. Test link

Scraping Data Twitter Menggunakan Twint Tool

 


Tutorial kali in admin akan membahas salah satu tool OSINT yaitu Twint. Kamu dapat mencari informasi dari situs atau media populer. Untuk media sosial yang sering di jadikan bahan percobaan OSINT yaitu twitter, seperti yang kita ketahui bahwa twitter memiliki banyak pengguna dan ini merupakan bahan untuk mendapatkan informasi yang banyak.

Twint is an advanced Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter’s API. Twint utilizes Twitter’s search operators to let you scrape Tweets from specific users, scrape Tweets relating to certain topics, hashtags & trends, or sort out sensitive information from Tweets like e-mail and phone numbers. I find this very useful, and you can get really creative with it too. Twint also makes special queries to Twitter allowing you to also scrape a Twitter user’s followers, Tweets a user has liked, and who they follow without any authentication, API, Selenium, or browser emulation.

Untuk menginsatlnya kamu bisa mengikuti langakah-langkah berikut.

$ git clone https://github.com/twintproject/twint.git
$ cd twint
$ python3 -m pip install . -r requirements.txt

Setelah proses selesai, silahkan masukan perintah berikut untuk melihat outputnya.

twint -h

Output.

[hidayat@code ~]$ twint -h
usage: python3 twint [options]

TWINT - An Advanced Twitter Scraping Tool.

optional arguments:
  -h, --help            show this help message and exit
  -u USERNAME, --username USERNAME
                        User's Tweets you want to scrape.
  -s SEARCH, --search SEARCH
                        Search for Tweets containing this word or phrase.
  -g GEO, --geo GEO     Search for geocoded Tweets.
  --near NEAR           Near a specified city.
  --location            Show user's location (Experimental).
  -l LANG, --lang LANG  Search for Tweets in a specific language.
  -o OUTPUT, --output OUTPUT
                        Save output to a file.
  -es ELASTICSEARCH, --elasticsearch ELASTICSEARCH
                        Index to Elasticsearch.
  --year YEAR           Filter Tweets before specified year.
  --since DATE          Filter Tweets sent since date (Example: "2017-12-27 20:30:15" or 2017-12-27).
  --until DATE          Filter Tweets sent until date (Example: "2017-12-27 20:30:15" or 2017-12-27).
  --email               Filter Tweets that might have email addresses
  --phone               Filter Tweets that might have phone numbers
  --verified            Display Tweets only from verified users (Use with -s).
  --csv                 Write as .csv file.
  --json                Write as .json file
  --hashtags            Output hashtags in seperate column.
  --cashtags            Output cashtags in seperate column.
  --userid USERID       Twitter user id.
  --limit LIMIT         Number of Tweets to pull (Increments of 20).
  --count               Display number of Tweets scraped at the end of session.
  --stats               Show number of replies, retweets, and likes.
  -db DATABASE, --database DATABASE
                        Store Tweets in a sqlite3 database.
  --to USERNAME         Search Tweets to a user.
  --all USERNAME        Search all Tweets associated with a user.
  --followers           Scrape a person's followers.
  --following           Scrape a person's follows
  --favorites           Scrape Tweets a user has liked.
  --proxy-type PROXY_TYPE
                        Socks5, HTTP, etc.
  --proxy-host PROXY_HOST
                        Proxy hostname or IP.
  --proxy-port PROXY_PORT
                        The port of the proxy server.
  --tor-control-port TOR_CONTROL_PORT
                        If proxy-host is set to tor, this is the control port
  --tor-control-password TOR_CONTROL_PASSWORD
                        If proxy-host is set to tor, this is the password for the control port
  --essid [ESSID]       Elasticsearch Session ID, use this to differentiate scraping sessions.
  --userlist USERLIST   Userlist from list or file.
  --retweets            Include user's Retweets (Warning: limited).
  --format FORMAT       Custom output format (See wiki for details).
  --user-full           Collect all user information (Use with followers or following only).
  --profile-full        Slow, but effective method of collecting a user's Tweets and RT.
  --translate           Get tweets translated by Google Translate.
  --translate-dest TRANSLATE_DEST
                        Translate tweet to language (ISO2).
  --store-pandas STORE_PANDAS
                        Save Tweets in a DataFrame (Pandas) file.
  --pandas-type [PANDAS_TYPE]
                        Specify HDF5 or Pickle (HDF5 as default)
  -it [INDEX_TWEETS], --index-tweets [INDEX_TWEETS]
                        Custom Elasticsearch Index name for Tweets.
  -if [INDEX_FOLLOW], --index-follow [INDEX_FOLLOW]
                        Custom Elasticsearch Index name for Follows.
  -iu [INDEX_USERS], --index-users [INDEX_USERS]
                        Custom Elasticsearch Index name for Users.
  --debug               Store information in debug logs
  --resume TWEET_ID     Resume from Tweet ID.
  --videos              Display only Tweets with videos.
  --images              Display only Tweets with images.
  --media               Display Tweets with only images or videos.
  --replies             Display replies to a subject.
  -pc PANDAS_CLEAN, --pandas-clean PANDAS_CLEAN
                        Automatically clean Pandas dataframe at every scrape.
  -cq CUSTOM_QUERY, --custom-query CUSTOM_QUERY
                        Custom search query.
  -pt, --popular-tweets
                        Scrape popular tweets instead of recent ones.
  -sc, --skip-certs     Skip certs verification, useful for SSC.
  -ho, --hide-output    Hide output, no tweets will be displayed.
  -nr, --native-retweets
                        Filter the results for retweets only.
  --min-likes MIN_LIKES
                        Filter the tweets by minimum number of likes.
  --min-retweets MIN_RETWEETS
                        Filter the tweets by minimum number of retweets.
  --min-replies MIN_REPLIES
                        Filter the tweets by minimum number of replies.
  --links LINKS         Include or exclude tweets containing one o more links. If not specified you will get both tweets that might contain links or not.
  --source SOURCE       Filter the tweets for specific source client.
  --members-list MEMBERS_LIST
                        Filter the tweets sent by users in a given list.
  -fr, --filter-retweets
                        Exclude retweets from the results.
  --backoff-exponent BACKOFF_EXPONENT
                        Specify a exponent for the polynomial backoff in case of errors.
  --min-wait-time MIN_WAIT_TIME
                        specifiy a minimum wait time in case of scraping limit error. This value will be adjusted by twint if the value provided does not satisfy the limits constraints

Untuk scraping menggunakan perintah.

$ twint -u username

Perintah di atas mencari cuitan berdasarkan username tertentu.

$ twint -u username --year 2xxx

Untuk perinntah di atas mencari ciutan username dengan menggunakan tahun yang telah di tentukan. Untuk mecari semua cuitan tweet dengan output file .txt, gunakan command berikut :

$ twint -u username -o file.txt

Selain dapat menggunakan ekstes txt, kamu juga dapat menggunakan ekstens cvs.

$ twint -u username -o file.csv --csv

Adapun peritah lainnya yang bisa kamu lakukan yaitu.

1. twint -u username - Scrape all the Tweets of a user (doesn't include retweets but includes replies). 
2. twint -u username -s pineapple - Scrape all Tweets from the user's timeline containing pineapple.
 3. twint -s pineapple - Collect every Tweet containing pineapple from everyone's Tweets. 
4. twint -u username --year 2014 - Collect Tweets that were tweeted before 2014. 
5. twint -u username --since "2015-12-20 20:30:15" - Collect Tweets that were tweeted since 2015-12-20 20:30:15. 
6. twint -u username --since 2015-12-20 - Collect Tweets that were tweeted since 2015-12-20 00:00:00. 7. twint -u username -o file.txt - Scrape Tweets and save to file.txt. 
8. twint -u username -o file.csv --csv - Scrape Tweets and save as a csv file. 
9. twint -u username --email --phone - Show Tweets that might have phone numbers or email addresses. 10. twint -s "Donald Trump" --verified - Display Tweets by verified users that Tweeted about Donald Trump. 
11. twint -g="48.880048,2.385939,1km" -o file.csv --csv - Scrape Tweets from a radius of 1km around a place in Paris and export them to a csv file. 
12. twint -u username -es localhost:9200 - Output Tweets to Elasticsearch 
13. twint -u username -o file.json --json - Scrape Tweets and save as a json file. 
14. twint -u username --database tweets.db - Save Tweets to a SQLite database. 
15. twint -u username --followers - Scrape a Twitter user's followers. 
16. twint -u username --following - Scrape who a Twitter user follows. 
17. twint -u username --favorites - Collect all the Tweets a user has favorited (gathers ~3200 tweet). 
18. twint -u username --following --user-full - Collect full user information a person follows 
19. twint -u username --timeline - Use an effective method to gather Tweets from a user's profile (Gathers ~3200 Tweets, including retweets & replies). 
20. twint -u username --retweets - Use a quick method to gather the last 900 Tweets (that includes retweets) from a user's profile. 
21. twint -u username --resume resume_file.txt - Resume a search starting from the last saved scroll-id.

إرسال تعليق

© HIDAYAT CODE. All rights reserved. Premium By Raushan Design